Posts Tagged ‘CloudComputing’
[DevoxxPL2022] Challenges Running Planet-Wide Computer: Efficiency • Jacek Bzdak, Beata Strack
Jacek Bzdak and Beata Strack, software engineers at Google Poland, delivered an engaging session at Devoxx Poland 2022, exploring the intricacies of optimizing Google’s planet-scale computing infrastructure. Their talk focused on achieving efficiency in a distributed system spanning global data centers, emphasizing resource utilization, auto-scaling, and operational strategies. By sharing insights from Google’s internal cloud and Autopilot system, Jacek and Beata provided a blueprint for enhancing service performance while navigating the complexities of large-scale computing.
Defining Efficiency in a Global Fleet
Beata opened by framing Google’s data centers as a singular “planet-wide computer,” where efficiency translates to minimizing operational costs—servers, CPU, memory, data centers, and electricity. Key metrics like fleet-wide utilization, CPU/RAM allocation, and growth rate serve as proxies for these costs, though they are imperfect, often masking quality issues like inflated memory usage. Beata stressed that efficiency begins at the service level, where individual jobs must optimize resource consumption, and extends to the fleet through an ecosystem that maximizes resource sharing. This dual approach ensures that savings at the micro level scale globally, a principle applicable even to smaller organizations.
Auto-Scaling: Balancing Utilization and Reliability
Jacek, a member of Google’s Autopilot team, delved into auto-scaling, a critical mechanism for achieving high utilization without compromising reliability. Autopilot’s vertical scaling adjusts resource limits (CPU/memory) for fixed replicas, while horizontal scaling modifies replica counts. Jacek presented data from an Autopilot paper, showing that auto-scaled services maintain memory slack below 20% for median cases, compared to over 60% for manually managed services. Crucially, automation reduces outage risks by dynamically adjusting limits, as demonstrated in a real-world case where Autopilot preempted a memory-induced crash. However, auto-scaling introduces complexity, particularly feedback loops, where overzealous caching or load shedding can destabilize resource allocation, requiring careful integration with application-specific metrics.
Java-Specific Challenges in Auto-Scaling
The talk transitioned to language-specific hurdles, with Jacek highlighting Java’s unique challenges in auto-scaling environments. Just-in-Time (JIT) compilation during application startup spikes CPU usage, complicating horizontal scaling decisions. Memory management poses further issues, as Java’s heap size is static, and out-of-memory errors may be masked by garbage collection (GC) thrashing, where excessive CPU is devoted to GC rather than request handling. To address this, Google sets static heap sizes and auto-scales non-heap memory, though Jacek envisioned a future where Java aligns with other languages, eliminating heap-specific configurations. These insights underscore the need for language-aware auto-scaling strategies in heterogeneous environments.
Operational Strategies for Resource Reclamation
Beata concluded by discussing operational techniques like overcommit and workload colocation to reclaim unused resources. Overcommit leverages the low probability of simultaneous resource spikes across unrelated services, allowing Google to pack more workloads onto machines. Colocating high-priority serving jobs with lower-priority batch workloads enables resource reclamation, with batch tasks evicted when serving jobs demand capacity. A 2015 experiment demonstrated significant machine savings through colocation, a concept influencing Kubernetes’ design. These strategies, combined with auto-scaling, create a robust framework for efficiency, though they demand rigorous isolation to prevent interference between workloads.
Links:
[PHPForumParis2021] Migrating a Bank-as-a-Service to Serverless – Louis Pinsard
Louis Pinsard, an engineering manager at Theodo, captivated the Forum PHP 2021 audience with a detailed recounting of his journey migrating a Bank-as-a-Service platform to a serverless architecture. Having returned to PHP after a hiatus, Louis shared his experience leveraging AWS serverless technologies to enhance scalability and reliability in a high-stakes financial environment. His narrative, rich with practical insights, illuminated the challenges and triumphs of modernizing critical systems. This post explores four key themes: the rationale for serverless, leveraging AWS tools, simplifying with Bref, and addressing migration challenges.
The Rationale for Serverless
Louis Pinsard opened by explaining the motivation behind adopting a serverless architecture for a Bank-as-a-Service platform at Theodo. Traditional server-based systems struggled with scalability and maintenance under the unpredictable demands of financial transactions. Serverless, with its pay-per-use model and automatic scaling, offered a solution to handle variable workloads efficiently. Louis highlighted how this approach reduced infrastructure management overhead, allowing his team to focus on business logic and deliver a robust, cost-effective platform.
Leveraging AWS Tools
A significant portion of Louis’s talk focused on the use of AWS services like Lambda and SQS to build a resilient system. He described how Lambda functions enabled event-driven processing, while SQS managed asynchronous message queues to handle transaction retries seamlessly. By integrating these tools, Louis’s team at Theodo ensured high availability and fault tolerance, critical for financial applications. His practical examples demonstrated how AWS’s native services simplified complex workflows, enhancing the platform’s performance and reliability.
Simplifying with Bref
Louis discussed the role of Bref, a PHP framework for serverless applications, in streamlining the migration process. While initially hesitant due to concerns about complexity, he found Bref to be a lightweight layer over AWS, making it nearly transparent for developers familiar with serverless concepts. Louis emphasized that Bref’s simplicity allowed his team to deploy PHP code efficiently, reducing the learning curve and enabling rapid development without sacrificing robustness, even in a demanding financial context.
Addressing Migration Challenges
Concluding his presentation, Louis addressed the challenges of migrating a legacy system to serverless, including team upskilling and managing dependencies. He shared how his team adopted AWS CloudFormation for infrastructure-as-code, simplifying deployments. Responding to an audience question, Louis noted that Bref’s minimal overhead made it a viable choice over native AWS SDKs for PHP developers. His insights underscored the importance of strategic planning and incremental adoption to ensure a smooth transition, offering valuable lessons for similar projects.
Links:
[KotlinConf2019] Kotless: A Kotlin-Native Approach to Serverless with Vladislav Tankov
Serverless computing has revolutionized how applications are deployed and scaled, but it often comes with its own set of complexities, including managing deployment DSLs like Terraform or CloudFormation. Vladislav Tankov, then a Software Developer at JetBrains, introduced Kotless at KotlinConf 2019 as a Kotlin Serverless Framework designed to simplify this landscape. Kotless aims to eliminate the need for external deployment DSLs by allowing developers to define serverless applications—including REST APIs and event handling—directly within their Kotlin code using familiar annotations. The project can be found on GitHub at github.com/JetBrains/kotless.
Vladislav’s presentation provided an overview of the Kotless Client API, demonstrated its use with a simple example, and delved into the architecture and design concepts behind its code-to-deployment pipeline. The core promise of Kotless is to make serverless computations easily understandable for anyone familiar with event-based architectures, particularly those comfortable with JAX-RS-like annotations.
Simplifying Serverless Deployment with Kotlin Annotations
The primary innovation of Kotless, as highlighted by Vladislav Tankov, is its ability to interpret Kotlin code and annotations to automatically generate the necessary deployment configurations for cloud providers like AWS (initially). Instead of writing separate configuration files in YAML or other DSLs, developers can define their serverless functions, API gateways, permissions, and scheduled events using Kotlin annotations directly on their functions and classes.
For example, creating a REST API endpoint could be as simple as annotating a Kotlin function with @Get("/mypath"). Kotless then parses these annotations during the build process and generates the required infrastructure definitions, deploys the lambdas, and configures the API Gateway. This approach significantly reduces boilerplate and the cognitive load associated with learning and maintaining separate infrastructure-as-code tools. Vladislav emphasized that a developer only needs familiarity with these annotations to create and deploy a serverless REST API application.
Architecture and Code-to-Deployment Pipeline
Vladislav Tankov provided insights into the inner workings of Kotless, explaining its architecture and the pipeline that transforms Kotlin code into a deployed serverless application. This process generally involves:
1. Annotation Processing: During compilation, Kotless processes the special annotations in the Kotlin code to understand the desired serverless architecture (e.g., API routes, event triggers, scheduled tasks).
2. Terraform Generation (Initially): Kotless then generates the necessary infrastructure-as-code configurations (initially using Terraform as a backend for AWS) based on these processed annotations. This includes defining Lambda functions, API Gateway resources, IAM roles, and event source mappings.
3. Deployment: Kotless handles the deployment of these generated configurations and the application code to the target cloud provider.
He also touched upon optimizations built into Kotless, such as “outer warming” of lambdas to reduce cold starts and optimizing lambdas by size. This focus on performance and ease of use is central to Kotless’s philosophy. The framework aims to abstract away the underlying complexities of serverless platforms, allowing developers to concentrate on their application logic.
Future Directions and Multiplatform Aspirations
Looking ahead, Vladislav Tankov discussed the future roadmap for Kotless, including ambitious plans for supporting Kotlin Multiplatform Projects (MPP). This would allow developers to choose different runtimes for their lambdas—JVM, JavaScript, or even Kotlin/Native—depending on the task and performance requirements. Supporting JavaScript lambdas, for example, could open up compatibility with platforms like Google Cloud Platform more broadly, which at the time had better support for JavaScript runtimes than JVM for serverless functions.
Other planned enhancements included extended event handling for custom events on AWS and other platforms, and continued work on performance optimizations. The vision for Kotless was to provide a comprehensive and flexible serverless solution for Kotlin developers, empowering them to build efficient and scalable cloud-native applications with minimal friction. Vladislav encouraged attendees to try Kotless and contribute to its development, positioning it as a community-driven effort to improve the Kotlin serverless experience.