Recent Posts
Archives

Posts Tagged ‘AutoScaling’

PostHeaderIcon [DevoxxGR2025] Optimized Kubernetes Scaling with Karpenter

Alex König, an AWS expert, delivered a 39-minute talk at Devoxx Greece 2025, exploring how Karpenter enhances Kubernetes cluster autoscaling for speed, cost-efficiency, and availability.

Karpenter’s Dynamic Autoscaling

König introduced Karpenter as an open-source, Kubernetes-native autoscaling solution, contrasting it with the traditional Cluster Autoscaler. Unlike the latter, which relies on uniform node groups (e.g., nodes with four CPUs and 16GB RAM), Karpenter uses the EC2 Fleet API to dynamically provision nodes tailored to workload needs. For instance, if a pod requires one CPU, Karpenter allocates a node with minimal excess capacity, avoiding resource waste. This right-sizing, combined with groupless scaling, enables faster and more cost-effective scaling, especially in dynamic environments.

Ensuring Availability with Constraints

König addressed availability challenges reported by users, emphasizing Kubernetes-native scheduling constraints to mitigate disruptions. Topology spread constraints distribute pods across availability zones, reducing the risk of downtime if a node fails. Pod disruption budgets, affinity/anti-affinity rules, and priority classes further ensure critical workloads are scheduled appropriately. For stateful workloads using EBS, König recommended setting the volume binding mode to “wait for first consumer” to avoid pod-volume mismatches across zones, preventing crashes and ensuring reliability.

Integrating with KEDA for Application Scaling

For advanced scaling, König highlighted combining Karpenter with KEDA for event-driven, application-specific scaling. KEDA scales pods based on metrics like Kafka topic sizes or SQS queues, beyond CPU/memory. Karpenter then provisions nodes for pending pods, enabling seamless scaling for workloads like flash sales. König outlined a four-step migration from Cluster Autoscaler to Karpenter, emphasizing its simplicity and open-source documentation.

Links

PostHeaderIcon [DevoxxPL2022] Challenges Running Planet-Wide Computer: Efficiency • Jacek Bzdak, Beata Strack

Jacek Bzdak and Beata Strack, software engineers at Google Poland, delivered an engaging session at Devoxx Poland 2022, exploring the intricacies of optimizing Google’s planet-scale computing infrastructure. Their talk focused on achieving efficiency in a distributed system spanning global data centers, emphasizing resource utilization, auto-scaling, and operational strategies. By sharing insights from Google’s internal cloud and Autopilot system, Jacek and Beata provided a blueprint for enhancing service performance while navigating the complexities of large-scale computing.

Defining Efficiency in a Global Fleet

Beata opened by framing Google’s data centers as a singular “planet-wide computer,” where efficiency translates to minimizing operational costs—servers, CPU, memory, data centers, and electricity. Key metrics like fleet-wide utilization, CPU/RAM allocation, and growth rate serve as proxies for these costs, though they are imperfect, often masking quality issues like inflated memory usage. Beata stressed that efficiency begins at the service level, where individual jobs must optimize resource consumption, and extends to the fleet through an ecosystem that maximizes resource sharing. This dual approach ensures that savings at the micro level scale globally, a principle applicable even to smaller organizations.

Auto-Scaling: Balancing Utilization and Reliability

Jacek, a member of Google’s Autopilot team, delved into auto-scaling, a critical mechanism for achieving high utilization without compromising reliability. Autopilot’s vertical scaling adjusts resource limits (CPU/memory) for fixed replicas, while horizontal scaling modifies replica counts. Jacek presented data from an Autopilot paper, showing that auto-scaled services maintain memory slack below 20% for median cases, compared to over 60% for manually managed services. Crucially, automation reduces outage risks by dynamically adjusting limits, as demonstrated in a real-world case where Autopilot preempted a memory-induced crash. However, auto-scaling introduces complexity, particularly feedback loops, where overzealous caching or load shedding can destabilize resource allocation, requiring careful integration with application-specific metrics.

Java-Specific Challenges in Auto-Scaling

The talk transitioned to language-specific hurdles, with Jacek highlighting Java’s unique challenges in auto-scaling environments. Just-in-Time (JIT) compilation during application startup spikes CPU usage, complicating horizontal scaling decisions. Memory management poses further issues, as Java’s heap size is static, and out-of-memory errors may be masked by garbage collection (GC) thrashing, where excessive CPU is devoted to GC rather than request handling. To address this, Google sets static heap sizes and auto-scales non-heap memory, though Jacek envisioned a future where Java aligns with other languages, eliminating heap-specific configurations. These insights underscore the need for language-aware auto-scaling strategies in heterogeneous environments.

Operational Strategies for Resource Reclamation

Beata concluded by discussing operational techniques like overcommit and workload colocation to reclaim unused resources. Overcommit leverages the low probability of simultaneous resource spikes across unrelated services, allowing Google to pack more workloads onto machines. Colocating high-priority serving jobs with lower-priority batch workloads enables resource reclamation, with batch tasks evicted when serving jobs demand capacity. A 2015 experiment demonstrated significant machine savings through colocation, a concept influencing Kubernetes’ design. These strategies, combined with auto-scaling, create a robust framework for efficiency, though they demand rigorous isolation to prevent interference between workloads.

Links: