Posts Tagged ‘ASML’
[SpringIO2025] A cloud cost saving journey: Strategies to balance CPU for containerized JAVA workloads in K8s
Lecturer
Laurentiu Marinescu is a Lead Software Engineer at ASML, specializing in building resilient, cloud-native platforms with a focus on full-stack development. With expertise in problem-solving and software craftsmanship, he serves as a tech lead responsible for next-generation cloud platforms at ASML. He holds a degree from the Faculty of Economic Cybernetics and is an advocate for pair programming and emerging technologies. Ajith Ganesan is a System Engineer at ASML with over 15 years of experience in software solutions, particularly in lithography process control applications. His work emphasizes data platform requirements and strategy, with a strong interest in AI opportunities. He holds degrees from Eindhoven University of Technology and is passionate about system design and optimization.
Abstract
This article investigates strategies for optimizing CPU resource utilization in Kubernetes environments for containerized Java workloads, emphasizing cost reduction and performance enhancement. It analyzes the trade-offs in resource allocation, including requests and limits, and presents data-driven approaches to minimize idle CPU cycles. Through examination of workload characteristics, scaling mechanisms, and JVM configurations, the discussion highlights practical implementations that balance efficiency, stability, and operational expenses in on-premises deployments.
Contextualizing Cloud Costs and CPU Utilization Challenges
The escalating costs of cloud infrastructure represent a significant challenge for organizations deploying containerized applications. Annual expenditures on cloud services have surpassed $600 billion, with many entities exceeding budgets by over 17%. In Kubernetes clusters, average CPU utilization hovers around 10%, even in large-scale environments exceeding 1,000 CPUs, where it reaches only 17%. This underutilization implies that up to 90% of provisioned resources remain idle, akin to maintaining expensive infrastructure on perpetual standby.
The inefficiency stems not from collective oversight but from inherent design trade-offs. Organizations deploy expansive clusters to ensure capacity for peak demands, yet this leads to substantial idle resources. The opportunity lies in reclaiming these for cost savings; even doubling utilization to 20% could yield significant reductions. This requires understanding application behaviors, load profiles, and the interplay between Kubernetes scheduling and Java Virtual Machine (JVM) dynamics.
In simulated scenarios with balanced nodes and containers, tight packing minimizes rollout costs but introduces risks. For instance, upgrading containers sequentially due to limited spare capacity (e.g., 25% headroom) can prevent zero-downtime deployments. Scaling demands may fail due to resource constraints, necessitating cluster expansions that inflate expenses. These examples underscore the need for strategies that optimize utilization without compromising reliability.
Resource Allocation Strategies: Requests, Limits, and Workload Profiling
Effective CPU management in Kubernetes hinges on judicious setting of resource requests and limits. Requests guarantee minimum allocation for scheduling, while limits cap maximum usage to prevent monopolization. For Java workloads, these must align with JVM ergonomics, which adapt heap and thread pools based on detected CPU cores.
Workload profiling is essential, categorizing applications into mission-critical (requiring deterministic latency) and non-critical (tolerant of variability). In practice, reducing requests by up to 75% for critical workloads, counterintuitively, enhanced performance by allowing burstable access to idle resources. Experiments demonstrated halved hardware, energy, and real estate costs, with improved stability.
A binary search query identified optimal requests, but assumptions—such as non-simultaneous peaks—were validated through rigorous testing. For non-critical applications, minimal requests (sharing 99% of resources) maximized utilization. Scaling based on application-specific metrics, rather than default CPU thresholds, proved superior. For example, autoscaling on heap usage or queue sizes avoided premature scaling triggered by garbage collection spikes.
Code example for configuring Kubernetes resources in a Deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: java-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: java-app:latest
resources:
requests:
cpu: "500m" # Reduced request for sharing
limits:
cpu: "2" # Expanded limit for bursts
This configuration enables overcommitment, assuming workload diversity prevents concurrent peaks.
JVM and Application-Level Optimizations for Efficiency
Java workloads introduce unique considerations due to JVM behaviors like garbage collection (GC) and thread management. Default JVM settings often lead to inefficiencies; for instance, GC pauses can spike CPU usage, triggering unnecessary scaling. Tuning collectors (e.g., ZGC for low-latency) and limiting threads reduced contention.
Servlet containers like Tomcat exhibited high overhead; profiling revealed excessive thread creation. Switching to Undertow, with its non-blocking I/O, halved resource usage while maintaining throughput. Reactive applications benefited from Netty, leveraging asynchronous processing for better utilization.
Thread management is critical: unbounded queues in executors caused out-of-memory errors under load. Implementing bounded queues with rejection policies ensured stability. For example:
@Bean
public ThreadPoolTaskExecutor executor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10); // Limit threads
executor.setMaxPoolSize(20);
executor.setQueueCapacity(50); // Bounded queue
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
return executor;
}
Monitoring tools like Prometheus and Grafana facilitated iterative tuning, adapting to evolving workloads.
Cluster-Level Interventions and Success Metrics
Cluster-wide optimizations complement application-level efforts. Overcommitment, by reducing requests while expanding limits, smoothed resource contention. Pre-optimization graphs showed erratic throttling; post-optimization, latency decreased 10-20%, with 7x more requests handled.
Success hinged on validating assumptions through experiments. Despite risks of simultaneous scaling, diverse workloads ensured viability. Continuous monitoring—via vulnerability scans and metrics—enabled proactive adjustments.
Key metrics included reduced throttling, stabilized performance, and halved costs. Policies at namespace and node levels aligned with overcommitment strategies, incorporating backups for node failures.
Implications for Sustainable Infrastructure Management
Optimizing CPU for Java in Kubernetes demands balancing trade-offs: determinism versus sharing, cost versus performance. Strategies emphasize application understanding, JVM tuning, and adaptive scaling. While mission-critical apps benefit from resource sharing under validated assumptions, non-critical ones maximize efficiency with minimal requests.
Future implications involve AI-driven predictions for peak avoidance, enhancing sustainability by reducing energy consumption. Organizations must iterate: monitor, fine-tune, adapt—treating efficiency as a dynamic goal.
Links:
[DevoxxBE2023] How Sand and Java Create the World’s Most Powerful Chips
Johan Janssen, an architect at ASML, captivated the DevoxxBE2023 audience with a deep dive into the intricate process of chip manufacturing and the role of Java in optimizing it. Johan, a seasoned speaker and JavaOne Rock Star, explained how ASML’s advanced lithography machines, powered by Java-based software, enable the creation of cutting-edge computer chips used in devices worldwide.
From Sand to Silicon Wafers
Johan began by demystifying chip production, starting with silica sand, an abundant resource transformed into silicon ingots and sliced into wafers. These wafers, approximately 30 cm in diameter, serve as the foundation for chips, hosting up to 600 chips per wafer or thousands for smaller sensors. He passed around a wafer adorned with Java’s mascot, Duke, illustrating the physical substrate of modern electronics.
The process involves printing multiple layers—up to 200—onto wafers using extreme ultraviolet (EUV) lithography machines. These machines, requiring four Boeing 747s for transport, achieve precision at the nanometer scale, with transistors as small as three nanometers. Johan likened this to driving a car 300 km and retracing the path with only 2 mm deviation, highlighting the extraordinary accuracy required.
The Role of EUV Lithography
Johan detailed the EUV lithography process, where tin droplets are hit by a 40-kilowatt laser to generate plasma at sun-like temperatures, producing EUV light. This light, directed by ultra-flat mirrors, patterns wafers through reticles costing €250,000 each. The process demands cleanroom environments, as even a single dust particle can ruin a chip, and involves continuous calibration to maintain precision across thousands of parameters.
ASML’s machines, some over 30 years old, remain in use for producing sensors and less advanced chips, demonstrating their longevity. Johan also previewed future advancements, such as high numerical aperture (NA) machines, which will enable even smaller transistors, further enhancing chip performance and energy efficiency.
Java-Powered Analytics Platform
At the heart of Johan’s talk was ASML’s Java-based analytics platform, which processes 31 terabytes of data weekly to optimize chip production. Built on Apache Spark, the platform distributes computations across worker nodes, supporting plugins for data ingestion, UI customization, and processing. These plugins allow departments to integrate diverse data types, from images to raw measurements, and support languages like Julia and C alongside Java.
The platform, running on-premise to protect sensitive data, consolidates previously disparate applications, improving efficiency and user experience. Johan highlighted a machine learning use case where the platform increased defect detection from 70% to 92% without slowing production, showcasing Java’s role in handling complex computations.
Challenges and Solutions in Chip Manufacturing
Johan discussed challenges like layer misalignment, which can cause short circuits or defective chips. The platform addresses these by analyzing wafer plots to identify correctable errors, such as adjusting subsequent layers to compensate for misalignments. Non-correctable errors may result in downgrading chips (e.g., from 16 GB to 8 GB RAM), ensuring minimal waste.
He emphasized a pragmatic approach to tool selection, starting with REST endpoints and gradually adopting Kafka for streaming data as needs evolved. Johan also noted ASML’s collaboration with tool maintainers to enhance compatibility, such as improving Spark’s progress tracking for customer feedback.
Future of Chip Manufacturing
Looking ahead, Johan highlighted the industry’s push to diversify chip production beyond Taiwan, driven by geopolitical and economic factors. However, building new factories, or “fabs,” costing $10–20 billion, faces challenges like equipment backlogs and the need for highly skilled operators. ASML’s customer support teams, working alongside clients like Intel, underscore the specialized knowledge required.
Johan concluded by stressing the importance of a forward-looking mindset, with ASML’s roadmap prioritizing innovation over rigid methodologies. This approach, combined with Java’s robustness, ensures the platform’s scalability and adaptability in a rapidly evolving industry.