Posts Tagged ‘PerMinborg’
[DevoxxUK2024] Project Leyden: Capturing Lightning in a Bottle by Per Minborg
Per Minborg, a seasoned member of Oracle’s Core Library team, delivered an insightful session at DevoxxUK2024, unveiling the ambitions of Project Leyden, a transformative initiative to enhance Java application performance. Focused on slashing startup time, accelerating warmup, and reducing memory footprint, Per’s talk explores how Java can evolve to meet modern demands while preserving its dynamic nature. By strategically shifting computations to optimize execution, Project Leyden introduces innovative techniques like condensers and enhanced Class Data Sharing (CDS). This session provides a roadmap for developers seeking to harness Java’s potential in high-performance environments, balancing flexibility with efficiency.
The Vision of Project Leyden
Per begins by outlining the core objectives of Project Leyden: improving startup time, warmup time, and memory footprint. Startup time, the duration from launching an application to its first meaningful output (e.g., a “Hello World” or serving a web request), is critical for user experience. Warmup time, the period until an application reaches peak performance through JIT compilation, can hinder responsiveness in dynamic systems. Footprint, encompassing memory and storage use, impacts scalability, especially in cloud environments. Per emphasizes that the best approach is to eliminate unnecessary computations, but when that’s not feasible, shifting them temporally—either earlier to compile time or later to runtime—can yield significant gains. This philosophy underpins Leyden’s strategy to refine Java’s execution model.
Shifting Computations for Efficiency
A cornerstone of Project Leyden is the concept of temporal computation shifting. Per explains that Java’s dynamic nature—encompassing dynamic class loading, JIT compilation, and runtime optimizations—enables expressive programming but can inflate startup and warmup times. By moving computations to build time, such as through constant folding or ahead-of-time (AOT) compilation, Leyden reduces runtime overhead. Alternatively, lazy evaluation postpones non-critical tasks, streamlining startup. Per introduces condensers, a novel mechanism that transforms program representations by shifting computations earlier, adding metadata, or imposing constraints on dynamism. Condensers are composable, meaning-preserving, and selectable, allowing developers to tailor optimizations based on application needs. For instance, a condenser might precompile lambda expressions into bytecode at build time, slashing runtime costs.
Enhancing Class Data Sharing (CDS)
Per delves into Class Data Sharing (CDS), a long-standing Java feature that Project Leyden enhances to achieve dramatic performance boosts. CDS allows pre-initialized JDK classes to be stored in a file, bypassing costly class loading during startup. With CDS++, Leyden extends this to include application classes, compiled code, and resolved constant pool references. Per shares compelling benchmarks: a test compiling 100 small Java files achieved a 2x startup improvement, while an XML parsing workload saw an 8x boost. For the Spring Pet Clinic benchmark, Leyden’s optimizations, including early class loading and cached compiled code, yielded up to 4x faster startup. These gains stem from a training run approach, where a representative execution gathers profiling data to inform optimizations, ensuring compatibility across platforms.
Balancing Dynamism and Performance
Java’s dynamism—encompassing dynamic typing, class loading, and reflection—empowers developers but complicates optimization. Per proposes selective constraints to balance this trade-off. For example, developers can restrict dynamic class loading for specific modules, enabling aggressive optimizations without sacrificing Java’s flexibility. The stable value feature, initially part of Leyden but now a standalone JEP, allows delayed initialization of final fields while maintaining performance akin to compile-time constants. Per illustrates this with a Fibonacci computation example, where memoization using stable values drastically reduces recursive overhead. By offering a “mixer board” of concessions, Leyden empowers developers to fine-tune performance, ensuring compatibility and preserving program semantics across diverse use cases.
Links:
[DevoxxBE2023] The Panama Dojo: Black Belt Programming with Java 21 and the FFM API by Per Minborg
In an engaging session at Devoxx Belgium 2023, Per Minborg, a Java Core Library team member at Oracle and an OpenJDK contributor, guided attendees through the intricacies of the Foreign Function and Memory (FFM) API, a pivotal component of Project Panama. With a blend of theoretical insights and live coding, Per demonstrated how this API, in its third preview in Java 21, enables seamless interaction with native memory and functions using pure Java code. His talk, dubbed the “Panama Dojo,” showcased the API’s potential to enhance performance and safety, culminating in a hands-on demo of a lightweight microservice framework built with memory segments, arenas, and memory layouts.
Unveiling the FFM API’s Capabilities
Per introduced the FFM API as a solution to the limitations of Java Native Interface (JNI) and direct buffers. Unlike JNI, which requires cumbersome C stubs and inefficient data passing, the FFM API allows direct native memory access and function calls. Per illustrated this with a Point
struct example, where a memory segment models a contiguous memory region with 64-bit addressing, supporting both heap and native segments. This eliminates the 2GB limit of direct buffers, offering greater flexibility and efficiency.
The API introduces memory segments with constraints like size, lifetime, and thread confinement, preventing out-of-bounds access and use-after-free errors. Per highlighted the importance of deterministic deallocation, contrasting Java’s automatic memory management with C’s manual approach. The FFM API’s arenas, such as confined and shared arenas, manage segment lifecycles, ensuring resources are freed explicitly, as demonstrated in a try-with-resources block that deterministically deallocates a segment.
Structuring Memory with Layouts and Arenas
Memory layouts, a key FFM API feature, provide a declarative way to define memory structures, reducing manual offset computations. Per showed how a Point
layout with x
and y
doubles uses var handles to access fields safely, leveraging JIT optimizations for atomic operations. This approach minimizes bugs in complex structs, as var handles inherently account for offsets, unlike manual calculations.
Arenas further enhance safety by grouping segments with shared lifetimes. Per demonstrated a confined arena, restricting access to a single thread, and a shared arena, allowing multi-threaded access with thread-local handshakes for safe closure. These constructs bridge the gap between C’s flexibility and Rust’s safety, offering a balanced model for Java developers. In his live demo, Per used an arena to allocate a MarketInfo
segment, showcasing deterministic deallocation and thread safety.
Building a Persistent Queue with Memory Mapping
The heart of Per’s session was a live coding demo constructing a persistent queue using memory mapping and atomic operations. He defined a MarketInfo
record for stock exchange data, including timestamp, symbol, and price fields. Using a record mapper, Per serialized and deserialized records to and from memory segments, demonstrating immutability and thread safety. The mapper, a potential future JDK feature, simplifies data transfer between Java objects and native memory.
Per then implemented a memory-mapped queue, where a file-backed segment stores headers and payloads. Headers use atomic operations to manage mutual exclusion across threads and JVMs, ensuring safe concurrent access. In the demo, a producer appended MarketInfo
records to the queue, while two consumers read them asynchronously, showcasing low-latency, high-performance data sharing. Per’s use of sparse files allowed a 1MB queue to scale virtually, highlighting the API’s efficiency.
Crafting a Microservice Framework
The session culminated in assembling these components into a microservice framework. Per’s queue, inspired by Chronicle Queue, supports persistent, high-performance data exchange across JVMs. The framework leverages memory mapping for durability, atomic operations for concurrency, and record mappers for clean data modeling. Per demonstrated its practical application by persisting a queue to a file and reading it in a separate JVM, underscoring its robustness for distributed systems.
He emphasized the reusability of these patterns across domains like machine learning and graphics processing, where native libraries are prevalent. Tools like jextract, briefly mentioned, further unlock native libraries like TensorFlow, enabling Java developers to integrate them effortlessly. Per’s framework, though minimal, illustrates how the FFM API can transform Java’s interaction with native code, offering a safer, faster alternative to JNI.
Performance and Safety in Harmony
Throughout, Per stressed the FFM API’s dual focus on performance and safety. Native function calls, faster than JNI, and memory segments with strict constraints outperform direct buffers while preventing common errors. The API’s integration with existing JDK features, like var handles, ensures compatibility and optimization. Per’s live coding, despite its complexity, flowed seamlessly, reinforcing the API’s practicality for real-world applications.
Conclusion: Embracing the Panama Dojo
Per’s session was a masterclass in leveraging the FFM API to push Java’s boundaries. By combining memory segments, layouts, arenas, and atomic operations, he crafted a framework that exemplifies the API’s potential. His call to action—experiment with the FFM API in Java 21—invites developers to explore this transformative tool, promising enhanced performance and safety for native interactions. The Panama Dojo left attendees inspired to break new ground in Java development.