Recent Posts
Archives

Posts Tagged ‘ProjectLoom’

PostHeaderIcon [DevoxxBE2025] Virtual Threads, Structured Concurrency, and Scoped Values: Putting It All Together

Lecturer

Balkrishna Rawool leads IT chapters at ING Bank, focusing on scalable software solutions and Java concurrency. He actively shares insights on Project Loom through conferences and writings, drawing from practical implementations in financial systems.

Abstract

This review dissects Project Loom’s enhancements to Java’s concurrency: virtual threads for efficient multitasking, structured concurrency for task orchestration, and scoped values for secure data sharing. Placed in web development contexts, it explains their interfaces and combined usage via a Spring Boot loan processing app. The evaluation covers integration techniques, traditional threading issues, and effects on legibility, expandability, and upkeep in parallel code.

Project Loom Foundations and Virtual Threads

Project Loom overhauls Java concurrency with lightweight alternatives to OS-bound threads, which limit scale due to overheads. Virtual threads, managed by the JVM, enable vast concurrency on few carriers, ideal for IO-heavy web services.

In the loan app—computing offers via credit, account, and loan calls—virtual threads parallelize without resource strain. Configuring Tomcat to use them boosts TPS from hundreds to thousands, as non-blocking calls unmount threads.

The interface mirrors traditional: Thread.ofVirtual().start(task). Internals use continuations for suspension, allowing carrier reuse. Consequences: lower memory, natural exception flow.

Care needed for pinning: synchronized blocks block carriers; ReentrantLocks avoid this, sustaining performance.

Structured Concurrency for Unified Task Control

Structured concurrency organizes subtasks as cohesive units, addressing executors’ scattering. StructuredTaskScope scopes forks, ensuring completion before progression.

In the app, scoping credit/account/loan forks with ShutdownOnFailure cancels on errors, avoiding leaks. Example:

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    var credit = scope.fork(() -> getCredit(request));
    var account = scope.fork(() -> getAccount(request));
    var loan = scope.fork(() -> calculateLoan(request));
    scope.join();
    // Aggregate
} catch (Exception e) {
    // Manage
}

This ensures orderly shutdowns, contrasting unstructured daemons. Effects: simpler debugging, no dangling tasks.

Scoped Values for Immutable Inheritance

Scoped values supplant ThreadLocals for virtual threads, binding data immutably in scopes. ThreadLocals mutate, risking inconsistencies; scoped values inherit safely.

For request IDs in logs: ScopedValue.where(ID, uuid).run(() -> tasks); IDs propagate to forks via scopes.

Example:

ScopedValue.where(REQ_ID, UUID.randomUUID()).run(() -> {
    // Forks access ID
});

This solves ThreadLocal inefficiencies in Loom. Effects: secure sharing in hierarchies.

Combined Usage and Prospects

Synergies yield maintainable concurrency: virtual threads scale, scopes structure, values share. The app processes concurrently yet organized, IDs tracing.

Effects: higher IO throughput, easier upkeep. Prospects: framework integrations reshaping concurrency.

In overview, Loom’s features enable efficient, readable parallel systems.

Links:

  • Lecture video: https://www.youtube.com/watch?v=iO79VR0zAhQ
  • Balkrishna Rawool on LinkedIn: https://nl.linkedin.com/in/balkrishnarawool
  • Balkrishna Rawool on Twitter/X: https://twitter.com/BalaRawool
  • ING Bank website: https://www.ing.com/

PostHeaderIcon [DevoxxBE2024] The Next Phase of Project Loom and Virtual Threads by Alan Bateman

At Devoxx Belgium 2024, Alan Bateman delivered a comprehensive session on the advancements in Project Loom, focusing on virtual threads and their impact on Java concurrency. As a key contributor to OpenJDK, Alan explored how virtual threads enable high-scale server applications with a thread-per-task model, addressing challenges like pinning, enhancing serviceability, and introducing structured concurrency. His talk provided practical insights into leveraging virtual threads for simpler, more scalable code, while detailing ongoing improvements in JDK 24 and beyond.

Understanding Virtual Threads and Project Loom

Project Loom, a transformative initiative in OpenJDK, aims to enhance concurrency in Java by introducing virtual threads—lightweight, user-mode threads that support a thread-per-task model. Unlike traditional platform threads, which are resource-intensive and often pooled, virtual threads are cheap, allowing millions to run within a single JVM. Alan emphasized that virtual threads enable developers to write simple, synchronous, blocking code that is easy to read and debug, avoiding the complexity of reactive or asynchronous models. Finalized in JDK 21 after two preview releases, virtual threads have been widely adopted by frameworks like Spring and Quarkus, with performance and reliability proving robust, though challenges like pinning remain.

The Pinning Problem and Its Resolution

A significant pain point with virtual threads is “pinning,” where a virtual thread cannot unmount from its carrier thread during blocking operations within synchronized methods or blocks, hindering scalability. Alan detailed three scenarios causing pinning: blocking inside synchronized methods, contention on synchronized methods, and object wait/notify operations. These can lead to scalability issues or even deadlocks if all carrier threads are pinned. JEP 444 acknowledged this as a quality-of-implementation issue, not a flaw in the synchronized keyword itself. JEP 491, currently in Early Access for JDK 24, addresses this by allowing carrier threads to be released during such operations, eliminating the need to rewrite code to use java.util.concurrent.locks.ReentrantLock. Alan urged developers to test these Early Access builds to validate reliability and performance, noting successful feedback from initial adopters.

Enhancing Serviceability for Virtual Threads

With millions of virtual threads in production, diagnosing issues is critical. Alan highlighted improvements in serviceability tools, such as thread dumps that now distinguish carrier threads and include stack traces for mounted virtual threads in JDK 24. A new JSON-based thread dump format, introduced with virtual threads, supports parsing for visualization and preserves thread groupings, aiding debugging of complex applications. For pinning, JFR (Java Flight Recorder) events now capture stack traces when blocking occurs in synchronized methods, with expanded support for FFM and JNI in JDK 24. Heap dumps in JDK 23 include unmounted virtual thread stacks, and new JMX-based monitoring interfaces allow dynamic inspection of the virtual thread scheduler, enabling fine-tuned control over parallelism.

Structured Concurrency: Simplifying Concurrent Programming

Structured concurrency, a preview feature in JDK 21–23, addresses the complexity of managing concurrent tasks. Alan presented a motivating example of aggregating data from a web service and a database, comparing sequential and concurrent approaches using thread pools. Traditional thread pools with Future.get() can lead to leaks or wasted cycles if tasks fail, requiring complex cancellation logic. The StructuredTaskScope API simplifies this by ensuring all subtasks complete before the main task proceeds, using a single join method to wait for results. If a subtask fails, others are canceled, preventing leaks and preserving task relationships in a tree-like structure. An improved API in Loom Early Access builds, planned for JDK 24 preview, introduces static factory methods and streamlined exception handling, making structured concurrency a powerful complement to virtual threads.

Future Directions and Community Engagement

Alan outlined Project Loom’s roadmap, focusing on JEP 491 for pinning resolution, enhanced diagnostics, and structured concurrency’s evolution. He emphasized that virtual threads are not a performance boost for individual methods but excel in scalability through sheer numbers. Misconceptions, like replacing all platform threads with virtual threads or pooling them, were debunked, urging developers to focus on task migration. Structured concurrency’s simplicity aligns with virtual threads’ lightweight nature, promising easier debugging and maintenance. Alan encouraged feedback on Early Access builds for JEP 491 and structured concurrency (JEP 480), highlighting their importance for production reliability. Links to JEP 444, JEP 491, and JEP 480 provide further details for developers eager to explore.

Links:

PostHeaderIcon [DevoxxGR2024] Butcher Virtual Threads Like a Pro at Devoxx Greece 2024 by Piotr Przybyl

Piotr Przybyl, a Java Champion and developer advocate at Elastic, captivated audiences at Devoxx Greece 2024 with a dynamic exploration of Java 21’s virtual threads. Through vivid analogies, practical demos, and a touch of humor, Piotr demystified virtual threads, highlighting their potential and pitfalls. His talk, rich with real-world insights, offered developers a guide to leveraging this transformative feature while avoiding common missteps. As a seasoned advocate for technologies like Elasticsearch and Testcontainers, Piotr’s presentation was a masterclass in navigating modern Java concurrency.

Understanding Virtual Threads

Piotr began by contextualizing virtual threads within Java’s concurrency evolution. Introduced in Java 21 under Project Loom, virtual threads address the limitations of traditional platform threads, which are costly to create and limited in number. Unlike platform threads, virtual threads are lightweight, managed by a scheduler that mounts and unmounts them from carrier threads during I/O operations. This enables a thread-per-request model, scaling applications to handle millions of concurrent tasks. Piotr likened virtual threads to taxis in a busy city like Athens, efficiently transporting passengers (tasks) without occupying resources during idle periods.

However, virtual threads are not a universal solution. Piotr emphasized that they do not inherently speed up individual requests but improve scalability by handling more concurrent tasks. Their API remains familiar, aligning with existing thread practices, making adoption seamless for developers accustomed to Java’s threading model.

Common Pitfalls and Pinning

A central theme of Piotr’s talk was “pinning,” a performance issue where virtual threads remain tied to carrier threads, negating benefits. Pinning occurs during I/O or native calls within synchronized blocks, akin to keeping a taxi running during a lunch break. Piotr demonstrated this with a legacy Elasticsearch client, using Testcontainers and Toxiproxy to simulate slow network calls. By enabling tracing with flags like -J-DTracePinnThreads, He identified and resolved pinning issues, replacing synchronized methods with modern, non-blocking clients.

Piotr cautioned against misuses like thread pooling or reusing virtual threads, which disrupt their lightweight design. He advocated for careful monitoring using JFR events to ensure threads remain unpinned, ensuring optimal performance in production environments.

Structured Concurrency and Scope Values

Piotr explored structured concurrency, a preview feature in Java 21, designed to eliminate thread leaks and cancellation delays. By creating scopes that manage forks, developers can ensure tasks complete or fail together, simplifying error handling. He demonstrated a shutdown-on-failure scope, where a single task failure cancels all others, contrasting this with the complexity of managing interdependent futures.

Scope Values, another preview feature, offer immutable, one-way thread locals to prevent bugs like data leakage in thread pools. Piotr illustrated their use in maintaining request context, warning against mutability to preserve reliability. These features, he argued, complement virtual threads, fostering robust, maintainable concurrent applications.

Practical Debugging and Best Practices

Through live coding, Piotr showcased how debugging with logging can inadvertently introduce I/O, unmounting virtual threads and degrading performance. He compared this to a concert where logging scatters tasks, reducing completion rates. To mitigate this, he recommended avoiding I/O in critical paths and using structured concurrency for monitoring.

Piotr’s best practices included using framework-specific annotations (e.g., Quarkus, Spring) to enable virtual threads and ensuring tasks are interruptible. He urged developers to test thoroughly, leveraging tools like Testcontainers to simulate real-world conditions. His blog post on testing unpinned threads provides further guidance for practitioners.

Conclusion

Piotr’s presentation was a clarion call to embrace virtual threads with enthusiasm and caution. By understanding their mechanics, avoiding pitfalls like pinning, and leveraging structured concurrency, developers can unlock unprecedented scalability. His engaging analogies and practical demos made complex concepts accessible, empowering attendees to modernize Java applications responsibly. As Java evolves, Piotr’s insights ensure developers remain equipped to navigate its concurrency landscape.

Links:

PostHeaderIcon [SpringIO2024] Continuations: The Magic Behind Virtual Threads in Java by Balkrishna Rawool @ Spring I/O 2024

At Spring I/O 2024 in Barcelona, Balkrishna Rawool, a software engineer at ING Bank, captivated attendees with an in-depth exploration of continuations, the underlying mechanism powering Java’s virtual threads. Introduced as a final feature in Java 21 under Project Loom, virtual threads promise unprecedented scalability for Java applications. Balkrishna’s session demystified how continuations enable this scalability by allowing programs to pause and resume execution, offering a deep dive into their mechanics and practical applications.

Understanding Virtual Threads

Virtual threads, a cornerstone of Project Loom, are lightweight user threads designed to enhance scalability in Java applications. Unlike platform threads, which map directly to operating system threads and are resource-intensive, virtual threads require minimal memory, enabling developers to create millions without significant overhead. Balkrishna illustrated this by comparing platform threads, often pooled due to their cost, to virtual threads, which are created and discarded as needed, avoiding pooling anti-patterns. He emphasized that virtual threads rely on platform threads—termed carrier threads—for execution, with a scheduler mounting and unmounting them dynamically. This mechanism ensures efficient CPU utilization, particularly in I/O-bound applications where threads spend considerable time waiting, thus boosting scalability.

The Power of Continuations

Continuations, the core focus of Balkrishna’s talk, are objects that represent a program’s current state or the “rest” of its computation. They allow developers to pause a program’s execution and resume it later, a capability critical to virtual threads’ efficiency. Using Java’s Continuation API, Balkrishna demonstrated how continuations pause execution via the yield method, transferring control back to the caller, and resume via the run method. He showcased this with a simple example where a continuation printed values, paused at specific points, and resumed, highlighting the manipulation of the call stack to achieve this control transfer. Although the Continuation API is not intended for direct application use, understanding it provides insight into virtual threads’ behavior and scalability.

Building a Generator with Continuations

To illustrate continuations’ versatility, Balkrishna implemented a generator—a data structure yielding values lazily—using only the Continuation API, eschewing Java’s streams or iterators. Generators are ideal for resource-intensive computations, producing values only when needed. In his demo, Balkrishna created a generator yielding strings (“a,” “b,” “c”) by defining a Source object to handle value yields and pauses via continuations. The generator paused after each yield, allowing consumers to iterate over values in a loop, demonstrating how continuations enable flexible control flow beyond virtual threads, applicable to constructs like coroutines or exception handling.

Crafting a Simple Virtual Thread

In the session’s climax, Balkrishna guided attendees through implementing a simplified virtual thread class using continuations. The custom virtual thread paused execution during blocking operations, freeing platform threads, and supported a many-to-many relationship with carrier threads. He introduced a scheduler to manage virtual threads on a fixed pool of platform threads, using a queue for first-in-first-out scheduling. A demo with thousands of virtual threads, each simulating blocking calls, outperformed an equivalent platform-thread implementation, underscoring virtual threads’ scalability. By leveraging scoped values and timers, Balkrishna ensured accurate thread identification and resumption, providing a clear, hands-on understanding of virtual threads’ mechanics.

Links:

PostHeaderIcon [DevoxxFR2012] Optimizing Resource Utilization: A Deep Dive into JVM, OS, and Hardware Interactions

Lecturers

Ben Evans and Martijn Verburg are titans of the Java performance community. Ben, co-author of The Well-Grounded Java Developer and a Java Champion, has spent over a decade dissecting JVM internals, GC algorithms, and hardware interactions. Martijn, known as the “Diabolical Developer,” co-leads the London Java User Group, serves on the JCP Executive Committee, and advocates for developer productivity and open-source tooling. Together, they have shaped modern Java performance practices through books, tools, and conference talks that bridge the gap between application code and silicon.

Abstract

This exhaustive exploration revisits Ben Evans and Martijn Verburg’s seminal 2012 DevoxxFR presentation on JVM resource utilization, expanding it with a decade of subsequent advancements. The core thesis remains unchanged: Java’s “write once, run anywhere” philosophy comes at the cost of opacity—developers deploy applications across diverse hardware without understanding how efficiently they consume CPU, memory, power, or I/O. This article dissects the three-layer stackJVM, Operating System, and Hardware—to reveal how Java applications interact with modern CPUs, memory hierarchies, and power management systems. Through diagnostic tools (jHiccup, SIGAR, JFR), tuning strategies (NUMA awareness, huge pages, GC selection), and cloud-era considerations (vCPU abstraction, noisy neighbors), it provides a comprehensive playbook for achieving 90%+ CPU utilization and minimal power waste. Updated for 2025, this piece incorporates ZGC’s generational mode, Project Loom’s virtual threads, ARM Graviton processors, and green computing initiatives, offering a forward-looking vision for sustainable, high-performance Java in the cloud.

The Abstraction Tax: Why Java Hides Hardware Reality

Java’s portability is its greatest strength and its most significant performance liability. The JVM abstracts away CPU architecture, memory layout, and power states to ensure identical behavior across x86, ARM, and PowerPC. But this abstraction hides critical utilization metrics:
– A Java thread may appear busy but spend 80% of its time in GC pause or context switching.
– A 64-core server running 100 Java processes might achieve only 10% aggregate CPU utilization due to lock contention and GC thrashing.
– Power consumption in data centers—8% of U.S. electricity in 2012, projected at 13% by 2030—is driven by underutilized hardware.

Ben and Martijn argue that visibility is the prerequisite for optimization. Without knowing how resources are used, tuning is guesswork.

Layer 1: The JVM – Where Java Meets the Machine

The HotSpot JVM is a marvel of adaptive optimization, but its default settings prioritize predictability over peak efficiency.

Garbage Collection: The Silent CPU Thief

GC is the largest source of CPU waste in Java applications. Even “low-pause” collectors like CMS introduce stop-the-world phases that halt all application threads.

// Example: CMS GC log
[GC (CMS Initial Mark) 1024K->768K(2048K), 0.0123456 secs]
[Full GC (Allocation Failure) 1800K->1200K(2048K), 0.0987654 secs]

Martijn demonstrates how a 10ms pause every 100ms reduces effective CPU capacity by 10%. In 2025, ZGC and Shenandoah achieve sub-millisecond pauses even at 1TB heaps:

-XX:+UseZGC -XX:ZCollectionInterval=100

JIT Compilation and Code Cache

The JIT compiler generates machine code on-the-fly, but code cache eviction under memory pressure forces recompilation:

-XX:ReservedCodeCacheSize=512m -XX:+PrintCodeCache

Ben recommends tiered compilation (-XX:+TieredCompilation) to balance warmup and peak performance.

Threading and Virtual Threads (2025 Update)

Traditional Java threads map 1:1 to OS threads, incurring 1MB stack overhead and context switch costs. Project Loom introduces virtual threads in Java 21:

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    IntStream.range(0, 100_000).forEach(i -> 
        executor.submit(() -> blockingIO()));
}

This enables millions of concurrent tasks with minimal OS overhead, saturating CPU without thread explosion.

Layer 2: The Operating System – Scheduler, Memory, and Power

The OS mediates between JVM and hardware, introducing scheduling, caching, and power management policies.

CPU Scheduling and Affinity

Linux’s CFS scheduler fairly distributes CPU time, but noisy neighbors in multi-tenant environments cause jitter. CPU affinity pins JVMs to cores:

taskset -c 0-7 java -jar app.jar

In NUMA systems, memory locality is critical:

// JNA call to sched_setaffinity

Memory Management: RSS vs. USS

Resident Set Size (RSS) includes shared libraries, inflating perceived usage. Unique Set Size (USS) is more accurate:

smem -t -k -p <pid>

Huge pages reduce TLB misses:

-XX:+UseLargePages -XX:LargePageSizeInBytes=2m

Power Management: P-States and C-States

CPUs dynamically adjust frequency (P-states) and enter sleep (C-states). Java has no direct control, but busy spinning prevents deep sleep:

-XX:+AlwaysPreTouch -XX:+UseNUMA

Layer 3: The Hardware – Cores, Caches, and Power

Modern CPUs are complex hierarchies of cores, caches, and interconnects.

Cache Coherence and False Sharing

Adjacent fields in objects can reside on the same cache line, causing false sharing:

class Counters {
    volatile long c1; // cache line 1
    volatile long c2; // same cache line!
}

Padding or @Contended (Java 8+) resolves this:

@Contended
public class PaddedLong { public volatile long value; }

NUMA and Memory Bandwidth

Non-Uniform Memory Access means local memory is 2–3x faster than remote. JVMs should bind threads to NUMA nodes:

numactl --cpunodebind=0 --membind=0 java -jar app.jar

Diagnostics: Making the Invisible Visible

jHiccup: Measuring Pause Times

java -jar jHiccup.jar -i 1000 -w 5000

Generates histograms of application pauses, revealing GC and OS scheduling hiccups.

Java Flight Recorder (JFR)

-XX:StartFlightRecording=duration=60s,filename=app.jfr

Captures CPU, GC, I/O, and lock contention with <1% overhead.

async-profiler and Flame Graphs

./profiler.sh -e cpu -d 60 -f flame.svg <pid>

Visualizes hot methods and inlining decisions.

Cloud and Green Computing: The Ultimate Utilization Challenge

In cloud environments, vCPUs are abstractions—often half-cores with hyper-threading. Noisy neighbors cause 50%+ variance in performance.

Green Computing Initiatives

  • Facebook’s Open Compute Project: 38% more efficient servers.
  • Google’s Borg: 90%+ cluster utilization via bin packing.
  • ARM Graviton3: 20% better perf/watt than x86.

Spot Markets for Compute (2025 Vision)

Ben and Martijn foresee a commodity market for compute cycles, enabled by:
Live migration via CRIU.
Standardized pricing (e.g., $0.001 per CPU-second).
Java’s portability as the ideal runtime.

Conclusion: Toward a Sustainable Java Future

Evans and Verburg’s central message endures: Utilization is a systems problem. Achieving 90%+ CPU efficiency requires coordination across JVM tuning, OS configuration, and hardware awareness. In 2025, tools like ZGC, Loom, and JFR have made this more achievable than ever, but the principles remain:
Measure everything (JFR, async-profiler).
Tune aggressively (GC, NUMA, huge pages).
Design for the cloud (elastic scaling, spot instances).

By making the invisible visible, Java developers can build faster, cheaper, and greener applications—ensuring Java’s dominance in the cloud-native era.

Links