Recent Posts
Archives

Posts Tagged ‘Concurrency’

PostHeaderIcon [DevoxxGR2024] Butcher Virtual Threads Like a Pro at Devoxx Greece 2024 by Piotr Przybyl

Piotr Przybyl, a Java Champion and developer advocate at Elastic, captivated audiences at Devoxx Greece 2024 with a dynamic exploration of Java 21’s virtual threads. Through vivid analogies, practical demos, and a touch of humor, Piotr demystified virtual threads, highlighting their potential and pitfalls. His talk, rich with real-world insights, offered developers a guide to leveraging this transformative feature while avoiding common missteps. As a seasoned advocate for technologies like Elasticsearch and Testcontainers, Piotr’s presentation was a masterclass in navigating modern Java concurrency.

Understanding Virtual Threads

Piotr began by contextualizing virtual threads within Java’s concurrency evolution. Introduced in Java 21 under Project Loom, virtual threads address the limitations of traditional platform threads, which are costly to create and limited in number. Unlike platform threads, virtual threads are lightweight, managed by a scheduler that mounts and unmounts them from carrier threads during I/O operations. This enables a thread-per-request model, scaling applications to handle millions of concurrent tasks. Piotr likened virtual threads to taxis in a busy city like Athens, efficiently transporting passengers (tasks) without occupying resources during idle periods.

However, virtual threads are not a universal solution. Piotr emphasized that they do not inherently speed up individual requests but improve scalability by handling more concurrent tasks. Their API remains familiar, aligning with existing thread practices, making adoption seamless for developers accustomed to Java’s threading model.

Common Pitfalls and Pinning

A central theme of Piotr’s talk was “pinning,” a performance issue where virtual threads remain tied to carrier threads, negating benefits. Pinning occurs during I/O or native calls within synchronized blocks, akin to keeping a taxi running during a lunch break. Piotr demonstrated this with a legacy Elasticsearch client, using Testcontainers and Toxiproxy to simulate slow network calls. By enabling tracing with flags like -J-DTracePinnThreads, He identified and resolved pinning issues, replacing synchronized methods with modern, non-blocking clients.

Piotr cautioned against misuses like thread pooling or reusing virtual threads, which disrupt their lightweight design. He advocated for careful monitoring using JFR events to ensure threads remain unpinned, ensuring optimal performance in production environments.

Structured Concurrency and Scope Values

Piotr explored structured concurrency, a preview feature in Java 21, designed to eliminate thread leaks and cancellation delays. By creating scopes that manage forks, developers can ensure tasks complete or fail together, simplifying error handling. He demonstrated a shutdown-on-failure scope, where a single task failure cancels all others, contrasting this with the complexity of managing interdependent futures.

Scope Values, another preview feature, offer immutable, one-way thread locals to prevent bugs like data leakage in thread pools. Piotr illustrated their use in maintaining request context, warning against mutability to preserve reliability. These features, he argued, complement virtual threads, fostering robust, maintainable concurrent applications.

Practical Debugging and Best Practices

Through live coding, Piotr showcased how debugging with logging can inadvertently introduce I/O, unmounting virtual threads and degrading performance. He compared this to a concert where logging scatters tasks, reducing completion rates. To mitigate this, he recommended avoiding I/O in critical paths and using structured concurrency for monitoring.

Piotr’s best practices included using framework-specific annotations (e.g., Quarkus, Spring) to enable virtual threads and ensuring tasks are interruptible. He urged developers to test thoroughly, leveraging tools like Testcontainers to simulate real-world conditions. His blog post on testing unpinned threads provides further guidance for practitioners.

Conclusion

Piotr’s presentation was a clarion call to embrace virtual threads with enthusiasm and caution. By understanding their mechanics, avoiding pitfalls like pinning, and leveraging structured concurrency, developers can unlock unprecedented scalability. His engaging analogies and practical demos made complex concepts accessible, empowering attendees to modernize Java applications responsibly. As Java evolves, Piotr’s insights ensure developers remain equipped to navigate its concurrency landscape.

Links:

PostHeaderIcon [PHPForumParis2021] Fiber: The Gateway to Asynchronous PHP – Benoit Viguier

Benoit Viguier, a developer at Bedrock, enthralled the Forum PHP 2021 audience with an exploration of PHP 8.1’s Fiber feature, a groundbreaking step toward asynchronous programming. With a history of discussing async development at AFUP events, Benoit shared early experiments with Fibers, positioning them as a future cornerstone of PHP. His talk blended technical insight with forward-thinking optimism, urging developers to embrace this new paradigm. This post covers three themes: understanding Fibers, practical applications, and the need for standards.

Understanding Fibers

Benoit Viguier introduced Fibers as a low-level feature in PHP 8.1, enabling lightweight, cooperative concurrency. Unlike traditional threading, Fibers allow developers to pause and resume execution without blocking the main thread, ideal for I/O-heavy tasks. Drawing on his work at Bedrock, Benoit explained how Fibers extend PHP’s async capabilities, building on libraries like Amphp and ReactPHP. His clear explanation demystified this cutting-edge feature for the audience.

Practical Applications

Delving into practical use cases, Benoit showcased how Fibers enhance performance in applications like Bedrock’s streaming platforms, such as 6play and Salto. By enabling non-blocking HTTP requests and database queries, Fibers reduce latency and improve user experience. Benoit shared early experiments, noting that while Fibers are not yet production-ready, their potential to streamline async workflows is immense, particularly for high-traffic systems requiring real-time responsiveness.

The Need for Standards

Benoit concluded by advocating for a standardized async ecosystem in PHP. He highlighted recent collaborations between Amphp and ReactPHP teams to propose a PSR standard for Fibers, fostering interoperability. By making libraries “Fiber-ready,” developers can create reusable, non-blocking APIs. Benoit’s vision for a unified async framework, inspired by his work at Bedrock, positions Fibers as a potential “killer feature” for PHP, encouraging community contributions to shape its future.

Links:

PostHeaderIcon [KotlinConf2018] Mastering Concurrency: Roman Elizarov’s Practical Guide to Kotlin Coroutines

Lecturer

Roman Elizarov is a seasoned software developer at JetBrains, with over 17 years of experience in high-performance trading software at Devexperts. An expert in Java and JVM, he teaches concurrent programming at St. Petersburg ITMO and serves as Chief Judge for the Northeastern European Regional Programming Contest. Relevant links: JetBrains Blog (publications); LinkedIn Profile (professional page).

Abstract

This article follows Roman Elizarov’s practical application of Kotlin coroutines to address concurrency challenges. Set in the context of large-scale systems, it examines methodologies for state confinement and communication via channels. The analysis highlights coroutines’ innovations in eliminating shared mutable state, with implications for robust, scalable architectures.

Introduction and Context

Roman Elizarov engaged KotlinConf 2018 attendees with a deep dive into coroutines, building on his prior introductory talk. With a vision for a unified language across distributed systems, Elizarov showcased coroutines as a solution to concurrency without shared mutable state. His examples addressed real-life coordination, set against his experience with high-throughput trading systems processing millions of events per second.

Methodological Approaches to Concurrency

Elizarov demonstrated coroutines confining state to single coroutines, communicating via channels. Each coroutine handles a specific task, receiving input and sending output through channels, avoiding locks. For UI integration, coroutines on the main thread directly update views or report via channels for decoupled architectures. Builders like launch and async orchestrate tasks, while suspend functions enable non-blocking code.

Analysis of Innovations and Features

Coroutines innovate by simplifying async programming. Channels provide fan-out communication, unlike threads’ shared state. Compared to Java’s CompletableFuture, coroutines preserve sequential code structure. Limitations include a learning curve for channel patterns and ensuring proper context management.

Implications and Consequences

Elizarov’s approach implies cleaner, safer concurrency models, reducing bugs in complex systems. It suits UI-driven apps and distributed systems, enhancing scalability. The consequence is a shift toward channel-based designs, though teams must master coroutine semantics.

Conclusion

Elizarov’s practical guide positions coroutines as a cornerstone for modern concurrency, offering a robust alternative to traditional threading models.

Links

PostHeaderIcon [DevoxxFR2013] Dispelling Performance Myths in Ultra-High-Throughput Systems

Lecturer

Martin Thompson stands as a preeminent authority in high-performance and low-latency engineering, having accumulated over two decades of expertise across transactional and big-data realms spanning automotive, gaming, financial, mobile, and content management sectors. As co-founder and former CTO of LMAX, he now consults globally, championing mechanical sympathy—the harmonious alignment of software with underlying hardware—to craft elegant, high-velocity solutions. His Disruptor framework exemplifies this philosophy.

Abstract

Martin Thompson systematically dismantles entrenched performance misconceptions through rigorous empirical analysis derived from extreme low-latency environments. Spanning Java and C implementations, third-party libraries, concurrency primitives, and operating system interactions, he promulgates a “measure everything” ethos to illuminate genuine bottlenecks. The discourse dissects garbage collection behaviors, logging overheads, parsing inefficiencies, and hardware utilization, furnishing actionable methodologies to engineer systems delivering millions of operations per second at microsecond latencies.

The Primacy of Empirical Validation: Profiling as the Arbiter of Truth

Thompson underscores that anecdotal wisdom often misleads in performance engineering. Comprehensive profiling under production-representative workloads unveils counterintuitive realities, necessitating continuous measurement with tools like perf, VTune, and async-profiler.

He categorizes fallacies into language-specific, library-induced, concurrency-related, and infrastructure-oriented myths, each substantiated by real-world benchmarks.

Garbage Collection Realities: Tuning for Predictability Over Throughput

A pervasive myth asserts that garbage collection pauses are an inescapable tax, best mitigated by throughput-oriented collectors. Thompson counters that Concurrent Mark-Sweep (CMS) consistently achieves sub-10ms pauses in financial trading systems, whereas G1 frequently doubles minor collection durations due to fragmented region evacuation and reference spidering in cache structures.

Strategic heap sizing to accommodate young generation promotion, coupled with object pooling on critical paths, minimizes pause variability. Direct ByteBuffers, often touted for zero-copy I/O, incur kernel transition penalties; heap-allocated buffers prove superior for modest payloads.

Code-Level Performance Traps: Parsing, Logging, and Allocation Patterns

Parsing dominates CPU cycles in message-driven architectures. XML and JSON deserialization routinely consumes 30-50% of processing time; binary protocols with zero-copy parsers slash this overhead dramatically.

Synchronous logging cripples latency; asynchronous, lock-free appenders built atop ring buffers sustain millions of events per second. Thompson’s Disruptor-based logger exemplifies this, outperforming traditional frameworks by orders of magnitude.

Frequent object allocation triggers premature promotions and GC pressure. Flyweight patterns, preallocation, and stack confinement eliminate heap churn on hot paths.

Concurrency Engineering: Beyond Thread Proliferation

The notion that scaling threads linearly accelerates execution collapses under context-switching and contention costs. Thompson advocates thread affinity to physical cores, aligning counts with hardware topology.

Contented locks serialize execution; lock-free algorithms leveraging compare-and-swap (CAS) preserve parallelism. False sharing—cache line ping-pong between adjacent variables—devastates throughput; 64-byte padding ensures isolation.

Infrastructure Optimization: OS, Network, and Storage Synergy

Operating system tuning involves interrupt coalescing, huge pages to reduce TLB misses, and scheduler affinity. Network kernel bypass (e.g., Solarflare OpenOnload) shaves microseconds from round-trip times.

Storage demands asynchronous I/O and batching; fsync calls must be minimized or offloaded to dedicated threads. SSD sequential writes eclipse HDDs, but random access patterns require careful buffering.

Cultural and Methodological Shifts for Sustained Performance

Thompson exhorts engineering teams to institutionalize profiling, automate benchmarks, and challenge assumptions relentlessly. The Disruptor’s single-writer principle, mechanical sympathy, and batching yield over six million operations per second on commodity hardware.

Performance is not an afterthought but an architectural cornerstone, demanding cross-disciplinary hardware-software coherence.

Links:

PostHeaderIcon [DevoxxFR2013] Java EE 7 in Detail

Lecturer

David Delabassee is a Principal Product Manager in Oracle’s GlassFish team. Previously at Sun for a decade, he focused on end-to-end Java, related technologies, and tools. Based in Belgium, he contributes to Devoxx Belgium’s steering committee.

Abstract

David Delabassee’s overview details Java EE 7’s innovations, emphasizing developer simplicity and HTML5 support. Covering WebSockets, JSON-P, JAX-RS 2, JMS 2, concurrency, caching, and batch processing, he demonstrates features via GlassFish. The analysis explores alignments with modern needs like cloud and modularity, implications for productivity, and forward compatibility.

Evolution and Key Themes: Simplifying Development and Embracing Modern Web

Delabassee notes Java EE 6’s (2009) popularity, with widespread server adoption. Java EE 7, nearing finalization, builds on this via JCP, comprising 13 updated, 4 new specs.

Themes: ease of development (defaults, pruning), web enhancements (HTML5 via WebSockets), alignment with trends (cloud, multi-tenancy). Pruning removes outdated techs like EJB CMP; new APIs address gaps.

GlassFish 4, the reference implementation, enables early testing. Delabassee demos features, stressing community feedback.

Core API Enhancements: WebSockets, JSON, and REST Improvements

WebSocket (JSR 356): Enables full-duplex, bidirectional communication over single TCP. Annotate endpoints (@ServerEndpoint), handle messages (@OnMessage).

@ServerEndpoint("/echo")
public class EchoEndpoint {
    @OnMessage
    public void echo(String message, Session session) {
        session.getBasicRemote().sendText(message);
    }
}

JSON-P (JSR 353): Parsing/processing API with streaming, object models. Complements JAX-RS for RESTful services.

JAX-RS 2 (JSR 339): Client API, filters/interceptors, async support. Client example:

Client client = ClientBuilder.newClient();
WebTarget target = client.target("http://example.com");
Response response = target.request().get();

These foster efficient, modern web apps.

Messaging and Concurrency: JMS 2 and Utilities for EE

JMS 2 simplifies: annotation-based injection (@JMSConnectionFactory), simplified API for sending/receiving.

@Inject
JMSContext context;

@Resource(lookup="myQueue")
Queue queue;

context.send(queue, "message");

Concurrency Utilities (JSR 236): Managed executors, scheduled tasks in EE context. Propagate context to threads, avoiding direct Thread creation.

Batch Applications (JSR 352): Framework for chunk/step processing, job management. XML-defined jobs with readers, processors, writers.

Additional Features and Future Outlook: Caching, CDI, and Java EE 8

Though JCache (JSR 107) deferred, it enables standardized distributed caching, usable on EE 7.

CDI 1.1 enhances: @Vetoed for exclusions, alternatives activation.

Java EE 8 plans: modularity, cloud (PaaS/SaaS), further HTML5. Community shapes via surveys.

Delabassee urges Adopt-a-JSR participation for influence.

Implications for Enterprise Development: Productivity and Adaptability

Java EE 7 boosts productivity via simplifications, aligns with web/cloud via new APIs. Demos show practical integration, like WebSocket chats or batch jobs.

Challenges: Learning curve for new features; benefits outweigh via robust, scalable apps.

Forward, EE 7 paves for EE 8’s evolutions, ensuring Java’s enterprise relevance.

Links:

PostHeaderIcon [DevoxxFR2013] NIO, Not So Simple?

Lecturer

Emmanuel Lecharny is a member of the Apache Software Foundation, contributing to projects like Apache Directory Server and Apache MINA. He also mentors incubating projects such as Deft and Syncope. As founder of his own company, he collaborates on OpenLDAP development through partnerships.

Abstract

Emmanuel Lecharny’s presentation delves into the intricacies of network input/output (NIO) in Java, contrasting it with blocking I/O (BIO) and asynchronous I/O (AIO). Through detailed explanations and code examples, he explores concurrency management, scalability, encoding/decoding, and performance in building efficient servers using Apache MINA. The talk emphasizes practical challenges and solutions, advocating framework use to simplify complex implementations while highlighting system-level considerations like buffers and selectors.

Fundamentals of I/O Models: BIO, NIO, and AIO Compared

Lecharny begins by outlining the three primary I/O paradigms in Java: blocking I/O (BIO), non-blocking I/O (NIO), and asynchronous I/O (AIO). BIO, the traditional model, assigns a thread per connection, blocking until data arrives. This simplicity suits low-connection scenarios but falters under high load, as threads consume resources—up to 1MB stack each—leading to context switching overhead.

NIO introduces selectors and channels, enabling a single thread to monitor multiple connections via events like OP_READ or OP_WRITE. This non-blocking approach scales better, handling thousands of connections without proportional threads. However, it requires manual state management, as partial reads/writes necessitate buffering.

AIO, added in Java 7, builds on NIO with callbacks or futures for completion notifications, reducing polling needs. Yet, it demands careful handler design to avoid blocking the callback thread, often necessitating additional threading for processing.

These models address concurrency differently: BIO is straightforward but resource-intensive; NIO offers efficiency through event-driven multiplexing; AIO provides true asynchrony but with added complexity in callback handling.

Building Scalable Servers with Apache MINA: Core Components and Configuration

Apache MINA simplifies NIO/AIO development by abstracting low-level details. Lecharny demonstrates a basic UDP server: instantiate IoAcceptor, bind to a port, and set a handler for messages. The framework manages buffers, threading, and protocol encoding/decoding.

Key components include IoService (for acceptors/connectors), IoHandler (for events like messageReceived), and filters (e.g., logging, protocol codecs). Configuration involves thread pools: one for I/O (typically one thread suffices due to selectors), another for application logic to prevent blocking.

Scalability hinges on proper setup: use direct buffers for large data to avoid JVM heap copies, but heap buffers for small payloads in Java 7 for speed. MINA’s executor filter offloads heavy computations, maintaining responsiveness.

Code example:

DatagramAcceptor acceptor = new NioDatagramAcceptor();
acceptor.setHandler(new MyHandler());
SocketAddress address = new InetSocketAddress(port);
acceptor.bind(address);

This binds a UDP acceptor, ready for incoming datagrams.

Handling Data: Encoding, Decoding, and Buffer Management

Encoding/decoding is pivotal; MINA’s ProtocolCodecFilter uses encoders/decoders for byte-to-object conversion. Lecharny explains cumulative decoding for fragmented messages: maintain a buffer, append incoming data, and decode when complete (e.g., via length prefixes).

Buffers in NIO are crucial: ByteBuffer for data storage, with position, limit, and capacity. Direct buffers (allocateDirect) bypass JVM heap for zero-copy I/O, ideal for large transfers, but allocation is costlier. Heap buffers (allocate) are faster for small sizes.

Performance tests show Java 7 heap buffers outperforming direct ones up to 64KB; beyond, direct excels. UDP limits (64KB max) favor heap buffers.

Partial writes require looping until completion, tracking written bytes. MINA abstracts this, but understanding underlies effective use.

public class LengthPrefixedDecoder extends CumulativeProtocolDecoder {
    protected boolean doDecode(IoSession session, IoBuffer in, ProtocolDecoderOutput out) {
        if (in.remaining() < 4) return false;
        int length = in.getInt();
        if (in.remaining() < length) return false;
        // Decode data
        return true;
    }
}

This decoder checks for complete messages via prefixed length.

Concurrency and Performance Optimization in High-Load Scenarios

Concurrency management involves separating I/O from processing: MINA’s single I/O thread uses selectors for event polling, dispatching to worker pools. Avoid blocking in handlers; use executors for database queries or computations.

Scalability tests: on a quad-core machine, MINA handles 10,000+ connections efficiently. UDP benchmarks show Java 7 20-30% faster than Java 6, nearing native speeds. TCP may lag BIO slightly due to overhead, but NIO/AIO shine in connection volume.

Common pitfalls: over-allocating threads (match to cores), ignoring backpressure (queue overloads), and poor buffer sizing. Monitor via JMX: MINA exposes metrics for queued events, throughput.

Lecharny stresses: network rarely bottlenecks; focus on application I/O (databases, disks). 10Gbps networks outpace SSDs, so optimize backend.

Practical Examples: From Simple Servers to Real-World Applications

Lecharny presents realistic servers: a basic echo server with MINA requires minimal code—set acceptor, handler, bind. For protocols like LDAP, integrate codecs for ASN.1 encoding.

In Directory Server, NIO enables handling massive concurrent searches without thread explosion. MINA’s modularity allows stacking filters: SSL for security, compression for efficiency.

For UDP-based services (e.g., DNS), similar setup but with DatagramAcceptor. Handle datagram fragmentation manually if exceeding MTU.

AIO variant: Use AsyncIoAcceptor with CompletionHandlers for callbacks, reducing selector polling.

These examples illustrate MINA’s brevity: functional servers in under 50 lines, versus hundreds in raw NIO.

Implications and Recommendations for NIO Adoption

NIO/AIO demand understanding OS-level mechanics: epoll (Linux) vs. kqueue (BSD) for selectors, impacting portability. Java abstracts this, but edge cases (e.g., IPv6) require vigilance.

Performance gains are situational: BIO suffices for <1000 connections; NIO for scalability. Frameworks like MINA or Netty mitigate complexity, encapsulating best practices.

Lecharny concludes: embrace frameworks to avoid reinventing; comprehend fundamentals for troubleshooting. Java 7+ enhancements make NIO more viable, but test rigorously under load.

Relevant Links and Hashtags

Links: