Jonathan Lalou's Blog

Posts Tagged ‘HighPerformance’

[SpringIO2024] Mind the Gap: Connecting High-Performance Systems at a Leading Crypto Exchange @ Spring I/O 2024

At Spring I/O 2024, Marcos Maia and Lars Werkman from Bitvavo, Europe’s leading cryptocurrency exchange, unveiled the architectural intricacies of their high-performance trading platform. Based in the Netherlands, Bitvavo processes thousands of transactions per second with sub-millisecond latency. Marcos and Lars detailed how they integrate ultra-low-latency systems with Spring Boot applications, offering a deep dive into their strategies for scalability and performance. Their talk, rich with technical insights, challenged conventional software practices, urging developers to rethink performance optimization.

Architecting for Ultra-Low Latency

Marcos opened by highlighting Bitvavo’s mission to enable seamless crypto trading for nearly two million customers. The exchange’s hot path, where orders are processed, demands microsecond response times. To achieve this, Bitvavo employs the Aeron framework, an open-source tool designed for high-performance messaging. By using memory-mapped files, UDP-based communication, and lock-free algorithms, the platform minimizes latency. Marcos explained how they bypass traditional databases, opting for in-memory processing with eventual disk synchronization, ensuring deterministic outcomes critical for trading fairness.

Optimizing the Hot Path

The hot path’s design is uncompromising, as Marcos elaborated. Bitvavo avoids garbage collection by preallocating and reusing objects, ensuring predictable memory usage. Single-threaded processing, counterintuitive to many, leverages CPU caches for nanosecond-level performance. The platform uses distributed state machines, guaranteeing consistent outputs across executions. Lars complemented this by discussing inter-process communication via shared memory and DPDK for kernel-bypassing network operations. These techniques, rooted in decades of trading system expertise, enable Bitvavo to handle peak loads of 30,000 transactions per second.

Bridging with Spring Boot

Integrating high-performance systems with the broader organization poses significant challenges. Marcos detailed the “cold sink,” a Spring Boot application that consumes data from the hot path’s Aeron archive, feeding it into Kafka and MySQL for downstream processing. By batching requests and using object pools, the cold sink minimizes garbage collection, maintaining performance under heavy loads. Fine-tuning batch sizes and applying backpressure ensure the system keeps pace with the hot path’s output, preventing data lags in Bitvavo’s 24/7 operations.

Enhancing JWT Signing Performance

Lars concluded with a case study on optimizing JWT token signing, a “warm path” process targeting sub-millisecond latency. Initially, their RSA-based signing took 8.8 milliseconds, far from the goal. By switching to symmetric HMAC signing and adopting Azul Prime’s JVM, they achieved a 30x performance boost, reaching 260-280 microsecond response times. Lars emphasized the importance of benchmarking with JMH and leveraging Azul’s features like Falcon JIT compiler for stable throughput. This optimization underscores Bitvavo’s commitment to performance across all system layers.

Links:

Posted in en-US | Tags: Aeron, Bitvavo, CryptoExchange, HighPerformance, LarsWerkman, LowLatency, MarcosMaia, SpringBoot, SpringIO2024 | No Comments »

[DevoxxFR2013] The Lightning Memory-Mapped Database: A Revolutionary Approach to High-Performance Key-Value Storage

Author: Jonathan Lalou

Lecturer

Howard Chu stands as one of the most influential figures in open-source systems programming, with a career that spans decades of foundational contributions to the software ecosystem. As the founder and Chief Technology Officer of Symas Corporation, he has dedicated himself to building robust, high-performance solutions for enterprise and embedded environments. His involvement with the OpenLDAP Project began in 1999, and by 2007 he had assumed the role of Chief Architect, guiding the project through significant scalability improvements that power directory services for millions of users worldwide. Chu’s earlier work in the 1980s on GNU tools and his invention of parallel make — a feature that enables concurrent compilation of source files and is now ubiquitous in build systems like GNU Make — demonstrates his deep understanding of system-level optimization and concurrency. The creation of the Lightning Memory-Mapped Database (LMDB) emerged directly from the practical challenges faced in OpenLDAP, where existing storage backends like Berkeley DB introduced unacceptable overhead in data copying, deadlock-prone locking, and maintenance-intensive compaction. Chu’s design philosophy emphasizes simplicity, zero-copy operations, and alignment with modern hardware capabilities, resulting in a database library that not only outperforms its predecessors by orders of magnitude but also fits entirely within a typical CPU’s L1 cache at just 32KB of object code. His ongoing work continues to influence a broad range of projects, from authentication systems to mobile applications, cementing his legacy as a pioneer in efficient, reliable data management.

Abstract

The Lightning Memory-Mapped Database (LMDB) represents a paradigm shift in embedded key-value storage, engineered by Howard Chu to address the critical performance bottlenecks encountered in the OpenLDAP Project when using Berkeley DB. This presentation provides an exhaustive examination of LMDB’s design principles, architectural innovations, and operational characteristics, demonstrating why it deserves the moniker “lightning.” Chu begins with a detailed analysis of Berkeley DB’s shortcomings — including data copying between kernel and user space, lock-based concurrency leading to deadlocks, and periodic compaction requirements — and contrasts these with LMDB’s solutions: direct memory mapping via POSIX mmap() for zero-copy access, append-only writes for instantaneous crash recovery, and multi-version concurrency control (MVCC) for lock-free, linearly scaling reads. He presents comprehensive benchmarks showing read throughput scaling perfectly with CPU cores, write performance exceeding SQLite by a factor of twenty, and a library footprint so compact that it executes entirely within L1 cache. The session includes an in-depth API walkthrough, transactional semantics, support for sorted duplicates, and real-world integrations with OpenLDAP, Cyrus SASL, Heimdal Kerberos, SQLite, and OpenDKIM. Attendees gain a complete understanding of how LMDB achieves unprecedented efficiency, simplicity, and reliability, making it the ideal choice for performance-critical applications ranging from embedded devices to high-throughput enterprise systems.

The Imperative for Change: Berkeley DB’s Limitations in High-Performance Directory Services

The development of LMDB was not an academic exercise but a direct response to the real-world constraints imposed by Berkeley DB in the OpenLDAP environment. OpenLDAP, as a mission-critical directory service, demands sub-millisecond response times for millions of authentication and authorization queries daily. Berkeley DB, while robust, introduced several fundamental inefficiencies that became unacceptable under such loads.

The most significant issue was data copying overhead. Berkeley DB maintained its own page cache in user space, requiring data to be copied from kernel buffers to this cache and then again to the application buffer — a process that violated the zero-copy principle essential for minimizing latency in I/O-bound operations. This double-copy penalty became particularly egregious with modern solid-state drives and multi-core processors, where memory bandwidth is often the primary bottleneck.

Another critical flaw was lock-based concurrency. Berkeley DB’s fine-grained locking mechanism, while theoretically sound, frequently resulted in deadlocks under high contention, especially in multi-threaded LDAP servers handling concurrent modifications. The overhead of lock management and deadlock detection negated much of the benefit of parallel processing.

Finally, compaction and maintenance represented an operational burden. Berkeley DB required periodic compaction to reclaim space from deleted records, a process that could lock the database for minutes or hours in large installations, rendering the system unavailable during peak usage periods.

These limitations collectively threatened OpenLDAP’s ability to scale with growing enterprise demands, prompting Chu to design a completely new storage backend from first principles.

Architectural Foundations: Memory Mapping, Append-Only Writes, and Flattened B+Trees

LMDB’s architecture is built on three core innovations that work in concert to eliminate the aforementioned bottlenecks.

The first and most fundamental is direct memory mapping using POSIX mmap(). Rather than maintaining a separate cache, LMDB maps the entire database file directly into the process’s virtual address space. This allows data to be accessed via pointers with zero copying — the operating system’s virtual memory manager handles paging transparently. This approach leverages decades of OS optimization for memory management while eliminating the complexity and overhead of a user-space cache.

The second innovation is append-only write semantics. When a transaction modifies data, LMDB does not update pages in place. Instead, it appends new versions of modified pages to the end of the file and updates the root pointer atomically using msync(). This design to instantaneous crash recovery — in the event of a system failure, the previous root pointer remains valid, and no log replay or checkpoint recovery is required. The append-only model also naturally supports Multi-Version Concurrency Control (MVCC), where readers access a consistent snapshot of the database without acquiring locks, while writers operate on private copies of pages.

The third architectural choice is a flattened B+tree structure. Traditional B+trees maintain multiple levels of internal nodes, each requiring additional I/O to traverse. LMDB stores all nodes at the leaf level, with internal nodes containing only keys and child pointers. This reduces tree height and minimizes the number of page fetches required for lookup operations. Keys within pages are maintained in sorted order, enabling efficient range scans and supporting sorted duplicates for multi-valued attributes common in directory schemas.

API Design: Simplicity and Power in Harmony

Despite its sophisticated internals, LMDB’s API is remarkably concise and intuitive, reflecting Chu’s philosophy that complexity should be encapsulated, not exposed. The core operations fit within a handful of functions:

MDB_env *env;
mdb_env_create(&env);
mdb_env_set_mapsize(env, 10485760);  // 10MB initial size
mdb_env_open(env, "./mydb", MDB_NOSUBDIR, 0664);

MDB_txn *txn;
mdb_txn_begin(env, NULL, 0, &txn);

MDB_dbi dbi;
mdb_dbi_open(txn, "mytable", MDB_CREATE, &dbi);

MDB_val key = {5, "hello"};
MDB_val data = {5, "world"};
mdb_put(txn, dbi, &key, &data, 0);

mdb_txn_commit(txn);

For databases supporting duplicate values:

mdb_dbi_open(txn, "multival", MDB_CREATE | MDB_DUPSORT, &dbi);
mdb_put(txn, dbi, &key, &data1, 0);
mdb_put(txn, dbi, &key, &data2, MDB_APPENDDUP);  // Preserves sort order

The API supports full ACID transactions with nested transactions, cursor-based iteration, and range queries. Error handling is straightforward, with return codes indicating success or specific failure conditions.

Performance Characteristics: Linear Scaling and Unparalleled Efficiency

LMDB’s performance profile is nothing short of revolutionary, particularly in read-heavy workloads. Benchmarks conducted by Chu demonstrate:

Read scaling: Perfect linear scaling with CPU cores, achieving over 1.5 million operations per second on an 8-core system. This is possible because readers never contend for locks and operate on consistent snapshots.
Write performance: Approximately 100,000 operations per second, compared to Berkeley DB’s 5,000 and SQLite’s similar range — a 20x improvement.
Memory efficiency: The shared memory mapping means multiple processes accessing the same database share physical RAM, dramatically reducing per-process memory footprint.
Cache residency: At 32KB of object code, the entire library fits in L1 cache, eliminating instruction cache misses during operation.

These metrics translate directly to real-world gains. OpenLDAP with LMDB handles 10 times more queries per second than with Berkeley DB, while SQLite gains a 2x speedup when using LMDB as a backend.

Operational Excellence: Zero Maintenance and Instant Recovery

LMDB eliminates the operational overhead that plagues traditional databases. There is no compaction, vacuuming, or index rebuilding required. The database file grows only with actual data, and deleted records are reclaimed automatically during transaction commits. The map size, specified at environment creation, can be increased dynamically without restarting the application.

Crash recovery is instantaneous — the last committed root pointer is always valid, and no transaction log replay is needed. This makes LMDB ideal for embedded systems and mobile devices where reliability and quick startup are paramount.

Concurrency Model: One Writer, Unlimited Readers

LMDB enforces a strict concurrency model: one writer at a time, unlimited concurrent readers. This design choice, while seemingly restrictive, actually improves performance. Chu’s testing revealed that even with Berkeley DB’s multi-writer support, placing a global lock around the database increased write throughput. The overhead of managing multiple concurrent writers — deadlock detection, lock escalation, and cache invalidation — often outweighs the benefits.

For applications requiring multiple writers, separate LMDB environments can be used, or higher-level coordination (e.g., via a message queue) can serialize write access. An experimental patch exists to allow concurrent writes to multiple databases within a single environment, but the single-writer model remains the recommended approach for maximum performance.

Ecosystem Integrations: Powering Critical Infrastructure

LMDB’s versatility is evident in its adoption across diverse projects:

OpenLDAP: The primary motivation, enabling directory servers to handle millions of entries with sub-millisecond latency
Cyrus SASL: Efficient storage of authentication credentials
Heimdal Kerberos: High-throughput ticket management in distributed authentication
SQLite: As a backend, providing embedded SQL with LMDB’s speed and reliability
OpenDKIM: Accelerating domain key lookups for email authentication

These integrations demonstrate LMDB’s ability to serve as a drop-in replacement for slower, more complex storage engines.

Future Directions: Replication and Distributed Systems

While LMDB focuses on local storage, Chu envisions its use as a high-performance backend for distributed NoSQL systems like Riak and HyperDex, which provide native replication. This separation of concerns allows LMDB to excel at what it does best — ultra-fast, reliable local access — while leveraging other systems for network coordination.

The library’s compact size and zero-dependency design make it particularly attractive for edge computing, IoT devices, and mobile applications, where resource constraints are severe.

Conclusion: Redefining the Possible in Database Design

The Lightning Memory-Mapped Database represents a triumph of focused engineering over feature bloat. By ruthlessly optimizing for the common case — read-heavy workloads with occasional writes — and leveraging modern OS capabilities like mmap(), Howard Chu created a storage engine that is simultaneously faster, simpler, and more reliable than its predecessors. LMDB proves that sometimes the most revolutionary advances come not from adding features, but from removing complexity. For any application where performance, reliability, and simplicity matter, LMDB is not just an option — it is the new standard.

Links

Posted in en-US | Tags: BPlusTree, DevoxxFR2013, EmbeddedDatabase, HighPerformance, HowardChu, LMDB, MemoryMapped, Symas | No Comments »