Posts Tagged ‘Apache Kafka’
From JMS and Message Queues to Kafka Streams: Why Kafka Had to Be Invented
For decades, enterprise systems relied on message queues and JMS-based brokers to decouple applications and ensure reliable communication. Technologies such as IBM MQ, ActiveMQ, and later RabbitMQ solved an important problem: how to move messages safely from one system to another without tight coupling.
However, as systems grew larger, more distributed, and more data-driven, the limitations of this model became increasingly apparent. Kafka — and later Kafka Streams — did not emerge because JMS and MQ were poorly designed. They emerged because they were designed for a different era and a different class of problems.
What JMS and MQ Were Designed to Do
Traditional message brokers focus on delivery. A producer sends a message, the broker stores it temporarily, and a consumer receives it. Once the message is acknowledged, it is typically removed. The broker’s primary responsibility is to guarantee that messages are delivered reliably and, in some cases, transactionally.
This model works very well for command-style interactions such as order submission, workflow orchestration, and request-driven integration between systems. Messages are transient by design, consumers are expected to be online, and the system’s success is measured by how quickly and reliably messages move through it.
For many years, this was sufficient.
The Problems That Started to Appear
As companies began operating at internet scale, the assumptions underlying JMS and MQ started to break down. Data volumes increased dramatically, and systems needed to handle not thousands, but millions of events per second. Message brokers that tracked delivery state per consumer became bottlenecks, both technically and operationally.
More importantly, the nature of the data changed. Events were no longer just instructions to be executed and discarded. They became facts: user actions, transactions, logs, metrics, and behavioral signals that needed to be stored, analyzed, and revisited.
With JMS and MQ, once a message was consumed, it was gone. Reprocessing required complex duplication strategies or external storage. Adding a new consumer meant replaying data manually, if it was even possible. The broker was optimized for delivery, not for history.
At the same time, architectures became more decoupled. Multiple teams wanted to consume the same data independently, at their own pace, and for different purposes. In a traditional queue-based system, this required copying messages or creating parallel queues, increasing cost and complexity.
These pressures revealed a fundamental mismatch between what message queues were built for and what modern systems required.
The Conceptual Shift That Led to Kafka
Kafka was created to answer a different question. Instead of asking how to deliver messages efficiently, its designers asked how to store events reliably at scale and allow many consumers to read them independently.
The key idea was deceptively simple: treat data as an append-only log. Producers write events to a log, and consumers read from that log at their own pace. Events are not deleted when consumed. They are retained for a configurable period, or even indefinitely.
In this model, the broker no longer tracks who consumed what. Each consumer keeps track of its own position. This small change eliminates a major scalability bottleneck and makes replay a natural operation rather than an exceptional one.
Kafka’s architecture reflects this shift. It is disk-first rather than memory-first, optimized for sequential writes and reads. It scales horizontally through partitioning. It treats durability and throughput as complementary goals rather than trade-offs.
Kafka was not created to replace message queues; it was created to solve problems message queues were never meant to solve.
From Transport to Platform: Why Kafka Streams Exists
Kafka alone provides storage and distribution of events, but it does not process them. Early Kafka users still needed external systems to transform, aggregate, and analyze data flowing through Kafka.
Kafka Streams was created to close this gap.
Instead of introducing another centralized processing cluster, Kafka Streams embeds stream processing directly into applications. This is a deliberate contrast with both JMS consumers and large external processing frameworks.
In a JMS-based system, consumers typically process messages one at a time, often statelessly, and rely on external databases for aggregation and state. Rebuilding state after a failure is complex and error-prone.
Kafka Streams, by contrast, assumes that stateful processing is normal. It provides abstractions for event streams and for state that evolves over time. It stores state locally for performance and backs it up to Kafka so it can be restored automatically. Processing logic, state, and data history are all aligned around the same event log.
This approach turns Kafka from a passive transport layer into an active data platform.
What Kafka and Kafka Streams Do Differently
The fundamental difference between JMS/MQ and Kafka is not syntax or APIs, but philosophy.
Message queues focus on messages as transient instructions. Kafka focuses on events as durable facts. Message queues optimize for delivery guarantees. Kafka optimizes for scalability, retention, and replay. Message queues treat consumers as part of the broker’s responsibility. Kafka treats consumers as independent actors.
Kafka Streams builds on this by assuming that computation belongs close to the data. Instead of shipping data to a processing engine, it ships processing logic to where the data already is. This inversion dramatically simplifies architectures while increasing reliability.
Why Someone “Woke Up and Created Kafka”
Kafka was born out of necessity. At companies like LinkedIn, existing messaging systems could not handle the volume, variety, and longevity of data they were producing. They needed a system that could ingest everything, store it reliably, and make it available to many consumers without coordination.
Kafka Streams followed naturally. Once data became durable and replayable, processing it in a stateless, fire-and-forget manner was no longer sufficient. Systems needed to compute continuously, maintain state, and recover automatically — all while remaining simple to operate.
Kafka and Kafka Streams are the result of rethinking messaging from first principles, in response to scale, data-driven architectures, and the need to treat events as first-class citizens.
Conclusion
JMS and traditional message queues remain excellent tools for command-based integration and transactional workflows. Kafka was not designed to replace them, but to address a different category of problems.
Kafka introduced the idea of a distributed, durable event log as the backbone of modern systems. Kafka Streams extended that idea by embedding real-time pro