EventStoreDB vs Kafka

Reading Time: 5 minutes

There is a lot of confusion in the community regarding EventStoreDB and Kafka – especially when it comes to event sourcing. Developers are confused to deciding which technology they should use. How they compare and what are the balance factor that they will have to make. This article helps you to understand, what the two solutions offer, how to use them effectively. First we need to explore the definition of event sourcing and the requirements the solution should meet.

Event sourcing

The name directly comes from the fact that event sourcing events are the source of truth. So all of the other data and other data structures are just derived from the events. So we can erase in theory all of those other storages as long as we keep event lock then we can always regenerate them. Event sourcing contains a ordered of our operation so if we have a look on the shopping cart.

  • At first we are initializing the shopping cart.
  • We are adding new product.
  • We are may be removing the product because we decided that we did it by mistake.
  • Then We added a new product.
  • At the end we are confirming the card.

Nice thing about Event sourcing is that we are able to do time traveling. If we have recorded the sequence of events then we can always go back. So we can just take the events and apply that to the current state and get back to time to see what has happened.

Advantages:

  • Event Sourcing solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes.
  • It persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.
  • It provides a 100% reliable audit log of the changes made to a business entity
  • This makes it possible to implement temporary queries that determine the status of an entity at any given time.
  • It’s logic consists of loosely coupled business entities that exchange events. This makes it very easy to migrate from monolithic applications to microservices architecture.

Issues:

1. Scaling with snapshots :

Handling entities with long and complex lifespans. Entities defined by frequent changes in state can become a problem due to the large number of events that have to be processed to determine the current state. Event store implementations typically address this by creating snapshots that summarize state up to a particular point in time. This reduces query load as you only need the latest snapshot along with any events committed since the snapshot’s creation.

When and how should snapshots be created? This is not straightforward as it usually requires an asynchronous process to create a snapshot prior to any expected query load. It can be difficult to forecast in the real world. An effective snapshot strategy may require a complex set of algorithms that are tailored for any process required to access the event store.

2. Longer bootup time : 

Long boot up times can be a bit of an issue if you are using multiple heterogeneous databases due to the initialization of different data contexts. If you’re using something simple like ADO .NET to interact with the event store and a micro-ORM for the read side, the system will “cold start” faster than any full-featured ORM “Will do”. This is really a problem that CQRS should solve. and as i said before, the read side should be modeled for the views and there should be no overhead of re-mapping the data.

EventStoreDB

EventStoreDB is an event sourcing database that stores your critical data in streams of immutable events. It was built from the ground up for Event Sourcing and offers an unprecedented solution for building event-sourced systems. It provides a low-level protocol in the form of an asynchronous TCP protocol that exchanges protobuf objects. Protobuf is a binary data-interchange format developed by Google, while JSON is the human-readable data-interchange format

Event Sourcing is used for storing data as events in an append-only log. Append-only log is a command to change the state of the database shall first be recorded in such log before it is applied to the database. Every change done is represented as an event, and appended to the event log. The current state of an entity can be created by iterating over all events in order of occurrence. Now this protocol has adapters for .NET and the JVM. It also offers an HTTP-based interface, based specifically on the AtomPub protocol.

The system information is sourced from the events. We assume Event Sourcing is mainly for auditing, but this is a limited view, but Event Sourcing is so much more than that.An append-only log is great for auditing. On the other side preparing an audit log, an event-sourced system has a prosperity of information and context stored within that can be valuable to the business. Audit log is a chronological record that has changed without reference. As the context is stored within events, the ‘why’ and ‘when’ of the event are stored implicitly within the event’s data.

Kafka

Users of modern day cloud applications expect a real time experience. How this is achieved. Apache Kafka is an open source, distributed streaming platform that enables the development of (among other things) real-time, event-driven applications.

Events represent facts of information or gathering information that happened in the past. Events are immutable in nature, events can be ignored but not be retracted, Events can be interpreted differently. Generally, an event is an action that drives another action as part of a process. Someone placing an order, choosing a seat on a train are all examples of events. An event doesn’t require a person to be involved—for example, a connected thermostat’s temperature report at a given time is also an event.

Event stream is a Ordered Sequence of events, It is also immutable. Each stream represents a specific object. Why is it called streams because if our system is alive and its ongoing, then it means that if we have a look on our screen any then between the previous look and the current one probably new events were recorded, if in our system no event is recorded then that means that either no one is using the system or have some serious back. So that’s why stream because that’s a continuous flow of events, event stream can be a picture like that so we have timeline and each new event is appended at the end of the event stream.

Advantages :

  • Specifically, it allows developers to build applications that continuously produce and consume streams of data records. It runs as a cluster that can span multiple servers or multiple data centers. The records that are generated are replicated and segmented in a way that allows a large amount of users to use the application simultaneously without any visible lag in performance.
  • Apache kafka is super fast.
  • It also maintains a very high level of accuracy with the data records.
  • It maintains the order of their occurrence.
  • It is also resilient and fault-tolerant.

Use Cases :

  • Messaging System.
  • Activity Tracking.
  • Gather metrics from many different locations.
  • Applications Logs gathering.
  • Stream processing(with the Kafka Streams API Spark).
  • De-coupling of system dependencies.
  • Integration with Spark, Flink, Storm, Hadoop and many other Big Data technologies.

Demerits :

  • No Complete Set of Monitoring Tools
  • Issues with Message Tweaking
  • Not support wildcard topic selection.
  • Lack of Pace.
  • Reduces Performance.
  • Behaves Clumsy.
  • Lacks some Messaging Paradigms.

Use Cases :

1. Delay in Replaying an event:

With EventStoreDB a replay is a natural operation and can be easily implemented by replaying all events since the system has been released. The consumer can continue to read events from the stream once the replay has ended. Replay can be equally easy with Kafka, as the model is pretty much the same – assuming that the retention policy is to store events in the topic forever. To redo a read model, we need to iterate a consumer group to read messages from the beginning.

2. Data archivization policy:

Most medium and large systems have a way to store data to reduce operating costs and maintain system performance. It is not possible to delete specific messages from a topic or stream in both Kafka and EventStoreDB and such operation is not supported. With Kafka, we can send a message with a unique partition key and a zero payload which will effectively mark all messages with that partition key for deletion. With EventStoreDB we can remove a fine stream and this is one of the basic functions that the database supports.

Conclusion :

EventStoreDB seems to be the clear winner when it comes to reading, writing and deleting data. Then we have Kafka with great throughput and scalability – it can accept many more writes and reads per second, and makes auto-scaling of competing consumers very easy. From my point of view, the two solutions coexist in the same space of message storage and processing, but their strengths are different.

Written by 

Gaurav Dubey is a Java Developer at Knoldus Software LLP. He has done M.CA from Kurukshetra University and completed Bachelor's Degree in Computer Science from M. D. University. He is a tech enthusiast with good knowledge of Java. He is majorly focused in Java practice. On the personal front, he loves to cook and travel.