Analytics, Bigdata, Kafka

Moving to communication of events between subsystems — CQRS-ES with open source…

Before going into definitions of EP, CEP, and QSQS let us start with some basic database term and what problem we are trying to address here. We have commercial databases and database professionals those who publicized CRUD operations a lot. It is one-row-per-pattern works well in most of the projects and enough to build an application more quickly and securely. I have probably implemented 100 CRUD projects (including web applications) and we do that way because we have limited budgets and projects have deadlines.

CRUD work well until someone asked for historical data and I saw few managers complaining lack of access to historical data. We can’t deny value of historical data and can’t simply say no — we don’t collect it. Ok, let say at some point we started organizing historical data on top of relational database provided we don’t have storage problem. But still considering fact that our immutable tables are stable model.

Aside from advertising and consumer behavior analysis, there are many industries that have a need for such historical data. The very idea that it must be necessarily valuable to store historical data about “all the things”.

Event sourcing

“Event sourcing” is not a new term among the old programmers. Let me explain what is ES and its promises. Event sourcing achieves atomicity without two phase commit but rather using a radically different, event-centric approach to persisting business entities. Rather than store the current state of an entity, the application stores a sequence of state-changing events. The application reconstructs an entity’s current state by replaying the events.

In an event sourced application, there is an eventstore that persist events. The Event Store also behaves like the Message Broker. It provides an API that enables services to subscribe to events. The Event Store delivers events to all interested subscribers. The Event Store is the backbone of an event-sourced microservices architecture. For more in depth description of event sourcing, check out introducing event sourcing post by Martin Fowler.

But many general myths follow Event Source model and you can see below arguments for not taking event sourcing seriously.

1. Event sourcing model takes too much time to implement solution and as we all need to justify the cost.

2. Event sourcing (and CQRS generally) carries enough implementation overhead in the amount of code.

3. All those moving parts/code make changing the system extremely painful, lots of ripple effects — and every time you have to make a change to your events.

I can say we are working on live projects that support CQRS and Event Processing with minimum cost using opensource technology.

Command and Query Responsibility Segregation(CQRS)

CQRS is a simple design pattern for separating concerns. Object’s methods should be either commands or queries but not both. A query returns data and does not alter the state of the object; a command changes the state of an object but does not return any data. The benefit is that you have a better understanding of what does, and what does not change the state in your system.

CQRS pattern comes in handy when building an event sourced application. Since event sourcing stores current state as a series of events, it becomes time consuming when you need to retrieve the current state. This is because to get the current state of an entity, we will need to rebuild state within the system by replaying the series of events for that entity. What we can do to alleviate this program is to maintain two separate model (write model — eventstore and read model-a normal database). Once an event is persisted in an event store, a different handler will be in charge of updating the read store. We can read data directly from the read store but not write to it. By using CQRS, we have completely separated read operations(via readstore) from write operations(via eventstore).

But to build Event Source model It is important to really know and understand the problem domain you are applying ES to.

To understand more on problem domain you can read my article.

Having a strategy to upgrade streams to new versions of your domain model is a good idea if you’re applying ES to a business without a well-understood domain. However, it is very, very difficult to back into an ES implementation from a “naive” solution like “CRUD”.

You make your life harder If you shoehorn everything into ES without doing the design work to establish a domain, its boundaries, and what events make sense within it.

Let me rephrase above words, if your event streams contain mostly CRUD (possibly ANY) then you’re most likely applying it incorrectly. It’s not just a version history of your data. The event type itself is data, which provides context and semantics over and above the notion of writes and deletes. If you’re falling back to CRUD events all you’re doing is creating a lot more work for yourself and deriving almost no benefit from the use of ES — in that case, you should just use CRUD of your choice.

A good way to think about ES is that as events in an ES system are like facts in RDBMS, and event-types in ES are like tables in an RDBMS define a category of facts with a particular shape. The difference is that whereas in an RDBMS the facts represented by rows can be general, events are facts about a specific occurrence in the world rather than the state of the world (and the “state of the world” is an aggregate function of the collection of events.)

Now I guess we are in shape to define Complex Event Processing(CEP). You will find many definitions of it as this word is active since 1990. Lets have a look on common definition of CEP. Note that we are not doing any CRUD operation in CEP, it is simply to notify you (generate a “complex event”) generating another event whenever a match is found.

With CEP, you write queries or rules that match certain patterns in the events. They are comparable to SQL queries (which describe what results you want to return from a database). For use cases that can be easily described in terms of a CEP query language, such a high-level language is much more convenient than a low-level event processing API. On the other hand, a low-level API gives you more freedom, allowing you to do a wider range of things than a query language would let you do. Also, by focusing their efforts on scalability and fault tolerance, stream processing frameworks provide a solid foundation upon which query languages can be built.

A use case is fraud detection or idea for doing full-text search on streams, whereby you register a search query in advance and then are notified whenever an event matches your query.

Many CEP products are commercial, expensive enterprise software, although we can build it using opensource technology stack.

Kafka as an EventStore

Kafka is typically a message broker or message queue(comparable to AMQP, JMS, NATS, RabbitMQ). What makes Kafka interesting and why it’s suitable for use as an eventstore is that it is structured as a log. Each kafka topic maps to a bounded context, which in turn maps to a microservice. Event sourcing solves a lot of problems inherent in traditional systems and facilitates applying the pattern, but ES is not the pattern itself.

So there’s a ton of benefit using Kafka and events as a big part of the architecture. Things get simpler to reason about, easier to test, and result in less dependencies between teams.

Leave a Reply

Your email address will not be published. Required fields are marked *