“We started happily transitioning our monolithic application to a set of independent microservices but soon it turned into a nightmare as various services started talking to each other and we struggled to manage state & handle partial failures”
This is a typical reaction of a big company trying to migrate from considerably an unmanaged chunk of messy code ( read monolithic J2EE application ) to a set of microservices that are supposed to be
- independently scalable
- polyglot ( can be developed in different languages/using different technology stacks )
- developed & managed by independent teams in parallel
All this works pretty fine in theory until you need these microservices to interact with each other, which is true for any sufficiently complex system.
A typical first attempt at facilitating this interaction results in each service exposing a REST API defining the interface which the other one could consume and proceed with the transaction. As an example, consider a decomposed view of Order Management System: each block owns its own persistence & is a microservice in its own and they interact with each other using the exposed APIs over HTTP.
The probable downsides of this approach include
- Handling “partial failures” in case any subsystem fails or takes a relatively longer time to respond. A typical request/response can span multiple synchronous REST calls across microservices and anyone can fail.
- Service Discovery: Each service must be discoverable in order to be called by the client or use an API gateway as a facade.
- Tight coupling between services: APIs need to evolve in a backwards-compatible manner.
Another well-established communication pattern is to use event sourcing and CQRS instead as outlined below, but first, let’s get to the problem:
Since each microservice has its own database ( Every microservice may choose a NoSQL, relational or a search engine like ElasticSearch or maybe a graph database like Neo4j ) it becomes really difficult to manage business transactions that may span multiple microservices and two-phase commit usually does not work in a distributed setting for modern applications. Another challenge is to perform efficient queries across databases and join them.
One possible way to solve this via event-driven architecture could be to use a message broker to communicate between services. Say, when an Order is placed by a Customer, we need to update the Inventory, Product Management Service & also the real-time Reporting Service. To achieve this, the Order service will update its own local database and then publish the OrderPlaced event using a highly available Message broker ( like Kafka, ActiveMQ, RabbitMQ etc )
Other services will be listening on the same Message Broker for events of interest and can update their own state ( in their database ) and further publish new events. Although seemingly simple, the major issue with this is for Order Service to achieve atomicity in updating its local database and publishing the event to the Message Broker. If either one fails we must rollback the other in order to make sense of data.
Event Sourcing to the rescue!
Event sourcing advocates storing all the events in an Event Store, as they occur. They do not update the state of an entity in a database, rather store all the state change events with all the relevant details in an Event Store in the order, they occur. This publishing of an event as some state changes is, of course atomic. The Event Store also acts as a message broker ( like Kafka ) in this design. The store typically provides APIs for storing and retrieving events for an entity. Storing all the events as they occur also facilitates any auditing requirements.
To get the current state of an entity, a typical access pattern could be to replay all the events related to that entity, fetching them in order from the Event Store. Obviously, this does not scale well, so it is recommended to store a snapshot of entity’s current state and then replay only those events that have occurred after this current state.
Another highly used pattern is to use CQRS based materialized views, which can be easily queried upon without having to go through all the events sequentially in real time. Command and Query Responsibility Segregation or CQRS splits up an application by building different read (query) and write (command) paths.
Although event handlers are shown as a separate box in the above diagram, but they most certainly form an integral part of the service itself as signified by the dashed arrows. The typical interaction pattern as governed by CQRS is:
- Command: The Order Service stores the state changes to Order as Events
- There can be different type of events for one entity ( eg OrderPlaced, OrderChanged, OrderRemoved etc )
- These different events could be subscribed by different services ( here: Order & Inventory )
- These event handlers update their own databases with relevant information as required by them
- Query: The Order Service when requested for the total number of orders placed so far, can directly query its database instead of replaying the events from the Event store.
As with everything else, Event sourcing & CQRS is not an answer to all your microservices problems and hence their pros-cons must be considered before adopting them for your use-case:
- No 2PC – Since the publishing of an event is inherently atomic in nature, so there is no requirement of having transactions that span multiple databases and hence no need of 2 phase commits.
- Separation of Concerns – It provides a clear separation between writing and reading paths for a service and each one could be developed and scaled independently.
- Auditing – As mentioned above, the event trail could be easily used to audit the state changes by simply consuming all the events from a given time for an entity.
- Complexity & Learning Curve – It is significantly more complex to develop and has a steeper learning curve than the typical microservice development cycle.
- Eventual Consistency – Although this is not really a problem but it is important to remember for an application that event sourcing and CQRS makes your data eventually consistent.
To conclude, there is no silver bullet when it comes to designing & developing highly scalable microservices but it’s always good to be aware of new design patterns that tend to avoid transactional queries and two-phase commits and provides a clear separation of concerns and scalability at the same time.