Microservices and the Saga Pattern

Table of contents

Reading Time: 10 minutes

Hi all,

Microservices are not new in the market, they have been a part of our day to day life for a while now, so here in this blog, I would like to share with you all my understanding of what microservices are and the Saga Pattern which is widely used with the microservices. We will start with what exactly do we mean when we say: (i) we need a microservice, (ii) what it means to be reactive, and then (iii) dig into the concept- Saga patterns, with respect to a distributed system along with an easy to understand real-life example.

So let’s get started.

Microservice Basics

I hope that most of us have heard the term Microservice. If not, then in layman terms I would define a microservice to be something which is alone enough to perform a specific functionality within a larger system and is independent of other things around it, it is only concerned about the functionality for which it is created and should only deal with that functionality alone. For example, if we are preparing a product like a Restaurant application, then we would be creating several small microservices like the Orders, Customer, Reservation and etc which would be performing specific tasks around a specific functionality of the Restaurant application, and would be interacting if and only if we need to have the functionalities clubbed together and that too only through their exposed API’s. For now, we can think of an API as endpoints of a service which are exposed for use to the outside world(outside of the microservice itself). I would like to mention some key characteristics of a microservice here:

Easy Deployment: Each microservice is independently deployed, this is what makes a microservice different from what we call a Service Oriented Architecture(SOA).
Data Ownership: Each service should own its data, meaning that no external entity/component/service should ever be able to access(read) this data directly or do an update(write) to this data.
Data Flow: Communication between reactive microservices or within a reactive microservice must only be in the form of asynchronous messaging. (Note: Synchronous communication tends to move the service away from the notion of being reactive.)
Isolation of failure: Isolation of failures, meaning that out of n services in your application the service that has a problem/failure should only be the one to suffer and other microservices should remain responsive irrespective of the one service being under the clouds. This would mean that serious failures are isolated to a single service so that the complete application doesn’t go down, failures should not propagate to dependent services and thus should not cause the other service to fail as well.

Here’s a top-level view of how our Restaurant application would look like in the presence of microservices-architecture enabled:

Now with the help of above-mentioned characteristics of a microservice, let us see why the services in a Restaurant application should be called microservices. We have 3 services in the restaurant application: 1. Orders, 2. Customer, and 3. Reservation. So let’s analyze and see their behavior and answer the question of whether they should be classified as microservices or not:

Independent Deployments: All these services work independently of one another and are deployed independently, for example, an improvement in the Orders service doesn’t wait for the Customer service to implement this enhancement in it, instead the Order service is deployed with the enhancement(of course supporting the previous version as well so that we don’t break any chain of usability) and when the Customer service feels like using this enhancement they switch to the enhanced version of the Order service.
Data Rights/Ownership: The data which is used by Orders microservice is conceptually related to the functionality for which the microservice was created in the first place, it may contain information like the placedOrders, rejectedOrders, waitingOrders etc. Similarly, the Customer service would have data specific to its use case, And we do not allow the Customer service to interact with the Database of the Orders service directly. If Customer service needs some information about a particular order then it could get that only by requesting the Order service to provide it by calling Order service’s exposed endpoint/APIs.
Asynchronous Communication: Asynchronous communication between the services and within a service avoids contention problem i.e; waiting for a response/resource sitting idle/blocked. Internals of the services has to be implemented in such a way that they don’t wait for a response from some other part instead when a response comes we do processing on that at that time only. So it’s something like: if I go to a stationery to get my documents printed and I see that the printer there is already busy then I should not wait there and sit idle I should rather ask the attendant to get the documents printed and inform me when it’s done meanwhile I would go to some other store/section and get a Folder to keep those documents in.
Isolation of Failures: Let suppose our Reservations service is down but that does not mean that a Customer should not be able to see his details from the customer service or they cannot see what were their previous orders from the Order service, only the reservations should be affected by the failure of the Reservations service, and the customer won’t be able to book a reservation for them at the moment. (We try to have the impact of a failure as minimal as possible.)

So do these services qualify for being called microservices?, I would say yes since they all possess the characteristics of a microservice.

I hope that by now you are pretty much clear about what a microservice is and you must have started analyzing the strengths and weaknesses of it. I tried to keep it simple here with giving just an overview of what a microservice should look like but there’s a lot more to it, which is out of the scope of this blog. 🙂

SOA vs Microservice

I would like to mention a key difference between a microservice and an S.O.A(Service Oriented Architecture), yes the two are not the same and that is because SOA does not talk about the service deployment and it is the reason that when we build a system using SOA we end up building it in monolithic style, where all the services are deployed as 1 application all together. On the other hand, microservices are a subset of S.O.A but they require that each of the services is independently deployed, meaning that they can be put on many different machines and any number of copies of the individual services could be deployed.

Microservice’s Advantages

Here are a few advantages to using a microservice:

Rapid Deployments: Since each of the microservices are independent of the deployment of other services hence leading to the rapid deployments within an application.
Database: We can have multiple independent databases, corresponding to every other microservice.
Loose Coupling between the components of the application.
Communication between the components could be in the form of synchronous messaging or asynchronous messaging(Reactive microservices), however preferable is asynchronous mode.
Scaling Microservice: Each of the microservice is scaled independently.
Failures Isolation: Failures are isolated to the originating source only and they do not cascade to the other services causing the complete application to fail rather the source of failure service’s functionality is the only thing that gets affected leaving the rest of the application functionalities to work properly as expected.

Microservice’s Disadvantages

Just like the monoliths would have advantages and disadvantages, microservices do have a few disadvantages, here are a few:

Complex Deployments: We may have many microservices in an application and those services could be written in different languages that may lead to complex deployment systems.
Complex Monitoring: A large number of microservices are deployed that means we need a way to monitor multiple services to see if any of them is having a problem that is not the case with a monolith where we just monitor 1 single application as a whole.
Support Older Versions: We need to provide continuous support for an older version of the API so that our customers who are still using that older API version can continue to perform their daily operations without any hassle. So the general convention is that we keep at least 2 versions(current & current’s closest predecessor) deployed for the same service till the time all the customers have migrated to the newer version(there are certain issues to that approach as well but that’s a story for next time).

Reactive Systems

So moving on to our next question, what does it mean to be Reactive?

We say that a system is reactive if it follows the principles of reactive architecture that are:

Responsive- It means that the system should be able to respond back to the user in a specified time t under all the circumstances(some negligible deviation is accepted though) whether there’s heavy traffic or nominal traffic. If a system’s response time degrades with the increasing load then it is not considered as responsive. A reactive system consistently responds in a timely fashion.
Resilient- Being resilient means that the system, even if it is suffering from some internal problem(failures) should be able to stay responsive and should continue to provide the user a response in same time t as it would have provided if there were no failures internally, however, the response received in the case when System is under the clouds could differ from the response when the system is working in a normal state. A reactive system remains responsive, even when a failure occurs.
Elastic- A system is said to be elastic if it can scale up or scale down as per the requirements, this results in better resource utilization. Elasticity implies that we can not only scale up when needed but then when the load decreases we can scale back down in order to conserve resources. The reactive manifesto states that a system needs to be both resilient and elastic in order to achieve responsiveness. A reactive system remains responsive, despite changes to system load.
Message-Driven- Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensure loose coupling, isolation and location transparency. Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.

So we conclude that the primary goal of reactive architecture is to provide an experience that is responsive under all conditions. More information is available here: The Reactive Manifesto

SAGA Patterns

Now moving on to the third point of discussion: Saga Patterns.

A reactive system is message-driven, and it puts emphasis on asynchronous and non-blocking messages. Ideally, we should always go with asynchronous messaging between the components but we can have synchronous messages as well. It is important to understand here that the need for synchronous messages should be driven by the domain requirements rather than technical convenience. Sometimes being asynchronous in nature of communication doesn’t help a lot, consider a scenario:

In a distributed system we want to do a database transaction. Our application is spread across multiple microservices, and they are all communicating asynchronously. Trying to open a transaction when you’re potentially accessing multiple databases doesn’t actually really work. And even if you could do that, because of the nature of the asynchronous messages, you’d have to hold that transaction open for a potentially long period of time, which makes things very brittle. Sometimes things like this happen where we have multiple stages, multiple steps, all of which have to complete, or all of which have to fail. That’s how a transaction works, but when this spreads out across multiple microservices we can’t simply use a transaction. What should be used then? The answer to that would be a SAGA.

A Saga pattern is a way of representing a long-running transaction. What we do is we have multiple requests, managed by a saga. These requests can be run in sequence or in parallel. When all the requests are complete and have all completed successfully then only we say that our saga is completed. But what happens if one request(R1) finished successfully but another request(R2) failed, what to do now? How to deal with this failure? The answer to that lies within the implementation of the saga pattern. So what happens is, in saga each request is paired up with a compensating action. If a request fails compensating actions are executed for all the completed steps in that request. Once such a scenario occurs we do not complete the saga, instead, we fail the saga, and if the compensating action also failed then we retry the compensating action till the time it succeeds.

We should not confuse a compensating action with a rollback. A rollback implies that a transaction has not completed, but when we rollback we erase the evidence of the transaction and that way we don’t even know that a transaction was initiated and it failed. We say that the thing we were trying to do that never happened, but this is not the case with a compensating action in saga’s, with compensating actions(or Saga’s) we acknowledge that the thing we were trying to do did not complete successfully and as a consequence, we would be applying fixes in form of our compensating actions to whatever change was made to the state of the system by this request(R2) prior to its failure. The evidence of original actions would still remain and we are being honest and transparent to the customer about the scenario/transaction/thing that happened/failed.

A real-world example to see the SAGA pattern in use could be found in the banking sector. Let suppose you tried to initiate a transaction “debit 500 bucks” from your bank account while doing a purchase from nearby stationery, but due to some certain factors you received a notification that your account was debited but the shopkeeper didn’t receive the payment so they asked you to do a transaction(“debit 500 bucks”) again and this time it succeeded. Now when you would see your bank statement you would see two transactions “debit 500 bucks” on the same date for the same entity you purchased and now you are curious/stressed 🙁 about: did that shopkeeper lied to you? Did he charge you twice for the same item? :O and etc etc. But that is not true neither was the shopkeeper lying nor were you charged twice for the same item, relief! 🙂 But how come two transactions are there for “debit 500 bucks” yet only once was your account debited? Curious again? :O

The answer is Saga Patterns were enabled for your transaction, and this condition(failure in transaction-1) is where they came into the picture. So your first transaction request that failed initiated a compensating action on its failure that provided fixes to all the operations that were performed before your request failed, so what happened is that 500bucks were debited from your account but since the request failed immediately, a compensating action paired with the request “debit 500 bucks” was executed that resulted in a command to the system “credit 500 bucks” and that command was executed on the state of your bank account. This way your money never left your bank account and never reached to the shopkeeper the first time. The next time you initiated the same transaction and it completed without any failures then only was your money transferred to the shopkeeper’s account. So if you would look carefully at your bank statement details you would see that there aren’t just 2 transactions that were made to that shop from your account that day, instead, there are 3 transactions, that would look like:

(Transaction-1) Debited 500 bucks from your account for payment at ABC shop.
(Transaction-2) Credited 500 bucks to your account via refund initiated etc.
(Transaction-3) Debited 500 bucks from your account for payment at ABC shop.

That is how a saga works and you can clearly see that there must have been some issue with the (Transaction-1) that resulted in (Transaction-2) to take place and so you were asked to perform the (Transaction-1) in form of (Transaction-3) again by the shopkeeper.

I hope that explains the basic idea of a saga pattern in action. 🙂

Conclusion:

We saw what a microservice is, what it takes for an SOA or service to be categorized as microservice. We also saw some pros and cons of working with microservice architecture. We went through the fundamentals of the reactive architecture and at last, we saw the Saga pattern in action with a real-world example.

With that, I would conclude this blog and in the next blog, we’ll see some other interesting Concepts/Patterns/Properties of a microservice.

I hope that this was informative and easy to understand, also, if you have any queries please feel free to list them down in the comments section.

Especially: I would like to thank the Team_Lighbend for preparing such informative content in their #Reactive_Architecture course series at cognitive classes. Do check-out these courses at Cognitive classes