Build & design highly available and resilient Fintech products

Table of contents

Reading Time: 7 minutes

FinTech – by the definition from Oxford Dictionary, it means computer programs and other technology used to provide banking and financial services.

When it comes to using Financial Services, every user of that service would want it to be available 24/7 i.e. no downtime and 100% user responsive.

And this means that FinTech products, by nature, should always be available and user-responsive.

But wait, how does the Reactive Architecture fit in here? Let’s see what reactive architecture has to say on this.

Reactive Architecture comes from Reactive Manifesto, which focuses mainly on user-responsiveness in a timely manner. It says that an application is useless if it doesn’t respond on time. And as well said, data not available on time is equal to data unavailability. For example, the world uses the Google search engine because it gives you a response in the blink of your eye. We probably will stop using the Google search engine if it starts taking more time for every search query.

Systems built with Reactive Architecture are more flexible, loosely coupled, and elastic. Systems which is built on this architecture should have the following properties:

– Responsive

– Resilient

– Elastic, and

– Message Driven

Responsive –

The system should respond in a timely manner if at all possible. The system should deliver a consistent quality of service to its users. It should also be able to provide consistent response time. When your system is reacting and responding timely, this builds a kind of trust and confidence amongst its users, which in turn attracts more and more new users.

Resilient –

The term resilient means that the system should stay responsive even in case of any failure. Any system that is non-resilient will never be responsive. Now the question may arise that is this possible to stay responsive in case of failure? And the obvious answer is YES, of course. This can be achieved by following the patterns like replication, isolation, containment. Failure of one component or one part of the system should not impact the other parts of the system. In the worst case, the system should respond with the error message or provide a degraded but still useful level of service.

Elastic –

The system should stay responsive even under heavy load. This means that there should not be any visible delay in the user response even during peak hours. The system should be designed in such a way that it is able to handle and distribute the request load to remove any central bottleneck. Your application should be able to scale up and scale down automatically under varying workloads.

For example, your web application is running with one instance which is able to handle appx. 10-15 user requests per second. But when the user request increases (assume it to be 50-60 per second), then it should automatically scale up and create at least 4 more instances of the same application, so that the load will be distributed among these 1+4 instances and your users will get the response in a timely manner. Happy users = Fewer troubles and more profit.

Message Driven –

Reactive systems should rely on asynchronous message passing. It should establish a boundary between multiple parts of the system ensuring loose coupling, isolation, and location transparency.

As mentioned earlier in this blog, FinTech applications require 24/7 availability and also are very much critical in nature because of the direct involvement of money.

So in order to make your Fintech application available round the clock, you also need to follow Reactive Patterns or practices which are under the reactive architecture. These help to design highly coherent, resilient, and responsive products.

The Reactive architecture fits best with Microservices where each microservice is based upon the concept of building a single function module. One microservice is responsible for handling and managing one part of the entire system. For example, in a sample FinTech application, the Account service is independent of Inventory Service. As we can draw a fine boundary between these two different components, there can be two different services acting independently from each other, one for the account and another for inventory. And in case, they need to communicate then they can make use of API for the same.

Under this reactive manifesto, there are other patterns. That will help us to design a highly coherent, resilient, and responsive application. There are the following design patterns:

1) Domain-Driven Design(DDD):

Domain-driven design is a great way to define the service boundaries. When building a microservice, you must first look at the domain of the microservice. But the question arises, how to define the domain?

To understand the domain, tech needs to be in constant touch with the Business so that requirements can be understood clearly. Both the parties, business, and tech, use the ubiquitous language to understand the requirements and prepare the domain or bounded context for the application.

By this approach, complex business requirements can be designed in a much simpler way. It provides a way to convert business requirements into different DDD primitives like business domain, domain logic, domain context, and domain boundary. The most important point is it allows non-technical people also to participate in the analysis & design phase without any obstacles. That will help to gather accurate business requirements in the early stage.

For example, in the sample Fintech application, as mentioned above, there are two different features or functionalities: account, and inventory management.
The account consists of the following possible use cases –

Create or Update an account
Deleting or Disabling an account
Add and Subtract credits
Make a transaction on the account
Check the transaction history

Similarly, for inventory management, the business has the following use case for now-

Add, update or remove inventories
Inventory tracking
Reporting on a regular basis

Seeing through the business requirements, there is a clear boundary and hence, there has to have two different microservices: account service and inventory service.

Here we can clearly see the account service will have only account-related features and similarly, the inventory service will have inventory-related features. Both the services will run completely independently from each other, solely responsible for their own data and state, and should follow a separate SDLC process.
In the future, if you need to add new features to the inventory, then you only need to make changes in the inventory service. The account service is independent of inventory changes.

2) Command Query Responsibility segregation(CQRS):

Once the boundaries of the services are defined, the services read and write operations then effectively need to be designed.
On top of DDD, you can design your microservice in the form of CQRS. With this, you can separate out the read/query side completely from the write/command side.

Command side requests are the ones, which change or modify the state of your system or database. Similarly, the query side requests do not modify the state and are only used to view the state.

Having separate command and query sides simplifies the design and implementation.

When the command and query sides are separated, then the corresponding request load can also be scaled accordingly. And this allows you to scale each side separately. If your application needs high processing power for the query side, then you can scale out (horizontally or vertically) the query side implementation. And you can also apply different optimization techniques on the read side without touching the write side, or vice versa.Taking an example from the account service, the possible command side and query side requests can be:

Command side requests:

CreateAccount
DeleteAccount
DebitAccount
CreditAccount

And query side requests can be:

GetAccountSummary
GetLastTransaction
GetLastNTransactions

Key benefits of CQRS:

Independent scaling. It allows the read and write workloads to scale independently, and may result in fewer lock contentions.
Optimized data schemas. The read side can use a schema that is optimized for queries, while the write side uses a schema that is optimized for updates.
Security. It’s easier to ensure that only the right domain entities are performing writes on the data.
Separation of concerns. Segregating the read and write sides can result in models that are more maintainable and flexible. Most of the complex business logic goes into the write model. The read model can be relatively simple.
Simpler queries. By storing a materialized view in the read database, the application can avoid complex joins when querying.

3) Event Sourcing:

After identifying the business boundary and separating the read and write operations, We have another design pattern that uses the benefits of the above two patterns and on top of this. It applies logical behaviour to make the system state fully resilient in case of any failure.
Event Sourcing is the idea of ensuring that every change in the system state is captured in the form of Events. In most cases, CQRS and Event Sourcing are used together. Every command received in the system, event(s) is generated by the system acting as proof that the command is processed. The event, when processed, changes the state of the system. The events are stored in a separate event log table. Events are only generated from the commands. The read side does not generate the events because the read side doesn’t change the state of the system.
At first glance, it might seem confusing the need of generating events for each command. But in the case of system failure, the events help to recover the system by re-build the current state through re-executing the events from the event log table one-by-one. This can be explained by a simple example:

From the sample FinTech application, for the account service, the following command generated the events as:
– CreateAccount -> AccountCreated
– UpdateAccount -> AccountUpdated
– AddBalance -> AccountBalanceAdded
– DebitAccount -> AccountDebited

Commands are received on the left and each command is generating an event. The important thing with event sourcing is that the system maintains the log of every state change in the form of events. Not only we can see the current state of the account, but we can also see when and how the state changes have been done for the particular account.
Event Sourcing can be very helpful and important for the FinTech product. Not only it helps in rebuilding the current state in case of failure and recovery, but it also keeps track of changing the system behavior. These event logs can be used in auditing and reporting your FinTech application.

While these patterns are not the only ones for Reactive architecture, there are other patterns also. But the above-mentioned patterns are the most popular ones in the industry.

Below are a few Fintech ventures who followed the principles of Reactive architecture to deliver hassle-free benefits to their users.
– Capital One scaled real-time auto loan decision-making with Reactive architecture.
– PayPal migrated from its old architecture to Reactive and scaled to billions of daily transactions with improved throughput.
– Verizon Wireless started using Reactive architecture and doubles overall business and performance results using half the hardware.

Conclusion

To conclude, when designing FinTech applications on Reactive architecture, it is important to follow the concepts of the Single responsibility principle, Asynchronous message passing,

In nutshell, we can say that, when we design any Fintech product by using the reactive principles along with these patterns. I can say, we are able to deliver 100 % resilient and 24/7 availability of a system with high throughput and fully cohesive, maintainable, and scalable product.

Also published on Medium.