Building Scalable Systems

Reading Time: 7 minutes

Building a Reactive System is all about the balance between consistency and availability and the consequences of picking one over the other. This article mainly focuses on consistency and availability and how they impact the scalability of a system.

What is Scalability, Consistency and Availability?

A system is scalable if it can meet the increase in demand while remaining responsive.
It is consistent if all the nodes show the same data at the same time.
It is available if it remains responsive despite any failures.

How does the scalability of a system differs from the performance of the system?

Scalability and performance are related but different concepts and we need to understand what the difference is.

Scalability is the number of requests system can handle at a time, i.e. load. It’s about optimizing the ability to handle load, which means improving how many requests system can handle at a time. Performance on the other hand is the time system takes to complete a single request, i.e. latency. It’s about optimizing the response time, which means improving how quickly system can handle a single request.

Performance has a limit on reducing the response time, and we will eventually reach that limit. Whereas, scalability has no theoretical limit. We may be restricted by the implementation. But in a perfectly scalable system, we could scale forever.

So when we build Reactive Micro-services we tend to focus on improving scalability than improving performance.

How can we measure scalability and performance of a system?

Measurement like requests-per-second actually measures both. This makes it a valuable metric because we can use it to see whether we have improved our scalability or our performance. But it also means that it is somewhat restrictive in the sense that if it improves we can’t tell which one changed. So if we want to know where that improvement came from then we have to track scalability and performance individually.

How can we explain consistency in distributed systems?

Distributed systems are systems that are separated by space. This means, the system could be deployed across multiple data centers or within the same data center, or just deployed to different hardware or to the same hardware.

Even if it’s deployed to the same hardware, a distributed system is one where information has to be transferred between different parts of that system and when that information is transferred it’s crossing some sort of space. It could be going over a local network, or it could be writing to a disk, or it could be writing to a database.

Information cannot be transferred instantaneously, it takes some time. Granted that time could be very small but there is an amount of time that elapses during the transfer of information. Within that time duration when the transfer the information takes place, the state of original sender may change.

The key here is to recognize that when we are dealing with a distributed system, we are always dealing with stale data. Reality is basically eventually consistent.

What is Eventual Consistency?

When a system stops receiving updates at least for some period of time, we can guarantee that all parts of the system will eventually converge on the same state. Thus in this way we can reach that level of consistency.

Common source control tools (Git, Subversion, etc) operate on an eventually consistent model. They rely on a later merge operation in order to bring things back into alignment. That’s how modern source control tools achieve consistency and it’s all an eventually consistent system.

Traditional monolithic architectures are usually based around strong consistency they use a strongly consistent database like a SQL database.

What is Strong Consistency?

When all members of a system agree on the state, before it becomes available, then we reach the level of strong consistency.

We can achieve strong consistency by introducing mechanisms like locks. Distributed system problem occurs when we have multiple things which are responsible for the same piece of data. As long as only one thing is responsible for that data, as long as we only have one instance of the lock, it’s not a distributed system problem anymore. Thus in this way we can resolve the distributed system problem by using a non distributed resource(lock).

But when we introduce a lock, it introduces overhead in the form of contention. That overhead has consequences to our ability to be elastic, to be resilient, and it has other consequences as well.

What is Contention?

Contention can be explained as when any two things contend for a single limited resource and are in competition with each other. That competition can only have one winner. This means that the others are forced to wait in line for the winner to complete.

As the number of things competing increases, the time to free up the resources increases.

What is Coherency Delay?

In a distributed system we have a system of multiple nodes and they want to reach a state of coherence where they all agree on the state of the system. Now in order to reach that state of coherence each node in the system will have to send messages to each other node informing them of any state changes. Over time all of the nodes in the system will receive all of the messages notifying them of the state changes and they will reach that state of coherence, but this does take time. The time that it takes to reach this synchronization is what we call coherency delay.

As the number of nodes increases, coherency delay also increases.

What are the Laws of Scalability?

The two laws of scalability are: Amdahl’s Law and Gunther’s Law

Amdahl’s Law of Scalability:
Amdahls’ Law shows the effect of contention on a distributed system. The key part of Amdahl’s Law is to recognize that contention limits parallelization. This law defines the maximum improvement gained by parallel processing. Improvements from parallelization are limited to the code that can be parallelized. So if we have pieces that can be parallelized we can improve those. The pieces that can’t be parallelized, there’s nothing we can do about it, because we have that single limited resource that we are contending for. Amdahl’s Law demonstrates that when we have contention, and we try to add concurrency, or scale up the system, we start to see diminishing returns.


Gunther’s Law of Scalability:
Gunther’s law shows the effect of contention as well as effect of coherency delay on a distributed system. This law demonstrates that as we scale up the system, we need more communication and coordination between all the nodes, as a result we eventually reach a point where the cost of that communication and coordination exceeds any benefits gained from scaling up. This Law shows that increasing concurrency can cause negative returns due to contention and coherency delay. This reduces our ability of parallelization.


So both of these laws shows that linear scalability is almost always unachievable. To achieve linear scalability we require total isolation. We basically need a system that’s stateless. No state is stored, either in the application, or the database. It’s very difficult or impossible to design a system that is truly stateless.


Limitations imposed by these laws are well understood by Reactive Systems. We try to minimize their impact by trying to exploit them rather than avoiding them.

How can we reduce the impact of laws of scalability on reactive systems?

Reactive systems can reduce contention by following ways:

  • By isolating locks. Isolated lock creates less contention.
  • By eliminating transactions.
  • By avoiding blocking operations. Blocking operations tie-up things like memory, and threads, and CPU, and various other resources for some period of time.

Reactive systems can mitigate coherency delay by following ways:

  • By embracing eventual consistency. If we accept the fact that things take a certain amount of time due to coherency delay, then we won’t be surprised by it.
  • By building autonomy. If we have more autonomous systems then we don’t have to deal with communication going back and forth all the time, thus reducing the amount of coherency delay

Thus by avoiding contention and coherency delay, we can achieve higher scalability. The goal here is to reduce or eliminate the things that prevent scalability in reactive systems.

What is CAP Theorem?

CAP Theorem states that in a distributed system we cannot provide more than two of the following:
i. consistency,
ii. availability
iii. partition tolerance.

Partition Tolerance states that the system continues to run, despite the number of messages being delayed by the network between nodes. In such a condition our system can sustain any amount of data without any failure.

Here is a Venn diagram to illustrate the CAP theorem:

The more we want to move towards consistency and partition tolerance, the more we have to sacrifice availability. Similarly the more we want to move towards availability and partition tolerance, the more we have to sacrifice consistency. There is a trade off between consistency and availability.

When a partition occurs, a distributed system has 2 options:

  • AP: System needs to sacrifice consistency by allowing writes to both sides of partitions. When the partition resolved, we need a way to merge the data in order to restore consistency.
  • CP: System needs to sacrifice availability by disabling or terminating one side of the partition. During the partition, either some or all of our system will be unavailable.

We can use technique like Sharding to balance the need for consistency along with the need for scalability. We can use technique like Conflict-Free Replicated Data Types(CRDTs) to balance the need for availability along with the need for scalability.

Consistency or Availability?

The choice between consistency and availability isn’t really a technical decision, it’s actually a business decision. So the decision of when and where to sacrifice consistency and availability should theoretically be discussed with domain experts and product owners. We need to consider various factors before making a choice.
What’s going to have the impact on revenue if the system is unavailable versus eventually consistent?
is going to have a bigger impact on the business?
What is going to have a bigger impact on the customer?
How is it going to affect them in the long run?