Couchbase High Availability and Disaster Recovery: Part 1

Reading Time: 4 minutes

Couchbase Server is an open-source, distributed, NoSQL, document-oriented engagement database. It is designed to support strong features like:
– Flexible data model
– Simple administration
– Query and Analytics
– Memory first architecture
– High Availability and many more

Couchbase strongly emphasizes reliability, high availability, and simple management. It aims to perform operations while the system remains online, without interrupting running applications.

High Availability

Focused on high availability, it leverages Replication and Failover mechanisms. These mechanisms help Couchbase to prevent downtime caused by unplanned incidents and enhance system availability.

Rack Zone Aware Replication

Replication of data in a distributed environment embarks availability. As a result, the data replicates to multiple nodes/data-centers. And, eventually prevent loss of data in the first place and ensures to provide access to replicated data if any data node is lost.

A Couchbase Rack is a logical group of nodes. It can be based on their physical location in the network.

Couchbase Racks
Couchbase Racks

Couchbase Rack Zone awareness ensures that active and replica documents are automatically assigned to different physical groups(Rack). So that, if a physical rack fails, the copy remains safe and available in other Rack!

Couchbase high availability: Rack Zone Awareness
Couchbase Rack Zone Awareness

Rebalancing and re-replication initiates administratively afterward.

XDCR: Cross Data Center Replication

Couchbase Server supports two forms of replication:

Local, or intra-cluster replication

A Couchbase cluster consists of one or more instances of Couchbase Server node.

Intra-cluster replication involves replicating data across the nodes of a cluster.
To initiate intra-cluster replication, a bucket is configured with some replicas. Although, the actual replication will be depend upon the number of nodes in the cluster. As a result, data will be maintained and updated in replica copies across different nodes. And hence they will be ready to get active in the event of node-failure.

Remote, or Cross Data Center Replication (XDCR)

XDCR involves replicating data across different clusters, each of which may occupy a different data center.

Cocuhbase high availability: xdcr

To configure XDCR, a bucket must be assigned as the replication source.
While, another bucket on the remote cluster should be assigned as the replication target. As a result, changes made to the source bucket automatically replicates to the target bucket.

Moreover, XDCR can be unidirectional, bidirectional, or hybrid.
So, if required, both source and target buckets can be accessed directly by the applications. But, make sure to configure XDCR replication in both directions. So that, updates made on either cluster replicates to each other, eventually making both buckets as source and target.

Furthermore, XDCR provides the following features above its replication capability:
Secure, continuous, memory to memory replication
The Replication is setup as robust, SSL encrypted pipelines.
Cluster topology neutral and aware
Couchbase XDCR works irrespective of the cluster configuration since each cluster may have different size and resources.
Because, only the most recent mutation of a document streams across the pipeline.
XDCR provides a lot of control through filtration. Since streams can be filtered based on document ID patterns. It allows only certain documents to flow through.
Since it uses DCP(Database Change Protocol), it allows pausing or resuming to any checkpoint. And eventually allows no loss auto-recovery to most recent checkpoint on. Also, the recovery can be done on any node/rack or zone failure.

Administration and Auto failover

When failover happens:
– Firstly, replica documents are promoted for activation, and
– Meanwhile, cluster maps are updated on the clients.
Cluster maps are the Connection and Cluster topology Details managed by the SDK

Couchbase high availability: Auto Failover
Couchbase Auto Failover

Similarly, while adding a new node to the cluster:
– vBuckets(Bucket divided into 1024 virtual Buckets) are recalculated, and then Documents incrementally transfer.
During the transfer process, there are ongoing cluster map updates to ensure clients have the most current data and service configurations.
As a result, we have 0 downtime as we add/remove any node.

Also, Couchbase allows automating failovers completely if chosen. Whereas rebalancing and re-replication initiates administratively considering the risk of cascading failure.

Resilient App design

Couchbase SDK includes features for resilient, reactive, asynchronous data access patterns that respond intelligently for timeouts in case of failures.
With this, both active and replica documents are capable of serving read requests. And eventually, avoid waiting for failover and replica promotion.

Furthermore, with bidirectional XDCR we can not only write data to a node but to any cluster in the entire network. Simultaneously, getting immediate consistency on the target cluster & eventual consistency on the rest depending on the speed of the network.

Also, Auto retry timeout can be configured to as little as 5 ms, to bounce the requests to any hot-standby Data Centre.


So far, we have learned how Couchbase achieves high availability by leveraging replication and failover mechanisms in its distributed environment.

In addition to that, Cross-Data Center Replication (XDCR) replicates data between clusters that protects against data center failure. And, also provides high-performance data-access for globally distributed, mission-critical applications.

So, In the next post, we’ll discuss how Couchbase high availability mechanisms begin to administer Disaster Recovery. Thus, protecting from potential data loss due to any unplanned incidents.

Hope you like this blog. Feel free to share your queries or thoughts in the comments section. 




Written by 

Neha is a Senior Software Consultant with an experience of more than 3 years. She is a Big data enthusiast and knows various programming languages including Scala and Java. She is always eager to learn new and advance concepts to expand her horizons and apply them in the project development.