HDFS Erasure Coding in Hadoop 3.0

Table of contents

Reading Time: 4 minutes

HDFS Erasure Coding(EC) in Hadoop 3.0 is the solution of the problem that we have in the earlier version of Hadoop, that is nothing but its 3x replication factor which is the simplest way to protect our data even in the failure of Datanode but needs too much extra storage. Now, in EC storage overhead magically reduced to 50% which is earlier 200% because of HDFS default 3x replication factor which also seems like extra work/load to store two more extra blocks other than our original data block with the same amount of resources as the original data block.

Therefore, in Hadoop 3.0, using Erasure Coding (EC) in place of replication, which provides the improved level of fault-tolerance with much less storage space. In typical Erasure Coding (EC) setups, the storage overhead is no more than 50%.

To understand Erasure Coding in detail, first I would like to introduce two terms:-

Durability:- How many simultaneous failures can be tolerated?
Storage Efficiency:- How much portion of storage is used for data?

Earlier when the replication factor is 3.

Data Durability is 2 as we can handle 2 simultaneous failure.

Storage Efficiency is 33% (useful block/total blocks i.e. 33%).

Apart from this, it causes 200% overhead in making two extra data copies in storage.

Today’s HDFS Erasure Coding

There are two algorithms available for it:-

1) XOR algorithm(Simple EC Algo)

It is the simplest implementation of HDFS Erasure coding. Let’s assume X and Y are data cell then parity cell is XOR of these two data cells.

Here, Data Durability is 1 as if can handle 1 simultaneous failure & Storage Efficiency is 75% (as we are using only 1 extra block i.e. 3/4).

x ⊕ y is XOR by which only one parity bit is generated and if anyone bit is lost it can be recovered by the remaining data cells and a parity bit. It is very limited since it produces 1 parity bit so XOR operation can tolerate only 1 failure with n group size but we get the benefit of better Storage Efficiency by using XOR algorithm.

2) Reed-Solomon algorithm(Improved EC Algo):-

The limitation of XOR operation is solved by improved EC algorithm or commonly known as a Reed-Solomon algorithm. Reed-Solomon uses linear algebra operations to generate multiple parity cells where instead of getting only one fault tolerance at a time, we can tolerate multiple failures per group. It works by multiplying a Generator Matrix (G^T) with ‘d‘ data cells to generate codeword with d data cells and p parity cells. In Reed, Solomon fault tolerance is up to ‘p’ i.e. (number of parity cells) cells and storage efficiency is d/d+p where ‘d’ is data cells and ‘p’ is parity cells.

In this particular example, when you look at the codeword, actual data cells are 6(blue cells) & 3(red cells) are the parity cells which are simply obtained by multiplied our data cells to generation matrix.

Storage failure can be recovered by the multiplying inverse of generator matrix with the extended codewords as long as ‘k’ out of ‘k+m’ cells are available.

Therefore, here Data Durability is 3 as it can handle 2 simultaneous failure, Storage Efficiency is 67% (as we are using only 1 extra block i.e. 6/9) & only we need to store half number of cells as compared to original number of cells, we can conclude that we also have only 50% overhead in it.

Advantages of HDFS Erasure Coding in Hadoop

Saving Storage – Initially, blocks are triplicated when they are no longer changed by any additional data, after this, a background task encode it into codeword and delete its replicas.
Two-way Recovery – HDFS block errors are discovered and recovered not only during reading the path but also we can check it actively in the background.
Low overhead – Overhead is reduced from 200% to just 50% in RS encoding algorithm.

Conclusion

Integrating EC(Reed-Solomon Algo) with HDFS can improve storage efficiency while still providing similar data durability as traditional replication-based HDFS deployments. As an example, a 3x replicated file with 6 blocks will consume 6*3 = 18 blocks of disk space. But with EC (6 data, 3 parity) deployment, it will only consume 9 blocks of disk space.