Data Storage Techniques: Object versus the HDFS

July 17 2015

The sudden explosion in data is forcing the people to rethink their strategies when it comes to long term storage of the data. The distributed file systems will find their place whatever the case may be, but the question is: Which distributed system one must pick?

Data Storage Techniques Object versus the HDFS

Is the file based system like HDFS is good or the object based file system which is being offered by the Amazon S3 will be more beneficial? And this brings us to the point where the arrangement ends and the debate begins on who is better than who.

The HDFS: Hadoop Distributed File System

Right now in the market HDFS (Hadoop Distributed File System) is the top contender when it comes to building the data lake. HDFS offers scalability, reliability and cost-effectiveness when it comes to dropping the data before you realize the real value of it and with all the growing ecosystem around the Hadoop it is no surprise that the organizations today are looking up to the HDFS as a long term storage option and processing need for the data.

Object Storage System

On the different facet lies the today’s modern object storage system whose storage costs are measured in cents-per-gigabyte. Many large scale web based companies including the giants like Amazon, Google, Facebook and many others uses object based storage systems which gives them the certain advantage when it comes storing the data effectively.

Coming to the surface: Let’s talk of the real benefits offered by the two of them

Other than the matter of individual choice and preference in storage system, the question still is: Under what circumstances you might want to opt for the HDFS or object based storage system? So, let us all take a moment and see the benefits offered by both storage technologies.

Object based stores are gaining popularity among the companies which want to make sure that there will be no lost in data.

They are using the Hadoop to analyze the data, but they are certainly not looking at it in terms of long storage. It is so because by design fundamentals Hadoop is suited to pour the large data set but lacks the reliability, compliance and attributes that makes it the first choice in the long term data store.

On the other hand, object based storage systems offer reliability when it comes to long term storage as compared to Hadoop and the reasons are simple:

  1. Use of the algorithm called erasure encoding which is capable of spreading out the data across any number of disks.

  2. Object stores also have the spare drives in their architecture to handle the unexpected failure of other hard drives and rely on the erasure algorithm to build the data automatically in case of drive failure.

  3. Using the Hadoop’s default settings a data is stored three times delivering the five 9s of the reliability, a gold standard in the enterprise computing but the lead architect of Hadoop,Arun Murthy,recently pointed out that by changing the default settings in Hadoop, if you are storing everything twice in HDFS, it gives you four 9s which sounds really good.