View Sidebar
When Facebook Concluded Largest Hadoop Data Migration Ever

When Facebook Concluded Largest Hadoop Data Migration Ever

11/04/2013 12:28 pm0 comments

Since the inception of Facebook in particular, days of storing massive data on servers are here. Data content being shared on the internet is growing enormously with every passing day and managing the same is becoming a problem for organizations across the globe.

When Facebook Concluded Largest Hadoop Data Migration EverFacebook recently undertook the largest data migration ever.  The Facebook infrastructure team moved dozens of petabites of data to a new a center – not easy, nonetheless a task well executed.

Over the past couple of years, the amount of data stored and processed by Facebook servers has grown exponentially, increasing the need for warehouse infrastructure and superior IT architecture.

Facebook stores its data on HDFS — the Hadoop distributed file system. In 2011, Facebook had almost 60 petabytes of data on Hadoop, which posed serious power and storage shortage issues. Geeks at Facebook were then compelled to move this data to a larger data center.

Data Move

The amount of content exchanged on Facebook daily has created a demand for a large team of data infrastructure management professionals. They will analyze all the data to give it out to in the quickest and most convenient way. The treatment of such large data requires large data centers.

So considering the amount of data that had piled up, Facebook’s infrastructure team just concluded the largest data migration ever. They moved petabytes of data to a new center.

This was the largest scale data migration ever. For this Facebook set up a replication system to mirror changes from smaller cluster to the larger cluster. This allowed all the files to be transferred.

First, the infrastructure team used the replication clusters to copy and transfer bulk data from the source to the destination cluster. Then the smaller files, Hive objects and user directories were copied onto the new server.

The process was complex, but since the replication clusters minimize downtime (time how quickly both old and new clusters can be brought to identical state), it became easy to transfer data on a large scale without a glitch.

Learning curve

According to Facebook, the infrastructure team has used a replication system like this one previously too. But, earlier, the clusters were smaller and could not accommodate the rate of data creation, which meant these clusters weren’t enough.

The team worked day in and day out for the data transfer. With the use of the replication approach, the migration of data became a seamless process.

Now, the team having transferred massive data to a bigger cluster means that Facebook can deliver absolutely relevant data to all users.

Leave a reply