High Speed Backup and efficient Recovery solution for Big Data Enterprise


Posted on March 13, 2018



Client introduction

Our Client is one of the world’s leading biotechnology companies. It is a value-based company, deeply rooted in science and innovation to transform new ideas and discoveries into medicines for patients with serious illnesses.

The Opportunity

The biotech company’s Enterprise Data Lake consisted of multiple Hadoop clusters which were hosted on premise in their network. These clusters contained over 300 terabytes of processed and unprocessed datasets critical to business function.

The company needed a clean and efficient disaster recovery solution which would do the following:


 Backup datasets which are regularly updated on a daily/weekly basis to AWS S3


 Backup cluster configurations regularly to AWS S3


 Backup all cluster metadata and logs stored in Oracle tables to AWS S3


 Restore the on-premise cluster from the backed up configurations


 Restore the entire data backed up in AWS S3


 Restore all metadata and logs backed up in AWS S3



Solution

from Knowledge Lens was the chosen solution which satisfied all conditions of backup and recovery.

 Backup datasets which are regularly updated on a daily/weekly basis to S3

MLens Data Migration HDFS backup synchronized HDFS directories to AWS S3 buckets and scheduled incremental backups which detected changes in the datasets and only transfered the updated files.

 Backup cluster configurations regularly to AWS S3

MLens Platform Migration Backup connected to Cloudera Manager and kept backing up the latest configurations as per the defined frequency.

 Backup all cluster metadata and logs stored in Oracle tables to AWS S3

MLens Data Migration RDBMS backup feature backed up tables from Oracle Database using connection details, table names and specific queries.

 Restore the on premise cluster from the backed up configurations

MLens Platform Migration Restore created a new Hadoop cluster based on backed up configurations from AWS S3 and by mapping new hosts to the old ones for a disaster recovery scenario.

 Fast recovery of the entire data backed up in AWS S3

MLens Data Migration Restore feature recovered the directories which were backed up with MLens Data Migration job. Using it’s distributed framework it ensured that the recovery is fast and business impact is minimal.

 Restore all metadata and logs backed up in AWS S3

MLens Data Migration Restore simply restored the RDBMS tables and records that had been scheduled for backup using MLens Data Migration RDBMS job.


Additional benefits

 Zero downtime achieved by MLens unique feature of Query support directly on Live backups.


 MLens software delivered high speed parallel data processing without landing zones.


 It supported compression and format conversion during backup.



Key takeway

While there are many data migration, backup and recovery tools available, no other solution provides such a wide range of disaster recovery solutions which not only ensures data migration but recovers the cluster and the data."

Leave a comment