Following a survey made for a well estabilished group at LinkedIn, Big Data professionals impute the causes of failure of most Big Data projects to a lack of knowledge on how to set up, deploy and manage a Hadoop cluster. Although I consider this vision a bit too much simplistic, while I endorse a vision where the change management matters, the bootstrap difficulties for the IT departments are indeed huge and steep. Major Hadoop distribution companies, like Cloudera and IBM, are aware of this flaw and provide sophisticated installers and managers, while they offer quick start VMs for developers at same time. First of all, this cannot be enough for those Enterprises which need to put their production environments under control and it’s even not enough for those developers which need to use something more reliable than a single-node virtual cluster. Finally, it’s important to understand how Hadoop works under the hood, but of course without wasting time on configuration details.
Vagrant definitively comes to help. Using a Cloudera Vagrant box, you’ll be able to build a complete scalable cluster that (with some minor tuning) could even be used in production environments. In this tutorial we’re going to build a cluster based on Cloudera Hadoop Distributions in minutes and without user interventions.