rhadoop

Before going to Installation steps i’d like to give a small introduction on RHADOOP.

What is RHadoop?

RHadoop is an open source project for combine R and Hadoop together.

It contains 4 different packages to combine different project from Hadoop and 1 package to enhance some functions to fit MapReduce framework.

  • rhdfs: Combine Hadoop’s HDFS with R.
  • rhbase: Combine Hadoop’s HBase with R.
  • rmr2: Combine Hadoop’s MapReduce 2 with R.
  • ravro: Combine Hadoop’s Avro with R.
  • plyrmr: Provides a familiar plyr-like interface with MapReduce.

You can reference the official GitHub of RHadoophttps://github.com/RevolutionAnalytics/RHadoop

Requirements

First at all, I have installed HDP2.5 on CentOS7 as my Demo Hadoop Cluster.

I suggest using Apache Ambari to deploy your own Hadoop.

You can reference my article to get more instruction of use Apache Ambari to deploy Hadoop Cluster.

http://ammozon.co.in/gif/HA0_MyEnvironmentCentos6.8introduction.gif : Series of Hadoop installation Part 1 to Part 9 using Ambari.

Coming back to RHADOOP, I’ve documented the steps and present at below GitHub site.

https://github.com/mkjmkumar/RHADOOP_Installation_on_HDP2.5

What this Script actually do?

  1. Install EPEL Repository.
  2. Install R from EPEL Repository.
  3. Install requirement packages of RHadoop.
  4. Setting Environment Variable of RHadoop.
  5. Install RHadoop.
  6. Run RHDFS smoke test.

You can find gif using my step by step cluster at below location:-

http://ammozon.co.in/gif/RHadoop_Installation.gif

After a few hurdles i was able to make success in RHADOOP installtion and completed capture of installation is preset on above gif.

Next i’ll create some use cases on RHADOOP. Keep in touch!!!

 

Leave a Reply

Your email address will not be published. Required fields are marked *