Before going to Installation steps i’d like to give a small introduction on RHADOOP.
What is RHadoop?
RHadoop is an open source project for combine R and Hadoop together.
It contains 4 different packages to combine different project from Hadoop and 1 package to enhance some functions to fit MapReduce framework.
- rhdfs: Combine Hadoop’s HDFS with R.
- rhbase: Combine Hadoop’s HBase with R.
- rmr2: Combine Hadoop’s MapReduce 2 with R.
- ravro: Combine Hadoop’s Avro with R.
- plyrmr: Provides a familiar plyr-like interface with MapReduce.
You can reference the official GitHub of RHadoop: https://github.com/RevolutionAnalytics/RHadoop
Requirements
First at all, I have installed HDP2.5 on CentOS7 as my Demo Hadoop Cluster.
I suggest using Apache Ambari to deploy your own Hadoop.
You can reference my article to get more instruction of use Apache Ambari to deploy Hadoop Cluster.
http://ammozon.co.in/gif/HA0_MyEnvironmentCentos6.8introduction.gif : Series of Hadoop installation Part 1 to Part 9 using Ambari.
Coming back to RHADOOP, I’ve documented the steps and present at below GitHub site.
https://github.com/mkjmkumar/RHADOOP_Installation_on_HDP2.5
What this Script actually do?
- Install EPEL Repository.
- Install R from EPEL Repository.
- Install requirement packages of RHadoop.
- Setting Environment Variable of RHadoop.
- Install RHadoop.
- Run RHDFS smoke test.
You can find gif using my step by step cluster at below location:-
http://ammozon.co.in/gif/RHadoop_Installation.gif
After a few hurdles i was able to make success in RHADOOP installtion and completed capture of installation is preset on above gif.
Next i’ll create some use cases on RHADOOP. Keep in touch!!!