Install, setup Cloudera Hadoop on Linux

Install Cloudera Hadoop on Linux

This article talks about installing Hadoop on a single host machine.

Hadoop is the framework for a large amount of data processing paralleled

Hadoop implementation is provided by different vendors like hortionworks and Cloudera.

This article talks about installing Cloudera Hadoop on a single machine.

To set up Cloudera Hadoop, java is required. if java is not already installed, install JDK 1.6, at least update 8

Please download Cloudera-testing. repo from here🔗 and copy it to /etc/yum.repos.d/ and make sure you update the yum command.

Please run the below commands to install hadoop, hive, and pig

yum install hadoop-0.20 -y

yum install hadoop-hive -y

yum install hadoop-pig -y

The above commands installs hadoop to /usr/lib/hadoop folder, hive installs to /usr/lib/hive, pig to /usr/lib/pig

please set up the environment variables as described below in the .bash_rc file

$ \\vi ~/.bashrc

export HADOOP_HOME=/usr/lib/hadoop

export HIVE_HOME=/usr/lib/hive

export PIG_HOME=/usr/lib/pig

export PATH=$HADOOP\_HOME/bin:$PATH:$PIG\_HOME/bin:$HIVE_HOME/bin

save it to .bashrc file

$ source ~/.bashrc

Open $HADOOP_HOME/conf/hadoop-env.sh. Add JAVA_HOME path. Ex: export JAVA_HOME=/usr/java/jdk1.6.0_18

* Open $HADOOP_HOME/conf/core-site.XML. Add the Namenode server name or localhost and port for fs.default.name. Ex:

fs.default.name hdfs://localhost:9000