Hadoop for DBAs (2/13): Building Hadoop for Oracle Linux 7

hadoop2

You may prefer to install Hadoop from distributions like the ones from Cloudera or Hortonworks… Even better, deploy Hadoop in the cloud or with an appliance like Netapp’s or Oracle’s. Those solutions help build, manage and are ready for your operating system. That’s not so obvious with Apache Hadoop software library.

No Pain, no gain! These series of articles are written so that you can taste the benefits and also some of the Hadoop challenges. They rely on the rough latest release from Apache, i.e. 2.4.1. It allows to test the latest features if needed. And because Apache Hadoop comes with 32-bit compiled libraries, we’ll need to rebuild it from source. I’m kin to it, so I’ll be using Oracle Linux 7. It should not be too difficult to adapt it to RHEL7, CentOS7 or Fedora…

Package Installation

To build Hadoop from source, several packages, libraries and tools are required. The 3 commands below install more than what is necessary to perform that task:

Google Protocol Buffers Installation

An important prerequisite to compile Apache Hadoop is the availability of Google Protocol Buffers 2.5. You might want to install it from the EPEL 7 repository  (for now beta). You can also install it from source:

Note:
protobuf default installation location, from source, is /usr/local/bin. Make sure it is included in the PATH variable or change the prefix to build Hadoop.

Java SE 8 JDK Installation

Most of Hadoop is written in Java and you’ll need to install a Java SE JDK too. You can use Java SE 8 RPM from the Oracle website or rely on OpenJDK [2]:

Add the 2 lines below in ~/.bashrc or a profile file to access Java during the build:

Maven Installation

Maven is used to build Hadoop. Download and install Maven distribution from one of Apache Mirror sites:

Add the lines below in ~/.bashrc or a profile file to access maven during the build:

Download Hadoop Source

Like all Apache projects, Hadoop software configuration manager is subversion. The good news is Apache also provides a git repository. Download Hadoop and checkout the 2.4.1 version:

Build Hadoop

You are done with installing the prerequisites and you should be good to run the build. The command below generates the distribution file, including the 64-bit dynamic C libraries. It should be archived and compressed in the hadoop-dist/target directory:

Note:
Hadoop Javadoc is not properly formed, including some unescaped punctuation characters. That is why you must skip it from the build.

Here we are, ready to install an Hadoop cluster on Oracle Linux 7…

References :
To know more about Hadoop build, read:
[1] How to Contribute to Hadoop Common.
[2] Hadoop Wiki Java Versions Page

Gregory Guillou

About Gregory Guillou

Gregory Guillou has written 768 post in this blog.

Senior Technical Architect at Easyteam

One thought on “Hadoop for DBAs (2/13): Building Hadoop for Oracle Linux 7

Leave a Reply