Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Hadoop Development Environment Setup Guide

Tech May 19 1

Overview

This guide covers setting up a complete Hadoop development environment including Java JDK configuration and Hadoop installation in pseudo-distributed mode. It's recommended to complete both sections together for optimal results.

Section 1: Java JDK Configuration

The first step involves configuring the Java Development Kit required for Hadoop operations.

Implementation Steps

Follow thece commands to install and configure Java:

# Create application directory
mkdir /app
cd /opt

# Extract Java archive
tar -zxvf jdk-8u171-linux-x64.tar.gz
mv jdk1.8.0_171/ /app

# Configure system environment variables
vim /etc/profile

Add the following configuration to the profile file:

#----------------------------------------------------------
JAVA_HOME=/app/jdk1.8.0_171
CLASSPATH=.:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH

export JAVA_HOME CLASSPATH PATH
#----------------------------------------------------------

Apply the changes and verify the installation:

source /etc/profile
java -version

Secsion 2: Hadoop Installation and Pseudo-Distributed Cluster Setup

This section covers installing Hadoop and configuring it for pseudo-distributed operation.

Initial Setup

Begin by extracting and organizing the Hadoop installation:

cd /opt
tar -zxvf hadoop-3.1.0.tar.gz -C /app
cd /app
mv hadoop-3.1.0 hadoop3.1

SSH Key Generation

Configure passwordless SSH access which is essential for Hadoop cluster operations:

ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

SSH Configuration

Update SSH server configuration to support key-based authentication:

vim /etc/ssh/sshd_config

Modify the following settings:

#----------------------------------------------------------
RSAAuthentication yes # Enable RSA authentication
PubkeyAuthentication yes # Enable public/private key pair authentication
AuthorizedKeysFile %h/.ssh/authorized_keys # Public key file path
#----------------------------------------------------------

Hadoop Environment Configuration

Navigate to the Hadoop configuration directory:

cd /app/hadoop3.1/etc/hadoop/

Java Environment Setup

Configure Java home in Hadoop environment:

vim hadoop-env.sh

Add the Java home configuration:

#----------------------------------------------------------
export JAVA_HOME=/app/jdk1.8.0_171
#----------------------------------------------------------

Similarly update the YARN environment:

vim yarn-env.sh

Add:

export JAVA_HOME=/app/jdk1.8.0_171

Core Configuration Files

Update the core-site.xml file:

vim core-site.xml

Insert the following configuration:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    <description>HDFS URI, filesystem://namenode_identifier:port_number</description>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop/tmp</value>
    <description>Local temporary folder for namenode</description>
  </property>
</configuration>

Configure HDFS settings in hdfs-site.xml:

vim hdfs-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/usr/hadoop/hdfs/name</value>
    <description>Storage location for HDFS namespace metadata on namenode</description>
  </property>

  <property>
    <name>dfs.data.dir</name>
    <value>/usr/hadoop/hdfs/data</value>
    <description>Physical storage location for data blocks on datanode</description>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

MapReduce Configuration

Set up MapReduce framework in mapred-site.xml:

vim mapred-site.xml

Configure the framework name:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

YARN Configuration

Configure YARN resource management in yarn-site.xml:

vim yarn-site.xml

Add the following settings:

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>192.168.2.10:8099</value>
    <description>Management interface address</description>
  </property>
</configuration>

Directory Creation and Final Setup

Create necessary directories:

mkdir -p /usr/hadoop/tmp
mkdir /usr/hadoop/hdfs
mkdir /usr/hadoop/hdfs/data
mkdir /usr/hadoop/hdfs/name

Add Hadoop to system PATH:

vim /etc/profile

Add these environment variables:

#----------------------------------------------------------
# Hadoop Environment Variables
export HADOOP_HOME=/app/hadoop3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#----------------------------------------------------------

Apply the changes:

source /etc/profile

Initialize and Start Services

Format the NameNode and start services:

hadoop namenode -format
start-yarn.sh
cd /app/hadoop3.1/sbin

Security Configuration Updates

Update startup scripts with security configurations:

Edit start-dfs.sh:

vim start-dfs.sh

Add at the beginning:

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Edit stop-dfs.sh:

vim stop-dfs.sh

Add:

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Edit stop-yarn.sh:

vim stop-yarn.sh

Add:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Edit start-yarn.sh:

vim start-yarn.sh

Add:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Final Startup

Start the DFS services and verify the installation:

start-dfs.sh
jps

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.