Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Configuring a Hadoop Runtime Environment

Tech 1

Base Virtual Machine Configuration

Provision a base virtual machine with 4GB RAM, 50GB hard disk, hostname node00, and IP address 10.0.2.100.

Ensure the VM has internet connectivity before using package managers:

[root@node00 ~]# ping google.com
PING google.com (142.250.190.46) 56(84) bytes of data.
64 bytes from 142.250.190.46: icmp_seq=1 ttl=128 time=10.2 ms

Install the EPEL repository and essential utilities (if using a minimal installation):

[root@node00 ~]# yum install -y epel-release
[root@node00 ~]# yum install -y net-tools vim

Disable the firewall and prevent it from starting on boot:

[root@node00 ~]# systemctl disable --now firewalld

Create a dedicated user and set its password:

[root@node00 ~]# useradd dataadmin
[root@node00 ~]# echo "dataadmin:password" | chpasswd

Grant sudo privileges without a password prompt to the new user by running visudo and adding the line below the %wheel group entry:

## Allows root to run any commands anywhere
root    ALL=(ALL)       ALL

## Allows people in group wheel to run all commands
%wheel  ALL=(ALL)       ALL
dataadmin   ALL=(ALL)       NOPASSWD:ALL

Create directories for application modules and installation packages, then assign ownership:

[root@node00 ~]# mkdir -p /opt/apps /opt/archives
[root@node00 ~]# chown -R dataadmin:dataadmin /opt/apps /opt/archives

Remove pre-installed OpenJDK packages (skip if using a minimal ISO):

[root@node00 ~]# rpm -qa | grep -i java | xargs rpm -e --nodeps

Reboot the system to apply changes:

[root@node00 ~]# reboot

Replicating Virtual Machines

Shut down the base VM and clone it to create three nodes: node01, node02, and node03.

On each cloned node, configure a static IP address. For example, on node01, edit the network script:

[root@node01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33

Update the configuration to reflect the new IP schema:

DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="ens33"
IPADDR=10.0.2.101
PREFIX=24
GATEWAY=10.0.2.2
DNS1=10.0.2.2

Update the hostname for the cloned machine:

[root@node01 ~]# echo "node01" > /etc/hostname

Configure hostname mappings by editing /etc/hosts on all Linux nodes:

10.0.2.100 node00
10.0.2.101 node01
10.0.2.102 node02
10.0.2.103 node03

Restart the clone to apply network and hostname changes:

[root@node01 ~]# reboot

Update the Windows host file located at C:\Windows\System32\drivers\etc\hosts with the same IP mappings to allow local hostname resolution.

Deploying Java Development Kit

Transfer the JDK archive (e.g., jdk-11.0.12_linux-x64_bin.tar.gz) to the /opt/archives directory on node01.

Extract the archive to the applications directory:

[dataadmin@node01 ~]$ tar -zxf /opt/archives/jdk-11.0.12_linux-x64_bin.tar.gz -C /opt/apps/

Configure the Java environment variables by creating a custom profile script:

[dataadmin@node01 ~]$ sudo vim /etc/profile.d/custom_env.sh

Add the following lines:

export JAVA_HOME=/opt/apps/jdk-11.0.12
export PATH=$JAVA_HOME/bin:$PATH

Apply the environment variables and verify the installation:

[dataadmin@node01 ~]$ source /etc/profile
[dataadmin@node01 ~]$ java -version
openjdk version "11.0.12"

Deploying Hadoop

Download Hadoop (e.g., version 3.3.1) and transfer the tarball to /opt/archives on node01.

Extract the Hadoop package:

[dataadmin@node01 ~]$ tar -zxf /opt/archives/hadoop-3.3.1.tar.gz -C /opt/apps/

Append Hadoop environment variables to the previously created script:

[dataadmin@node01 ~]$ sudo vim /etc/profile.d/custom_env.sh

Add the following configurations:

export HADOOP_HOME=/opt/apps/hadoop-3.3.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Reload the profile and verify Hadoop is functioning:

[dataadmin@node01 ~]$ source /etc/profile
[dataadmin@node01 ~]$ hadoop version
Hadoop 3.3.1

Hadoop Directory Overview

Inside the Hadoop installation directory, the following folders are the most significant:

  • bin: Contains executable scripts for interacting with HDFS, YARN, and MapReduce services.
  • etc: Holds the core configuration files required by the Hadoop framework.
  • lib: Stores native libraries used for data compression and decompression.
  • sbin: Includes shell scripts for starting and stopping Hadoop daemons.
  • share: Contains dependency JAR files, documentation, and official example programs.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.