Configuring a Hadoop Runtime Environment
Base Virtual Machine Configuration
Provision a base virtual machine with 4GB RAM, 50GB hard disk, hostname node00, and IP address 10.0.2.100.
Ensure the VM has internet connectivity before using package managers:
[root@node00 ~]# ping google.com
PING google.com (142.250.190.46) 56(84) bytes of data.
64 bytes from 142.250.190.46: icmp_seq=1 ttl=128 time=10.2 msInstall the EPEL repository and essential utilities (if using a minimal installation):
[root@node00 ~]# yum install -y epel-release
[root@node00 ~]# yum install -y net-tools vimDisable the firewall and prevent it from starting on boot:
[root@node00 ~]# systemctl disable --now firewalldCreate a dedicated user and set its password:
[root@node00 ~]# useradd dataadmin
[root@node00 ~]# echo "dataadmin:password" | chpasswdGrant sudo privileges without a password prompt to the new user by running visudo and adding the line below the %wheel group entry:
## Allows root to run any commands anywhere
root ALL=(ALL) ALL
## Allows people in group wheel to run all commands
%wheel ALL=(ALL) ALL
dataadmin ALL=(ALL) NOPASSWD:ALLCreate directories for application modules and installation packages, then assign ownership:
[root@node00 ~]# mkdir -p /opt/apps /opt/archives
[root@node00 ~]# chown -R dataadmin:dataadmin /opt/apps /opt/archivesRemove pre-installed OpenJDK packages (skip if using a minimal ISO):
[root@node00 ~]# rpm -qa | grep -i java | xargs rpm -e --nodepsReboot the system to apply changes:
[root@node00 ~]# rebootReplicating Virtual Machines
Shut down the base VM and clone it to create three nodes: node01, node02, and node03.
On each cloned node, configure a static IP address. For example, on node01, edit the network script:
[root@node01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33Update the configuration to reflect the new IP schema:
DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="ens33"
IPADDR=10.0.2.101
PREFIX=24
GATEWAY=10.0.2.2
DNS1=10.0.2.2Update the hostname for the cloned machine:
[root@node01 ~]# echo "node01" > /etc/hostnameConfigure hostname mappings by editing /etc/hosts on all Linux nodes:
10.0.2.100 node00
10.0.2.101 node01
10.0.2.102 node02
10.0.2.103 node03Restart the clone to apply network and hostname changes:
[root@node01 ~]# rebootUpdate the Windows host file located at C:\Windows\System32\drivers\etc\hosts with the same IP mappings to allow local hostname resolution.
Deploying Java Development Kit
Transfer the JDK archive (e.g., jdk-11.0.12_linux-x64_bin.tar.gz) to the /opt/archives directory on node01.
Extract the archive to the applications directory:
[dataadmin@node01 ~]$ tar -zxf /opt/archives/jdk-11.0.12_linux-x64_bin.tar.gz -C /opt/apps/Configure the Java environment variables by creating a custom profile script:
[dataadmin@node01 ~]$ sudo vim /etc/profile.d/custom_env.shAdd the following lines:
export JAVA_HOME=/opt/apps/jdk-11.0.12
export PATH=$JAVA_HOME/bin:$PATHApply the environment variables and verify the installation:
[dataadmin@node01 ~]$ source /etc/profile
[dataadmin@node01 ~]$ java -version
openjdk version "11.0.12"Deploying Hadoop
Download Hadoop (e.g., version 3.3.1) and transfer the tarball to /opt/archives on node01.
Extract the Hadoop package:
[dataadmin@node01 ~]$ tar -zxf /opt/archives/hadoop-3.3.1.tar.gz -C /opt/apps/Append Hadoop environment variables to the previously created script:
[dataadmin@node01 ~]$ sudo vim /etc/profile.d/custom_env.shAdd the following configurations:
export HADOOP_HOME=/opt/apps/hadoop-3.3.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATHReload the profile and verify Hadoop is functioning:
[dataadmin@node01 ~]$ source /etc/profile
[dataadmin@node01 ~]$ hadoop version
Hadoop 3.3.1Hadoop Directory Overview
Inside the Hadoop installation directory, the following folders are the most significant:
- bin: Contains executable scripts for interacting with HDFS, YARN, and MapReduce services.
- etc: Holds the core configuration files required by the Hadoop framework.
- lib: Stores native libraries used for data compression and decompression.
- sbin: Includes shell scripts for starting and stopping Hadoop daemons.
- share: Contains dependency JAR files, documentation, and official example programs.