Hadoop 3.x High Availability (HA) Configuration: A Step-by-Step Guide
Prerequisites for Hadoop HA Configuration
- JDK (version used: JDK 1.8; configure JDK environment variables independently)
- ZooKeeper (version used: ZooKeeper 3.8.3)
- Hadoop (version used: Hadoop 3.3.6)
- Hadoop cluster configured with three nodes:
master(primary),slave1(secondary),slave2(secondary) - Host IP mappings and passwordless SSH setup are omitted here
- Modify paths in configuration files as needed
- Critical: Disable firewalls on all nodes
ZooKeeper Configuration
(Configuration details omitted; refer to standard ZooKeeper deployment guides)
Hadoop HA Configuration
1. Configure Environment Variables (All Nodes)
Edit /etc/profile.d/bigdata_env.sh:
# Hadoop Environment Variables
export HADOOP_BASE=/opt/module/hadoop
export HADOOP_COMMON_DIR=$HADOOP_BASE
export HADOOP_HDFS_DIR=$HADOOP_BASE
export HADOOP_YARN_DIR=$HADOOP_BASE
export HADOOP_MAPRED_DIR=$HADOOP_BASE
export PATH=$HADOOP_BASE/bin:$HADOOP_BASE/sbin:$PATH
2. Hadoop HA Core Configuration (/etc/hadoop Diretcory)
2.1 Edit hadoop-env.sh
export JDK_HOME=/opt/module/jdk
export NN_USER=root
export DN_USER=root
export JOURNALNODE_USER=root
export SECONDARY_NN_USER=root
2.2 Edit yarn-env.sh
export JDK_HOME=/opt/module/jdk
export YARN_NM_USER=root
export YARN_RM_USER=root
2.3 Edit core-site.xml
<!-- Assemble multiple NameNode addresses into cluster 'mycluster' -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfscluster</value>
</property>
<!-- Directory for Hadoop runtime generated files -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop/tmpdir</value>
</property>
<!-- ZooKeeper quorum address for ZKFC -->
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,slave1:2181,slave2:2181</value>
</property>
<!-- Static user identity for Hadoop web services -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
<!-- Hosts root can proxy (all hosts) -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<!-- User groups root can proxy (all groups) -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
2.4 Edit hdfs-site.xml
<!-- Replication factor configuration -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>