Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Deploying Apache Hudi Docker Demo on CentOS 7

Tech 1

Allocate a minimum of 8GB RAM to the CentOS 7 virtual machine to ensure sufficient memory for the Docker containers.

System Preparations

Map the required service hostnames by appending the following entries to the /etc/hosts file:

cat <<EOF >> /etc/hosts
127.0.0.1 adhoc-1 adhoc-2 namenode datanode1
127.0.0.1 hiveserver hivemetastore kafkabroker
127.0.0.1 sparkmaster zookeeper
EOF

Dependency Installation

Install essential utilities and disable the firewall to prevent network interference:

yum install -y epel-release net-tools vim
systemctl disable --now firewalld

Java Development Kit (JDK) 8

CentOS 7 includes a JRE by default, but the Hudi build requires a full JDK. Remove the pre-installed Java packages first:

rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps

Extract the downloaded JDK archive to a custom path:

mkdir -p /usr/local/jdk8
tar -xzf jdk-8u212-linux-x64.tar.gz -C /usr/local/jdk8 --strip-components=1

Apache Maven

Maven is required to compile the Hudi source code. Download version 3.8.1 and extract it:

wget https://repo.huaweicloud.com/apache/maven/maven-3/3.8.1/binaries/apache-maven-3.8.1-bin.tar.gz
tar -xzf apache-maven-3.8.1-bin.tar.gz -C /usr/local/
ln -s /usr/local/apache-maven-3.8.1 /usr/local/mvn

Configure Maven repository mirrors by editing /usr/local/mvn/conf/settings.xml to include the Aliyun mirror and set a local repository path:

<settings>
  <localRepository>/usr/local/mvn/repo</localRepository>
  <mirrors>
    <mirror>
      <id>aliyun-central</id>
      <name>Aliyun Central Mirror</name>
      <url>https://maven.aliyun.com/repository/central/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
    <mirror>
      <id>maven-default-http-blocker</id>
      <mirrorOf>external:http:*</mirrorOf>
      <name>Pseudo repository to mirror external repositories initially using HTTP.</name>
      <url>http://0.0.0.0/</url>
      <blocked>true</blocked>
    </mirror>
  </mirrors>
</settings>

Scala

Scala is a prerequisite for building Hudi and Spark. Install version 2.11.12:

wget https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz
tar -xzf scala-2.11.12.tgz -C /usr/local/
ln -s /usr/local/scala-2.11.12 /usr/local/scala

Enviroment Variables

Create a dedicated profile script for all path configurations:

cat <<'EOF' > /etc/profile.d/custom_env.sh
export JDK8_HOME=/usr/local/jdk8
export MAVEN_HOME=/usr/local/mvn
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$JDK8_HOME/bin:$MAVEN_HOME/bin:$SCALA_HOME/bin
EOF

source /etc/profile.d/custom_env.sh

Verify the installations:

java -version
mvn -version
scala -version

Git

Install Git to clone the Hudi repository:

yum install -y curl-devel expat-devel gettext-devel openssl-devel zlib-devel gcc-c++ perl-ExtUtils-MakeMaker git

Remote Access Configuration

Start the SSH daemon for remote terminal access:

systemctl start sshd

Retrieve the IPv4 address of the ens33 interface using ip addr show ens33 to connect via an SSH client.

Building Apache Hudi

Clone the Hudi repository, checking out the 0.14.0 release:

mkdir -p /opt/repos && cd /opt/repos
git clone --branch release-0.14.0 https://github.com/apache/hudi.git hudi-0.14.0
cd hudi-0.14.0

Compile the source code. This process may take over an hour on the first run:

mvn package -DskipTests

Docker and Docker Compose Setup

Install the latest version of Docker CE to avoid missing signature errors common in older releases:

yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum remove -y docker*
yum install -y docker-ce docker-ce-cli containerd.io

Download and configure Docker Compose v2:

curl -SL https://github.com/docker/compose/releases/download/v2.24.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

To prevent image pull timeouts and connnection rejections, configure Docker daemon mirrors in /etc/docker/daemon.:

{ "registry-mirrors": [ "https://mpoqfnbe.mirror.aliyuncs.com", "https://docker.888666222.xyz", "https://atomhub.openatom.cn/", "https://docker.m.daocloud.io", "https://dockerproxy.com", "https://docker.mirrors.ustc.edu.cn", "https://docker.nju.edu.cn", "https://reg-mirror.qiniu.com", "https://docker.rainbond.cc" ] }

Apply the Docker configuration changes:

systemctl daemon-reload
systemctl restart docker

Launching the Hudi Demo

Navigate to the Docker directory within the compiled Hudi source and execute the setup script:

cd /opt/repos/hudi-0.14.0/docker
./setup_demo.sh

Wait for the orchestration process to complete. Once finished, all 20 service containers will be actively running.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.