Home > Tech > Content

Deploying a Distributed ZooKeeper and HBase Cluster

Tech 1

ZooKeeper and HBase Overview

ZooKeeper

ZooKeeper operates as an open-source coordination framework, originally developed at Yahoo! to provide straightforward and robust access for distributed applications. It abstracts complex and error-prone consensus protocols into an efficient and reliable set of primitives exposed via simple interfaces. Distributed systems leverage ZooKeeper for data publication/subscription, load distribution, naming services, coordination notifications, cluster administration, leader election, distributed locking, and queue management. Service providers register their endpoints within the ZooKeeper registry; consumers then query the registry to discover provider details before initiating direct communication.

HBase

HBase is a highly reliable, performant, scalable, and column-oriented distributed storage system designed to run on commodity hardware. It targets the storage and processing of massive datasets, easily handling tables comprising billions of rows and millions of columns using standard server configurations.

HBase Characteristics

Massive Storage: Capable of managing petabyte-scale data, returning queries in tens to hundreds of milliseconds due to its exceptional scalability.
Column-Family Storage: Data is organized into column families, which must be defined during table creation. A family can contain an unlimited number of columns.
Extreme Scalability: Expansion is supported both computationally (by adding RegionServers to handle more regions) and in storage capacity (by adding DataNodes to the underlying HDFS).
High Concurrency: Despite running on commodity hardware where individual I/O latency might be in the millisecond range, HBase maintains consistently low latency even under heavy concurrent access loads.
Sparsity: Empty columns within a column family consume zero physical storage space, offering immense flexibility without wasted disk usage.

ZooKeeper Deployment

Extraction

Ensure firewall services are disabled across all nodes to prevent connection failures.

[root@primary ~]# tar -xzvf /opt/archives/apache-zookeeper-3.5.10-bin.tar.gz -C /opt/apps/
[root@primary ~]# mv /opt/apps/apache-zookeeper-3.5.10-bin /opt/apps/zk

Primary Node Configuration

Create required data and logging directories within the installation path.

[root@primary ~]# cd /opt/apps/zk
[root@primary zk]# mkdir zk_data && mkdir zk_logs

Assign a unique identifier for this node.

[root@primary zk]# echo "1" > /opt/apps/zk/zk_data/myid

Generate the configuration file from the provided sample and modify it.

[root@primary zk]# cp /opt/apps/zk/conf/zoo_sample.cfg /opt/apps/zk/conf/zoo.cfg
[root@primary zk]# vi /opt/apps/zk/conf/zoo.cfg

Update the dataDir parameter:

dataDir=/opt/apps/zk/zk_data

Append the cluster node definitions at the end of the file:

server.1=primary:2888:3888
server.2=worker1:2888:3888
server.3=worker2:2888:3888

Transfer ownership of the installation directory to the designated service account.

[root@primary zk]# chown -R hadoopuser:hadoopgroup /opt/apps/zk

Worker Node Configuration

Transfer the configured directory to the remaining nodes.

[root@primary ~]# scp -r /opt/apps/zk worker1:/opt/apps/
[root@primary ~]# scp -r /opt/apps/zk worker2:/opt/apps/

On worker1, set the appropriate permissions and update its identifier.

[root@worker1 ~]# chown -R hadoopuser:hadoopgroup /opt/apps/zk
[root@worker1 ~]# echo "2" > /opt/apps/zk/zk_data/myid

On worker2, apply the same permissions and set a distinct identifier.

[root@worker2 ~]# chown -R hadoopuser:hadoopgroup /opt/apps/zk
[root@worker2 ~]# echo "3" > /opt/apps/zk/zk_data/myid

Environment Variables

Append the following environment configurations to /etc/profile on all machines.

export ZK_HOME=/opt/apps/zk
export PATH=$PATH:$ZK_HOME/bin

Service Activation

Switch to the service account on all nodes, reload the profile, and start the daemon.

[hadoopuser@primary ~]$ source /etc/profile
[hadoopuser@primary ~]$ zkServer.sh start

[hadoopuser@worker1 ~]$ source /etc/profile
[hadoopuser@worker1 ~]$ zkServer.sh start

[hadoopuser@worker2 ~]$ source /etc/profile
[hadoopuser@worker2 ~]$ zkServer.sh start

Once all instances are active, verify the cluster state. One node will assume the leader role while the others become followers.

[hadoopuser@primary ~]$ zkServer.sh status
[hadoopuser@worker1 ~]$ zkServer.sh status
[hadoopuser@worker2 ~]$ zkServer.sh status

HBase Deployment

Extraction and Relocation

[root@primary ~]# tar -xzvf /opt/archives/hbase-2.4.11-bin.tar.gz -C /opt/apps/
[root@primary ~]# mv /opt/apps/hbase-2.4.11 /opt/apps/hbase

Environment Variables

Add the HBase environment paths to /etc/profile across all nodes.

export HBASE_HOME=/opt/apps/hbase
export PATH=$HBASE_HOME/bin:$PATH

Apply the changes on every machine.

[root@primary ~]# source /etc/profile
[root@worker1 ~]# source /etc/profile
[root@worker2 ~]# source /etc/profile

Primary Node Configuration

Navigate to the configuration directory.

[root@primary ~]# cd /opt/apps/hbase/conf/

Edit hbase-env.sh to define Java locations and external ZooKeeper usage.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk
export HBASE_MANAGES_ZK=false
export HBASE_CLASSPATH=/opt/apps/hadoop/etc/hadoop/

Modify hbase-site.xml to define the distributed properties.

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://primary:8020/hbase_data</value>
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>16010</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>90000</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>primary,worker1,worker2</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>/opt/apps/hbase/temp_store</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
</configuration>

Update the regionservers file to list the worker nodes.

worker1
worker2

Create the temporary directory specified in the configuration.

[root@primary conf]# mkdir /opt/apps/hbase/temp_store

Cluster Distribution and Permissions

Synchronize the installation folder to the worker nodes.

[root@primary conf]# scp -r /opt/apps/hbase/ worker1:/opt/apps/
[root@primary conf]# scp -r /opt/apps/hbase/ worker2:/opt/apps/

Assign proper ownership on all nodes.

[root@primary ~]# chown -R hadoopuser:hadoopgroup /opt/apps/hbase/
[root@worker1 ~]# chown -R hadoopuser:hadoopgroup /opt/apps/hbase/
[root@worker2 ~]# chown -R hadoopuser:hadoopgroup /opt/apps/hbase/

Service Activation

[hadoopuser@primary ~]$ source /etc/profile
[hadoopuser@primary ~]$ start-hbase.sh

Back to List

Prev: Flask (Jinja2) - Template Inheritance

Next: Implementing Pagination in Django REST Framework

Fading Coder

Deploying a Distributed ZooKeeper and HBase Cluster

ZooKeeper and HBase Overview

ZooKeeper

HBase

HBase Characteristics

ZooKeeper Deployment

Extraction

Primary Node Configuration

Worker Node Configuration

Environment Variables

Service Activation

HBase Deployment

Extraction and Relocation

Environment Variables

Primary Node Configuration

Cluster Distribution and Permissions

Service Activation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Deploying a Distributed ZooKeeper and HBase Cluster

ZooKeeper and HBase Overview

ZooKeeper

HBase

HBase Characteristics

ZooKeeper Deployment

Extraction

Primary Node Configuration

Worker Node Configuration

Environment Variables

Service Activation

HBase Deployment

Extraction and Relocation

Environment Variables

Primary Node Configuration

Cluster Distribution and Permissions

Service Activation

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment