Big Data Components Installation, Configuration Files, and Service Management Commands Guide
Flume
-
Remove conflicting JAR file:
rm /opt/module/flume/lib/guava-11.0.2.jar -
Launch Flume monitoring:
bin/flume-ng agent -n a1 -c conf/ -f job/flume-file-hdfs.conf -
Stop Flume monitoring:
# Terminate process using ps -ef command ps aux | grep flume kill <process_id>
Hadoop (Cluster)
Configuration Files:
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
- yarn-site.xml
- workers/slaves
Common Port Numbers:
Version 2.x:
- NameNode communication port: 8020/9000
- HDFS web UI port: 50070
- ResourceManager web UI port: 8088
- History server communication port: 19888
Version 3.x:
- NameNode communication port: 8020/9000/9820
- HDFS web UI port: 9870
- ResourceManager web UI port: 8088
- History server communication port: 19888
Cluster Startup Process:
-
Initial cluster startup (first-time formatting):
hdfs namenode -formatIf first startup fails and requires reformatting:
- Ensure namenode and datanode processes are stopped
- Delete data and logs directories across all cluster machines
- Re-execute formatting command
-
Startup commands (HDFS → YARN → HistoryServer):
sbin/start-dfs.sh sbin/start-yarn.sh bin/mapred --daemon start historyserver -
Shutdown commands (HistoryServer → YARN → HDFS):
bin/mapred --daemon stop historyserver sbin/stop-yarn.sh sbin/stop-dfs.sh
Kafka (Cluster)
Startup Commands:
Foreground startup:
bin/kafka-server-start.sh config/server.properties
Background startup:
bin/kafka-server-start.sh -daemon config/server.properties
# OR
nohup bin/kafka-server-start.sh config/server.properties &
Shutdown Command:
bin/kafka-server-stop.sh
Topic Management Commands:
Create topic:
# With zookeeper address specified
bin/kafka-topics.sh -zookeeper localhost:2181 --create --partitions 3 --replication-factor 3 --topic sample_kafka_topic
# With zookeeper path
bin/kafka-topics.sh --create --zookeeper localhost:2181/kafka --replication-factor 3 --partitions 3 --topic sample_kafka_topic
List topics:
bin/kafka-topics.sh --list --zookeeper localhost:2181
Describe specific topic:
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic sample_kafka_topic
Increase partition count:
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic sample_kafka_topic --partitions 5
Check topic partition offset values:
# Maximum offset
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic sample_kafka_topic --time -1 --broker-list 127.0.0.1:9092 --partitions 0
# Minimum offset
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic sample_kafka_topic --time -2 --broker-list 127.0.0.1:9092 --partitions 0
Delete topic:
bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic sample_kafka_topic
Note: Set delete.topic.enable=true in ${KAFKA_HOME}/config/server.properties to enable topic deletion.
Message Operations:
Producer message sending:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic sample_kafka_topic
Consuemr message consumption:
From beginning:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic sample_kafka_topic
Latest messages:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest
Specific partition from latest:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 0
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 2
Specific partition with offset:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --partition 0 --offset 100
Consumer group consumption:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic sample_kafka_topic --group group1
Limited message consumption:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 0 --max-messages 10
Consumer Group Management:
List consumer groups:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
Describe group details:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --describe
Sample output:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
sample_topic 0 5 5 0 - - -
Meaning:
- CURRENT-OFFSET: Latest committed offset for the consumer group
- LOG-END-OFFSET: Highest watermark offset in the broker
- LAG: Difference between consumer's current position and broker's end offset
Delete topic from group:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --topic test_topic --delete
Delete entire group:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --delete
Additional Commands:
Rebalance leaders:
bin/kafka-preferred-replica-election.sh --bootstrap-server localhost:9092
Performance testing tool:
bin/kafka-producer-perf-test.sh --topic test --num-records 100 --record-size 1 --throughput 100 --producer-props bootstrap.servers=localhost:9092
Configuration File Locations:
Kafka configuraton files are located in the config/ directory of your Kafka installation.