Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Big Data Components Installation, Configuration Files, and Service Management Commands Guide

Tech May 10 4

Flume

  1. Remove conflicting JAR file:

    rm /opt/module/flume/lib/guava-11.0.2.jar
    
  2. Launch Flume monitoring:

    bin/flume-ng agent -n a1 -c conf/ -f job/flume-file-hdfs.conf 
    
  3. Stop Flume monitoring:

    # Terminate process using ps -ef command
    ps aux | grep flume
    kill <process_id>
    

Hadoop (Cluster)

Configuration Files:

  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml
  • yarn-site.xml
  • workers/slaves

Common Port Numbers:

Version 2.x:

  • NameNode communication port: 8020/9000
  • HDFS web UI port: 50070
  • ResourceManager web UI port: 8088
  • History server communication port: 19888

Version 3.x:

  • NameNode communication port: 8020/9000/9820
  • HDFS web UI port: 9870
  • ResourceManager web UI port: 8088
  • History server communication port: 19888

Cluster Startup Process:

  1. Initial cluster startup (first-time formatting):

    hdfs namenode -format
    

    If first startup fails and requires reformatting:

    1. Ensure namenode and datanode processes are stopped
    2. Delete data and logs directories across all cluster machines
    3. Re-execute formatting command
  2. Startup commands (HDFS → YARN → HistoryServer):

    sbin/start-dfs.sh
    sbin/start-yarn.sh
    bin/mapred --daemon start historyserver
    
  3. Shutdown commands (HistoryServer → YARN → HDFS):

    bin/mapred --daemon stop historyserver
    sbin/stop-yarn.sh
    sbin/stop-dfs.sh
    

Kafka (Cluster)

Startup Commands:

Foreground startup:

bin/kafka-server-start.sh config/server.properties

Background startup:

bin/kafka-server-start.sh -daemon config/server.properties
# OR
nohup bin/kafka-server-start.sh config/server.properties &

Shutdown Command:

bin/kafka-server-stop.sh

Topic Management Commands:

Create topic:

# With zookeeper address specified
bin/kafka-topics.sh -zookeeper localhost:2181 --create --partitions 3 --replication-factor 3 --topic sample_kafka_topic

# With zookeeper path
bin/kafka-topics.sh --create --zookeeper localhost:2181/kafka --replication-factor 3 --partitions 3 --topic sample_kafka_topic

List topics:

bin/kafka-topics.sh --list --zookeeper localhost:2181

Describe specific topic:

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic sample_kafka_topic

Increase partition count:

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic sample_kafka_topic --partitions 5

Check topic partition offset values:

# Maximum offset
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic sample_kafka_topic --time -1 --broker-list 127.0.0.1:9092 --partitions 0

# Minimum offset
bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic sample_kafka_topic --time -2 --broker-list 127.0.0.1:9092 --partitions 0

Delete topic:

bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic sample_kafka_topic

Note: Set delete.topic.enable=true in ${KAFKA_HOME}/config/server.properties to enable topic deletion.

Message Operations:

Producer message sending:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic sample_kafka_topic

Consuemr message consumption:

From beginning:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic sample_kafka_topic

Latest messages:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest

Specific partition from latest:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 0
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 2

Specific partition with offset:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --partition 0 --offset 100

Consumer group consumption:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic sample_kafka_topic --group group1

Limited message consumption:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic sample_kafka_topic --offset latest --partition 0 --max-messages 10

Consumer Group Management:

List consumer groups:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

Describe group details:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --describe

Sample output:

TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
sample_topic    0          5               5               0               -               -               -

Meaning:

  • CURRENT-OFFSET: Latest committed offset for the consumer group
  • LOG-END-OFFSET: Highest watermark offset in the broker
  • LAG: Difference between consumer's current position and broker's end offset

Delete topic from group:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --topic test_topic --delete

Delete entire group:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test_group --delete

Additional Commands:

Rebalance leaders:

bin/kafka-preferred-replica-election.sh --bootstrap-server localhost:9092

Performance testing tool:

bin/kafka-producer-perf-test.sh --topic test --num-records 100 --record-size 1 --throughput 100 --producer-props bootstrap.servers=localhost:9092 

Configuration File Locations:

Kafka configuraton files are located in the config/ directory of your Kafka installation.

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.