Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Working with HDFS File System Commands and Performance Benchmarking

Notes 1

File System Operations

Searching Files

To locate files within HDFS, use the find command with the pattern specified after the -name flag:

hadoop fs -find / -name "application_*"

Modifying Permissions

Changing permissions requires appropriate ownership. Direct attempts with root may fail:

hadoop fs -chmod -R 777 /
# Result: Permission denied. Switch to hadoop user

Successful execution after switching users:

su - hadoop
hadoop fs -chmod -R 777 /

Directory and File Counting

The count command provides metrics about directories, files, and their sizes:

hdfs dfs -count [-q] [-h] <paths>
  • -q: Displays quota information
  • -h: Shows sizes in human-readable format

Example output without -q:

DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

With quota information:

QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

Bypassing Trash During Deletion

When encountering symbolic link errors during removal:

hadoop fs -rm -r cosn://{bucket}/emr/hadoop-yarn/staging/hadoop/.staging -skipTrash
# Error: Symbolic link does not exist

Use -skipTrash flag to force deletion:

hadoop fs -rm -r -skipTrash cosn://{bucket}/emr/hadoop-yarn/staging/hadoop/.staging

Performance Evaluation

I/O Benchmark Testing

Execute write performance tests with TestDFSIO:

hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -write -nrFiles 10 -size 4GB -bufferSize 8388608 -resFile ./TestDFSIOwrite.log

Key parameters:

  • -nrFiles: Number of files to process
  • -size: Individual file size
  • -bufferSize: Buffer size for I/O operations (default 1MB)
  • -resFile: Output results file

Handling Depenedncies

If ClassNotFoundException occurs:

scp junit-4.11.jar /usr/local/service/hadoop/share/hadoop/common/lib/

Analyzing Results

Sample output interpretation:

----- TestDFSIO ----- : write
            Date & time: Thu Apr 18 20:50:27 CST 2024
        Number of files: 10
 Total MBytes processed: 40960
      Throughput mb/sec: 122.97
 Average IO rate mb/sec: 123.25
  IO rate std deviation: 5.85
     Test exec time sec: 53.6

Cluster throughput calculation: 40960 MB / 53.6 seconds = 764 MB/s (aggregate parallel processing)

Benchmark Data Locations

Generated test data directories:

hadoop fs -ls /benchmarks/TestDFSIO/

Content details stored in:

hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000

Practical Test Scenarios

Bandwidth testing with different storage systems:

# COSN write test
hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -Dfs.defaultFS=cosn://{bucket} -Dfs.AbstractFileSystem.cosn.impl=org.apache.hadoop.fs.CosN -libjars /data/xxx.jar -write -nrFiles 40 -size 4GB -bufferSize 8388608 -resFile ./50gCosnTestDFSIOwrite.log

# COSN read test
hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -Dfs.defaultFS=cosn://{bucket} -Dfs.AbstractFileSystem.cosn.impl=org.apache.hadoop.fs.CosN -libjars /data/xxx.jar -read -nrFiles 40 -size 4GB -bufferSize 8388608 -resFile ./50gCosnTestDFSIOread.log

# S3A write test
hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -Dfs.defaultFS=s3a://{bucket} -Dfs.s3a.endpoint=https://cos.ap-shanghai.myqcloud.com -write -nrFiles 40 -size 4GB -bufferSize 8388608 -resFile ./50gS3ATestDFSIOwrite.log

# S3A read test
hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -Dfs.defaultFS=s3a://{bucket} -Dfs.s3a.endpoint=https://cos.ap-shanghai.myqcloud.com -read -nrFiles 40 -size 4GB -bufferSize 8388608 -resFile ./50gS3ATestDFSIOread.log

Optimized COSN random read testing:

hadoop jar hadoop-mapreduce-client-jobclient-3.2.2-tests.jar TestDFSIO -Dfs.defaultFS=cosn://{bucket} -Dfs.AbstractFileSystem.cosn.impl=org.apache.hadoop.fs.CosN -Dfs.cosn.impl=org.apache.hadoop.fs.CosFileSystem -Dfs.cosn.read.inputstream.optimized.enabled=true -libjars /home/hadoop/hadoop-cos-8.3.10.jar,/home/hadoop/cos_api-bundle-5.6.137.2.jar -read -random -nrFiles 40 -size 4GB -bufferSize 8388608 -resFile ./50gCosnOptTestDFSIOreadrandom.log
Tags: hdfsHadoop

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.