Disk I/O Benchmarking and Performance Metric Interpretation
Prerequisites and I/O Monitoring Tools
Before conducting storage benchmarks, insure the monitoring utilities are installed. On RPM-based distributions, the required package is typically installed via:
sudo yum install sysstat -y
The iostat utility provides granular visibility into block device operations. To capture extended metrics at five-second intervals for five iterations, execute:
iostat -dx 5 5
Key output fields require precise interpretation:
rrqm/s&wrqm/s: Rate of merged read/write requests per second.r/s&w/s: Completed read/write operations per second to the device.rsec/s&wsec/s: Total sectors transferred per second for reads and writes.rkB/s&wkB/s: Data throughput in kilobytes per second.avgrq-sz: Average request size measured in sectors.avgqu-sz: Mean queue length awaiting processing.await: Total latency in milliseconds, encompassing both queue time and service execution.r_await&w_await: Distinct latency values separated by operation type.%util: Percentage of wall-clock time the device spent actively processing requests. Values approaching 100% indicate saturation.
Supplementary utilities like du -sh (directory size aggregation) and df -h (filesystem capacity overview) are essential for contextualizing storage allocation.
Core Storage Performance Dimensions
Storage evaluation revolves around five primary axes:
- Utilization: The ratio of active processing time to total elapsed time.
- Saturation: Queue depth endicating pending requests that exceed immediate processing capacity.
- IOPS: Total completed input/output operations per second.
- Throughput: Volume of data successfully transferred per second.
- Latency: Duration between request submission and completion acknowledgment.
Cache Management Procedures
Linux leverages available RAM for page caching to accelerate disk operations. To guarantee that benchmark results reflect physical media performance rather then cached memory hits, purge the buffer cache beforehand:
echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null
The numeric parameter controls the scope:
1: Clears pagecache.2: Removes dentries and inodes.3: Eliminates pagecache, dentries, and inodes simultaneously. Following execution,freememory will increase whilecacheandbuffvalues drop significantly.
Sequential Write Throughput Benchmark
- Initialize a clean state by flushing caches using the procedure above.
- Generate synthetic write traffic using a block copy utility. The following command creates a 400 MB payload written in 40 MB chunks:
time dd if=/dev/zero of=./io_write_benchmark bs=40M count=10 conv=fdatasync status=progress
if=/dev/zero: Supplies a continuous stream of null bytes without incurring source disk reads.of=./io_write_benchmark: Specifies the target path for the generated data.bs=40M count=10: Configures the transfer size and iteration count.conv=fdatasync: Forces physical write completion before returning, ensuring accurate timing.
- In a parallel terminal, observe memory state transitions with
vmstat 2. Expect a rise incacheandbo(blocks out), whilefreememory declines. - Correlate with
iostat -dx 2. ThewkB/scolumn will spike, andw_awaitwill reflect the underlying storage's commit speed.
Sequential Read Throughput Benchmark
- Purge system caches again to prevent reading from RAM.
- Stream data directly from the physical block device into a null sink. This bypasses filesystem overhead and measures raw device read capability:
time dd if=/dev/sda1 of=/dev/null bs=64M count=10 status=progress
if=/dev/sda1: Targets the specific partition for direct sector access.of=/dev/null: Discards output instantly, isolating read performance.
- Monitor with
vmstat 2. Look for elevatedbi(blocks in) and stablecachevalues. - Verify via
iostat -dx 2. HighrkB/svalues should appear alongside moderater_awaitfigures.
Memory Bandwidth Baseline and Comparative Analysis
To establish a reference for system memory transfer rates, execute a pure RAM-to-RAM copy operation:
time dd if=/dev/zero of=/dev/null bs=1G count=5 status=progress
Since both the source and destination are virtual memory constructs, storage controllers remain idle. The measured throughput reflects main memory bandwidth exclusively.
Comparative baselines typically show disk subsystems operating between 200 MB/s and 1.5 GB/s depending on media type, whereas DDR memory consistently achieves multi-gigabyte per second rates. During write-heavy workloads, wkB/s and bo dominate while cache utilization grows. Conversely, read-heavy patterns elevate rkB/s and bi, with less aggressive cache expansion. Monitoring these divergent states accurately identifies whether a bottleneck originates from storage latency, queue saturation, or insufficient bandwidth.