Advanced Text Processing with AWK and Sed Commands in Linux
AWK and Sed are powerful command-line utilities for manipulating and analyzing text data in files or streams. This guide covers their core functionalities with practical examples.
AWK Command Overview
AWK is a pattern scanning and processing language deisgned for text extraction and reporting. It operates on a per-record basis, typically lines, and supports field-based operations.
Basic AWK Syntax
The general structure of an AWK command is:
awk [options] 'pattern { action }' input_file
Common options include -F to define the field separator. The pattern specifies conditions for line selection, and the action defines operations to perform on matched lines.
Extracting Specific Columns
To retrieve a particular column from a file, use the field separator and print the desired field number. For instance, to extract usernames from /etc/passwd:
awk -F ':' '{ print $1 }' /etc/passwd
Here, -F ':' sets the colon as the delimiter, and $1 outputs the first field (username).
Counting Lines in a File
AWK can tally the total number of records using the NR variable, which holds the current record number. To count lines:
awk 'END { print NR }' /etc/passwd
The END block executes after processing all line, printing the final count.
Filtering Lines Based on Patterns
Lines can be filtered by matching specific patterns. For example, to display lines containing "root":
awk '/root/ { print }' /etc/passwd
The pattern /root/ selects lines with "root", and the action prints them.
Formatted Output with AWK
AWK supports formatted printing using printf for precise control. To output usernames and user IDs from /etc/passwd:
awk -F ':' '{ printf "Username: %s\tID: %s\n", $1, $3 }' /etc/passwd
This uses %s for string placeholders, \t for tabs, and \n for newlines.
Sed Command Overview
Sed is a stream editor that performs text transformations on input streams. It processes text line-by-line and is ideal for batch editing tasks.
Basic Sed Syntax
The standard Sed command format is:
sed [options] 'command' file_name
Commands are applied to each line, with options like -e for multiple commands.
Replacing Text Strings
Sed can substitute occurrences of text globally. To replace "old" with "new" in a file:
sed 's/old/new/g' data.txt
The s command performs substitution, and g ensures all occurrences on a line are changed.
Inserting New Lines
Text can be inserted at specified line positions. To add a line before line 2:
sed '2i Inserted text line.' data.txt
The i command inserts the following text before the given line number.
Deleting Specific Lines
Lines matching a pattern can be removed. To delete lines containing "old":
sed '/old/d' data.txt
The d command deletes lines that match the pattern /old/.
Executing Multiple Commands
Multiple editing operations can be combined in a single Sed invocation. For example, to perform two substitutions:
sed -e 's/old/new/' -e 's/foo/bar/' data.txt
The -e flag allows chaining commands, applying them sequentially.
These tools are essential for efficient text processing in shell scripting and data manipulation tasks.