Advanced AWK Text Manipulation Techniques
AWK Syntax and Fundamentals
AWK functions as a pattern scanning and processing language. It interprets data as a series of records (lines) and fields (columns). The default separator is whitespace. The general execution flow follows awk 'pattern { action }' filename. If a pattern evaluates to true, the associated action is executed on that record.
Field Extraction and Display
Fields are accessed via $1, $2, up to $NF. $0 represents the entire line.
who | awk '{printf "User: %-15s | Location: %s\n", $1, $2}'This command formats the output of who, displaying the user and terminal with aligned spacing.
Formatting Output with printf
While print automatically separates fields with a space and adds a newline, printf offers C-style formatting. This is essential for generating tabular reports.
echo -e "ID:101 Name:Dev Salary:50000" | awk '{printf "ID: %d, Name: %s, Pay: $%d\n", $2, $4, $NF}'Note that format specifiers like %d for integers and %s for strings must correspond to the arguments provided.
Regular Expressions and Pattern Matching
AWK filters lines using pattern matching. The ~ operator checks if a field contains a regex pattern.
awk -F: '$1 ~ /^admin/ {print $7}' /etc/passwd
awk '/error|fail/ {print NR": "$0}' system.logThe first example prints the login shell for usernames starting with "admin". The second example prints lines containing "error" or "fail", prefixed by the line number.
Built-in Variables
Key variables that control AWK behavior include:
NR: Current line number across all files.FNR: Current line number in the current file.NF: Number of fields in the current line.FS: Input field separator (equivalent to-F).OFS: Output field separator.
Configuring Delimiters
Separators can be set via the -F option or by assigning the FS variable inside the script. Multiple characters can be used as delimiters using square brackets.
awk -F'[,;:]' '{print $1, $3}' data.csv
awk 'BEGIN{FS=":"; OFS=" - "} {print $1, $NF}' /etc/passwdBEGIN and END Blocks
Use BEGIN to initialize variables or print headers before processing starts. Use END to perform calculations or summaries after processing finishes.
awk 'BEGIN {count=0}
$4 == "Technology" {count++}
END {print "Tech entries: " count}' employees.txtRelational Operators
AWK supports standard logical and comparison operators to filter data:
>, <, >=, <===, !=&&, ||!~(Does not match regex)
awk '$3 > 50 && $5 ~ /North/ {print $0}' regions.txtAdvanced Manipulation Examples
Column Arithmetic
Summing values from the last column of a file.
awk '{sum += $NF} END {print "Total Sum: " sum}' financials.txtText Substitution
Replacing text using gsub (global substitution) or sub (single substitution).
awk '{gsub(/Windows/, "Linux"); print}' os_list.txtExcluding Columns
To omit a column from the output, set it to empty or loop through specific indices.
# Remove the 3rd column
awk '{$3=""; print $0}' file.txt
# Loop to print columns 2 through the end
awk '{for(i=2; i<=NF; i++) printf "%s ", $i; print ""}' file.txt