Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Advanced AWK Text Manipulation Techniques

Tech 2

AWK Syntax and Fundamentals

AWK functions as a pattern scanning and processing language. It interprets data as a series of records (lines) and fields (columns). The default separator is whitespace. The general execution flow follows awk 'pattern { action }' filename. If a pattern evaluates to true, the associated action is executed on that record.

Field Extraction and Display

Fields are accessed via $1, $2, up to $NF. $0 represents the entire line.

who | awk '{printf "User: %-15s | Location: %s\n", $1, $2}'

This command formats the output of who, displaying the user and terminal with aligned spacing.

Formatting Output with printf

While print automatically separates fields with a space and adds a newline, printf offers C-style formatting. This is essential for generating tabular reports.

echo -e "ID:101 Name:Dev Salary:50000" | awk '{printf "ID: %d, Name: %s, Pay: $%d\n", $2, $4, $NF}'

Note that format specifiers like %d for integers and %s for strings must correspond to the arguments provided.

Regular Expressions and Pattern Matching

AWK filters lines using pattern matching. The ~ operator checks if a field contains a regex pattern.

awk -F: '$1 ~ /^admin/ {print $7}' /etc/passwd

awk '/error|fail/ {print NR": "$0}' system.log

The first example prints the login shell for usernames starting with "admin". The second example prints lines containing "error" or "fail", prefixed by the line number.

Built-in Variables

Key variables that control AWK behavior include:

  • NR: Current line number across all files.
  • FNR: Current line number in the current file.
  • NF: Number of fields in the current line.
  • FS: Input field separator (equivalent to -F).
  • OFS: Output field separator.

Configuring Delimiters

Separators can be set via the -F option or by assigning the FS variable inside the script. Multiple characters can be used as delimiters using square brackets.

awk -F'[,;:]' '{print $1, $3}' data.csv

awk 'BEGIN{FS=":"; OFS=" - "} {print $1, $NF}' /etc/passwd

BEGIN and END Blocks

Use BEGIN to initialize variables or print headers before processing starts. Use END to perform calculations or summaries after processing finishes.

awk 'BEGIN {count=0} 
     $4 == "Technology" {count++} 
     END {print "Tech entries: " count}' employees.txt

Relational Operators

AWK supports standard logical and comparison operators to filter data:

  • >, <, >=, <=
  • ==, !=
  • &&, ||
  • !~ (Does not match regex)
awk '$3 > 50 && $5 ~ /North/ {print $0}' regions.txt

Advanced Manipulation Examples

Column Arithmetic

Summing values from the last column of a file.

awk '{sum += $NF} END {print "Total Sum: " sum}' financials.txt

Text Substitution

Replacing text using gsub (global substitution) or sub (single substitution).

awk '{gsub(/Windows/, "Linux"); print}' os_list.txt

Excluding Columns

To omit a column from the output, set it to empty or loop through specific indices.

# Remove the 3rd column
awk '{$3=""; print $0}' file.txt

# Loop to print columns 2 through the end
awk '{for(i=2; i<=NF; i++) printf "%s ", $i; print ""}' file.txt

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.