Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Essential Text Processing Tools in CentOS 7: grep, sed, and awk

Tech 1

Overview of the Three Essential Tools

  • grep: Filters and searches for specific patterns in text.
  • sed: Modifies and replaces content in files, particularly effective for line-based operations.
  • awk: Analyzes and processses file content, especially powerful for column-based operations.

Regular Expressions

Regular expressions are fundamental for pattern matching with these tools:

  • .: Matches any single character.
  • *: Matches zero or more occurrences of the preceding character.
  • .*: Matches any sequence of characters.
  • {}: Specifies a range for the preceding character.
  • +: Matches one or more occurrences of the preceding character.
  • ?: Matches zero or one occurrence of the preceding character.
  • |: Represents logical OR.

grep: Search and Filter

Common Options

  • -c: Count matching lines.
  • -n: Display line numbers.
  • -i: Case-insensitive search.
  • -v: Invert match (show non-matching lines).
  • -r: Recursively search subdirectories.
  • -A2: Show matching lines and the next 2 lines.
  • -B3: Show matching lines and the previous 3 lines.
  • -C1: Show matching lines and one line above and below.
  • -o: Show only the matching part.

Usage Examples

# Search for 'nologin' in /etc/passwd with line numbers
grep -n 'nologin' /etc/passwd

# Filter docker images containing 'months' OR 'GB'
docker images | grep "months\|GB"

# Filter docker images containing 'months' AND 'GB'
docker images | grep "months" | grep "GB"

# Show lines containing 'root' with line numbers
grep -n 'root' /etc/passwd

# Show lines NOT containing 'nologin'
grep -vn 'nologin' /etc/passwd

# Show lines containing digits
grep '[0-9]' /etc/inittab

# Show lines NOT containing digits
grep -v '[0-9]' /etc/inittab

# Show lines starting with '#'
grep '^#' /etc/inittab

# Show lines NOT starting with '#'
grep -v '^#' /etc/inittab

# Show empty lines
grep '^$' /etc/inittab

# Show lines matching 'r.o' (where '.' is any character)
grep 'r.o' /etc/passwd

# Show lines matching 'r*o' (zero or more 'r's followed by 'o')
grep 'r*o' /etc/passwd

# Show all lines (matches any character sequence)
grep '.*' /etc/passwd

# Show lines where 'o' appears exactly twice
grep 'o\{2\}' /etc/passwd

# Using egrep for extended regex (matches 'o' exactly twice)
egrep 'o{2}' /etc/passwd

# Match lines containing 'root' OR 'nologin'
egrep 'root|nologin' /etc/passwd

# Count established TCP connections
netstat -an | grep 'ESTABLISHED' | grep 'tcp' | wc -l

# Count established TCP connections on port 5672
netstat -an | grep 'ESTABLISHED' | grep -i '5672' | wc -l

sed: Stream Editor for Text Transformation

Common Options

  • -n: Suppress automatic printing; only show matched lines.
  • -r: Enable extended regular expressions.
  • -e: Specify multiple editing commands.
  • -i: Edit files in-place.

Common Commands

  • p: Print lines.
  • I: Case-insensitive matching.
  • d: Delete lines.
  • a: Append text after a line.
  • i: Insert text before a line.
  • s: Substitute text.
  • =: Print line numbers.
  • r: Read and insert file content.

Usage Examples

Searching Data

# Print lines containing 'miao'
sed -n '/miao/p' data.txt

# Print lines containing 'miao' OR 'xue'
sed -n '/miao/p;/xue/p' data.txt

# Print lines from 'miao' to 'xue' (inclusive)
sed -n '/miao/,/xue/p' data.txt

# Print lines 1 to 3
sed -n '1,3p' data.txt

# Print line 3
sed -n '3p' data.txt

Adding Data

# Insert text before line 1
sed -i '1i099 huang beijing' data.txt

# Insert text before line 3
sed -i '3i111 ming shanghai' data.txt

# Append text after the last line
sed -i '$a107 huang beijing' data.txt

# Append text after line 4
sed -i '4a112 shen guangzhou' data.txt

# Insert text before and after lines containing 'hai'
sed -i -e '/hai/ihaiqian' -e '/hai/ahaihou' data.txt

# Append multiple lines at the end
sed '$achengdu01\nchengdu02' data.txt

Deleting Data

# Delete line 3
sed -i '3d' data.txt

# Delete lines 1 to 3
sed -i '1,3d' data.txt

# Delete lines containing 'haiqian'
sed -i '/haiqian/d' data.txt

# Delete lines 3 and 5
sed -i '3d;5d' data.txt

# Delete empty lines
sed -i '/^$/d' data.txt

Modifying Data

# Replace all occurrences of 'python' with 'java'
sed -i 's#python#java#g' data.txt

# Replace with backup creation
sed -i.bak 's#python#java#g' data.txt

# Extract IP address from eth0 interface
ip a s eth0 | sed -n '3p' | sed -r 's#.*net(.*)/24 brd (.*) scope.*#\1#g'

# Batch rename .txt files to .png
ls *.txt | sed -r 's#(.*).txt#mv & \1.png#g' | bash

awk: Pattern Scanning and Processing Language

Basic Usage

# Print first field using ':' as delimiter
awk -F ':' '{print $1}' data.txt

# Print entire line
awk -F ':' '{print $0}' data.txt

# Print first three fields separated by ':'
awk -F ':' '{print $1,$2,$3}' data.txt

# Print first three fields with custom separator
awk -F ':' '{print $1"#"$2"#"$3}' data.txt

# Filter docker images with 'months' OR 'GB'
docker images | awk '/months|GB/{print $1":"$2}'

# Filter docker images with 'months' AND 'GB'
docker images | awk '/months/&&/GB/{print $1":"$2}'

Pattern Matching

# Print lines containing 'oo'
awk '/oo/' data.txt

# Print lines where first field contains 'oo'
awk -F ':' '$1~/oo/' data.txt

# Print lines where first field matches regex 'o+'
awk -F ':' '$1~/o+/' data.txt

# Print first field for lines containing 'root' OR 'user'
awk -F ':' '/root|user/{print $1}' data.txt

Numeric Matching

# Print lines where third field equals 1
awk -F ':' '$3==1' data.txt

# Print first field where third field equals 4
awk -F ':' '$3==4 {print $1}' data.txt

Custom Output Field Separator

# Print first three fields with '##' separator
awk -F ':' '{OFS="##"}{print $1,$2,$3}' data.txt

Conditional Statements

# Print first and third fields where third field > 7
awk -F ':' 'OFS="#" {if ($3>7) print $1,$3}' data.txt

NF (Number of Fields)

# Print last field using '/' as delimiter
awk -F '/' '{print $NF}' data.txt

NR (Record Number)

# Print lines with line numbers
awk -F ':' '{print NR":"$0}' data.txt

# Print first 6 lines
awk -F ':' 'NR<=6' data.txt

# Print lines <=9 where first field contains 'root'
awk -F ':' 'NR<=9 && $1~/root/' data.txt

# Print field number equal to line number and last field
awk -F ':' '{print $NR":"$NF}' data.txt

$0 (Entire Line)

# Print lines with line numbers
awk '{print NR":"$0}' data.txt

Assignment

# Set all first fields to 'root'
awk -F ':' '$1="root"' data.txt

# Set all first fields to 'root' with ':' separator
awk -F ':' '{OFS=":"}$1="root"' data.txt

Summation and Counting

# Count empty lines in /etc/services
awk '/^$/{count=count+1}END{print count}' /etc/services

# Count users with shell ending in 'bash'
awk '/bash$/{count=count+1}END{print count}' /etc/passwd

# Count users without 'bash' shell
awk '!/bash$/{count=count+1}END{print count}' /etc/passwd

# Count TCP connections by state
netstat -an | awk '/^tcp/{++state[$NF]}END{for (s in state) print s, state[s]}'

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.