Essential Text Processing Tools in CentOS 7: grep, sed, and awk
Overview of the Three Essential Tools
- grep: Filters and searches for specific patterns in text.
- sed: Modifies and replaces content in files, particularly effective for line-based operations.
- awk: Analyzes and processses file content, especially powerful for column-based operations.
Regular Expressions
Regular expressions are fundamental for pattern matching with these tools:
.: Matches any single character.*: Matches zero or more occurrences of the preceding character..*: Matches any sequence of characters.{}: Specifies a range for the preceding character.+: Matches one or more occurrences of the preceding character.?: Matches zero or one occurrence of the preceding character.|: Represents logical OR.
grep: Search and Filter
Common Options
-c: Count matching lines.-n: Display line numbers.-i: Case-insensitive search.-v: Invert match (show non-matching lines).-r: Recursively search subdirectories.-A2: Show matching lines and the next 2 lines.-B3: Show matching lines and the previous 3 lines.-C1: Show matching lines and one line above and below.-o: Show only the matching part.
Usage Examples
# Search for 'nologin' in /etc/passwd with line numbers
grep -n 'nologin' /etc/passwd
# Filter docker images containing 'months' OR 'GB'
docker images | grep "months\|GB"
# Filter docker images containing 'months' AND 'GB'
docker images | grep "months" | grep "GB"
# Show lines containing 'root' with line numbers
grep -n 'root' /etc/passwd
# Show lines NOT containing 'nologin'
grep -vn 'nologin' /etc/passwd
# Show lines containing digits
grep '[0-9]' /etc/inittab
# Show lines NOT containing digits
grep -v '[0-9]' /etc/inittab
# Show lines starting with '#'
grep '^#' /etc/inittab
# Show lines NOT starting with '#'
grep -v '^#' /etc/inittab
# Show empty lines
grep '^$' /etc/inittab
# Show lines matching 'r.o' (where '.' is any character)
grep 'r.o' /etc/passwd
# Show lines matching 'r*o' (zero or more 'r's followed by 'o')
grep 'r*o' /etc/passwd
# Show all lines (matches any character sequence)
grep '.*' /etc/passwd
# Show lines where 'o' appears exactly twice
grep 'o\{2\}' /etc/passwd
# Using egrep for extended regex (matches 'o' exactly twice)
egrep 'o{2}' /etc/passwd
# Match lines containing 'root' OR 'nologin'
egrep 'root|nologin' /etc/passwd
# Count established TCP connections
netstat -an | grep 'ESTABLISHED' | grep 'tcp' | wc -l
# Count established TCP connections on port 5672
netstat -an | grep 'ESTABLISHED' | grep -i '5672' | wc -l
sed: Stream Editor for Text Transformation
Common Options
-n: Suppress automatic printing; only show matched lines.-r: Enable extended regular expressions.-e: Specify multiple editing commands.-i: Edit files in-place.
Common Commands
p: Print lines.I: Case-insensitive matching.d: Delete lines.a: Append text after a line.i: Insert text before a line.s: Substitute text.=: Print line numbers.r: Read and insert file content.
Usage Examples
Searching Data
# Print lines containing 'miao'
sed -n '/miao/p' data.txt
# Print lines containing 'miao' OR 'xue'
sed -n '/miao/p;/xue/p' data.txt
# Print lines from 'miao' to 'xue' (inclusive)
sed -n '/miao/,/xue/p' data.txt
# Print lines 1 to 3
sed -n '1,3p' data.txt
# Print line 3
sed -n '3p' data.txt
Adding Data
# Insert text before line 1
sed -i '1i099 huang beijing' data.txt
# Insert text before line 3
sed -i '3i111 ming shanghai' data.txt
# Append text after the last line
sed -i '$a107 huang beijing' data.txt
# Append text after line 4
sed -i '4a112 shen guangzhou' data.txt
# Insert text before and after lines containing 'hai'
sed -i -e '/hai/ihaiqian' -e '/hai/ahaihou' data.txt
# Append multiple lines at the end
sed '$achengdu01\nchengdu02' data.txt
Deleting Data
# Delete line 3
sed -i '3d' data.txt
# Delete lines 1 to 3
sed -i '1,3d' data.txt
# Delete lines containing 'haiqian'
sed -i '/haiqian/d' data.txt
# Delete lines 3 and 5
sed -i '3d;5d' data.txt
# Delete empty lines
sed -i '/^$/d' data.txt
Modifying Data
# Replace all occurrences of 'python' with 'java'
sed -i 's#python#java#g' data.txt
# Replace with backup creation
sed -i.bak 's#python#java#g' data.txt
# Extract IP address from eth0 interface
ip a s eth0 | sed -n '3p' | sed -r 's#.*net(.*)/24 brd (.*) scope.*#\1#g'
# Batch rename .txt files to .png
ls *.txt | sed -r 's#(.*).txt#mv & \1.png#g' | bash
awk: Pattern Scanning and Processing Language
Basic Usage
# Print first field using ':' as delimiter
awk -F ':' '{print $1}' data.txt
# Print entire line
awk -F ':' '{print $0}' data.txt
# Print first three fields separated by ':'
awk -F ':' '{print $1,$2,$3}' data.txt
# Print first three fields with custom separator
awk -F ':' '{print $1"#"$2"#"$3}' data.txt
# Filter docker images with 'months' OR 'GB'
docker images | awk '/months|GB/{print $1":"$2}'
# Filter docker images with 'months' AND 'GB'
docker images | awk '/months/&&/GB/{print $1":"$2}'
Pattern Matching
# Print lines containing 'oo'
awk '/oo/' data.txt
# Print lines where first field contains 'oo'
awk -F ':' '$1~/oo/' data.txt
# Print lines where first field matches regex 'o+'
awk -F ':' '$1~/o+/' data.txt
# Print first field for lines containing 'root' OR 'user'
awk -F ':' '/root|user/{print $1}' data.txt
Numeric Matching
# Print lines where third field equals 1
awk -F ':' '$3==1' data.txt
# Print first field where third field equals 4
awk -F ':' '$3==4 {print $1}' data.txt
Custom Output Field Separator
# Print first three fields with '##' separator
awk -F ':' '{OFS="##"}{print $1,$2,$3}' data.txt
Conditional Statements
# Print first and third fields where third field > 7
awk -F ':' 'OFS="#" {if ($3>7) print $1,$3}' data.txt
NF (Number of Fields)
# Print last field using '/' as delimiter
awk -F '/' '{print $NF}' data.txt
NR (Record Number)
# Print lines with line numbers
awk -F ':' '{print NR":"$0}' data.txt
# Print first 6 lines
awk -F ':' 'NR<=6' data.txt
# Print lines <=9 where first field contains 'root'
awk -F ':' 'NR<=9 && $1~/root/' data.txt
# Print field number equal to line number and last field
awk -F ':' '{print $NR":"$NF}' data.txt
$0 (Entire Line)
# Print lines with line numbers
awk '{print NR":"$0}' data.txt
Assignment
# Set all first fields to 'root'
awk -F ':' '$1="root"' data.txt
# Set all first fields to 'root' with ':' separator
awk -F ':' '{OFS=":"}$1="root"' data.txt
Summation and Counting
# Count empty lines in /etc/services
awk '/^$/{count=count+1}END{print count}' /etc/services
# Count users with shell ending in 'bash'
awk '/bash$/{count=count+1}END{print count}' /etc/passwd
# Count users without 'bash' shell
awk '!/bash$/{count=count+1}END{print count}' /etc/passwd
# Count TCP connections by state
netstat -an | awk '/^tcp/{++state[$NF]}END{for (s in state) print s, state[s]}'