Home > Tech > Content

Essential Linux Command-Line Text Processing Utilities

Tech May 12 15

Core Text Processing Utilities

Linux systems rely heavily on text manipulation, governed primarily by three tools known as the 'text processing triad': grep, sed, and awk.

grep: Searches text using patterns defined by regular expressions.
sed: A stream editor for filtering and transforming text.
awk: A specialized language designed for text reporting and data formatting.

Grep: Pattern Matching and Extraction

The grep command (Global search Regular Expression and Print out the line) searches input files for lines matching a specific pattern. It supports three distinct regex dialects:

grep: Standard Basic Regular Expressions (BRE).
egrep (or grep -E): Extended Regular Expressions (ERE).
fgrep (or grep -F): Fixed strings, interpreting pattern literally without regex parsing.

Common Command Options

--color=auto: Highlights matching text.
-i: Case-insensitive search.
-v: Inverts the match, showing non-matching lines.
-o: Outputs only the matched string, not the full line.
-q: Quiet mode; returns exit status without output.
-A n, -B n, -C n: Displays n lines After, Before, or Context (both) around the match.

Regular Expression Fundamentals

Regular expressions define the search pattern using metacharacters.

Character Matching:

.: Matches any single character.
[]: Matches any single character within the brackets.
[^]: Matches any character NOT in the brackets.
POSIX classes like [:digit:], [:alpha:], and [:space:] can be used inside brackets.

Quantifiers (BRE syntax):

*: Matches the preceding character zero or more times.
.*: Matches any sequence of characters.
\?: Matches zero or one time.
\+: Matches one or more times.
\{m,n\}: Matches between m and n times.

Anchors:

^: Anchors to the start of the line.
$: Anchors to the end of the line.
\<, \b: Anchors to the start of a word.
\>, \b: Anchors to the end of a word.

Practical Grep Examples

To display lines in /etc/passwd that do NOT end with /sbin/nologin:

grep -v "/sbin/nologin$" /etc/passwd

To find empty lines or lines containing only whitespace:

grep "^[[:space:]]*$" filename.txt

To match a complete word (e.g., 'root') using word boundaries:

grep "\<root\>" /etc/passwd

Grouping and Back References

Patterns can be grouped using  in BRE. The matched content is stored in registers (\1, \2) for later reference.

For example, to find lines where a word appears twice in sequence, given a file repetition.txt with content:

Time after time.
Win win situation.

The command would look for a pattern and reference it back:

grep "\(\<[a-z]\+\>\).*\1" repetition.txt

Egrep: Extended Regular Expressions

egrep simplifies syntax by removing the need to escape metacharacters. Quantifiers like +, ?, and {} are used without backslashes. It also introduces the logical OR operator |.

To search for lines starting with 'S' or 's' in /proc/meminfo:

egrep "^(s|S)" /proc/meminfo

To match numbers between 0 and 255 (useful for IP parsing logic):

egrep -o "\<([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\>"

Supplementary Text Utilities

Beyond searching, Linux provides tools for cutting, sorting, and analyzing text streams.

cut

Extracts specific sections from each line of a file.

# Extract the 1st field using ':' as delimiter
cut -d':' -f1 /etc/passwd

sort

Sorts lines of text files. Keys include -n (numeric sort), -r (reverse), -u (unique), and -k (field number).

# Sort numerically by the 3rd field
sort -t':' -k3 -n /etc/passwd

uniq

Filters adjacent matching lines. Often combined with sort.

sort data.log | uniq -c

wc

Counts lines, words, and bytes.

# Count lines in a file
wc -l /etc/passwd

diff

Compares files line by line.

diff original.file modified.file

Back to List

Prev: Implementing AutoMapper for Model Mapping in Prism Applications

Next: Spring Boot Fundamentals and Project Setup Guide

Fading Coder

Essential Linux Command-Line Text Processing Utilities

Core Text Processing Utilities

Grep: Pattern Matching and Extraction

Common Command Options

Regular Expression Fundamentals

Practical Grep Examples

Grouping and Back References

Egrep: Extended Regular Expressions

Supplementary Text Utilities

cut

sort

uniq

wc

diff

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Essential Linux Command-Line Text Processing Utilities

Core Text Processing Utilities

Grep: Pattern Matching and Extraction

Common Command Options

Regular Expression Fundamentals

Practical Grep Examples

Grouping and Back References

Egrep: Extended Regular Expressions

Supplementary Text Utilities

cut

sort

uniq

wc

diff

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment