Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Essential Shell Scripting Tips and Tricks for Bioinformatics Workflows

Tech 1

Batch File Processing Example

#!/bin/bash

python preprocess_annotation.py -i wheat_annotation.gff3 -o wheat_annotation_filtered.gff4

base_fasta="wheat_transcripts.fasta"
annotation_gff="wheat_annotation_filtered.gff4"

motif_types=("G4" "C4" "A4" "T4")

for type in "${motif_types[@]}"
do
    python extract_motifs.py -f1 "$base_fasta" -f2 "motifs_${type}_raw.fasta"

    Rscript calculate_score.R -i "motifs_${type}_raw.fasta" -o "motifs_${type}_scored.fasta"

    python annotate_results.py -g "$annotation_gff" -f "motifs_${type}_scored.fasta" -o "motifs_${type}_annotated.fasta"
    
    less "motifs_${type}_annotated.fasta" | grep -E 'Id|five' > "five_${type}_total.txt"
    less "motifs_${type}_annotated.fasta" | grep -E 'Id|CDS' > "cds_${type}_total.txt"
    less "motifs_${type}_annotated.fasta" | grep -E 'Id|three' > "three_${type}_total.txt"
    cp "motifs_${type}_annotated.fasta" "mrna_${type}_total.txt"
    
    python filter_nonoverlap.py -f1 "five_${type}_total.txt" -f2 "five_${type}_filtered.txt"
    python filter_nonoverlap.py -f1 "cds_${type}_total.txt" -f2 "cds_${type}_filtered.txt"
    python filter_nonoverlap.py -f1 "three_${type}_total.txt" -f2 "three_${type}_filtered.txt"
    python filter_nonoverlap.py -f1 "mrna_${type}_total.txt" -f2 "mrna_${type}_filtered.txt"
done

for type in "${motif_types[@]}"
do
    mv "five_${type}_filtered.txt" "five_${type}_total.txt"
    mv "cds_${type}_filtered.txt" "cds_${type}_total.txt"
    mv "three_${type}_filtered.txt" "three_${type}_total.txt"
    mv "mrna_${type}_filtered.txt" "mrna_${type}_total.txt"

    rm "motifs_${type}_raw.fasta"
    rm "motifs_${type}_scored.fasta"
    rm "motifs_${type}_annotated.fasta"
done

Processs Management

Terminating Background Jobs

To terminate a running background script, first identfiy the process ID:

ps aux | grep batch_processing_pipeline.sh
kill -SIGINT 2852911

Monitoring Thread Usage

Check the number of threads (lightweight processes) consumed by a specific process:

ps -o nlwp,pid,cmd -C python

File Manipulation

Removing Carriage Returns

Convert Windows-style line endings to Unix format:

sed -i 's/\r//g' analysis_script.R

In-place String Replacement

Replace all occurrences of a substring across an entire file:

sed -i 's/Chr//g' coordinates.bed

Directory Operations

Safe Directory Creation

Recreate a directory by removing it first if it exists, then creating it:

DIRECTORY="analysis_output"

if [ -d "$DIRECTORY" ]; then
    rm -rf "$DIRECTORY"
    echo "Directory $DIRECTORY removed."
fi

mkdir "$DIRECTORY"
echo "Directory $DIRECTORY created."

Shell Navigation

Return to Previous Directory

Jump back to the last working directory (not the parent directory):

cd -

Clear Command Line

Clear text after the cursor position:

Ctrl + K

Clear text before the cursor position:

Ctrl + U

Text Processing

Deduplication by Column

Remove duplicate lines based on a specific column without sorting:

less data.txt | awk '!seen[$2]++'

Bioinformatics Utilities

Quick Sequence Length Retrieval

Fetch a specific transcript from a FASTA file and display its length:

seqkit grep -p "TraesCS5D03G0974400.1" transcripts.fa | seqkit fx2tab -l -n

Vim Text Editing

Bulk Commenting

Comment out all lines containing a specific pattern:

:argdo %s/^.*pattern.*$/# &/ | update

Directory Statistics

Count Directories in Current Path

ls -l | grep ^d | wc -l

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.