Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

MOCAT2: Installation, Configuration, and Usage Guide for Metagenomic and Metatranscriptomic Analysis

Tech 1

Common Modules and Output Files in MOCAT2

1. mocat_preprocessing Module:

  • Output Files:
    • clean_reads_1.fastq, clean_reads_2.fastq: Sequencing data after quality control and preprocessing.
    • summary_statistics.txt: Statistical information about the quality control steps, such as sequence counts and quality score statistics.

2. mocat_assembly Module:

  • Output Files:
    • contigs.fasta: Assembled contig sequences.
    • assembly_stats.txt: Statistics on assembly quality and performance, including N50, maximum/minimum contig lengths, etc.

3. mocat_analysis Module:

  • Output Files:
    • blast_results.txt: Results from BLAST annotation, showing sequence similarity to reference databases.
    • gene_catalog.fasta: Gene catalog sequences generated based on alignment results.
    • functional_annotation.txt: Functional annotation results, including gene or sequence functional descriptions, KEGG or COG ennotations, etc.
    • classification_results.txt: Classification results, displaying taxonomic information for sequences or genes, such as strain, genus, or phylum-level classifications.

4. mocat_metaquant Module (Optional, for quantitative analysis):

  • Output Files:
    • gene_abundance_table.txt: Gene abundance table, showing estimated abundance of each gene in samples.
    • transcript_abundance_table.txt: Transcript abundance table, showing estimated abundance of transcripts in samples.
    • Other files may include sample abundance information.

Notes:

  • The format and content of output files generated by each module may vary depending on applied parameters and experimental design.
  • Information in the result files helps researchers understand data quality, sequence annotation, assembly quality, and functional annotation.
  • Data in output files are typically presented in text or FASTA formats and can be viewed and further analyzed using text editors or specialized bioinformatics software.

MOCAT2 Usage Workflow

Data Preparation:

  • Obtain metagenomic/metatranscriptomic sequencing data in FASTQ format.
  • Prepare reference databases, such as genome databases or functional annotation databases.

Running MOCAT2:

Main modules and example commands for MOCAT2 are as follows:

mocat_preprocessing: Perform quality control and preprocessing.

mocat_preprocessing -t 4 -o output_directory --input-files reads_1.fastq,reads_2.fastq

mocat_assembly: Execute sequence assembly.

mocat_assembly -t 4 -o output_directory --input-files reads_1.fastq,reads_2.fastq

mocat_analysis: Conduct functional annotation and classification analysis.

mocat_analysis -t 4 -o output_directory --input-files assembly.fa

Here, the -t option specifies the number of threads, -o specifies the output directory, and --input-files specifies the input files.

Result Interpretation and Analysis:

Output files generated by MOCAT2 include assembled sequences, annotation results, and classification information. These results can be further interpreted and analyzed using other tools or analysis pipelines.

Example Code

Below is a simple Shell script example demonstrating a basic analysis workflow using MOCAT2:

# Quality control and preprocessing
mocat_preprocessing -t 4 -o preprocessing_output --input-files reads_1.fastq,reads_2.fastq

# Sequence assembly
mocat_assembly -t 4 -o assembly_output --input-files preprocessing_output/clean_reads_1.fastq,preprocessing_output/clean_reads_2.fastq

# Functional annotation and classification analysis
mocat_analysis -t 4 -o analysis_output --input-files assembly_output/contigs.fasta

Notes:

  • MOCAT2 offers a wide range of features and modules; specific usage methods and parameter settings should be adjusted based on data type and experimental design.
  • The analysis process may require significant time and computational resources, especially for large-scale metagenomic/metatranscriptomic data.
  • Depending on data type and analysis needs, further downstream analysis and interpretation may be necessary.

Full Parameter Help Information for MOCAT.pl

MOCAT.pl --help
===============================================================================
                  MOCAT - Metagenomics Analysis Toolkit                 v2.1.3
 by Jens Roat Kultima, Luis Pedro Coelho, Shinichi Sunagawa @ Bork Group, EMBL
===============================================================================

                    Full manual & FAQ: MOCAT.pl -man

                    How to cite MOCAT: MOCAT.pl -cite

            Have you tried the wrapper runMOCAT.sh? Try it!

Usage: MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]

 'FILE'
   Contains the list of folder names (sample names), one per line,
   in which the raw sample data is located

Examples

Process, Assemble, Revise Assembly, Predict Genes, cluster genes into gene catalog, annotate gene catalog, profile against gene catalog
                            MOCAT.pl -sf my.samples -rtf
                            MOCAT.pl -sf my.samples -a
                            MOCAT.pl -sf my.samples -gp assembly
                            MOCAT.pl -sf my.samples -make_gene_catalog -assembly_type assembly
                            MOCAT.pl -sf my.samples -annotate_gene_catalog
                            MOCAT.pl -sf my.samples -s my.samples.padded -identity 95
                            MOCAT.pl -sf my.samples -f my.samples.padded -identity 95
                            MOCAT.pl -sf my.samples -p my.samples.padded -identity 95 -mode functional

Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
  (no screen)               MOCAT.pl -sf my.samples -a
                            MOCAT.pl -sf my.samples -gp assembly
  fetch marker genes:       MOCAT.pl -sf my.samples -fmg assembly
                            MOCAT.pl -sf my.samples -ss

Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
  (DB screen)               MOCAT.pl -sf my.samples -s hg19 -screened_files -identity 90
                            MOCAT.pl -sf my.samples -a -r hg19
                            MOCAT.pl -sf my.samples -gp assembly -r hg19
                            MOCAT.pl -sf my.samples -ss

Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
  (remove eg. adapters      MOCAT.pl -sf my.samples -sff adapters.fa -screened_files
   and then DB screen)      MOCAT.pl -sf my.samples -bwa hg19 -r adapters.fa  -screened_files
                            MOCAT.pl -sf my.samples -a -r screened.adapters.fa.on.hg19
                            MOCAT.pl -sf my.samples -gp assembly -r screened.adapters.fa.on.hg19
                            MOCAT.pl -sf my.samples -ss

Pipeline Options

 -r|reads ['reads.processed', 'DATABASE' or 'FASTA FILE']
   Required for all pipeline options, except rtf|read_trim_filter
   Specify whether processing trim & filtered, or screened reads.
   A default value to this setting can also be specified in config file

 -e|extracted
   Optional for all pipeline options, except rtf|read_trim_filter, see full manual


 -rtf|read_trim_filter
   performs trimming and filtering of reads

 -a|assembly
   Performs assembly of reads

 -ar|assembly_revision
   Further improves assemblies

 -gp|gene_prediction ['assembly', 'assembly.revised']
   Predicts protein coding genes on assemblies

 -fmg|fetch_mg ['assembly', 'assembly.revised']
   Extracts marker genes among the predicted genes

 -soap|bwa ['DB1 DB2 ...',s,c,f,r]
   Screen, extract and map reads against a reference databse (hg19 is provided) or (s)acftigs,
   (c)ontigs, sca(f)folds from an assembly, or scaftigs from a (r)evised assembly.
   This mapping step uses SOAPaligner2 (soap) or BWA (bwa).
   Additional options:
    -screened_files : If set, screened read files are generated, these are reads not matching the DB
    -extracted_files : If set, extracted read files are generated, these are reads matching the DB
    -use_mem  : If set, copies the DB into memory for faster loading

 -sff|screen_fastafile 'FASTA FILE'
   Same as 's|screen' above, but uses USearch, rather than SOAPaligner2.

 -fsoap ['DB1 DB2 ...',s,c,f,r]
   Filter screened reads, (s)caftigs, (c)ontigs, sca(f)folds or (r)evised assembly scaftigs
    at higher %ID and length cutoff. This step has to be run before calculating profiles if the option soap was used

   Additional options:
    -shm   : If set, faster, but saves data for the filtering step in /dev/shm/<USER>
	
 -psoap|pbwa ['DB1 DB2 ...',s,c,f,r] -m|mode [gene, NCBI, mOTU, functional] -o [OUTPUT FOLDER]
   Generate gene, mOTU, NCBI or functional profiles on filtered reads,
   (s)caftigs, (c)ontigs, sca(f)folds or (r)evised assembly scaftigs. 
   If -mode is set to either NCBI or mOTU, it is expected that the 
   reads have been correctly mapped to the corresponding databases.
   Specify psoap if you used the command 'soap' previously, and 'pbwa' if you used 'bwa'.
   Additional options:
    -no_horizontal : No not calculate horizontal gene & functional coverages
    -verbose       : Prints extra information about status of profiling steps
    -shm           : Faster, but saves 2-5 GB of data for the profiling step in /dev/shm/<USER>
    -uniq          : Specify this flag if you find duplicated row names
                     (e.g. if you have mapped to a DB where the same reference appears multiple times)

Available modules

 These are installed in the folder /nfs/data/Downloads/mocat2/stable/2.1.3/mod
 Each module requires a NAME.sh and NAME.cfg file inside the NAME folder

 -annotate_gene_catalog [leave empty for using sample file generated catalog or enter full path to catalog; use amino acid sequence file]
   Required options:
    -blasttype [should be "blastp" normally for amino acid sequences, but can be set to "blastx"]

 -make_gene_catalog [samples specifed in sample file will be used ot generate catalog]
   Required options:
    -assembly_type [asembly or assembly.revised]


Statistics Options

 -sfq|stats_fastqc
   Produces statistics for each lane with raw reads using the FastQC toolkit
 -ss|sample_status
   Prints a simple view how the processing status of each sample,
   and stores this in <sample_file>.status

Additional Options

 -cfg|config [file]
   Specify another config file than MOCAT.cfg
 -x|no_execute
   Only create job scripts, but don't execute them
 -nt|no_temp
   Overrides any specified temp folders config file
 -cpus [integer]
   Not recommended, but specifies a fixed number of cores for each job,
   please read the full manual using MOCAT.pl -man
 -host [hostname]
   Runs the jobs on a different host machine
 -identity [integer]
   Overrides any percentage cutoff setting in cfg file
 -length [integer]
   Overrides any length cutoff setting in cfg file

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.