Ecological Phylogenetics: R Packages, File Formats, and Construction Workflows
R Packages for Phylogenetic Analysis
Several specialized R packages facilitate the handling and analysis of phylogenetic data in ecological contexts:
- pegas: Processes population genetics data, accepting tabular genetic markers and population details to compute gene flow, population structure, and genetic diversity.
- adegenet: Handles polymorphic datasets for multivariate analysis. It accepts tabular genetic data to perform clustering, discriminant analysis, and spatial structure detection.
- poppr: Designed specifically for population genetics involving clonal or repeated measurements. It provides tools for clone correction, linkage disequilibrium, and customized visualization of reproductive patterns.
- phrapl: Analyzes phylogeographic region-based datasets. It takes regional data and tree objects to infer population histories across defined geographic areas.
- treeio: A robust utility for importing and exporting diverse tree file formats (Newick, Nexus, PHYLOXML, NeXML, Jplace) into tidy data structures in R.
- ggtree: An extension of ggplot2 tailored for phylogenetic visualization. It accepts
apeortreeioobjects to render highly customizable vector graphics (SVG/PDF).
Phylogenetic File Formats and Generation Tools
-
Newick (
.tre,.nwk):- Tools: PhyML, RAxML, FastTree, MrBayes
- Input: Multiple sequence alignment (e.g., FASTA)
- Output: Text-based tree representation with branch lengths and node support.
- Example (RAxML):
raxmlHPC -s input_aln.fasta -n result -m GTRCAT
-
Nexus (
.nex):- Tools: PAUP*, TNT, MrBayes
- Input: Aligned sequences alongside analysis metadata
- Output: Comprehensive file holding sequence data, model parameters, and tree topologies.
- Example (MrBayes):
mb run_mrbayes.nex
-
NHX (
.nhx):- Tools: ETE Toolkit, PhyloXML
- Input: Standard Newick trees
- Output: Extended Newick format embedding additional metadata (e.g., duplication events, bootstrap values).
-
Phylip (
.phy):- Tools: PHYLIP, RAxML, IQ-TREE
- Input: Multiple sequence alignment
- Output: Sequential or interleaved text mapping taxa names to their sequences.
- Example (RAxML):
raxmlHPC-PTHREADS -s input_aln.phy -n result -m GTRGAMMA -T 4
-
Jplace (
.jplace):- Tools: EPA, pplacer
- Input: Query sequences and a reference tree
- Output: JSON format detailing the placement of query sequences onto specific branches of the reference tree.
-
Beast (
.xml):- Tools: BEAST, BEAUti
- Input: Aligned sequences and MCMC configuration
- Output: XML configuration file encoding all parameters required for Bayesian phylogenetic inference.
-
PhyloXML (
.phyloxml):- Tools: PhyloXML Tools, ETE Toolkit
- Input: Various phylogenetic data
- Output: XML schema designed for extensive annotation and metadata storage alongside tree topologies.
-
NeXML (
.nexml):- Tools: nexml-python, CDAOtools
- Input: Various phylogenetic data
- Output: XML standard supporting rich metadata and semantic web ontologies.
-
NHWT (
.tre):- Tools: Dendroscope
- Input: Standard Newick files
- Output: Binary format optimized for rapid loading and rendering of massive trees within Dendroscope.
-
PDF / SVG (
.pdf,.svg):- Tools: FigTree, iTOL, EvolView
- Input: Standard tree formats
- Output: Scalable vector graphics for publication-ready figures.
-
CSV / TSV (
.csv,.tsv):- Tools: Custom R/Python scripts, ete3, biopython
- Input: Tree objects
- Output: Tabular extraction of node attributes, branch lengths, or tip labels for downstream statistical analysis.
-
JSON (
.):- Tools: Custom scripts, PhyloCanvas
- Input: Tree objects
- Output: Lightweight data-interchange format ideal for rendering trees in web browsers.
Phylogenetic Construction Workflows
Distance-Based Tree Construction with ape
Input data for distance-based methods typicallly consists of FASTA sequences:
text
Sample_A ATCGATCGATCG Sample_B ATCGATCGTGC Sample_C ATCGTAGCTAG
The workflow involves reading the alignment, computing a pairwise distance matrix, constructing the tree using the Neighbor-Joining algorithm, and plotting the result.
R
Load the package
library(ape)
Import FASTA sequences
dna_seqs <- read.fasta("gene_alignment.fasta")
Ensure sequences are formatted as a DNAbin alignment matrix
dna_bin <- as.DNAbin(dna_seqs)
Calculate the genetic distance using the Kimura 2-parameter model
k2p_dist <- dist.dna(dna_bin, model = "K80")
Build the Neighbor-Joining phylogeny
nj_phylogeny <- nj(k2p_dist)
Plot the unrooted tree with adjusted label formatting
plot(nj_phylogeny, type = "unrooted", cex = 0.7, label.offset = 0.4)
Trait Mapping with phytools
Phytools enables the integration of phenotypic trait data with phylogenetic trees. Trait data is usually stored in a CSV structure:
csv Taxa,BodyMass,Habitat Taxon_X,45.2,Forest Taxon_Y,12.8,Wetland Taxon_Z,88.1,Grassland
Prior to mapping, a Nexus tree must be generated using external software like RAxML and then imported into R.
R
Load the package
library(phytools)
Read the Nexus formatted tree
nexus_phylo <- read.nexus("phylo_output.nex")
Import trait data, setting taxa names as row identifiers
phenotype_df <- read.csv("taxa_attributes.csv", header = TRUE, row.names = 1)
Extract a specific continuous trait and map it to the tree
mapped_phylo <- trait.data(nexus_phylo, phenotype_df$BodyMass, tip.labels = nexus_phylo$tip.label)
Visualize the tree as a fan diagram with trait overlay
plotTree(mapped_phylo, type = "fan", show.tip.label = TRUE, cex = 0.6, label.offset = 0.02)
Maximum Likelihood Inference with RAxML
RAxML (Randomized Axelerated Maximum Likelihood) is optimized for large-scale ML phylogenetic inference.
Step 1: Prepare the alignment file
Ensure sequences are aligned and saved as gene_alignment.fasta.
Step 2: Execute RAxML Run the ML analysis with rapid bootstrapping:
bash raxmlHPC -s gene_alignment.fasta -n ml_run -m GTRCAT -p 91827 -N 150
Parameters:
-s: Input alignment file.-n: Suffix for output files.-m: Substitution model (GTRCAT approximates GTR+Gamma).-p: Random seed for parsimony inferences.-N: Number of bootstrap replicates.
Step 3: Visualize outputs
RAxML produces RAxML_bestTree.ml_run and RAxML_bootstrap.ml_run. These can be opened in GUI applications like FigTree for annotation, rooting, and exporting publication-quality images.
Visualizing Trees with FigTree
FigTree is a graphical application for rendering and editing phylogenetic trees. After launching the program, import the RAxML output via File > Open. The interface allows users to collapse clades, adjust branch line widths, colorize specific taxa, and annotate node support values. Finished graphics can be exported directly as SVG or PDF for further editing in vector illustration software.