Home > Tech > Content

Visualizing PCA in R with ggbiplot: sample scatter, group ellipses, and loading vectors

Tech Apr 1 19

Overview

ggbiplot is a small R package that renders PCA results with ggplot2. It works directly with prcomp() or princomp() outputs and can:

Plot samples in the PCA space
Color by groups with optional Normal ellipses
Display variable loading vectors and an optional correlation circle
Produce scree plots of explained variance

The package provides two main functions: ggbiplot() for biplots and ggscreeplot() for variance diagnostics.

Installation and quick start

# Install from GitHub (pick one approach)
# install.packages("devtools"); devtools::install_github("vqv/ggbiplot")
# or
# install.packages("remotes"); remotes::install_github("vqv/ggbiplot")

library(ggbiplot)
library(ggplot2)

# Demo data bundled with ggbiplot
data(wine)              # loads 'wine' and 'wine.class'

# Fit PCA (scale to unit variance so loadings are comparable)
wine_fit <- prcomp(wine, scale. = TRUE)

# Scree plot of explained variance
# (useful to decide how many PCs to inspect)
ggscreeplot(wine_fit, type = "bar", ncomp = 6)

# Biplot with groups, 68% Normal ellipses, and the correlation circle
p <- ggbiplot(
  wine_fit,
  obs.scale  = 1,
  var.scale  = 1,
  groups     = wine.class,
  ellipse    = TRUE,
  circle     = TRUE,
  alpha      = 0.8,
  varname.abbrev = TRUE
) +
  scale_color_discrete(name = NULL) +
  theme(legend.position = "top", legend.direction = "horizontal")

p

# For comparison: a very plain base R scatter of the first two PCs
plot(wine_fit$x[, 1:2], pch = 19, col = as.integer(wine.class))

How to read the biplot

PC1 and PC2 axes show the fraction of total variance explained by each component
Points are samples; color encodes group membership
Ellipses summarize the group distribution under a Normal asssumption (by default, ~68% probability mass)
Arrows are variable loadings; arrow length indicates contribution magnitude and direction indicates correlation with the PCs

Working with microbiome OTU tables

The following example demonstrates PCA on an OTU count matrix with a sample-level design file. Replace file paths with your own.

library(ggbiplot)
library(ggplot2)

# Read study design (rows are samples)
meta <- read.table("design.txt", header = TRUE, row.names = 1, sep = "\t", check.names = FALSE)

# Read OTU/ASV table (rows are taxa/OTUs, columns are samples)
otu_raw <- read.delim("otu_table.txt", header = TRUE, row.names = 1, sep = "\t", check.names = FALSE)

# Align samples present in both tables and preserve a common order
common_ids <- intersect(rownames(meta), colnames(otu_raw))
meta_sub   <- meta[common_ids, , drop = FALSE]
counts     <- otu_raw[, common_ids, drop = FALSE]

# PCA on samples: transpose so samples are rows
# Scale = TRUE treats all OTUs on a comparable scale (useful when counts differ widely)
ota_fit <- prcomp(t(counts), scale. = TRUE)

# Group by a design factor (e.g., genotype)
# Replace 'genotype' with the column you want to use
b <- ggbiplot(
  ota_fit,
  groups     = meta_sub$genotype,
  ellipse    = TRUE,
  obs.scale  = 1,
  var.scale  = 1,
  var.axes   = FALSE,
  alpha      = 0.7
) +
  scale_color_brewer(palette = "Set1", name = "Group") +
  theme_minimal() +
  theme(legend.position = "top")

b

If groups are well separated along PC1/PC2, that indicates strong between-group structure in the data.

Focusing on high-variance taxa and showing correlations

To highlight taxa that drive separation, subset to the most variable features and re-run the PCA with loading vectors and the correlation circle.

# Convert counts to column-wise percentages (relative abundance)
rel_abund <- sweep(counts, 2, colSums(counts, na.rm = TRUE), FUN = "/") * 100

# Option A: keep taxa with MAD > 0.5
mad_vals <- apply(rel_abund, 1, mad, na.rm = TRUE)
high_var_taxa <- rel_abund[mad_vals > 0.5, , drop = FALSE]

# Option B: take the top N most variable taxa by MAD
N <- 6
ord <- order(mad_vals, decreasing = TRUE)
high_var_topN <- rel_abund[ord[seq_len(min(N, length(ord)))], , drop = FALSE]

# Choose one of the subsets to proceed (here, top N)
core_mat <- high_var_topN

# PCA on the subset (transpose so samples are rows)
core_fit <- prcomp(t(core_mat), scale. = TRUE)

# Biplot with loading vectors and correlation circle enabled
c <- ggbiplot(
  core_fit,
  groups        = meta_sub$genotype,
  ellipse       = TRUE,
  var.axes      = TRUE,
  circle        = TRUE,
  obs.scale     = 1,
  var.scale     = 1,
  varname.size  = 3,
  varname.abbrev = FALSE
) +
  scale_color_brewer(palette = "Dark2", name = "Group") +
  theme_minimal() +
  theme(legend.position = "top")

c

Arrows pointing in similar directions indicate taxa positively correlated with each other and with that PC; arrows pointing in opposite directions suggest negative correlation. Longer arrows denote stronger contributions to the separation captured by the plotted PCs.

ggbiplot function arguments (quick reference)

pcobj: output of prcomp() or princomp()
choices: which PCs to plot (default c(1, 2))
scale: 1 for covariance biplot, 0 for form biplot; affects how observations and variables are scaled
obs.scale: scaling applied to the observation scores (samples)
var.scale: scaling applied to variable loadings (arrows)
pc.biplot: match biplot.princomp() conventions if TRUE
groups: factor (or vector coercible to factor) for coloring samples
ellipse: draw Normal probability ellipses per group when TRUE
ellipse.prob: probability mass for ellipses (default 0.68)
labels: optional character vector for sample labels
labels.size: size of sample labels
alpha: point transparency for samples (0 transparent, 1 opaque)
var.axes: draw variable loading vectors (arrows) when TRUE
circle: draw the unit correlation circle; meaningful when prcomp(scale.=TRUE) and var.scale = 1
circle.prob: probability used to compute the circle radius (default ≈ 0.69)
varname.size: text size for variable names
varname.adjust: nudges labels away from arrow tips (≥ 1 pushes labels farther)
varname.abbrev: abbreviate varible names when TRUE
...: additional ggplot2 layer arguments (e.g., color scales, themes)

Tags: R PCA

Back to List

Prev: Configure Samba to Share Users’ Home Directories

Next: Deploying Snort with Barnyard2 and BASE on Ubuntu Linux

Fading Coder

Visualizing PCA in R with ggbiplot: sample scatter, group ellipses, and loading vectors

Overview

Installation and quick start

How to read the biplot

Working with microbiome OTU tables

Focusing on high-variance taxa and showing correlations

ggbiplot function arguments (quick reference)

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a Comment

Copyright © fadingcoder.top

Fading Coder

Visualizing PCA in R with ggbiplot: sample scatter, group ellipses, and loading vectors

Overview

Installation and quick start

How to read the biplot

Working with microbiome OTU tables

Focusing on high-variance taxa and showing correlations

ggbiplot function arguments (quick reference)

Related Articles

Understanding Strong and Weak References in Java

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Leave a CommentCancel Reply

Copyright © fadingcoder.top

Leave a Comment