When studying how environmental factors shape biodiversity, correlation analysis is usually the first stop. But which method should you use — Pearson, Spearman, or Kendall? Pick wrong and you might draw misleading conclusions.
This tutorial walks through the complete workflow using real-world ecological data (50 sites × 12 variables), covering method selection, correlation computation, significance testing, and corrplot visualization at publication quality.
Quick Reference: Which Method to Use?
| Method | Normality Required | Relationship | Outlier Sensitivity | Best For |
|---|---|---|---|---|
| Pearson | Yes | Linear | Sensitive | Normally distributed continuous data |
| Spearman | No | Monotonic | Robust | Default for ecological data ✅ |
| Kendall | No | Monotonic | Robust | Small samples (n < 30) with many ties |
Rule of thumb: Run Shapiro-Wilk first. If all variables are normal → Pearson. Otherwise → Spearman (most common in ecology).
Data Overview
The dataset contains 50 sampling sites with 12 variables:
Environmental factors (6): pH, TN (total nitrogen), TP (total phosphorus), SOM (soil organic matter), MAP (mean annual precipitation), MAT (mean annual temperature)
Diversity indices (6): PD (phylogenetic diversity), NRI (net relatedness index), NTI (nearest taxon index), Shannon-Wiener index, Simpson index, S (species richness)
# Load data
df <- read.csv("alpha_diversity.csv", header = TRUE)
head(df)
The data file (alpha_diversity.csv) contains 50 rows and 12 numeric columns — one row per sampling site, one column per variable.
Step 1: Normality Test
Before choosing a correlation method, test whether your data follows a normal distribution:
# Shapiro-Wilk normality test for all variables
p_values <- sapply(df, function(x) shapiro.test(x)$p.value)
norm_table <- data.frame(
Variable = names(df),
p_value = round(p_values, 4),
Normal = ifelse(p_values > 0.05, "Yes", "No")
)
print(norm_table)
In our dataset, most environmental factors and diversity indices have p < 0.05 (non-normal). This means Spearman is the appropriate choice.
Step 2: Calculate Correlation Coefficients
R's cor() function computes all three methods in one line:
# All three methods at once
pearson_r <- cor(df, method = "pearson")
spearman_r <- cor(df, method = "spearman")
kendall_r <- cor(df, method = "kendall")
The sign and relative magnitude are broadly consistent across methods. When a relationship is significant under Spearman but not Pearson, it often indicates a non-linear monotonic relationship — exactly where Spearman excels.
Step 3: Get Coefficients and p-values Together
cor() only returns the correlation matrix. For significance tests, use rcorr() from the Hmisc package:
library(Hmisc)
res <- rcorr(as.matrix(df), type = "spearman")
res$r # Correlation coefficients
res$P # p-values
Example interpretation (pH vs. Shannon):
- Correlation coefficient r = 0.54
- p < 0.001 (highly significant)
- Interpretation: Higher pH is associated with higher Shannon diversity — soil acidity is an important driver of community diversity.
# Export results
write.csv(round(res$r, 3), "spearman_corr_coef.csv")
write.csv(round(res$P, 4), "spearman_p_values.csv")
Step 4: corrplot Visualization
Let's build up from a basic plot to a publication-ready figure. All examples use corrplot.
4.1 Basic corrplot — One Line
library(corrplot)
corrplot(res$r)
This gives you a colored circle matrix out of the box — blue for positive, red for negative, color intensity reflects strength. It's useful for a quick look but not publication-ready (no labels, no sorting).
4.2 Custom Colors + Upper/Lower Triangular Mix
Three improvements:
-
order = "AOE"— auto-cluster by angular order of eigenvectors - Upper triangle: colored circles; Lower triangle: numbers
- Custom blue-white-red color palette
corrplot(res$r, order = "AOE", type = "upper", tl.pos = "d",
tl.col = "black", tl.cex = 0.8,
col = colorRampPalette(c("#0072B5", "white", "#BC3C29"))(200))
corrplot(res$r, add = TRUE, type = "lower", method = "number",
order = "AOE", diag = FALSE, tl.pos = "n", cl.pos = "n",
col = "black", number.cex = 0.7)
Key parameters:
-
order = "AOE": Reorders variables to reveal clustered structure -
add = TRUE: Overlays the second layer (numbers) on top of the first (circles) -
tl.pos = "d": Places variable labels on the diagonal to avoid overlap
4.3 Publication Standard: Significance Star Annotation ⭐
This is the format most commonly seen in peer-reviewed papers — all correlations shown, significant ones marked with stars (* p < 0.05, ** p < 0.01, *** p < 0.001):
p_mat <- res$P
diag(p_mat) <- 1
order_aoe <- corrMatOrder(res$r, order = "AOE")
corrplot(res$r[order_aoe, order_aoe], type = "upper", tl.pos = "d",
tl.col = "black", tl.cex = 0.8,
col = colorRampPalette(c("#0072B5", "white", "#BC3C29"))(200),
p.mat = p_mat[order_aoe, order_aoe],
sig.level = c(0.001, 0.01, 0.05),
insig = "label_sig",
pch.cex = 1.0, pch.col = "black")
corrplot(res$r[order_aoe, order_aoe], add = TRUE, type = "lower",
method = "number", diag = FALSE, tl.pos = "n", cl.pos = "n",
col = "black", number.cex = 0.7)
Critical details:
-
sig.levelmust be in ascending order (0.001, 0.01, 0.05) -
insig = "label_sig"annotates significant correlations with stars -
p.matmust use the same AOE ordering as the correlation matrix to keep upper and lower triangles aligned - Pre-compute
order_aoewithcorrMatOrder()and apply it to bothres$randp_mat
Step 5: Reading the Results
The corrplot reveals clear patterns in environment–diversity relationships:
Environment vs. Diversity:
- pH positively correlated with Shannon (r = 0.54, p < 0.001), Simpson (r = 0.46, p < 0.001), and S (r = 0.44, p < 0.01) — near-neutral to alkaline soils support higher diversity
- TN shows consistent positive correlations with diversity indices (r ≈ 0.43–0.53) — nitrogen availability is another key driver
- SOM only weakly correlates with PD (r = 0.32, p < 0.05)
Among diversity indices:
- Shannon and Simpson are strongly correlated (r = 0.85) — they capture overlapping information
- Shannon and S also highly correlated (r = 0.74) — Shannon is inherently influenced by richness
- NRI and NTI moderately correlated (r = 0.53) — both reflect phylogenetic structure
Method note: Comparing Pearson and Spearman results, most significant relationships agree in direction, but Spearman's rank correlation is more robust when normality assumptions are violated. Spearman is recommended as the default in ecological studies.
Complete Runnable Script
# ============================================================
# Complete Correlation Analysis: Pearson / Spearman / Kendall
# ============================================================
# First run: install.packages(c("Hmisc", "corrplot"))
library(Hmisc)
library(corrplot)
# --- Load data ---
df <- read.csv("alpha_diversity.csv", header = TRUE)
# --- Normality test ---
p_vals <- sapply(df, function(x) shapiro.test(x)$p.value)
data.frame(Variable = names(df), p_value = round(p_vals, 4),
Normal = ifelse(p_vals > 0.05, "Yes", "No"))
# --- Three correlation methods ---
cor(df, method = "pearson")
cor(df, method = "spearman")
cor(df, method = "kendall")
# --- Spearman with p-values ---
res <- rcorr(as.matrix(df), type = "spearman")
res$r
res$P
# --- Visualization ---
# 1) Basic
pdf("corrplot_basic.pdf", width = 10, height = 8)
corrplot(res$r)
dev.off()
# 2) Custom colors + upper/lower mix
pdf("corrplot_color.pdf", width = 10, height = 8)
corrplot(res$r, order = "AOE", type = "upper", tl.pos = "d",
tl.col = "black", tl.cex = 0.8,
col = colorRampPalette(c("#0072B5", "white", "#BC3C29"))(200))
corrplot(res$r, add = TRUE, type = "lower", method = "number",
order = "AOE", diag = FALSE, tl.pos = "n", cl.pos = "n",
col = "black", number.cex = 0.7)
dev.off()
# 3) Publication standard with significance stars
p_mat <- res$P
diag(p_mat) <- 1
order_aoe <- corrMatOrder(res$r, order = "AOE")
pdf("corrplot_significance.pdf", width = 10, height = 8)
corrplot(res$r[order_aoe, order_aoe], type = "upper", tl.pos = "d",
tl.col = "black", tl.cex = 0.8,
col = colorRampPalette(c("#0072B5", "white", "#BC3C29"))(200),
p.mat = p_mat[order_aoe, order_aoe],
sig.level = c(0.001, 0.01, 0.05),
insig = "label_sig",
pch.cex = 1.0, pch.col = "black")
corrplot(res$r[order_aoe, order_aoe], add = TRUE, type = "lower",
method = "number", diag = FALSE, tl.pos = "n", cl.pos = "n",
col = "black", number.cex = 0.7)
dev.off()
To use with your own data: Prepare a CSV file where each column is a numeric variable and each row is an observation. Replace "alpha_diversity.csv" in the read.csv() call with your file name.
Key Takeaways
- Check normality first — Shapiro-Wilk test tells you which correlation method to use
- Spearman is the safe default for ecological and environmental data
-
corrplot with
order = "AOE"reveals clustered structure automatically - Significance star annotations make your figure publication-ready
- Correlation ≠ causation — even r = 0.9 doesn't mean A causes B
- Multiple testing — with 12 variables (66 pairs), some significant results may be random. Consider Bonferroni or FDR correction for strict inference
Need help with R data analysis? I provide data analysis, visualization, and publication-ready figure services.
Follow me on Dev.to for more R tutorials. GitHub and project links available on request.
Let me know in the comments if you have questions or want the example dataset!




Top comments (0)