Alzheimer's disease affects over 55 million people worldwide, yet the precise molecular changes happening inside individual brain cells remain poorly understood. I wanted to dig into that question - not at the tissue level, but at single-cell resolution.
So I built a full scRNA-seq analysis pipeline in Python using Scanpy, working with a publicly available dataset of 63,608 nuclei from human prefrontal cortex tissue (sourced from CZ CELLxGENE). The donors spanned three Braak stages: 0 (cognitively normal), 2 (early Alzheimer's), and 6 (severe Alzheimer's).
Here's what I found and how I found it.
The Dataset
The data came from a study on the molecular characterisation of selectively vulnerable neurons in AD. It covers the superior frontal gyrus, a prefrontal region known to be hit hard by neurodegeneration - and includes seven major brain cell types:
- Glutamatergic neurons
- GABAergic neurons
- Oligodendrocytes
- OPCs (oligodendrocyte precursor cells)
- Astrocytes
- Microglia
- Endothelial cells
31,997 genes. 63,608 cells. Three disease stages. A lot to work with.
The Pipeline
1. Quality Control
No dataset is clean out of the box. I filtered cells to keep only those with between 200 and 6,000 detected genes, and excluded anything with more than 20% mitochondrial gene content (high mitochondrial reads usually signal a dying or damaged cell). This removed around 2,809 low-quality cells.
2. Normalisation
Library sizes were normalised to 10,000 counts per cell, followed by log1p transformation, standard practice that makes cells comparable regardless of how deeply they were sequenced. I then identified 5,607 highly variable genes to focus the downstream analysis.
3. Dimensionality Reduction
PCA (50 components) → neighbourhood graph (10 neighbours, 20 PCs) → UMAP embedding.
The UMAP is where the biology starts to become visible. All seven cell types separated into distinct clusters, with clear separation between neuronal subtypes and glial populations.
4. Differential Expression
For the microglial analysis, I used a Wilcoxon rank-sum test comparing AD vs normal microglia, with Benjamini-Hochberg multiple testing correction to control the false discovery rate.
The Findings
Glutamatergic Neurons Are Selectively Depleted
One of the most striking results: glutamatergic (excitatory) neurons dropped from ~34% of cells in normal tissue to ~30% in AD tissue. This might sound like a small shift, but at the scale of 60,000+ cells it's biologically meaningful and it's consistent with what the literature already tells us about the selective vulnerability of excitatory neurons in AD.
Alzheimer's Leaves a Clear Signature in Microglia
Microglia are the brain's resident immune cells, and they showed the most dramatic transcriptomic shifts between AD and normal tissue. The differential expression analysis revealed:
Upregulated in AD microglia:
-
MALAT1- a long non-coding RNA strongly linked to neuroinflammation -
FTH1- ferritin heavy chain, pointing to iron dysregulation -
B2M- beta-2 microglobulin, a known AD biomarker reflecting immune activation -
FOXP1- a transcription factor tied to microglial activation states
Downregulated in AD microglia:
-
MT-CO3,MT-CO1,MT-ATP6,MT-ND2- mitochondrial complex genes, suggesting impaired energy metabolism in AD-affected microglia
This pattern is consistent with what's described as disease-associated microglia (DAM) in the literature, a distinct activation state that emerges in neurodegeneration.
Disease Progression Captured Across Braak Stages
Cells from all three Braak stages were distributed across every cluster in the UMAP. This reflects that AD-associated transcriptomic changes are not confined to one cell type, they propagate across the whole cellular ecosystem as the disease progresses.
What I Learned
- Memory management matters. 60K+ cells × 30K+ genes is a big matrix. Working with sparse AnnData objects and being deliberate about which steps you checkpoint to disk makes a real difference.
- Cell type annotation is an art. The dataset came with pre-annotated cell types, but validating them against canonical marker genes (the dotplot step) is essential and satisfying when the biology confirms itself.
- Volcano plots are still one of the most readable ways to communicate differential expression. They give you significance and fold change in one glance.
The Code
Everything is in a fully annotated Jupyter Notebook. If you want to reproduce the analysis, download the H5AD file from CZ CELLxGENE and drop it in the data/ folder.
Farhan89082
/
alzheimers-scrna-analysis
Single-cell transcriptomic analysis of Alzheimer's disease using Scanpy - cell-type-specific gene expression in the human prefrontal cortex
🧠 Single-Cell Transcriptomic Analysis of Alzheimer's Disease
Cell-Type-Specific Gene Expression Changes in the Human Superior Frontal Gyrus
📌 Background
Alzheimer's disease (AD) is the most common form of dementia, affecting over 55 million people worldwide. While the hallmarks of AD — amyloid plaques and neurofibrillary tangles — are well established, the cell-type-specific molecular changes that drive neurodegeneration remain incompletely understood.
Single-nucleus RNA sequencing (snRNA-seq) enables transcriptomic profiling of individual cells in post-mortem human brain tissue, making it a powerful tool for dissecting the cellular basis of AD. This project analyses a publicly available snRNA-seq dataset of the human superior frontal gyrus from AD and cognitively normal donors, sourced from the CZ CELLxGENE Discover platform. The dataset contains 63,608 nuclei across 7 major brain cell types and three Braak stages (0, 2, and 6), enabling analysis of both disease status and progression severity.
🎯 Objectives
- Perform quality control, normalisation, and dimensionality…
If you're working with single-cell data or have questions about the pipeline, I'd love to hear from you in the comments. There's something fascinating about watching biology emerge from a matrix of gene counts.
Top comments (0)