Uncovering the Ancient Origins of Obesity: Neanderthals' Surprising "Fat Factories" 125,000 Years Ago
Spoiler: This isn’t a history lesson — it’s a hands-on bioinformatics adventure. We’re going to use real genetic data to explore how Neanderthal DNA might influence modern human metabolism, including fat storage. You’ll learn to analyze ancient DNA, compare genomes, and identify obesity-linked variants — all in Python.
Let’s dig into the science behind the headlines.
🧬 Why This Matters
Recent studies (like those from the Max Planck Institute) show that some modern humans carry Neanderthal gene variants linked to increased fat storage. These “thrifty genes” helped Neanderthals survive Ice Age winters — but today, they may contribute to obesity.
We’ll use public genomic datasets to:
- Download Neanderthal and modern human genomes
- Identify key SNPs (genetic variants)
- Cross-reference with known obesity-related genes
- Visualize the results
No prior genomics experience? No problem.
🛠️ Tools We’ll Use
-
Python 3 (with
pandas,requests,matplotlib) - UCSC Genome Browser API (for genomic data)
- dbSNP & GWAS Catalog (for disease-linked variants)
- Jupyter Notebook (recommended)
Install dependencies:
pip install pandas requests matplotlib
Step 1: Fetch Neanderthal Genome Data
We’ll use the publicly available Altai Neanderthal genome (sequenced in 2013). We can access it via the UCSC Genome Browser’s API.
import requests
import pandas as pd
# Query UCSC for Neanderthal SNP data near the PPARG gene (key in fat regulation)
def fetch_genome_data(chromosome, start, end, genome="neandertal1"):
url = f"http://genome.ucsc.edu/cgi-bin/das/{genome}/dna"
params = {'segment': f'chr{chromosome}:{start},{end}'}
response = requests.get(url, params=params)
if response.status_code == 200:
return response.text
else:
print(f"Error: {response.status_code}")
return None
# Example: Get region around PPARG (chromosome 3, position ~12,400,000)
data = fetch_genome_data(3, 12400000, 12401000)
print(data[:500]) # Preview first 500 chars
🔍 Note: The UCSC DAS server may be slow. For this tutorial, we’ll simulate data if needed.
Step 2: Load Modern Human Variants (dbSNP)
We’ll compare Neanderthal DNA to known human SNPs. Let’s pull data from dbSNP using a simplified CSV.
# Simulate a small dataset of obesity-linked SNPs
obesity_snps = pd.DataFrame({
'rsID': ['rs1801282', 'rs3856806', 'rs4684847'],
'gene': ['PPARG', 'PPARG', 'BSX'],
'chromosome': [3, 3, 1],
'position': [12405680, 12407980, 20112345],
'effect_allele': ['C', 'T', 'A'],
'effect': ['increased fat storage', 'insulin resistance', 'appetite regulation']
})
print(obesity_snps)
Output:
rsID gene chromosome position effect_allele effect
0 rs1801282 PPARG 3 12405680 C increased fat storage
1 rs3856806 PPARG 3 12407980 T insulin resistance
2 rs4684847 BSX 1 20112345 A appetite regulation
Step 3: Simulate Neanderthal Genotype Matching
We don’t have direct SNP calls from the API, so let’s simulate what we’d do with real alignment data.
Assume we’ve aligned Neanderthal reads and found:
# Simulated Neanderthal genotype calls
neanderthal_calls = {
'rs1801282': 'C/C', # Homozygous for 'C' — the risk allele
'rs3856806': 'T/T',
'rs4684847': 'A/A'
}
# Add to our dataframe
obesity_snps['neanderthal_genotype'] = obesity_snps['rsID'].map(neanderthal_calls)
print(obesity_snps)
Now we see:
rsID gene chromosome position effect_allele effect neanderthal_genotype
0 rs1801282 PPARG 3 12405680 C increased fat storage C/C
1 rs3856806 PPARG 3 12407980 T insulin resistance T/T
2 rs4684847 BSX
---
☕ **Appreciative**
Top comments (0)