<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: StiiWann</title>
    <description>The latest articles on DEV Community by StiiWann (@stiiwann_35eb8bb2cf8dc53e).</description>
    <link>https://dev.to/stiiwann_35eb8bb2cf8dc53e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3941032%2F00c211bf-b3b5-4a26-8e2e-9eec25965287.jpg</url>
      <title>DEV Community: StiiWann</title>
      <link>https://dev.to/stiiwann_35eb8bb2cf8dc53e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stiiwann_35eb8bb2cf8dc53e"/>
    <language>en</language>
    <item>
      <title>Fentanyl Poverty: Building a Big Data Pipeline to Map America's Overdose Epidemic</title>
      <dc:creator>StiiWann</dc:creator>
      <pubDate>Tue, 19 May 2026 20:53:49 +0000</pubDate>
      <link>https://dev.to/stiiwann_35eb8bb2cf8dc53e/fentanyl-x-poverty-building-a-big-data-pipeline-to-map-americas-overdose-epidemic-5dhm</link>
      <guid>https://dev.to/stiiwann_35eb8bb2cf8dc53e/fentanyl-x-poverty-building-a-big-data-pipeline-to-map-americas-overdose-epidemic-5dhm</guid>
      <description>&lt;p&gt;The United States is in the grip of an opioid crisis. Between 2019 and 2023, &lt;br&gt;
fentanyl-related overdose deaths skyrocketed — but the impact is not uniform &lt;br&gt;
across the country. Are the hardest-hit states also the poorest? We built a &lt;br&gt;
full Big Data pipeline to answer that question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data
&lt;/h2&gt;

&lt;p&gt;We combined two official U.S. government sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CDC VSRR&lt;/strong&gt; (Vital Statistics Rapid Release) — state-level fentanyl overdose 
deaths per 12-month rolling period, from 2015 to 2023&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;U.S. Census Bureau ACS 5-Year&lt;/strong&gt; — median household income, poverty rate, 
and unemployment rate for all 50 states + D.C.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;CDC API ──┐&lt;br&gt;
├── Apache Spark ── Elasticsearch ── Kibana Dashboard&lt;br&gt;
Census ───┘ │&lt;br&gt;
└── scikit-learn (ML)&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 — Ingestion
&lt;/h3&gt;

&lt;p&gt;Python scripts fetch both datasets via REST APIs and land them in a raw &lt;br&gt;
datalake (&lt;code&gt;data/raw/&lt;/code&gt;), with UTC timestamps for traceability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Spark Processing
&lt;/h3&gt;

&lt;p&gt;Apache Spark formats and combines both sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filters for fentanyl-specific death indicators&lt;/li&gt;
&lt;li&gt;Joins CDC deaths with Census socioeconomic data by state and year&lt;/li&gt;
&lt;li&gt;Computes a &lt;strong&gt;risk score&lt;/strong&gt;: 40% poverty + 30% unemployment + 30% inverse income&lt;/li&gt;
&lt;li&gt;Outputs Parquet files via PyArrow (Windows-compatible, no winutils needed)
### Step 3 — Machine Learning
Two scikit-learn models add predictive power:
&lt;strong&gt;Linear Regression&lt;/strong&gt; — predicts deaths from socioeconomic features.
Result: R²=0.066, showing that poverty alone doesn't explain deaths linearly.
&lt;strong&gt;K-Means Clustering (k=3)&lt;/strong&gt; — groups states into risk profiles:
| Cluster | States | Avg Deaths/Year |
|---------|--------|----------------|
| LOW_RISK | 21 states | ~415 |
| MEDIUM_RISK | 17 states | ~1,200 |
| HIGH_RISK | 11 states | ~3,800+ |
Ohio, Pennsylvania, West Virginia and California consistently appear in the 
HIGH_RISK cluster — driven by both high absolute populations and deep 
socioeconomic distress.
### Step 4 — Elasticsearch + Kibana
All 637 documents (3 indices) are indexed on &lt;strong&gt;Elastic Cloud Serverless&lt;/strong&gt;, 
with &lt;code&gt;geo_point&lt;/code&gt; coordinates for each state enabling map visualizations.
The Kibana dashboard includes:&lt;/li&gt;
&lt;li&gt;🗺️ &lt;strong&gt;Map&lt;/strong&gt; — deaths per state with bubble sizing&lt;/li&gt;
&lt;li&gt;📈 &lt;strong&gt;Timeline&lt;/strong&gt; — the 2019-2021 explosion visible at a glance&lt;/li&gt;
&lt;li&gt;💰 &lt;strong&gt;Scatter plot&lt;/strong&gt; — income vs. deaths (weak but visible inverse trend)&lt;/li&gt;
&lt;li&gt;🏷️ &lt;strong&gt;Risk table&lt;/strong&gt; — all 51 states ranked by risk score and cluster
## Key Finding
The correlation between poverty and fentanyl deaths is &lt;strong&gt;positive but weak&lt;/strong&gt; 
(Pearson r ≈ 0.04 for poverty rate, r ≈ 0.36 for unemployment). This suggests 
the epidemic crosses socioeconomic lines — but unemployment is a stronger signal 
than raw poverty. The K-Means clustering is more revealing: states with 
&lt;em&gt;combined&lt;/em&gt; economic distress AND large populations form the HIGH_RISK cluster.
## The Stack
| Component | Tool |
|-----------|------|
| Ingestion | Python + Requests |
| Processing | Apache Spark 3.5 + PyArrow |
| ML | scikit-learn (LinearRegression, KMeans) |
| Storage | Parquet (datalake) + Elasticsearch 8.13 |
| Visualization | Kibana on Elastic Cloud |
| Orchestration | Apache Airflow (daily DAG) |
| Version control | GitHub |
## Try It Yourself
The full pipeline is open source:
👉 &lt;a href="https://github.com/tristandaniel8/fentanyl-poverty-epidemic" rel="noopener noreferrer"&gt;github.com/tristandaniel8/fentanyl-poverty-epidemic&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
git clone https://github.com/tristandaniel8/fentanyl-poverty-epidemic
pip install -r requirements.txt
python pipeline.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>bigdata</category>
      <category>elasticsearch</category>
      <category>spark</category>
      <category>python</category>
    </item>
  </channel>
</rss>
