DEV Community

Cover image for Building a Food Price & Inflation Analysis Pipeline in Kenya (2006–2024)
Nick Kahihu
Nick Kahihu

Posted on

Building a Food Price & Inflation Analysis Pipeline in Kenya (2006–2024)

Kenyan households face soaring food prices and inflation. By building a robust ETL pipeline, dimensional model and visual dashboards, I analyzed how inflation impacts staple food prices across Kenyan markets from 2006 to 2024.

Why This Project Matters

  • Food consistently makes up 30–40% of Kenya’s consumer basket, driving inflation volatility.  
  • By analyzing monthly market data, we uncover inflation’s impacts on food security and policy effectiveness.

Project Overview

Full project code on Github

Tech stack:

  • Python
  • Pandas
  • PostgreSQL (star schema)
  • Apache Airflow
  • Grafana

Data sources:

  1. Food prices: HDX CSV from WFP/Humanitarian Data Exchange
  2. Inflation: World Bank CPI API for Kenya (2006–2024)

Star Schema:

The cleaned data is structured using a star schema to enable efficient querying and dashboarding. At its center is the fact_prices table, which records transactional price data and connects to dimension tables for time, market and commodity information.

Star schema for the food prices dataset

ETL Pipeline Overview

a. Extraction: Collect CSV (food prices, with daily / monthly
sampling) and JSON from World Bank (CPI series).

Extraction image for food prices

Extraction image for CPI data

b. Transformation: Clean dates, normalize numeric types, drop nulls
and dedupe.

Extraction image for food prices

c. Schema model: Build dimension and fact tables using Python + psycopg2.

Schema model for food prices

d. Automation: Used Airflow (daily runs) with DAGs etl_food_prices.py and etl_cpi.py.

e. Loading: Tables are loaded into capstone schema in PostgreSQL.

Loading image for the fact and dimension tables

Visualizing in Grafana

By connecting Grafana to PostgreSQL, I crafted multiple dashboards using SQL:

a. Yearly CPI vs Food Prices (2006–2024)

SELECT
  TO_TIMESTAMP(d.year::TEXT,'YYYY') AS time,
  ROUND(AVG(f.price::NUMERIC),2) AS avg_food_price,
  c.cpi::NUMERIC AS cpi
FROM capstone.fact_food_prices f
JOIN capstone.date_dim d ON f.date_key = d.date_key
JOIN capstone.fact_cpi c ON c.year = d.year
WHERE d.year::INT BETWEEN 2006 AND 2024
GROUP BY d.year, c.cpi
ORDER BY d.year;
Enter fullscreen mode Exit fullscreen mode

b. Monthly Price Seasonality (All Years)

SELECT
  d.month,
  ROUND(AVG(f.price::NUMERIC),2) AS avg_price
FROM capstone.fact_food_prices f
JOIN capstone.date_dim d ON f.date_key = d.date_key
GROUP BY d.month
ORDER BY d.month::INT;
Enter fullscreen mode Exit fullscreen mode

c. Top 10 Most Expensive Commodities (2006–2024)

SELECT
  c.commodity,
  ROUND(AVG(f.price::NUMERIC),2) AS avg_price
FROM capstone.fact_food_prices f
JOIN capstone.commodity_dim c ON f.commodity_id = c.commodity_id
JOIN capstone.date_dim d ON f.date_key = d.date_key
WHERE d.year::INT BETWEEN 2006 AND 2024
GROUP BY c.commodity
ORDER BY avg_price DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

d. Market Heatmap (requires latitude/longitude)

SELECT
  m.market, m.latitude, m.longitude,
  ROUND(AVG(f.price::NUMERIC),2) AS avg_price
FROM capstone.fact_food_prices f
JOIN capstone.market_dim m ON f.market_id = m.market_id
JOIN capstone.date_dim d ON f.date_key = d.date_key
WHERE d.year::INT BETWEEN 2006 AND 2024
GROUP BY m.market, m.latitude, m.longitude
HAVING COUNT(*) > 10;
Enter fullscreen mode Exit fullscreen mode

Dynamic Filtering in Grafana

You can enhance interactivity using:

  1. Variable placeholders ($commodity, $market), passed into WHERE clauses.
  2. Grafana $__timeFrom() / $__timeTo() to use dashboard time-range picker (e.g. 2008–2022).

Example:

WHERE c.commodity = '$commodity'
  AND TO_TIMESTAMP(d.year || '-' || d.month || '-01','YYYY-MM-DD')
      BETWEEN $__timeFrom() AND $__timeTo()
Enter fullscreen mode Exit fullscreen mode

Grafana Dashboard

Dashboard Image

Insights & Observations

  • Food prices ,notably maize, legumes and dairy, steadily rose, especially around crises in 2008 and 2020.
  • CPI mirrored food price trends but was smoothed, showing lagged effects.
  • Seasonal cycles were evident: vegetable prices spiked July–September across years.
  • Nairobi vs Mombasa vs Kisumu market differences stood out in the geographic heatmap.

Impact

  1. Policy relevance: Local inflation trends inform policymakers and NGOs targeting vulnerable regions.
  2. Data-driven planning: Seasonal insights assist supply chain managers and food security planners.
  3. Extendable: This framework can integrate price forecasts or poverty impact analysis.

Post Insights & Next Steps

  • I used Airflow to operationalize daily data loads and schema tables.

Future enhancements:

  • Build forecasting models.
  • Add more granular seasonal dashboard filtering

Conclusion

This project showcases how thoughtful data modeling can transform raw information into actionable insights. By combining clean architecture with real-world relevance, we not only improve technical workflows but also create tools that can support smarter decisions in critical sectors.

Top comments (0)