"Because waking up at 2am to check Bitcoin prices isn’t scalable."
The Idea Behind the Project
Crypto markets never sleep, and neither should your data pipeline. I built a fully automated ETL pipeline using Apache Airflow that extracts hourly snapshots of crypto data from CoinGecko, stores them in a PostgreSQL database (on Aiven), and sets the stage for real-time analysis, dashboards, and trading models.
Whether you're tracking Bitcoin moonshots or studying volume dips in altcoins, this pipeline's got your back.
What It Actually Does
Every hour, my pipeline:
Pulls real-time data from the CoinGecko API for 15 top
cryptocurrencies (BTC, ETH, SOL, etc.)Captures price, market cap, and trading volume
Stores the data in a time-series friendly PostgreSQL table: crypto.crypto_prices
Retries automatically on failure and logs each step in Airflow’s UI
No manual refresh. No scripts to rerun. Just clean, structured data—hour after hour.
Stack Breakdown
Tools
Apache Airflow DAG orchestration + scheduling
Python 3.12 Core logic and scripting
CoinGecko API Crypto market data
PostgreSQL (Aiven) Cloud-hosted database storage
psycopg2 PostgreSQL database connector
python-dotenv Secure secret management via .env
How I Built It
This project lives in my GitHub repo, and it’s fully reproducible:
Clone the repo
git clone https://github.com/KellyKiprop/Crypto-price-pipeline.git
cd crypto-etl-pipeline
Create a virtual environment & install dependencies
python -m venv venv
source venv/bin/activate # Or venv\Scripts\activate on Windows
pip install -r requirements.txt
Set up environment variables
Create a .env file in the root folder with your PostgreSQL config:
DB_NAME=defaultdb
DB_HOST=your-db-host.aivencloud.com
DB_USER=avnadmin
DB_PASSWORD=yourpassword
DB_PORT=17440
Initialize and start Airflow
airflow db init
airflow webserver --port 8080
airflow scheduler
Trigger the DAG from the UI
Visit localhost:8080, toggle coin_price_etl_dag, and trigger a run.
Testing the Results
Check the crypto.crypto_prices table in your database — you should see hourly updates like:
name symbol price market_cap total_volume timestamp
Bitcoin BTC 71,231 ... ... 2025-06-07T12:00:00Z
You can now plug this into:
Grafana dashboards
ML models for price prediction
Long-term trend analysis
Why This Matters
It’s a hands-on intro to Airflow and ETL pipelines
Crypto data is noisy and dynamic — perfect for learning real-time data workflows
You’ll walk away with a reusable pattern for any public API + DB project
About the Author
Kelly Kiprop
kipropkelly4@gmail.com
This project uses Airflow to extract hourly crypto data from CoinGecko and store it in PostgreSQL. It’s cloud-ready, beginner-friendly, and built for real-time insight.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.