Wangeci Ndovu

Posted on May 7

SIMPLE BEGINNER CRYPTO ETL PIPELINE.

#beginners #dataengineering #postgres #python

Crypto ETL Pipeline (Python + PostgreSQL)

Overview

This project is a simple ETL (Extract, Transform, Load) pipeline that retrieves real-time cryptocurrency market data from the CoinPaprika API, processes it using Python (Pandas), and loads it into a PostgreSQL database.

The pipeline is designed for learning and demonstrating core data engineering concepts such as API ingestion, data transformation, and relational database storage.

Architecture

Extract → Transform → Load

Extract
- Data is fetched from the CoinPaprika REST API
- Cryptocurrencies include: Bitcoin, Ethereum, XRP, Solana
Transform
- JSON response is normalized using Pandas
- Unnecessary fields are removed
- Columns are renamed for clarity
- Timestamp (ingested_at) is added
Load
- Cleaned data is inserted into PostgreSQL
- Table: crypto_market_data

Tech Stack

Python 3.x
Pandas
Requests
SQLAlchemy
Psycopg2
PostgreSQL
python-dotenv

Project Structure

Crypto_etl.py/
│
├── crypto_etl.py Main ETL script
├── requirements.txt Python dependencies
├── .env Environment variables (not pushed to GitHub)
├── .gitignore Ignored files (venv, env, etc.)
└── README.md Project documentation

Setup Instructions

1. Clone repository

bash
git clone https://github.com/your-username/crypto-etl.git
cd crypto-etl

2. Create virtual environment

python3 -m venv venv
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Create a .env file:

DB_USER=your_db_user
DB_PASSWORD=your_db_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=crypto_db

5. Run ETL pipeline

python crypto_etl.py

Database Schema

Table: crypto_market_data

Column	Description
coin_id	Unique coin identifier
coin_name	Name of cryptocurrency
coin_symbol	Symbol (BTC, ETH, etc.)
price_usd	Current price in USD
volume_24h_usd	24h trading volume
volume_24h_change	Volume change percentage
market_cap_usd	Market capitalization
ingested_at	Timestamp of ingestion

Features

Real-time crypto data ingestion
Automated transformation using Pandas
PostgreSQL data storage
Environment variable configuration
Modular and extensible ETL structure

Future Improvements

Add scheduling (cron/Airflow)
Implement upsert logic (avoid duplicates)
Dockerize the pipeline
Add logging and error handling
Build dashboard for visualization

Author

Wangeci Ndovu
Data Engineer

DEV Community