DEV Community

Wangeci Ndovu
Wangeci Ndovu

Posted on

SIMPLE BEGINNER CRYPTO ETL PIPELINE.

Crypto ETL Pipeline (Python + PostgreSQL)

Overview

This project is a simple ETL (Extract, Transform, Load) pipeline that retrieves real-time cryptocurrency market data from the CoinPaprika API, processes it using Python (Pandas), and loads it into a PostgreSQL database.

The pipeline is designed for learning and demonstrating core data engineering concepts such as API ingestion, data transformation, and relational database storage.


Architecture

Extract → Transform → Load

  1. Extract

    • Data is fetched from the CoinPaprika REST API
    • Cryptocurrencies include: Bitcoin, Ethereum, XRP, Solana
  2. Transform

    • JSON response is normalized using Pandas
    • Unnecessary fields are removed
    • Columns are renamed for clarity
    • Timestamp (ingested_at) is added
  3. Load

    • Cleaned data is inserted into PostgreSQL
    • Table: crypto_market_data

Tech Stack

  • Python 3.x
  • Pandas
  • Requests
  • SQLAlchemy
  • Psycopg2
  • PostgreSQL
  • python-dotenv

Project Structure

Crypto_etl.py/

├── crypto_etl.py Main ETL script
├── requirements.txt Python dependencies
├── .env Environment variables (not pushed to GitHub)
├── .gitignore Ignored files (venv, env, etc.)
└── README.md Project documentation


Setup Instructions

1. Clone repository

bash
git clone https://github.com/your-username/crypto-etl.git
cd crypto-etl
Enter fullscreen mode Exit fullscreen mode

2. Create virtual environment

python3 -m venv venv
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

3. Install dependencies

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

4. Configure environment variables

Create a .env file:

DB_USER=your_db_user
DB_PASSWORD=your_db_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=crypto_db
Enter fullscreen mode Exit fullscreen mode

5. Run ETL pipeline

python crypto_etl.py
Enter fullscreen mode Exit fullscreen mode

Database Schema

Table: crypto_market_data

Column Description
coin_id Unique coin identifier
coin_name Name of cryptocurrency
coin_symbol Symbol (BTC, ETH, etc.)
price_usd Current price in USD
volume_24h_usd 24h trading volume
volume_24h_change Volume change percentage
market_cap_usd Market capitalization
ingested_at Timestamp of ingestion

Features

  • Real-time crypto data ingestion
  • Automated transformation using Pandas
  • PostgreSQL data storage
  • Environment variable configuration
  • Modular and extensible ETL structure

Future Improvements

  • Add scheduling (cron/Airflow)
  • Implement upsert logic (avoid duplicates)
  • Dockerize the pipeline
  • Add logging and error handling
  • Build dashboard for visualization

Author


Wangeci Ndovu
Data Engineer

Top comments (0)