In this comprehensive guide, we'll walk through building a complete cryptocurrency ETL (Extract, Transform, Load) pipeline using Apache Airflow orchestrated through Astronomer's Astro CLI. This project demonstrates how to create a robust data pipeline that extracts cryptocurrency data from APIs, transforms it, and loads it into a PostgreSQL database, all while leveraging containerization for consistent development and deployment.
What is Apache Airflow?
Apache Airflow is an open-source platform designed to programmatically author, schedule, and monitor workflows. It allows you to define workflows as Directed Acyclic Graphs (DAGs) of tasks, making it perfect for ETL processes where data flows through multiple stages of processing.
What is Docker?
Docker is a containerization platform that packages applications and their dependencies into lightweight, portable containers. Think of it as a virtual box that contains everything your application needs to run - the code, runtime, system tools, libraries, and settings
Prerequisites
Before starting, ensure you have:
- Windows machine with administrative privileges
- Docker installed and running
- Visual Studio Code
- Basic understanding of Python and SQL
Project Setup
Step 1: Project Initialization
First, create a dedicated folder for your project:
mkdir CryptoETL
cd CryptoETL
Open the folder in Visual Studio Code by running:
code .
Step 2: Installing Astro CLI
The Astro CLI is Astronomer's command-line tool that makes it easy to develop and deploy Airflow projects locally. Since we're on Windows, we'll use the Windows Package Manager (winget) for installation.
In your VS Code terminal, run:
winget install -e --id Astronomer.Astro
Important: After installation, restart Visual Studio Code to ensure the Astro CLI is properly loaded and available in your terminal.
Step 3: Initializing the Astro Project
With the Astro CLI installed, initialize your Airflow project
astro dev init
This command creates a complete Airflow development environment by:
- Pulling the latest Astro Runtime (which includes Apache Airflow)
- Creating necessary project structure and configuration files
- Setting up Docker containers for local development
- Initializing an empty Astro project in your current directory
Project Architecture
Our ETL pipeline consists of three main components:
- Extract: Fetch cryptocurrency data from external APIs
- Transform: Process and clean the raw data
- Load: Store the processed data in PostgreSQL database
Building the DAG
Understanding DAGs
A Directed Acyclic Graph (DAG) in Airflow represents a workflow where:
- Directed: Tasks have a specific order and direction
- Acyclic: No circular dependencies (tasks can't loop back)
- Graph: Visual representation of task relationships
The complete code for this Astro Airflow ETL pipeline is available on GitHub.
Docker Configuration
Docker Compose Setup
This configuration sets up:
- PostgreSQL database for storing cryptocurrency data
- Environment variables for database connection
- Port mapping for external access
- Persistent volume for data storage
Running the Project
Step 4: Starting the Development Environment
astro dev start
This command:
- Builds Docker containers based on your project configuration
- Starts all necessary services (Airflow webserver, scheduler, database)
- Makes the Airflow UI available at
http://localhost:8080
Step 5: Accessing the Airflow UI
Once the containers are running, open your web browser and navigate to:
http://localhost:8080
Default credentials:
- Username: admin
- Password: admin
Configuration and Connections
Setting Up Airflow Connections
For your ETL pipeline to work properly, you need to configure connections in Airflow:
- Navigate to Admin > Connections in the Airflow UI
- Add PostgreSQL Connection:
- Connection Id:
postgres_default
- Connection Type:
Postgres
- Host:
postgres
(Docker service name) - Schema:
crypto_db
- Login:
airflow
- Password:
airflow
- Port:
5432
3.Add API Connections (if using authenticated APIs):
Configure HTTP connections for your cryptocurrency APIs
Store API keys securely using Airflow Variables or Connections
Next Steps
Consider these enhancements for your pipeline:
- Implement data quality monitoring
- Add email notifications for task failures
- Create data visualization dashboards
- Implement automated testing for DAG logic
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.