DEV Community

Cover image for Building a Crypto ETL Pipeline with Apache Airflow and Astro CLI
Navas Herbert
Navas Herbert

Posted on

Building a Crypto ETL Pipeline with Apache Airflow and Astro CLI

In this comprehensive guide, we'll walk through building a complete cryptocurrency ETL (Extract, Transform, Load) pipeline using Apache Airflow orchestrated through Astronomer's Astro CLI. This project demonstrates how to create a robust data pipeline that extracts cryptocurrency data from APIs, transforms it, and loads it into a PostgreSQL database, all while leveraging containerization for consistent development and deployment.

What is Apache Airflow?

Apache Airflow is an open-source platform designed to programmatically author, schedule, and monitor workflows. It allows you to define workflows as Directed Acyclic Graphs (DAGs) of tasks, making it perfect for ETL processes where data flows through multiple stages of processing.

What is Docker?

Docker is a containerization platform that packages applications and their dependencies into lightweight, portable containers. Think of it as a virtual box that contains everything your application needs to run - the code, runtime, system tools, libraries, and settings

Prerequisites

Before starting, ensure you have:

  • Windows machine with administrative privileges
  • Docker installed and running
  • Visual Studio Code
  • Basic understanding of Python and SQL

Project Setup

Step 1: Project Initialization

First, create a dedicated folder for your project:

mkdir CryptoETL
cd CryptoETL
Enter fullscreen mode Exit fullscreen mode

Open the folder in Visual Studio Code by running:

code .
Enter fullscreen mode Exit fullscreen mode

Step 2: Installing Astro CLI

Image description

The Astro CLI is Astronomer's command-line tool that makes it easy to develop and deploy Airflow projects locally. Since we're on Windows, we'll use the Windows Package Manager (winget) for installation.
In your VS Code terminal, run:

winget install -e --id Astronomer.Astro
Enter fullscreen mode Exit fullscreen mode

Important: After installation, restart Visual Studio Code to ensure the Astro CLI is properly loaded and available in your terminal.

Step 3: Initializing the Astro Project

With the Astro CLI installed, initialize your Airflow project

astro dev init
Enter fullscreen mode Exit fullscreen mode

This command creates a complete Airflow development environment by:

  • Pulling the latest Astro Runtime (which includes Apache Airflow)
  • Creating necessary project structure and configuration files
  • Setting up Docker containers for local development
  • Initializing an empty Astro project in your current directory

Project Architecture

Our ETL pipeline consists of three main components:

  • Extract: Fetch cryptocurrency data from external APIs
  • Transform: Process and clean the raw data
  • Load: Store the processed data in PostgreSQL database

Building the DAG

Understanding DAGs

A Directed Acyclic Graph (DAG) in Airflow represents a workflow where:

  • Directed: Tasks have a specific order and direction
  • Acyclic: No circular dependencies (tasks can't loop back)
  • Graph: Visual representation of task relationships

The complete code for this Astro Airflow ETL pipeline is available on GitHub.

Docker Configuration

Docker Compose Setup

Image description

This configuration sets up:

  • PostgreSQL database for storing cryptocurrency data
  • Environment variables for database connection
  • Port mapping for external access
  • Persistent volume for data storage

Running the Project

Step 4: Starting the Development Environment

astro dev start
Enter fullscreen mode Exit fullscreen mode

This command:

  • Builds Docker containers based on your project configuration
  • Starts all necessary services (Airflow webserver, scheduler, database)
  • Makes the Airflow UI available at http://localhost:8080

Step 5: Accessing the Airflow UI

Once the containers are running, open your web browser and navigate to:

http://localhost:8080

Default credentials:

  • Username: admin
  • Password: admin

Configuration and Connections

Setting Up Airflow Connections

For your ETL pipeline to work properly, you need to configure connections in Airflow:

  1. Navigate to Admin > Connections in the Airflow UI
  2. Add PostgreSQL Connection:
  • Connection Id: postgres_default
  • Connection Type: Postgres
  • Host: postgres (Docker service name)
  • Schema: crypto_db
  • Login: airflow
  • Password: airflow
  • Port: 5432

3.Add API Connections (if using authenticated APIs):

Configure HTTP connections for your cryptocurrency APIs
Store API keys securely using Airflow Variables or Connections

Next Steps

Consider these enhancements for your pipeline:

  • Implement data quality monitoring
  • Add email notifications for task failures
  • Create data visualization dashboards
  • Implement automated testing for DAG logic

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.