Fave😌✨

Posted on Jan 31

Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration

NBA Statistics Pipeline🏀

🚀 Introduction

This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.

This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).

🛠 Tech Stack

Python (Data processing, API requests, logging)
AWS DynamoDB (NoSQL database for storing NBA stats)
AWS CloudWatch (Logging & monitoring)
Boto3 (AWS SDK for Python)
Docker (Containerization)
EC2 Instance (Compute environment for development)

🎯 Features

Fetches real-time NBA statistics from the SportsData API
Stores team stats in AWS DynamoDB
Structured logging with AWS CloudWatch
Error handling and logging with JSON structured logs
Uses environment variables for sensitive credentials
Implements batch writing for efficiency

📸 Snapshots

API Response Sample

DynamoDB Table Data

CloudWatch Logs (Structured logs for monitoring)

Terminal Output (Successful execution of the pipeline)

🏗 Project Architecture

└── nba-stats-pipeline
    ├── src
    │  ├── __init__.py
    │  ├── nba_stats.py
    │  ├── lambdafunction.py
    ├── requirements.txt      # Dependencies
    │   ├── .env              # Environment variables
    │   ├── Dockerfile        # Containerization setup (if applicable)
    ├── README.md             # Project documentation

🚀 Step-by-Step Guide to Building the NBA Stats Pipeline

4️⃣ Launch EC2 Instance and SSH Into It

ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com

1️⃣ Clone the Repository

git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline

1️⃣ Install Python3

Python3 is required to run the project.

sudo apt update
sudo apt install python3

1️⃣ Install Pip

On most systems, pip comes pre-installed with Python3. To verify, run:

pip3 --version

If you don't have pip installed, use the following command:

sudo apt install python3-pip

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Set Up Environment Variables

Create a .env file with the following content:

SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1

4️⃣ CD Into the Folder Containing the Pipeline

cd src

4️⃣ Run the Pipeline

python3 nba_stats.py

📊 Sample Data Format

[
  {
    "TeamID": 1,
    "TeamName": "Los Angeles Lakers",
    "Wins": 25,
    "Losses": 15,
    "PointsPerGameFor": 112.5,
    "PointsPerGameAgainst": 108.3
  }
]

🏗 Deployment (Optional: Dockerized Version)

To run this project inside a Docker container:

docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline

🔥 Key Takeaways

AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring
DevOps Skills: Managed credentials, logging, and error handling efficiently
Cloud-Native Thinking: Designed a cloud-based ETL pipeline

📌 Next Steps

Implement Lambda Functions for automated execution
Deploy using AWS ECS or Kubernetes
Integrate with Grafana for real-time data visualization

📢 Connect With Me

🚀 LinkedIn | 🐦 Twitter/X

Top comments (1)

Anthony Uketui • Aug 26

Beautiful article ✨