DEV Community

Cover image for Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration
Fave๐Ÿ˜Œโœจ
Fave๐Ÿ˜Œโœจ

Posted on

2

Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration

NBA Statistics Pipeline๐Ÿ€

๐Ÿš€ Introduction

This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.

This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).

๐Ÿ›  Tech Stack

  • Python (Data processing, API requests, logging)
  • AWS DynamoDB (NoSQL database for storing NBA stats)
  • AWS CloudWatch (Logging & monitoring)
  • Boto3 (AWS SDK for Python)
  • Docker (Containerization)
  • EC2 Instance (Compute environment for development)

๐ŸŽฏ Features

  • Fetches real-time NBA statistics from the SportsData API
  • Stores team stats in AWS DynamoDB
  • Structured logging with AWS CloudWatch
  • Error handling and logging with JSON structured logs
  • Uses environment variables for sensitive credentials
  • Implements batch writing for efficiency

๐Ÿ“ธ Snapshots

  • API Response Sample

Screenshot 2025-01-31 114651

  • DynamoDB Table Data

Screenshot 2025-01-31 115147
Screenshot 2025-01-31 115315
Screenshot 2025-01-31 115438
Screenshot 2025-01-31 115510

  • CloudWatch Logs (Structured logs for monitoring)

Screenshot 2025-01-31 121713
Screenshot 2025-01-31 121916

  • Terminal Output (Successful execution of the pipeline)

Screenshot 2025-01-31 122345
Screenshot 2025-01-31 122416
Screenshot 2025-01-31 122458

๐Ÿ— Project Architecture

โ””โ”€โ”€ nba-stats-pipeline
    โ”œโ”€โ”€ src
    โ”‚  โ”œโ”€โ”€ __init__.py
    โ”‚  โ”œโ”€โ”€ nba_stats.py
    โ”‚  โ”œโ”€โ”€ lambdafunction.py
    โ”œโ”€โ”€ requirements.txt      # Dependencies
    โ”‚   โ”œโ”€โ”€ .env              # Environment variables
    โ”‚   โ”œโ”€โ”€ Dockerfile        # Containerization setup (if applicable)
    โ”œโ”€โ”€ README.md             # Project documentation
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Step-by-Step Guide to Building the NBA Stats Pipeline

4๏ธโƒฃ Launch EC2 Instance and SSH Into It

ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

1๏ธโƒฃ Clone the Repository

git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline
Enter fullscreen mode Exit fullscreen mode

1๏ธโƒฃ Install Python3

Python3 is required to run the project.

sudo apt update
sudo apt install python3
Enter fullscreen mode Exit fullscreen mode

1๏ธโƒฃ Install Pip

On most systems, pip comes pre-installed with Python3. To verify, run:

pip3 --version
Enter fullscreen mode Exit fullscreen mode

If you don't have pip installed, use the following command:

sudo apt install python3-pip
Enter fullscreen mode Exit fullscreen mode

2๏ธโƒฃ Install Dependencies

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

3๏ธโƒฃ Set Up Environment Variables

Create a .env file with the following content:

SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
Enter fullscreen mode Exit fullscreen mode

4๏ธโƒฃ CD Into the Folder Containing the Pipeline

cd src
Enter fullscreen mode Exit fullscreen mode

4๏ธโƒฃ Run the Pipeline

python3 nba_stats.py
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š Sample Data Format

[
  {
    "TeamID": 1,
    "TeamName": "Los Angeles Lakers",
    "Wins": 25,
    "Losses": 15,
    "PointsPerGameFor": 112.5,
    "PointsPerGameAgainst": 108.3
  }
]
Enter fullscreen mode Exit fullscreen mode

๐Ÿ— Deployment (Optional: Dockerized Version)

To run this project inside a Docker container:

docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”ฅ Key Takeaways

  • AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring
  • DevOps Skills: Managed credentials, logging, and error handling efficiently
  • Cloud-Native Thinking: Designed a cloud-based ETL pipeline

๐Ÿ“Œ Next Steps

  • Implement Lambda Functions for automated execution
  • Deploy using AWS ECS or Kubernetes
  • Integrate with Grafana for real-time data visualization

๐Ÿ“ข Connect With Me

๐Ÿš€ LinkedIn | ๐Ÿฆ Twitter/X

Image of Datadog

How to Diagram Your Cloud Architecture

Cloud architecture diagrams provide critical visibility into the resources in your environment and how theyโ€™re connected. In our latest eBook, AWS Solution Architects Jason Mimick and James Wenzel walk through best practices on how to build effective and professional diagrams.

Download the Free eBook

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

๐Ÿ‘‹ Kindness is contagious

Please leave a โค๏ธ or a friendly comment on this post if you found it helpful!

Okay