NBA Statistics Pipeline๐
๐ Introduction
This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.
This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).
๐ Tech Stack
- Python (Data processing, API requests, logging)
- AWS DynamoDB (NoSQL database for storing NBA stats)
- AWS CloudWatch (Logging & monitoring)
- Boto3 (AWS SDK for Python)
- Docker (Containerization)
- EC2 Instance (Compute environment for development)
๐ฏ Features
- Fetches real-time NBA statistics from the SportsData API
- Stores team stats in AWS DynamoDB
- Structured logging with AWS CloudWatch
- Error handling and logging with JSON structured logs
- Uses environment variables for sensitive credentials
- Implements batch writing for efficiency
๐ธ Snapshots
- API Response Sample
- DynamoDB Table Data
- CloudWatch Logs (Structured logs for monitoring)
- Terminal Output (Successful execution of the pipeline)
๐ Project Architecture
โโโ nba-stats-pipeline
โโโ src
โ โโโ __init__.py
โ โโโ nba_stats.py
โ โโโ lambdafunction.py
โโโ requirements.txt # Dependencies
โ โโโ .env # Environment variables
โ โโโ Dockerfile # Containerization setup (if applicable)
โโโ README.md # Project documentation
๐ Step-by-Step Guide to Building the NBA Stats Pipeline
4๏ธโฃ Launch EC2 Instance and SSH Into It
ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com
1๏ธโฃ Clone the Repository
git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline
1๏ธโฃ Install Python3
Python3 is required to run the project.
sudo apt update
sudo apt install python3
1๏ธโฃ Install Pip
On most systems, pip comes pre-installed with Python3. To verify, run:
pip3 --version
If you don't have pip installed, use the following command:
sudo apt install python3-pip
2๏ธโฃ Install Dependencies
pip install -r requirements.txt
3๏ธโฃ Set Up Environment Variables
Create a .env
file with the following content:
SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
4๏ธโฃ CD Into the Folder Containing the Pipeline
cd src
4๏ธโฃ Run the Pipeline
python3 nba_stats.py
๐ Sample Data Format
[
{
"TeamID": 1,
"TeamName": "Los Angeles Lakers",
"Wins": 25,
"Losses": 15,
"PointsPerGameFor": 112.5,
"PointsPerGameAgainst": 108.3
}
]
๐ Deployment (Optional: Dockerized Version)
To run this project inside a Docker container:
docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline
๐ฅ Key Takeaways
- AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring
- DevOps Skills: Managed credentials, logging, and error handling efficiently
- Cloud-Native Thinking: Designed a cloud-based ETL pipeline
๐ Next Steps
- Implement Lambda Functions for automated execution
- Deploy using AWS ECS or Kubernetes
- Integrate with Grafana for real-time data visualization
Top comments (0)