DEV Community

Ameh Mathias Ejeh
Ameh Mathias Ejeh

Posted on

2 1 1 1 1

Building an NBA Sport Data Lake Analytic using AWS Services

Overview

The NBA Sport Data Lake Analytic project is a cloud-native solution that builds a scalable data lake for NBA analytics. By leveraging AWS services, this project automates data ingestion, cataloging, and querying, enabling efficient storage and analysis of NBA-related data.

Architecture

The architecture of the project is designed to process and analyze NBA data efficiently. The main components are:

  • Amazon S3: Stores raw and processed data.
  • AWS Glue: Automates data cataloging and schema creation.
  • Amazon Athena: Enables SQL querying of the data stored in S3.

Architecture Diagram

Image description

Workflow

  • Data Ingestion: Fetch data from SportsData.io's NBA API.
  • Data Storage: Store the raw data in Amazon S3.
  • Data Cataloging: Use AWS Glue to create a database and table schema.
  • Data Querying: Query the data using Amazon Athena for analytics.

Prerequisites

Required Accounts and Tools

  • SportsData.io API Key: Sign up at SportsData.io to get access to the NBA API.
  • AWS Account: An active AWS account with permissions to use S3, Glue, and Athena.
  • Python Environment: Python 2.31.0 installed locally. A virtual environment for dependency management.

Permissions

Ensure the IAM user or role has the following AWS permissions:

  • S3: s3:CreateBucket, s3:PutObject, s3:DeleteBucket, s3:ListBucket
  • Glue: glue:CreateDatabase, glue:CreateTable, glue:DeleteDatabase, glue:DeleteTable
  • Athena: athena:StartQueryExecution, athena:GetQueryResults

Setup Guide

Step 1: Clone the Repository

git clone https://github.com/ameh0429/ameh0429-NBA-Sport-Data-Lake-Analytic.git
cd ameh0429-NBA-Sport-Data-Lake-Analytic
Enter fullscreen mode Exit fullscreen mode

Step 2: Install Dependencies

  • Create and activate a virtual environment:
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure Environment Variables

  • Create a .env file with your API key and endpoint:
echo "SPORTS_DATA_API_KEY=your_api_key" >> .env
echo "NBA_ENDPOINT=https://api.sportsdata.io/v3/nba/scores/json/Players" >> .env
Enter fullscreen mode Exit fullscreen mode

Step 4: Run the Data Lake Setup Script

  • In the CLI terminal, paste the setup_nba_data_lake.py script

Image description

  • Run the script
python setup_nba_data_lake.py
Enter fullscreen mode Exit fullscreen mode

The script performs the following actions:

  • Creates an S3 bucket named sports-analytics-data-lake-0429.
  • Uploads NBA player data to the raw-data folder.
  • Configures a Glue database and table.
  • Sets up Athena for querying

Image description

Step 5: Validate Setup

  • S3: Verify the bucket and data file in the AWS Management Console.

Image description

Image description

  • Athena: Run a test query:

Query 1

SELECT FirstName, LastName, Position, Team
FROM nba_players
WHERE Position = 'PG';
Enter fullscreen mode Exit fullscreen mode

The output

Image description
Query 2

SELECT PlayerID, FirstName, LastName, Team, Position
FROM nba_players
WHERE Team = 'LAL';
Enter fullscreen mode Exit fullscreen mode

The output

Image description

Cleanup

To delete all the resources created by the project, run the cleanup script:

python delete_resources.py
Enter fullscreen mode Exit fullscreen mode

This will:

  • Remove the S3 bucket and its contents.
  • Delete the Glue database and table.
  • Clean up Athena configurations.

Image of Datadog

Measure and Advance Your DevSecOps Maturity

In this white paper, we lay out a DevSecOps maturity model based on our experience helping thousands of organizations advance their DevSecOps practices. Learn the key competencies and practices across four distinct levels of maturity.

Get The White Paper

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more