Inspired by AWS Cookbook by John Culkin & Mike Zazon - Chapter 7: Big Data
My Journey into Data Analytics
While exploring AWS AI/ML services, I realized that artificial intelligence and machine learning are fundamentally built upon quality data foundations. This insight led me to step back and master the data analytics fundamentals first. What better way than to build a complete serverless pipeline?
In this post, I'll share how I created a scalable analytics solution using AWS S3, Athena, and QuickSight to analyze Premier League data.
Why Premier League data? As a passionate football enthusiast, I'm fascinated by the rich statistical narratives that unfold each season. Every match generates meaningful data points, from goals and assists to tactical formations and player performance metrics. This abundance of structured, real-world data makes football analytics an ideal playground for learning data engineering concepts while working with something I genuinely care about.
What we'll build:
- Serverless data storage with Amazon S3
- SQL querying with Amazon Athena
- Interactive dashboards with Amazon QuickSight
⚠️ Disclaimer: The Premier League data used in this project is completely fictional and for demonstration purposes only. If you see Manchester City with 150 points or Tottenham actually winning something, that's just my creative data generation at work! 😄 Please don't use this for your fantasy football decisions - you've been warned! For real Premier League data, check the official sources (and prepare for more realistic disappointment).
Architecture Overview
This serverless architecture eliminates infrastructure complexity while providing:
- Scalability: Automatic scaling without server management
- Cost-efficiency: Pay-per-query pricing model
- Speed: Query results in seconds
Implementation Highlights
1. S3 Data Lake Setup
I stored Premier League CSV files in S3, creating a scalable data foundation:
# Create bucket and upload data
aws s3api create-bucket --bucket premier-league-data-$(openssl rand -hex 3)
aws s3 cp data/ s3://your-bucket/raw-data/ --recursive
2. Athena SQL Querying
Created External Tables:
Athena's power lies in querying data directly from S3 without moving it. Here's how I created the tables:
-- Create standings table
CREATE EXTERNAL TABLE IF NOT EXISTS standings (
team_name STRING,
matches_played INT,
wins INT,
draws INT,
losses INT,
goals_for INT,
goals_against INT,
goal_difference INT,
points INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://your-bucket/raw-data/'
TBLPROPERTIES ('skip.header.line.count'='1');
-- Create match results table
CREATE EXTERNAL TABLE IF NOT EXISTS match_results (
team_name STRING,
result_type STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://your-bucket/raw-data/'
TBLPROPERTIES ('skip.header.line.count'='1');
Sample Analytics Queries:
Once tables were created, I ran insightful queries:
-- Top 6 teams analysis
SELECT team_name, points, goal_difference
FROM standings
ORDER BY points DESC
LIMIT 6;
-- Win percentage calculation
SELECT team_name,
ROUND((wins * 100.0 / matches_played), 2) as win_percentage
FROM standings
ORDER BY win_percentage DESC;
-- Verify data integrity
SELECT * FROM standings;
SELECT * FROM match_results;
Key Athena Benefits:
- No data movement required
- Standard SQL interface
- Pay only for data scanned ($5/TB)
- Results in seconds
3. QuickSight Dashboards
Built interactive visualizations, including:
- League standings table
- Points comparison charts
- Goal difference analysis
- Team performance metrics
Business Value for Management
QuickSight delivers immediate ROI through:
Decision Speed: Real-time dashboards eliminate waiting for IT reports
Cost Savings: $9/user vs $70+ for traditional BI tools like Tableau
Self-Service Analytics: Business users create their own insights without technical dependencies
Mobile Access: Executive dashboards available anywhere, anytime
Scalability: Handles 10 users or 10,000 users with the same architecture
Security: Enterprise-grade AWS security and compliance built in

ImageSource amazon.com QuickSight page
Management Benefits:
- Reduce reporting cycle from weeks to minutes
- Democratize data access across all departments
- Lower total cost of ownership by 60-80% vs traditional solutions
- Eliminate server maintenance and upgrade costs
Results & Insights
Cost Breakdown:
- S3 Storage: ~$0.05/month
- Athena Queries: ~$0.25/month
- QuickSight: $9/user/month
Total: ~$9.30/month for enterprise-grade analytics!
Key Learnings:
✅ Setup completed in under 2 hours
✅ Serverless = zero infrastructure management
✅ SQL familiarity accelerated development
⚠️ QuickSight permissions required for initial troubleshooting
Next Steps
This foundation opens doors to:
- Real-time data integration
- Machine learning predictions
- Advanced ETL pipelines with AWS Glue
Final Reflections
Starting with data fundamentals before diving into AI/ML proved invaluable. This serverless analytics pipeline demonstrates that powerful data solutions don't require complex infrastructure - just the right AWS services working together.
The S3 + Athena + QuickSight combination delivers enterprise-grade analytics at startup costs, making it perfect for both learning and production use cases.
Resources
- GitHub Repository: AWS-Analytics Project
- AWS Cookbook: Chapter 7 - Big Data
- AI-Powered BI Tool: Amazon Quick Sight
- Athena SQL: Amazon Athena
- Serverless S3: Amazon S3
Building your own data pipeline? Connect with me on LinkedIn I'd love to hear about your experience!







Top comments (0)