Overview provided by: Alicia Ahl
Github Link: https://github.com/MikeAA97/DevOpsAllStarsChallenge2025/tree/main/Day3
At a high-level this day's project was based around building out an AWS Data Lake and understanding the components that make it up. I've never used AWS Glue or Athena resources before so this was a great introduction to those tools.
AWS Resources Used:
- AWS Lambda: Handles the logic for querying the API, transforming the data, and pushing it to S3
- AWS S3: Where both the API Data was stored, and results from Athena Queries mentioned later in the section
- AWS Glue Database: A container to hold similar AWS Glue Tables / segment them for different projects or datasets.
- AWS Glue Table: Defines the location of the data, as well as the schema and format, so that other tools such as Athena know how to interpret the data.
- AWS Athena Workgroup: Container for Queries, as well as shared-settings. In this case I wanted to specify the output location instead of using the default.
- AWS Athena Queries: A few template queries to start interacting with the dataset.
Terraform Experience
I initially struggled with managing Athena's query result output location because I was creating a new Athena database and table instead of just referencing my Glue Catalog. I realized that the database creation was aimed at manipulating the default/primary workgroup’s output location, which resulted in a dummy database. Instead of this, I opted to create a dedicated workgroup for my named_queries and specified a custom output location within the workgroup resource.
resource "aws_athena_workgroup" "output" {
name = "output" #I'm bad with naming things
configuration {
result_configuration {
output_location = "s3://${aws_s3_bucket.day3.bucket}/athena-results"
}
}
}
resource "aws_athena_named_query" "test-query" {
name = "EXAMPLE-QUERY"
workgroup = aws_athena_workgroup.nba_analytics_output.id
database = aws_glue_catalog_database.day3.name
query = "SELECT * FROM ${aws_glue_catalog_database.day3.name}.${aws_glue_catalog_table.nba_players_table.name} limit 10;"
}
This gave me better control without needing a dummy database.
Key Takeaways
Terraform versus Boto3 is going to be an interesting battle as the architecture gets more complex. I'm already seeing how I have to refactor certain api_calls to fit into a terraform-centric model.
Oddly enough, I feel like seeing the resource references within the terraform resources helps me to envision how everything relates to one another better than looking at pure Python code. It forces me to give every resource scrutiny about it's place in the workflow.
Last Thoughts
I was able to re-use a lot of the logic from Days 1 and 2 to tackle this project, which saved a ton of time. I was able to re-direct that energy to understanding what I was trying to automate.
I'll probably delay Day 4 to begin trying out integrations w/ Splunk for these previous projects, as well as work on my diagramming skills. Hopefully going forward the entire process will be more streamlined, and visually appealing.
Top comments (0)