ICHINO Kazuaki for AWS Community Builders

Posted on Jun 15 • Originally published at blog.kazzpapa3.com

How I Built a CTF Game with Kiro CLI

Note: My English proficiency is limited, so I relied on GenAI (Kiro CLI) to help translate this post.

On June 13, 2026, we held 【JAWS-UG Kobe #12】CTF Game Tournament as a hybrid event (in-person at the Chuo-ku Cultural Center in Kobe + online).

Nearly 40 participants competed in a CTF challenge built on CTFd and AWS, themed around the scenario: "Your access key has been leaked — now play CTF (Capture The Flag)."

This blog post is a behind-the-scenes look at how the game was created and implemented.

Background

I'd been spending a lot of time recently dealing with access key leaks — investigating incidents and helping with containment — and through that work, I started noticing consistent patterns in attacker behavior.

I thought maybe generative AI tools like Kiro already understood these as known techniques. Combine that with discovering CTFd through a different community, and I figured this could be a great way to help more people learn about these attack patterns. That was the original motivation.

So I thought — why not use Kiro CLI to build it in a vibe-coding style? When I got a mock version working, I posted this tweet:

Early Development

I kicked things off with the following prompt to Kiro CLI:

I want to create a CTF game environment using CTFd, themed around an access key leak incident.

The scenario should cover identifying the leaked key through containment to identifying attacker actions, where players analyze CloudTrail logs to find or combine flags and submit answers.

The initial output used Terraform to provision the AWS environment from the get-go.

A setup script generated JSON files mimicking CloudTrail logs and uploaded them to an S3 bucket, serving as the CloudTrail event history for each challenge.

The setup script was also defined to create the challenges, and the format for installing them into CTFd using ctfcli was already in place — it was essentially usable with almost no modifications.

Kiro CLI also generated the challenges themselves, and about 70% of the questions in the current version are ones Kiro CLI created, used nearly as-is.

I adopted them because they reflected real attack techniques I'd encountered during actual investigations — which tells me these are well-known attack patterns that generative AI already understands.

From the first prompt to having a fully playable game took about 3 hours, and I ran an internal trial at my company.

Improvements

Part 1 — Improving Scalability

The initial setup ran CTFd on a single EC2 instance with CTFd's default SQLite database.

This could bottleneck with many players, so I asked Kiro CLI for a performance assessment. Here's what it said:

Regarding SQLite's concurrent connection performance:

SQLite Characteristics

Multiple concurrent reads are possible, but writes lock the entire database (even in WAL mode, only one writer)

CTFd runs an INSERT on every flag submission, so concurrent submissions cause lock contention

Practical Guidelines

Participants Experience

~10 No issues

10–30 Mostly fine, occasional delays of a few hundred ms during simultaneous submissions

30–50 Noticeable delays on scoreboard display and flag submission

50+ External RDS recommended

Bottleneck Points

Flag submission: INSERT + validation. Highest load right after competition starts or hints are released

Scoreboard: SELECT aggregating all team scores. Gets heavier with more participants and competes with write locks

CTFd itself on Gunicorn (default 1 worker): Application-level concurrency is limited even before SQLite locks

Participants	Experience
~10	No issues
10–30	Mostly fine, occasional delays of a few hundred ms during simultaneous submissions
30–50	Noticeable delays on scoreboard display and flag submission
50+	External RDS recommended

This was the prototype of what became "EC2 mode" + "S3 mode" in the current architecture (described below).

When I told Kiro "I want it to scale easily and support serverless deployment," it suggested creating "Deploy modes" — "EC2 mode" and "Fargate mode" — and got it working in essentially one shot.

Switching between deploy modes was made as simple as defining a ctfd_mode variable in terraform.tfvars. Clean and easy.

This significantly improved scalability.

Part 2 — Adding Realism

Scalability was better, but I felt that downloading CloudTrail logs from S3 and investigating locally would just turn into a cat *.json | jq party.

I figured it'd be better to have actual logs in the AWS account. Both my own ideas and Kiro's suggestions led to the same conclusion: we'd need to actually execute the API calls.

In real incidents like this, it's extremely common for CloudTrail trails to not be configured, leaving you with only management events in Event History. (Basically, data events like S3 object modifications are often fundamentally untraceable.)

So I accepted that we didn't need anything as rich as querying with Athena on a configured trail — we'd just straightforwardly fire off API requests.

However, if all the account setup and attack-simulating API calls came from my own environment, they'd all share the same source IP address. So I ran them from different regions within the same AWS account to distribute the IP addresses.¹

This created two modes: "download logs from S3 and investigate locally" and "use CloudTrail Event History recorded in the account." Internally, this was implemented as Investigation mode.

Part 3 — Auto-Generating Participant Guide Docs

People in the security community may be familiar with CTF (Capture The Flag), but it's not something you hear much about in the JAWS (AWS user group) community — I first learned about it through a different community myself.

So I felt it was necessary to explain what "flags" are, what the game is asking you to find, and how to submit answers.

I aimed to templatize the documentation so it would auto-generate alongside the CTFd game environment.

Since this blog is built with Markdown + MkDocs, I knew a similar setup would be straightforward. I used S3 static website hosting to publish the participant guide.

Current Architecture

After all these improvements, here's what the current architecture looks like:

Define Deploy mode and Investigation mode in terraform.tfvars
CTFd runtime environment is provisioned based on the Deploy mode setting
Based on Investigation mode, either logs are generated or attack scripts execute real AWS API calls → challenges and answers are generated accordingly
Participant guide documentation is auto-generated

Challenge Overview

Here's an excerpt from the auto-generated participant guide — these are the currently available scenarios.

If you can already tell what each challenge is about from the category and hint alone, feel free to scroll past. (We might reuse the same challenges at a future event 🙏)

Wave 1: Access Key Leak (Stages 1–5)

#	Difficulty	Category	Hint
1	★☆☆	Recon	Read GuardDuty findings
2	★★☆	Investigation	CloudTrail log analysis
3	★★☆	Investigation	Track S3 operations
4	★★☆	Containment	Perform containment via AWS CLI
5	★★★	Persistence	Multi-region investigation

Wave 2: Lateral Movement & Deep Compromise (Stages 6–19)

#	Difficulty	Category	Hint
6	★★☆	Investigation	AssumeRole traces
7	★★☆	Investigation	Multi-region EC2 operations
8	★★☆	Forensics	Attacks on CloudTrail itself
9	★★☆	Network	Security group modifications
10	★★★	Persistence	SSM command contents
11	★★★	Investigation	DNS record changes
12	★★★	Persistence	Lambda + EventBridge
13	★★☆	Investigation	Access to Secrets Manager
14	★★★	Investigation	RDS snapshot sharing
15	★★☆	Persistence	Inline policies
16	★★☆	Persistence	Adding console access
17	★★★	Persistence	MFA device configuration
18	★★☆	Investigation	S3 bucket policy
19	★☆☆	Investigation	Traces of failed operations

Wave 3: Large-Scale Attack Blocked by SCP (Stages 20–23)

The attacker attempted an even larger-scale attack, but SCP (Service Control Policy) blocked everything.

Track these "attempted but thwarted attacks" and decipher the attacker's intent.

#	Difficulty	Category	Hint
20	★★★	Persistence	IAM role trust policy rewrite
21	★★☆	Investigation	Attempt to enable opt-in regions
22	★★★	Persistence	Auto-resurrecting cluster via ASG + ECS
23	★★☆	Investigation	SageMaker notebook launch attempt

Final Challenge (Stage 24)

#	Difficulty	Category	Hint
24	★★★	Forensics	Build the complete timeline

What's Next

The build process is mostly established, and running the event on June 13, 2026 surfaced several issues.

I'll be making improvements based on those findings, but since the repo contains both question-generation and attack scripts, it's currently managed as a private repository.

From the event survey, some participants solved everything on their own while others leveraged generative AI effectively.

This tells me the content is approachable for people with various skill sets — some will enjoy it, others will suffer through it. I'd like to find a way to avoid spoilers while making it public so more people can play.

If people fork it and contribute improvements, that would accelerate the mission to eliminate access key usage.

Until I figure out a good way to hide the answers and make it public, I'm happy to deliver it on request — so please reach out if you're interested.

Until next time.

2026-06-15 Addendum & Acknowledgments

suzryo's Blog Post & Feedback

After publishing this post, @suzryo — a participant who was in the lead until hitting a discrepancy between the question-generation environment and the actual questions (mentioned in the footnote) — wrote a detailed analysis and offered improvement suggestions.

In particular, anyone who participated in the game will get this: the glaring contradiction of the guy saying "I want to eliminate access keys" being the same person handing out access keys to game participants. That had been bugging me too, so I really appreciate the proposal addressing that mechanism.

Thank you for participating, writing a blazing-fast blog post, accepting the last-minute speaking slot when we pivoted to an impromptu LT session, AND providing feedback on top of all that.

Receiving feedback is truly invaluable.

This is a blog post in Japanese. / JAWS-UG Kobe #12: Joining the CTF with Kiro — How Prep Work and Local Aggregation of CloudTrail Delivery Logs Made the Difference

Kazuya's Blog Post & Feedback

Kazuya, another participant, also wrote a report.

The game was originally designed so participants could download from S3 and investigate locally or via AWS CLI, but could also participate through the Management Console.

However, the guidance was insufficient — especially regarding GUI-based approaches. The hybrid format made it harder to notice struggling participants, and I should have included mid-game walkthroughs. Definitely areas for improvement.

This is a blog post in Japanese. / I Got Absolutely Destroyed at the JAWS-Kobe CTF! (A Blog Post from Someone Who Couldn't Do It)

gengen's Blog Post & Feedback

gengen also wrote a participation report.

I appreciate that he picked up on the grudge packed into this game.

The difficulty of chasing logs is real — it gets genuinely overwhelming as volume grows. While AI-powered efficiency and operational support are becoming possible, what's technically feasible and what's permissible are different things. We still need consensus on how much AI should intervene in production systems and how much information we can share with it.

gengen has already declared "I'm becoming the guy who never forgives access keys," so my access-key-elimination campaign may have made some progress.

Thank you for participating and writing about it.

This is a blog post in Japanese. / Participating in 【JAWS-UG Kobe #12】CTF Game Tournament

Issues were found on both hardware and software fronts, so I'll incorporate those and update the CTF game.

When I get to run it again, I hope you'll look forward to seeing how the feedback has been put to use.

At JAWS-UG Kobe #12, there was an unexpected issue with this design: while waiting for a certain process to complete, the Lambda's IP address changed, causing a discrepancy with the final challenge's expected answers. ↩

DEV Community