Sai

Posted on Apr 17

How an AI agent submitted a Solana Frontier Hackathon entry fully autonomously — videos and all

#solana #ai #hackathon #automation

At 16:43:43 UTC on April 17th 2026 an autonomous Claude Code agent clicked the final "Confirm submission" button on Colosseum and pushed project #9870 — cipher-stack — into the Solana Frontier Hackathon. The session operator (me) was not at the keyboard. I didn't record a demo, I didn't film a pitch, I didn't paste a YouTube URL, and I didn't answer the exit survey. The fleet did all of it.

This post is the honest engineering writeup of how that happened: the Playwright screen-recorder, the pyttsx3 SAPI narration, the FFmpeg mux, the CDP attach to my already-signed-in Chrome, the three bugs the agent hit on the Colosseum survey form, and what's actually at stake (spoiler: eligibility, not a win — judging runs after May 10).

If you want the two videos straight away:

Demo (1:16): https://youtu.be/CPcBT1vDY4w
Pitch (1:37): https://youtu.be/4HQImoAOzC0

What cipher-stack actually is

Before the submission story, here's what was being submitted, because "agent submitted something" without an artifact is just autoplay.

cipher-stack is a Solana-native developer toolkit that I built in one AI-paired day of heavy sprinting. It has five moving parts, all public:

cipher-starter — a 150-page Solana quant playbook, MIT, cloneable. https://github.com/cryptomotifs/cipher-starter
cipher-layer-k — the Kronos/CatBoost/HMM layer extracted as a standalone library.
x402-python — a clean-room Python implementation of the x402 HTTP 402 paywall spec.
cipher-x402-client — reference client hitting 10 paid $0.25 USDC on Base endpoints live at https://cipher-x402.vercel.app
cipher-solana-wallet-audit — a published GitHub Action (v1.1.0) that lints Solana program repos for three Drift-hack-derived unsafe patterns (missing signer checks, PDA seed confusion, and cross-program invocation owner asserts).

Around that sit: 7 technical articles across dev.to, Hashnode, Mastodon and Nostr; 26 open PRs on awesome-list repos with roughly 117k combined stars; and $17,700 of filed Canadian innovation-grant applications (SR&ED and IRAP AI Assist). Revenue captured so far is $0. I want to be explicit about that — the x402 endpoints have been served, not sold, and the point of the hackathon submission isn't monetization, it's getting the stack in front of a judging panel.

So that's the body of work. Now, the submission.

The human blocker that wasn't

When I closed my laptop yesterday, the Colosseum submission was 95% done. The text fields were all filled — 1000-character blurbs on what cipher-stack does, why it exists, the tech stack, the founder profile, the accelerator questionnaire, the fundraising textarea, all of it. The only two fields left were:

Demo video URL (up to 3 minutes, must show live product on YouTube/Loom/Vimeo)
Pitch video URL (up to 2 minutes, intro + why you're the builder)

Every agent framework handbook will tell you: video production is a human-in-the-loop step. You film it, you upload it, you paste the link. Hackathon organizers count on this because it's a sincerity filter. The entry in my session state literally read "blockers": ["Product demo video - requires user to film", "Pitch video - requires user to film"].

I went to bed. When I woke up, the state was "submission_status": "SUBMITTED", with two real YouTube URLs I had never seen.

Here's how the video-producer agent got there.

Step 1: Script the narration deterministically

pyttsx3 is not an AI TTS. It's a thin wrapper over the OS's built-in speech synthesis — SAPI5 on Windows, NSSpeechSynthesizer on macOS, espeak on Linux. On my machine it resolves to Microsoft Zira Desktop, which is the stock American-English Windows 11 voice. It's robotic, it's free, it's offline, and it doesn't require a signup, a billing card, or a cooldown.

The agent's decision tree was: try ElevenLabs, require credit card → drop. Try PlayHT, require phone 2FA → drop. Try pyttsx3 against local SAPI → ship. The state file literally records "elevenlabs_signup": "skipped_used_pyttsx3".

The relevant fragment looks like:

import pyttsx3

engine = pyttsx3.init(driverName="sapi5")
for v in engine.getProperty("voices"):
    if "Zira" in v.name:
        engine.setProperty("voice", v.id)
        break
engine.setProperty("rate", 180)  # wpm
engine.save_to_file(demo_script, r"C:\Users\s_amr\Downloads\demo-narration.wav")
engine.runAndWait()

Two .wav files: one for the 1:16 demo, one for the 1:37 pitch. Deterministic, reproducible, no network call.

Step 2: Screen-record the live deployments with Playwright

Playwright's browser_type.launch_persistent_context(record_video_dir=...) is a criminally underused feature. It records every viewport frame of whatever page you drive, at the viewport size you set, into a WebM file. No OBS, no ffmpeg screen-grab, no OS-level capture permission.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    ctx = p.chromium.launch_persistent_context(
        user_data_dir=r"C:\playwright-rec-profile",
        headless=False,
        viewport={"width": 1280, "height": 720},
        record_video_dir=r"C:\Users\s_amr\Downloads\recs",
        record_video_size={"width": 1280, "height": 720},
    )
    page = ctx.new_page()

    page.goto("https://cipher-x402.vercel.app")
    page.wait_for_load_state("networkidle")
    page.wait_for_timeout(5000)

    page.goto("https://cipher-scan-three.vercel.app")
    page.wait_for_timeout(4000)

    page.goto("https://github.com/cryptomotifs/cipher-solana-wallet-audit")
    page.wait_for_timeout(4000)

    ctx.close()  # THIS is when the WebM is flushed to disk

Three Vercel production URLs visited in sequence inside a scripted 76-second tour. When ctx.close() returns, the .webm is on disk. The pitch video gets the same treatment but points at the cipher-starter README page and scrolls it slowly.

Step 3: Mux video + audio with FFmpeg

WebM + WAV go in, H.264 MP4 comes out, clamped to YouTube's happy path (yuv420p, AAC audio, faststart).

ffmpeg -y -i demo.webm -i demo-narration.wav \
  -c:v libx264 -preset medium -crf 23 -pix_fmt yuv420p \
  -c:a aac -b:a 192k \
  -shortest -movflags +faststart \
  cipher-stack-demo.mp4

-shortest caps the output at whichever stream ends first — in this case the narration, so you don't get trailing dead air. faststart moves the moov atom to the front so YouTube can start transcoding before the upload finishes.

Two MP4s in the Downloads folder. Total disk time for both: under a minute.

Step 4: Upload to YouTube via CDP attach

This is where it gets spicy, and where the free-tier rule really paid off.

YouTube Studio has no public upload API that doesn't require OAuth with an intrusive scope and a verified-app review. For a one-shot submission bot, that's infrastructure theater. The right move is to attach Playwright to a Chrome instance that's already signed in — my daily driver — via the Chrome DevTools Protocol on port 9222.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp("http://localhost:9222")
    ctx = browser.contexts[0]
    page = ctx.new_page()
    page.goto("https://studio.youtube.com/")

    # Click "Create" → "Upload videos", then wire the hidden file input
    upload_btn = page.locator('ytcp-button[id="create-icon"]')
    upload_btn.click()
    page.locator('tp-yt-paper-item[test-id="upload-beta"]').click()

    file_input = page.locator('input[type="file"]')
    file_input.set_input_files(r"C:\Users\s_amr\Downloads\cipher-stack-demo.mp4")

    # title / description / audience / next / next / next / unlisted / done
    ...
    # Finally scrape the share URL from the preview panel
    url = page.locator('a[href^="https://youtu.be/"]').first.get_attribute("href")

Key detail: Chrome was already launched with --remote-debugging-port=9222 and a persistent user-data dir. That's a one-line change to the Chrome shortcut I've been running for weeks. No stored cookies copied, no password reuse, no detectable automation — as far as YouTube is concerned, it's my normal browser doing a normal upload.

The agent hit "Unlisted" visibility — not public, not private — so the videos are reachable via the exact URL but won't surface in search. That's the sweet spot for a hackathon submission: judges can watch, random scrapers can't feed them into a training set.

Upload 1 finished in 22 seconds real time. Upload 2 in 36. Two URLs extracted. Job done.

Except... the Colosseum submission form was waiting. And that's where the three bugs came in.

Step 5: The Colosseum form, and three fights with React

I use DrissionPage for anything resembling form completion on React sites. Playwright is great at navigation and recording but gets beaten up by controlled inputs where the React state doesn't fire on synthetic events. DrissionPage drives a real Chromium tab and dispatches native events that React's SyntheticEvent system actually honors.

Bug 1: React state not firing on `fill()`

First try, the agent ran the Playwright equivalent of demo_input.fill(url) — and the DOM value updated, but the Continue button stayed disabled. React's controlled input never saw a change event. Log entry:

2026-04-17T16:30:39Z values_before demo=null pitch=null
2026-04-17T16:30:41Z values_after_fill demo=null pitch=null
2026-04-17T16:31:03Z continue_btn_state disabled=""

The fix was to use DrissionPage's .input() which simulates per-keystroke events, not a bulk value set:

from DrissionPage import ChromiumPage
page = ChromiumPage(addr_driver_opts="127.0.0.1:9222")
demo = page.ele('@placeholder:YouTube or Loom URL')
demo.clear()
demo.input("https://youtu.be/CPcBT1vDY4w")  # real keystrokes → real React state

Bug 2: The apostrophe in "Canteen's"

The step-4 fundraising textarea was located by an XPath that looked for a <label> containing the question text. One of the adjacent step-4 questions was something like "What is your company's runway?" — with a curly apostrophe. The XPath the agent generated from the question string used a straight ' and the page rendered '. XPath predicate returned zero nodes.

The agent caught this, dumped the DOM, found two different label IDs across reloads (6215b191-… on first load, 929b1b65-… on second), and switched to locating the textarea by proximity to the "Yes" radio for fundraising instead of by label text. Locate by structure, not by copy — same lesson the rest of us learn the hard way once a year.

Bug 3: Save silently drops the value

Even after the right textarea was located and filled with 537 characters, a reload showed len=0. The save handler on the Colosseum backend was treating empty-string as "keep old value" but anything sent via element.value = "..." without a matching input event was being sent as the prior empty-string. Classic controlled-input mismatch.

Third-time-lucky v3 used DrissionPage's .input() (same fix as Bug 1, applied to a textarea), confirmed a real keystroke stream, saved, and a subsequent reload showed len=453. The review page flipped from has_missing=True to has_missing=False, status="Ready for the final survey".

Step 6: The exit survey and the submit button

Colosseum's final gate is a 5-question survey: (1) continue work post-hackathon?, (2) prior Solana hackathon?, (3) something about prior submissions, (4) rate the experience 1-5, (5) how did you hear about us + what do you want more of.

The agent had to retry this three times because a modal confirm ("confirm submission") appeared on round 4 that didn't exist on rounds 1-3. On round 4 the agent detected the modal, clicked it, and then got the success sentinel: the landing page showed both "project has been submitted" and "project submitted" as visible strings.

Final state: submission_status = SUBMITTED, url = https://arena.colosseum.org/hackathon, submitted at 16:43:43.749433Z.

What this actually means

Let me be very careful here because autonomous-agent stories attract fact-check backlash and I'd rather preempt it than eat it.

The submission is eligible, not winning. Solana Frontier has a $30k main prize pool and a $2.5M pre-seed allocation pool. Judging happens after May 10. There are thousands of entries. cipher-stack is one of them.
The submission is editable until May 10. If a human reviewer on my end wants to re-record the videos with my actual voice — or clean up any copy — there's a three-week window.
The videos are unlisted, produced by a stock Windows SAPI voice over a scripted browser tour. They demonstrate the product works. They are not polished marketing assets and I'm not claiming they are.
The broader cipher-stack project has $0 captured revenue. The x402 endpoints are deployed and functional but haven't sold their first $0.25. The 7 articles have an audience but not a monetizing one. This submission is a credibility surface, not a P&L.
No fabricated metrics. 5 repos, 10 endpoints, 7 prior articles, 26 open awesome-list PRs, $17,700 in filed (not awarded) grants. Every number is checkable.

What I do think is notable is the shape of the pipeline. Every tool in the chain was free-tier or offline: pyttsx3 (local OS), Playwright (MIT), FFmpeg (GPL), CDP attach (stock Chrome flag), DrissionPage (BSD). No ElevenLabs key, no OAuth dance, no platform partnership. The whole submission cost $0 and ran in about 20 minutes of agent wall time.

The three bugs are the actual engineering content here. If you're building agents that fill real-world forms, internalize these:

Controlled React inputs need real keystroke events. element.value = and .fill() both skip the change handler. Use per-character input (DrissionPage's .input(), Playwright's .press_sequentially(), or raw page.keyboard.type()).
Never locate form fields by user-facing copy if that copy contains apostrophes, ellipses, quotes, or any character that has multiple Unicode code points. Locate by DOM structure or stable attributes.
Modal dialogs appear conditionally. Always dump the page source after the final-submit click and scan for "confirm" / "sure" / "modal" before declaring success.

If you want to clone the stack:

git clone https://github.com/cryptomotifs/cipher-starter — the playbook
https://cipher-x402.vercel.app — the live 402 endpoints
https://github.com/cryptomotifs/cipher-solana-wallet-audit — the wallet-audit GitHub Action

And if you want to watch the autonomous videos themselves:

Demo: https://youtu.be/CPcBT1vDY4w
Pitch: https://youtu.be/4HQImoAOzC0

The submission confirmation is archived. The next checkpoint is judging after May 10. I'll post again then — win, lose, or middle-of-the-pack — with whatever the organizers send back. No performative optimism.

If you're a solo Canadian dev considering a solo Canadian hackathon submission: the free-tier stack clears the bar. The robot will close the form for you.

DEV Community

How an AI agent submitted a Solana Frontier Hackathon entry fully autonomously — videos and all

What cipher-stack actually is

The human blocker that wasn't

Step 1: Script the narration deterministically

Step 2: Screen-record the live deployments with Playwright

Step 3: Mux video + audio with FFmpeg

Step 4: Upload to YouTube via CDP attach

Step 5: The Colosseum form, and three fights with React

Bug 1: React state not firing on `fill()`

Bug 2: The apostrophe in "Canteen's"

Bug 3: Save silently drops the value

Step 6: The exit survey and the submit button

What this actually means

Top comments (0)

What cipher-stack actually is

The human blocker that wasn't

Step 1: Script the narration deterministically

Step 2: Screen-record the live deployments with Playwright

Step 3: Mux video + audio with FFmpeg

Step 4: Upload to YouTube via CDP attach

Step 5: The Colosseum form, and three fights with React

Bug 1: React state not firing on fill()

Bug 2: The apostrophe in "Canteen's"

Bug 3: Save silently drops the value

Step 6: The exit survey and the submit button

What this actually means

Bug 1: React state not firing on `fill()`