Our Experience from the FCA AI Supercharged Sandbox: Infrastructure, Iteration, and Microsoft Aspire

#ai #llm #agents #aspire

The Financial Conduct Authority have announced that applications for cohort 2 of the FCA AI Supercharged Sandbox are now open! As a grateful participant of the first cohort I can’t recommend it enough to anyone building serious AI for UK Financial Services.

Picking up from a previous post I wrote for Serene, I wanted to dive a little more into our experience and learnings on how we got the most out of the time in the programme.

Into the Sandbox

The programme ran from September 2025 to January 2026 and the structured environment provided by NayaOne gave us an incredible stack of toys to play with in Amazon Web Services (AWS) such as GPU-enabled infrastructure backed by the power of NVIDIA's AI Enterprise platform.

There were high quality datasets with support from firms like bigspark, and many expert mentors from regulated firms that brought genuinely different lenses on architecture, regulation, delivery, and scale.

We were able to test whether the architecture we were building held up under real operational constraints.

Looking to Cohort 2, the FCA has said it is particularly interested in agentic AI applications for payment services, compliance and customer interaction or servicing. That's a direct signal about where they see the most interesting and most challenging problems. If that's the space you're working in, it’s a no brainer!

The first few weeks

We spent the initial weeks of the programme familiarising ourselves with the sandbox layout, the individual pieces of technology and understanding the best way to pivot our current Azure native stack to AWS. During this time, we had some onboarding calls with the amazing FCA team (thanks Séamus Merrin!), and started requesting sessions with the mentors who had volunteered their precious time.

The Sandbox gave us access to GPU compute on reserved AWS EC2 instances. The sessions were time-boxed. You reserved your window in advance, you had a generous but fixed budget, and you used it well or you didn't.

You have a development VM instance running Linux, where your code can run and the session persists for the duration of the sandbox. With the GPU instances, you lose any deployments the moment your session window ends.

It became quickly apparent that there was no room to waste the first half of a session wrestling with infrastructure.

We were spending the best part of 30 minutes installing docker, the NVIDIA Container Toolkit, verifying GPU visibility and standing up different useful toolsets for our stack like Qdrant and pulling in different models for our use cases.

By the time everything was running and verified, a meaningful chunk of the window was already gone.

I also considered "Last night, Darth Vader came down from Planet Vulcan and told me that if I didn't automate my GPU setup, he'd melt my brain."

Enter Microsoft Aspire

We were already using Aspire as our application host, it’s a default for most of our solutions now.

If you haven't used it, Aspire is Microsoft's open source, cloud-native development stack for multiple languages (not just C#), and its superpower is orchestrating distributed applications through a single AppHost project. It’s extremely well supported with frequent releases, enhancements and features by the brilliant David Fowler, Damian Edwards and Maddy Montaquila.

For us, one dotnet run starts our APIs, dependencies such as containers with tools like qdrant, wires up service discovery, and gives you a dashboard to watch it all.

That model turned out to be exactly right for the Sandbox problem.

We built a small worker service called ‘SandboxAwsConnector’ and plugged it into the Aspire AppHost as a first-class resource. Set a SANDBOX_GPU_HOST environment variable for the IP of the GPU instance, run with the SandboxAws launch profile, and the connector evolved to:

SSH into the reserved EC2 instance
Upload and execute runbook_gpu_stack.sh, a setup script that handled Docker Engine installation, the NVIDIA Container Toolkit (including a fallback repo path for Ubuntu 24.04), Ollama and Qdrant via Docker Compose, and model pulls, in a single unattended pass
Establish local port forwards over the SSH tunnel to tools like Qdrant and Ollama
Health monitor the tunnels for the duration of the session, reconnecting automatically if anything dropped

The health check and model pull section of runbook_gpu_stack.sh. It waits, verifies, then pulls. Nothing starts until everything is ready.

This adheres to our everything-as-code approach, the script itself was written to be idempotent and conflict-proof. It evolved to handle gotchas and discoveries as we continued to develop our solution on this amazing technology. We could pre-pull models we wanted to test and verify our findings, so embeddings and inferences didn’t have to wait.

The pivot capability came from Aspire's abstractions.

In the AppHost project, the same embedding model and chat model resources resolved to Azure AI Foundry in an azure deployment, to a local Ollama container in local development, and to the remote GPU instance via SSH tunnel in Sandbox mode.

Switching between them was a configuration change, not a code change. The rest of the application, the agents, the RAG pipeline, the API service, had no idea which backend was running underneath. It doesn’t need to.

From 30 minutes of manual steps to a few minutes of automated ones. In a time-boxed session against a fixed compute budget, that difference was real. We got to spend our GPU time on the actual work.

Beyond the Sandbox

While this helped us save hours during our development on the AWS GPU, we took extensive notes from our experience and advice from the mentors which backed a theory that formed as our agentic UI took shape.

The regulated firms that we are working with need to see governance and control within agentic applications. Aspire helped us to achieve that by creating a flexible, portable solution that supports a “bring your own model” approach that we can extend. This became a key part of our presentation and offering to those in attendance.

When you're testing whether an agent's compliance grounding holds up in edge cases, whether your safety policy handles a specific escalation correctly, or whether your event stream maintains the right ordering under load, you need to iterate quickly. Run the scenario, see what happened, adjust, run it again. If standing up the environment costs you 30 minutes each time, you do fewer iterations. You find fewer things.

The Sandbox gave us GPU compute and a structured environment to test in. Aspire gave us the ability to actually use it. The combination is what allowed us to make real progress in the time we had.

The mentors we worked with pushed us on things we hadn't fully resolved and I believe that challenge sharpened our thinking in ways that wouldn't have happened otherwise.

Watching the Aspire dashboard go green for the first time.

Next time

If this is interesting to you, I’ll eventually get out another piece around the Microsoft Agent Framework and how we utilised a preview framework to iterate our ideas quickly for the sandbox.

_This post was originally published on LinkedIn. [_Read the original here.](https://www.linkedin.com/pulse/our-experience-from-fca-ai-supercharged-sandbox-aspire-ian-rathbone-zviee/?ref=blog.rathbone.dev)