Some of the most interesting developments of recent Generative AI implementations are all the different ways we can ask models and agents to work together to come up with solutions for our tasks. We have orchestration, choreography, and every permutation we can think of.
One of the concepts that many of us have experimented with is the LLM Council pattern from Andrej Karpathy at https://github.com/karpathy/llm-council. This project sets up three configurable models and asks each the users’ questions. The answers from each model go through peer review and ranking. Finally, the chairman of the LLM Council compiles the responses into a final judgement.
Why would we choose this framework? Each model has a set of unique combinations of strengths and weaknesses. We can come up with more accurate, more diverse, and more complete answers by trying to combine the best of each.
I built a variant of this using AWS AgentCore https://github.com/mgbec/Council-agents. I substituted a few of Andrej Karpathy’s components with AgentCore elements:
Instead of OpenRouter + FastAPI + JSON files, this version uses:
- Amazon Bedrock for multi-model access (Claude, Llama, Mistral, etc.)
- AgentCore Runtime for serverless hosting with session management
- AgentCore Memory for conversation persistence across sessions
I did substitute some of models with different versions(easy to change in config.py):
COUNCIL_MODELS =
“us.anthropic.claude-sonnet-4–20250514-v1:0”
“us.meta.llama4-maverick-17b-instruct-v1:0”
“mistral.mistral-large-2411-v1:0”
CHAIRMAN_MODEL = “us.anthropic.claude-sonnet-4–20250514-v1:0”
The basic functions are still the same, however:
- Ask question and receive individual responses:
- Peer ranking:
- Then a final Council decision is made:
This is all hosted on AWS with a React frontend. The workflow keeps credentials server-side, authenticates users via Cognito, and serves the React app from CloudFront.
Some of the learning opportunities I had:
* API Gateway REST APIs have a hard 29-second timeout, but the council takes 30–90 seconds. To work around this, the system uses an async pattern: the frontend submits a request (instant response with a request ID), then polls for the result every 5 seconds. The heavy work runs in a separate SQS-triggered Lambda with no timeout constraint.
*I originally tried a Lambda Function URL to work around the API Gateway timeout. It would have worked, but the way I had it implemented was not very secure. First, the Lambda function was set up as public, which was not safe at all. My second attempt was having the Lambda itself validate the Cognito JWT on every request. Validation would check token structure, expiration, issuer, app client ID and that the key ID (kid) exists in your Cognito JSON Web Key Set. It did not do RSA signature verification, however, and I scrapped that plan for an async pattern with API Gateway, Lambdas, DynamoDB, and SQS. The full architecture is here, https://github.com/mgbec/Council-agents/blob/master/architecture.md, but a quick synopsis of the part in question:
* For the AgentCore deployment, we can use CodeZip and upload to S3, or Docker image and push it to ECR. In the past I have used the Docker/ECR method, but Kiro told me that the best option for this project is the CodeZip method. “For this project, CodeZip (S3) is the right choice — it’s pure Python with pip-installable dependencies, nothing exotic in the runtime. Container mode is more useful when you need system-level packages, custom binaries, or a specific OS setup.”
*The Lambda is used as a thin proxy that calls InvokeAgentRuntime, keeping AgentCore ARN and AWS credentials server-side, never exposed to the browser. Lambda then uses the Cognito sub claim to namespace AgentCore sessions, so users have memory isolation.
* I really enjoy AgentCore Observability in all of its detail. For this project I didn’t see any sessions captured in the Observability Dashboard. I saw plenty of traces but no sessions at all. I asked Kiro about that, and the answer was “The issue is that our agent code (main.py) uses raw boto3 calls via bedrock_client.py rather than the Strands Agent framework. When you use a Strands Agent() with the BedrockAgentCoreApp, the framework automatically propagates session context into the OTEL spans. Our code bypasses that — it just calls boto3.client(“bedrock-runtime”).converse() directly, so the traces show the Bedrock calls but don’t associate them with the AgentCore session.”
Kiro suggested two possible fixes to see the sessions in AgentCore Observability. The agent code would need to use one of these options:
-Use a Strands Agent with session management (the framework handles OTEL context automatically)
-Manually inject the session ID into the OTEL span attributes
I did attempt to refactor to use the Strands Agent session management but this created a metastasizing string of errors. I also tried to manually inject the session ID into the span attributes, which also did not work. Finally I tried something I saw in this document about OTEL baggage:
https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/observability-configure.html
I had no luck with that either so sessions in the Observability Dashboard are a problem for another day.
*Kiro was great at fixing the Dependabot vulnerabilities when asked to:
“All 9 vulnerabilities fixed — npm audit fix updated 13 packages and now shows 0 vulnerabilities. Let me verify the build still works, then commit:”
This was a fun way to implement Andrej Karpathy’s LLM Council idea. The next steps for me might be fixing the session observability, speeding up the responses, or trying a cheaper model. I asked my council to recommend a cost effective model for a chairman, and this was actually a snappy response. It recommended Claude 3 Haiku for the reasons shown below:
I’m looking forward to all the creativity, new arrangements and workflows we will see in the future. Thanks for reading!








Top comments (0)