In today’s tutorial, I will use Autogen’s docker-jupyter-executor runtime with Qwen3’s newest max model to try finishing the Advent of Code challenge quickly. I aim to demonstrate that combining LLM code generation with a stateful Python runtime can effectively solve extremely complex algorithmic problems.
As usual, I will share the full project source code for you to check. You can find it in the reference section.
Course Background
You have probably heard of Advent of Code (AOC). It is a fun programming challenge that claims to help beginners practice a programming language. The puzzles are really hard. Every year, I struggle and only finish the first few days.
I was not happy about that.
This year, there is still one month before the Advent of Code starts, but I have done all the prep work. New monitor, new IDE, new keyboard, and a new agent tool.
Yes, I do not plan to use my brain to solve the problems this year. Like AlphaGo, I want to build an agent. I will let AI read the puzzle, write the code, and get the result all by itself. My job will be making coffee and sitting at my desk waiting.
It worked. I tested with past challenges and started getting stars faster than I had time to read the problems. My cost was only some tokens.
I even allow users to enter Part Two of an AoC problem through multi-turn conversation, so the agent can keep solving.
And it is not just for Advent of Code. This agent can also run data analysis or other tasks you can imagine.
How did this all happen
In today’s tutorial, you will see:
- I will follow the ReAct pattern to build a single-agent app that solves complex challenges by planning sub-steps one at a time.
- Each sub-step depends on the main task and previous results, so the LLM can adjust mistakes anytime.
- Each sub-step uses Python code to solve the puzzle and uses Jupyter as the runtime to get intermediate results.
- The agent relies on the stateful Jupyter kernel, so it can reflect on previous results and adjust the next steps until it finds the final answer. The effect is amazing.
Why this works well
In my last post, we tried building a multi-agent system for math problems. You can read it here:
I Used Autogen GraphFlow and Qwen3 Coder to Solve Math Problems — And It Worked
That system worked well, but not perfectly. It worked by letting a reasoning agent plan all steps at once and then sending them to a coding agent to write Python code.
This caused problems.
For exploratory tasks like reading a file and then deciding what to do based on its content, the system could not handle it.
If the code failed during execution, the whole Python file had to be regenerated to find the error and adjust it. This was not flexible.
Think about how humans handle challenging tasks like data analysis or ML modeling. We write some code, run it, see if the result matches expectations, then decide what to write next. That is why Jupyter is so popular in data science.
So why not use Jupyter as the Python runtime? Of course, we can. That is what we will do today. We will generate a small bit of code each time we run it, then move forward until we reach the goal.
Preparation
Build a Jupyter container
Since we will use Jupyter as the runtime, we need to set it up before the course starts.
I will use a Docker container to isolate Jupyter so that bad LLM code will not break the system.
Dockerfile looks like this:
FROM python:3.13-slim-bookworm
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
pip install --no-cache-dir jupyter_kernel_gateway ipykernel numpy pandas sympy scipy --upgrade
RUN pip install --no-cache-dir -r requirements.txt --upgrade
EXPOSE 8888
ENV TOKEN="UNSET"
CMD python -m jupyter kernelgateway \
--KernelGatewayApp.ip=0.0.0.0 \
--KernelGatewayApp.port=8888 \
--KernelGatewayApp.auth_token="${TOKEN}" \
--JupyterApp.answer_yes=true
requirements.txt looks like this:
matplotlib
xlrd
openpyxl
pdfplumber
reportlab
I install rarely changed dependencies and often changed dependencies separately to use Docker layer cache for faster builds.
Autogen uses Docker SDK to control the start and stop of the container, so I did not set up Jupyter auth. This makes the runtime call easier, but it is not safe for production.
Then we build the image and name it jupyter-server for later.
docker build -t jupyter-server .
Test connectivity with Autogen
After building the image, we need to test with Autogen to see if running code in Jupyter works. We must install autogen-ext[docker-jupyter-executor] and nbclient.
Do not worry. I already added these to pyproject.toml So you just run pip install --upgrade -e ..
Before starting, we need to initialize a DockerJupyterServer module. This uses Docker SDK to start a container from the Jupyter image. We will use this today.
jupyter_server = DockerJupyterServer(
custom_image_name="jupyter-server:latest",
expose_port=8888
)
There are three ways to use Jupyter runtime.
First, extract the Python code generated by the LLM, run it manually through the Jupyter executor, and get the result.
async def main_1() -> None:
async with jupyter_server:
async with DockerJupyterCodeExecutor(jupyter_server) as executor:
code_blocks = [CodeBlock(code="print('hello world!')", language="python")]
code_result = await executor.execute_code_blocks(code_blocks, cancellation_token=CancellationToken())
print(code_result)
Note that DockerJupyterCodeExecutor is stateful, so in an async with scope repeated calls reuse previous variables without regenerating them.
Second use PythonCodeExecutionTool to execute code and return results.
async def main_2() -> None:
async with jupyter_server:
async with DockerJupyterCodeExecutor(jupyter_server) as executor:
tool = PythonCodeExecutionTool(executor)
agent = AssistantAgent("assistant", model_client=model_client, tools=[tool])
result = await agent.run(task="What is the 10th Fibonacci number? Use Python to calculate it.")
print(result.messages[-1].content)
This uses the agent’s function call ability. If your agent needs to do many jobs and code execution is just one part, use this.
Third use CodeExecutorAgent to execute code.
async def main_3() -> None:
async with jupyter_server:
async with DockerJupyterCodeExecutor(jupyter_server) as executor:
code_executor_agent = CodeExecutorAgent("code_executor", code_executor=executor)
task = TextMessage(
content="""
```
python
a = 3
```
""",
source="user"
)
response = await code_executor_agent.on_messages([task], CancellationToken())
print(response.chat_message)
task_2 = TextMessage(
content="""
```
python
print(a)
```
""",
source="user"
)
response_2 = await code_executor_agent.on_messages([task_2], CancellationToken())
print(response_2.chat_message)
In a multi-agent system, if you want a dedicated agent for code execution and reflection, this is good.
For example, in my last tutorial I used CodeExecutorAgent in an Autogen GraphFlow to handle code execution.
Let’s Start
With the Jupyter runtime ready, we can look at today’s project.
Architecture design
Advent of Code is hard. No LLM can plan the whole logic up front. So we will plan one step, run the code, see the result, then plan the next.
So the loop becomes think, act, observe, think again.
Sounds familiar. Yes, this is the famous ReAct agent design.
Since ReAct only needs one agent, we will build a single-agent app. The agent will use the user request and the previous result to plan the current step, then write a Python snippet to get the intermediate result.
With a single agent app, it fits to use PythonCodeExecutorTool for running code.
Unlike traditional generate and run code, here we plan one step and get only an intermediate result.
In this case direct Python runtime does not work well. The best way is to send code to a Jupyter kernel, which saves variables and results.
Our single-agent app architecture looks like this:
Write agent code
With goals and design set, it is coding time.
Using Docker means we need to manage context and container lifecycle. I do not want the caller to start or stop Docker each time. Code execution is the agent’s duty, not the caller’s.
I also want to keep the Autogen AssistantAgent API so the agent stays general. So I will wrap its init and call it as a new Agent.
The agent and Jupyter runtime must allow generated code to read files. So I will mount a folder in the Docker container and put user-uploaded files in it.
class AOCAssistant:
...
@staticmethod
def _copy_file(
file_name: str | None = None,
file_path: Path | str | None = None,
) -> Path | str | None:
if file_path is None:
return None
if file_name is None:
file_name = Path(file_path).name
dst_path = BINDING_DIR / file_name
shutil.copy2(file_path, dst_path)
return file_name
The Agent will manage DockerJupyterServer and DockerJupyterCodeExecutor lifecycle.
class AOCAssistant:
...
async def start(self):
await self._executor.start()
async def stop(self):
await self._model_client.close()
await self._executor.stop()
await self._jupyter_server.stop()
async def __aenter__(self) -> "AOCAssistant":
await self.start()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.stop()
def _init_jupyter_docker(self) -> None:
self._jupyter_server = DockerJupyterServer(
custom_image_name="jupyter-server:latest",
expose_port=8888,
bind_dir=BINDING_DIR,
)
self._executor = DockerJupyterCodeExecutor(
jupyter_server=self._jupyter_server,
timeout=600)
I implemented __aenter__ and __aexit__ , so you can manage resources with async with.
Next, init LLM client and AssistantAgent , bind the CodeExecutor as a tool to the Agent.
class AOCAssistant:
...
def _init_assistant(self) -> None:
self._model_client = OpenAILikeChatCompletionClient(
model=self._model_name,
temperature=0.5,
top_p=0.85,
)
tool = PythonCodeExecutionTool(self._executor)
self._agent = AssistantAgent(
'assistant',
model_client=self._model_client,
tools=[tool],
model_client_stream=True,
system_message=SYS_PROMPT,
max_tool_iterations=30,
)
I used the newest Qwen3-max model. Open source qwen3-next-80b-a3b-instruct is also good. I set temperature to 0.5 for some creativity in final results and top_p to 0.85 for serious planning and coding.
I need ReAct style iteration, so I set max_tool_iterations in AssistantAgent. In Autogen, this lets the agent iterate based on tool_calls. It stops when it hits the max.
Finally, to keep our custom Agent API the same as Autogen AssistantAgent I implemented run and run_stream.
class AOCAssistant:
...
async def run(
self,
*,
task: str | BaseChatMessage | Sequence[BaseChatMessage] | None = None,
cancellation_token: CancellationToken | None = None,
file_name: str | None = None,
file_path: Path | str | None = None,
) -> TaskResult:
async for message in self.run_stream(
task=task,
cancellation_token=cancellation_token,
file_name=file_name,
file_path=file_path,
):
if isinstance(message, TaskResult):
return message
raise ValueError("No task result output.")
async def run_stream(
self,
*,
task: str | BaseChatMessage | Sequence[BaseChatMessage] | None = None,
cancellation_token: CancellationToken | None = None,
file_name: str | None = None,
file_path: Path | str | None = None,
) -> AsyncGenerator[BaseAgentEvent | BaseChatMessage | TaskResult, None]:
file_name = self._copy_file(file_name, file_path)
input_messages = []
if isinstance(task, str):
input_messages.append(TextMessage(
source="user",
content=task
))
elif isinstance(task, BaseChatMessage):
input_messages.append(task)
if file_name is not None:
input_messages.append(TextMessage(
source="user",
content=f"The input file is `{file_name}`"
))
async for message in self._agent.run_stream(
task=input_messages,
cancellation_token=cancellation_token):
yield message
run just calls run_stream and returns TaskResult.
run_stream copies user files to the mounted directory, rebuilds input_messages adds file info, then calls AssistantAgent.run_stream to get LLM streaming output.
Write the prompt
This project needs the agent to plan sub-tasks step by step, write correct Python code, iterate based on results, and give a good final output. So the prompt will be detailed.
I will give you the whole prompt and explain why it is written that way.
I will also show you a trick to debug prompts better.
Here is the prompt first:
from textwrap import dedent
SYS_PROMPT = dedent("""
## Role
You are a university professor who is good at breaking down complex tasks into smaller parts that can be solved using Python code.
## Task
1. **Task Breakdown**: Break the user's request into several smaller steps, each suitable for solving with Python code.
2. **Code Generation**: Turn the current step into Python code.
3. **Code Execution**: Use tools to run the code and get the results.
4. **Iterative Progress**: Decide the next step based on the previous result, and repeat the process until you get the final answer to the user's request.
## Requirements
- Plan and execute only one step at a time. Do not skip or combine steps.
- Keep repeating the process until the task is fully completed.
## Output
- Explain your thinking for each step.
- Keep the structure clear.
- Use a relaxed but authoritative tone.
- Use emojis appropriately to make things friendlier.
- Provide the final result.
- If the result is an expression, solve it as a floating-point number.
- Do not say "Task completed."
## Code Guidelines
- The code runs in a Jupyter environment, and you can reuse variables that have already been declared.
- Write code in an incremental way and use the kernel's statefulness to avoid repeating code.
## Python Package Management
1. You can only use numpy, pandas, sympy, scipy, and numexpr.
2. You are not allowed to install packages yourself using `pip install`.
""")
I used markdown to organize the prompt.
Role and Output parts set the tone and format for the answer.
Task and Requirements tell the agent to plan only one step at a time in iterative style.
Code Guidelines and Python Package Management set rules for writing Python and what third party libraries are allowed.
One handy prompt debug trick is to write a main method in agents.py that asks the agent to repeat the instructions in detail.
async def main():
async with AOCAssistant() as agent:
await Console(agent.run_stream(task=dedent("""
Please repeat my instructions in detail.
""")))
if __name__ == "__main__":
asyncio.run(main())
Then the agent will output its understanding of your instructions.
This helps find missing points and you can copy the agent’s version back into your prompt.
Build the UI with chainlit
I want my agent to be easy to use with a UI. Chainlit is a fast way to make a prototype.
I put chainlit code in app.py. During development, you can run chainlit run app.py -w for hot reload.
I first define on_chat_start and on_chat_end to initialize our custom Agent and manage the Jupyter server lifecycle.
@cl.on_chat_start
async def on_chat_start():
assistant = AOCAssistant()
await assistant.start()
cl.user_session.set("assistant", assistant)
@cl.on_chat_end
async def on_chat_end():
assistant = cl.user_session.get("assistant")
await assistant.stop()
In on_message we get user files, then call the agent and filter the returned text to show in the UI.
@cl.on_message
async def on_message(message: cl.Message):
input_msg = message.content
file_path = None
file_name = None
if len(message.elements)>0:
file_path = message.elements[0].path
file_name = message.elements[0].name
assistant: AOCAssistant = cl.user_session.get("assistant")
output_msg = cl.Message(content='')
async for event in assistant.run_stream(
task=input_msg,
file_name=file_name,
file_path=file_path,
):
if isinstance(event, ModelClientStreamingChunkEvent):
await output_msg.stream_token(event.content)
await output_msg.update()
And that is it. The agent app is done. It is simpler than it sounds. Agent app development is like this. Once you know what to do, coding is easy.
After Class Practice
Today, our project starts a Jupyter container with Docker SDK through Autogen. This method is fine for local testing. In enterprise apps, the agent itself will be in a container, so starting Jupyter with Docker SDK becomes hard. You need another way.
To keep the code simple, we built a single-agent app. This makes the agent’s job more complex, which goes against the idea that each agent should have only one atomic job. In later optimization, you should try splitting into a multi-agent app.
Course Summary
Agent development tech will keep growing, and agents will handle more complex tasks. Code generation by agents for solving tasks will become more common.
In today’s project, we followed ReAct agent design to build a single-agent app. It plans steps one by one and runs Python snippets in a stateful runtime container. This makes the agent smarter and more independent.
I tested it with the 2024 Advent of Code challenge and got good results. The agent can also work in other complex scenarios.
Thanks for reading today’s post. I am collecting ideas for agent development. If you have thoughts, share them in the comments. I will reply soon.
Enjoyed this read? Subscribe now to get more cutting-edge data science tips straight to your inbox! Your feedback and questions are welcome — let’s discuss in the comments below!
This article was originally published on Data Leads Future.







Top comments (0)