<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Roberto de la Cámara</title>
    <description>The latest articles on DEV Community by Roberto de la Cámara (@robertodelacamara).</description>
    <link>https://dev.to/robertodelacamara</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3859520%2Facc1832f-bd9c-480e-808f-610747919691.jpeg</url>
      <title>DEV Community: Roberto de la Cámara</title>
      <link>https://dev.to/robertodelacamara</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/robertodelacamara"/>
    <language>en</language>
    <item>
      <title>Variance testing flipped my Ollama benchmark ranking</title>
      <dc:creator>Roberto de la Cámara</dc:creator>
      <pubDate>Sun, 26 Apr 2026 17:26:30 +0000</pubDate>
      <link>https://dev.to/robertodelacamara/variance-testing-flipped-my-ollama-benchmark-ranking-3pod</link>
      <guid>https://dev.to/robertodelacamara/variance-testing-flipped-my-ollama-benchmark-ranking-3pod</guid>
      <description>&lt;p&gt;I ran 6 local Ollama models against strict code-gen prompts, then re-ran the most discriminating prompt 3 times each. The single-shot winner was unstable, and the actual best was a general-purpose model the single-shot run had ranked 5th.&lt;/p&gt;

&lt;p&gt;I've been picking models for a local Ollama pool that handles small, well-scoped coding chores delegated from a main agent. Before cabling routing rules into the agent, I wanted a defensible answer to "which model for which task family." So I built a tiny benchmark. The interesting part wasn't the ranking. It was that the ranking changed after I added variance testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I ran 6 models against 3 strict, single-function prompts (auto-graded by I/O equivalence, 32 test cases). Then I ran the most discriminating prompt 3 times on every model. Findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-shot ranking placed &lt;code&gt;qwen3.5:9b&lt;/code&gt; at the top and &lt;code&gt;gemma4:latest&lt;/code&gt; 5th.&lt;/li&gt;
&lt;li&gt;Post-variance, &lt;code&gt;gemma4:latest&lt;/code&gt; was the only byte-stable perfect model. &lt;code&gt;qwen3.5:9b&lt;/code&gt; produced byte-identical buggy code in 2 of 3 runs at &lt;code&gt;temperature=0.2&lt;/code&gt;. Its dominant decoding mode is broken on this prompt.&lt;/li&gt;
&lt;li&gt;The Qwen3 thinking variants returned empty &lt;code&gt;response&lt;/code&gt; fields on 100% of constrained code-gen prompts until I set &lt;code&gt;think:false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The "obvious coder" pick (&lt;code&gt;qwen2.5-coder:14b&lt;/code&gt;) lost to a general-purpose model (&lt;code&gt;gemma4&lt;/code&gt;) on every code-gen prompt that didn't require Python runtime reasoning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Methodological lesson: single-shot LLM benchmarks lie in both directions. The "winner" was unstable, and the "loser" was best-in-class for a specific task family.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Single workstation, 16 GB VRAM, Ollama on &lt;code&gt;127.0.0.1:11434&lt;/code&gt;. A 60-line bash wrapper POSTs each prompt with &lt;code&gt;temperature=0.2&lt;/code&gt;, &lt;code&gt;stream=false&lt;/code&gt;. A Python verifier strips markdown fences, &lt;code&gt;exec()&lt;/code&gt;s the model's output, and runs valid + invalid inputs against the resulting function. All scores are automated.&lt;/p&gt;

&lt;p&gt;Three prompts, all forbidding markdown fences and preamble:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;P1&lt;/strong&gt;: a pytest test generator with a stale-reference trap (the function under test rebinds the module global, so the test must re-read by attribute, not hold a local).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P2&lt;/strong&gt;: &lt;code&gt;parse_iso_duration(s) -&amp;gt; int&lt;/code&gt; for &lt;code&gt;PT&amp;lt;H&amp;gt;H&amp;lt;M&amp;gt;M&amp;lt;S&amp;gt;S&lt;/code&gt; strings, raising &lt;code&gt;ValueError&lt;/code&gt; on malformed input. 6 valid + 8 invalid cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;P3&lt;/strong&gt;: &lt;code&gt;flatten(d, sep=".") -&amp;gt; dict&lt;/code&gt; recursing into nested dicts but leaving lists alone. 10 cases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Variance test (P2, 3 runs per model)
&lt;/h2&gt;

&lt;p&gt;Same prompt, same &lt;code&gt;temperature=0.2&lt;/code&gt;, three independent calls:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Run 1&lt;/th&gt;
&lt;th&gt;Run 2&lt;/th&gt;
&lt;th&gt;Run 3&lt;/th&gt;
&lt;th&gt;Mean&lt;/th&gt;
&lt;th&gt;Stability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gemma4:latest&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22/22&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22/22&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22/22&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;byte-stable perfect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen2.5-coder:14b&lt;/td&gt;
&lt;td&gt;22/22&lt;/td&gt;
&lt;td&gt;20/22&lt;/td&gt;
&lt;td&gt;20/22&lt;/td&gt;
&lt;td&gt;20.7&lt;/td&gt;
&lt;td&gt;tight cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3:14b (&lt;code&gt;think:false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;17/22&lt;/td&gt;
&lt;td&gt;16/22&lt;/td&gt;
&lt;td&gt;17/22&lt;/td&gt;
&lt;td&gt;16.7&lt;/td&gt;
&lt;td&gt;stable, mediocre&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deepseek-coder-v2:16b&lt;/td&gt;
&lt;td&gt;16/22&lt;/td&gt;
&lt;td&gt;16/22&lt;/td&gt;
&lt;td&gt;12/22&lt;/td&gt;
&lt;td&gt;14.7&lt;/td&gt;
&lt;td&gt;stable, wrong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.5:9b (&lt;code&gt;think:false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;9/22&lt;/td&gt;
&lt;td&gt;9/22&lt;/td&gt;
&lt;td&gt;21/22&lt;/td&gt;
&lt;td&gt;13.0&lt;/td&gt;
&lt;td&gt;bimodal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;qwen3.5:4b (&lt;code&gt;think:false&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;4/22&lt;/td&gt;
&lt;td&gt;19/22&lt;/td&gt;
&lt;td&gt;16/22&lt;/td&gt;
&lt;td&gt;13.0&lt;/td&gt;
&lt;td&gt;wild&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The bug &lt;code&gt;qwen3.5:9b&lt;/code&gt; produced byte-identically in runs 1 and 2 was a regex requiring all three letters: &lt;code&gt;^(\d+)?H(\d+)?M(\d+)?S$&lt;/code&gt;. So &lt;code&gt;"PT5M"&lt;/code&gt; falsely fails because there's no &lt;code&gt;H&lt;/code&gt; and no &lt;code&gt;S&lt;/code&gt; literal. Subtle, plausible-looking, and it ships unless you actually run the function. The 21/22 score in single-shot was the less common sampling path.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;deepseek-coder-v2:16b&lt;/code&gt; is stably wrong: 0/6 valid inputs across all 3 runs. Same regex bug every time. Rerunning won't save it.&lt;/p&gt;

&lt;p&gt;I ran a cross-prompt confirmation on the two stable models with P3, 3 runs each. &lt;code&gt;gemma4&lt;/code&gt; 10/10/10. &lt;code&gt;qwen2.5-coder:14b&lt;/code&gt; 10/10/9. &lt;code&gt;gemma4&lt;/code&gt; went 6 for 6 across both code-gen prompts, byte-stable. The point qwen2.5-coder lost was using &lt;code&gt;if v:&lt;/code&gt; (truthy check) instead of &lt;code&gt;if v is not None&lt;/code&gt;, silently dropping a &lt;code&gt;None&lt;/code&gt; value. Idiomatic but wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thinking-mode trap
&lt;/h2&gt;

&lt;p&gt;First pass on Qwen3 with the default &lt;code&gt;think:true&lt;/code&gt;: &lt;code&gt;qwen3:14b&lt;/code&gt; returned 1 byte (&lt;code&gt;\n&lt;/code&gt;) after 1174 seconds of GPU time. Twenty minutes for nothing. Ollama's &lt;code&gt;/api/generate&lt;/code&gt; returns two fields for thinking-mode models: &lt;code&gt;response&lt;/code&gt; and &lt;code&gt;thinking&lt;/code&gt;. My script only logged &lt;code&gt;response&lt;/code&gt;. When I dumped the raw JSON, the 9B's &lt;code&gt;thinking&lt;/code&gt; field was 21 KB of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Wait, I need to check if I can use `src` if `import src.main_improved` is used.
* Yes.
* So I will use `src.main_improved`.
* Wait, I need to check if I can use `src` if `import src` is used.
* Yes.
* So I will use `src.main_improved`.
[...repeats until context fills...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;done_reason: "stop"&lt;/code&gt; on a 21,000-character thinking trace with no committed answer. The fix was one parameter: &lt;code&gt;"think": false&lt;/code&gt; in the request body. With it, all three Qwen3 sizes responded in 8 to 11 seconds and produced clean code.&lt;/p&gt;

&lt;p&gt;If you're benchmarking thinking-capable models against strict output requirements: smoke-test with &lt;code&gt;think:false&lt;/code&gt; first, and log both fields. One missing line of logging cost me 20 minutes of GPU debugging for what looked like crashes but were actually infinite-loop self-arguments inside &lt;code&gt;thinking&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing rules I ended up with
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parsers, regex, recursive transformers&lt;/strong&gt;: &lt;code&gt;gemma4:latest&lt;/code&gt;. Byte-stable 22/22 across 6 runs of 2 different prompts at temp 0.2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests, fixtures, anything needing Python module/runtime semantics&lt;/strong&gt;: &lt;code&gt;qwen2.5-coder:14b&lt;/code&gt;. Stable 20-22/22, the only model that handled the test-scaffolding trap correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mini tier (laptop, 4 GB VRAM)&lt;/strong&gt;: &lt;code&gt;qwen3.5:4b&lt;/code&gt; with &lt;code&gt;think:false&lt;/code&gt;, sample 5x at temp 0.7, run a verifier, keep the passer. 3.4 GB, ~20s total. Hit rate &amp;gt;=18/22 was 60% in my runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip&lt;/strong&gt;: &lt;code&gt;qwen3:14b&lt;/code&gt; (stably mediocre, 16/22 mean) and &lt;code&gt;deepseek-coder-v2:16b&lt;/code&gt; (stably wrong on valid inputs, same regex bug 3/3 runs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most useful single observation: a general-purpose model beat the dedicated coder on every code-gen prompt that didn't require Python runtime reasoning. The "coder" label means trained on code, not best at every code task.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently next time
&lt;/h2&gt;

&lt;p&gt;Run every prompt against every model 5+ times from the start. The cheap-shot single-run cost me a wrong recommendation once. Add &lt;code&gt;mypy --strict&lt;/code&gt; to the verifier to catch type-hint laziness that &lt;code&gt;exec()&lt;/code&gt; doesn't. And test &lt;code&gt;phi-4-mini&lt;/code&gt; and &lt;code&gt;granite-code:3b&lt;/code&gt; against &lt;code&gt;qwen3.5:4b&lt;/code&gt; for the mini-tier slot.&lt;/p&gt;

&lt;p&gt;If you've shipped &lt;code&gt;qwen3.5:4b&lt;/code&gt; (or anything smaller) in a best-of-N + verifier loop in production, I'd be curious about your hit rate and N.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>ollama</category>
    </item>
    <item>
      <title>Building a multi-source autonomous research agent with LangGraph, ThreadPoolExecutor and Ollama</title>
      <dc:creator>Roberto de la Cámara</dc:creator>
      <pubDate>Fri, 03 Apr 2026 13:11:08 +0000</pubDate>
      <link>https://dev.to/robertodelacamara/building-a-multi-source-autonomous-research-agent-with-langgraph-threadpoolexecutor-and-ollama-1ahk</link>
      <guid>https://dev.to/robertodelacamara/building-a-multi-source-autonomous-research-agent-with-langgraph-threadpoolexecutor-and-ollama-1ahk</guid>
      <description>&lt;p&gt;I wanted a tool that could research any topic deeply — not just one web search, but Wikipedia, arXiv, Semantic Scholar, GitHub, Hacker News, Stack Overflow, Reddit, YouTube and local documents, all at once. So I built it.&lt;/p&gt;

&lt;p&gt;This post covers the architecture decisions, the parallel execution model, the self-correction loop, and a few things that didn't work before I got it right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://huggingface.co/spaces/ecerocg/research-agent" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/ecerocg/research-agent&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/RobertoDeLaCamara/Research-Agent" rel="noopener noreferrer"&gt;https://github.com/RobertoDeLaCamara/Research-Agent&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The problem with sequential research agents
&lt;/h2&gt;

&lt;p&gt;Most agent examples I found do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search web → process → search wiki → process → search arxiv → process → synthesize
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If each source takes 5–10 seconds (network + LLM processing), a 10-source agent takes 50–100 seconds minimum — before synthesis.&lt;/p&gt;

&lt;p&gt;The fix is obvious: run everything in parallel.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;initialize_state
      │
plan_research  ←──────────────────────┐
      │                               │
parallel_search                    re-plan
      │                               │
consolidate ──→ evaluate ─────────────┘
      │              │
   report          END
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graph is implemented with LangGraph's &lt;code&gt;StateGraph&lt;/code&gt;. Each node receives the full &lt;code&gt;AgentState&lt;/code&gt; TypedDict and returns a partial update.&lt;/p&gt;




&lt;h2&gt;
  
  
  Parallel execution with ThreadPoolExecutor
&lt;/h2&gt;

&lt;p&gt;The core of the agent is &lt;code&gt;parallel_search_node&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parallel_search_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;source_functions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="n"&gt;search_web_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wiki&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;search_wiki_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;search_arxiv_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scholar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;search_scholar_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;search_github_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;search_hn_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;so&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="n"&gt;search_so_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reddit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;search_reddit_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local_rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;local_rag_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;youtube&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;_youtube_combined_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;futures_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;futures_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_name&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures_map&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;futures_map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Source &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each source function is independent and returns a partial state dict. &lt;code&gt;combined.update(result)&lt;/code&gt; merges all results — no locking needed because each source writes to different state keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;YouTube is an exception&lt;/strong&gt; — search must complete before summarize can run, so it gets a sequential wrapper inside the parallel executor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_youtube_combined_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;search_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_videos_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;merged_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;search_result&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;summarize_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_videos_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged_state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summarize_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This brings total research time from ~5 min sequential to ~45s on a decent connection.&lt;/p&gt;




&lt;h2&gt;
  
  
  The self-correction loop
&lt;/h2&gt;

&lt;p&gt;After parallel search, an evaluation node checks for knowledge gaps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_research_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;next_node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evaluation_report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max iterations reached.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;gaps_detected&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;next_node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;re_plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;next_node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LangGraph conditional edge routes back to &lt;code&gt;plan_research&lt;/code&gt; or forward to &lt;code&gt;END&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evaluate_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;next_node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;re_plan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plan_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidate_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On re-plan, the LLM can select different or additional sources based on what was missing. On niche topics this second pass noticeably improves coverage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dynamic research planning with personas
&lt;/h2&gt;

&lt;p&gt;Before searching, &lt;code&gt;plan_research_node&lt;/code&gt; asks the LLM which sources are relevant for the topic. This avoids wasting API calls on irrelevant sources.&lt;/p&gt;

&lt;p&gt;Five personas shape the planning:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Persona&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Generalist&lt;/td&gt;
&lt;td&gt;Balanced across all sources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Software Architect&lt;/td&gt;
&lt;td&gt;GitHub, HN, SO heavily weighted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Market Analyst&lt;/td&gt;
&lt;td&gt;Web, Reddit, HN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scientific Reviewer&lt;/td&gt;
&lt;td&gt;arXiv, Semantic Scholar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product Manager&lt;/td&gt;
&lt;td&gt;Web, Reddit, YouTube&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The persona prompt changes what the LLM considers "relevant", so the research plan — and therefore which threads run in parallel — differs per persona.&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM factory: local or cloud with zero config changes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OLLAMA_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5:1.5b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;_is_cloud_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same agent code runs against local Ollama or Groq/Gemini/OpenAI — just swap env vars. Reads &lt;code&gt;os.environ&lt;/code&gt; at call time (not at import) so Streamlit sidebar overrides work without restart.&lt;/p&gt;




&lt;h2&gt;
  
  
  What didn't work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;nonlocal&lt;/code&gt; in threaded callbacks&lt;/strong&gt; — I originally used &lt;code&gt;nonlocal&lt;/code&gt; to capture results from threads. Race conditions appeared under load. Fixed by switching to a mutable container pattern (&lt;code&gt;container = {"data": []}&lt;/code&gt;) and reading only after &lt;code&gt;thread.join()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pydantic v1 validators&lt;/strong&gt; — &lt;code&gt;@validator&lt;/code&gt; with positional &lt;code&gt;cls&lt;/code&gt; broke on Pydantic v2. Migrated to &lt;code&gt;@field_validator&lt;/code&gt; with &lt;code&gt;@classmethod&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sequential YouTube&lt;/strong&gt; — the first version treated YouTube like other sources. Summarization needs the transcript, which needs the video URL, which needs the search. Making it a composed sequential node within the parallel executor fixed this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.env&lt;/code&gt; baked into Docker image&lt;/strong&gt; — &lt;code&gt;COPY . .&lt;/code&gt; was copying &lt;code&gt;.env&lt;/code&gt; into the image, leaking credentials. Added &lt;code&gt;.env&lt;/code&gt; to &lt;code&gt;.dockerignore&lt;/code&gt;. Docker Compose also interpolates &lt;code&gt;${OLLAMA_MODEL:-default}&lt;/code&gt; from &lt;code&gt;.env&lt;/code&gt;, which overrode the intended demo model. Hardcoded the values in &lt;code&gt;docker-compose.full.yml&lt;/code&gt; instead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running it locally
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# With local Ollama — no API keys needed:&lt;/span&gt;
git clone https://github.com/RobertoDeLaCamara/Research-Agent
&lt;span class="nb"&gt;cd &lt;/span&gt;Research-Agent
docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.full.yml up
&lt;span class="c"&gt;# pulls qwen2.5:1.5b automatically, starts at localhost:8501&lt;/span&gt;

&lt;span class="c"&gt;# With Groq free tier:&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;env.example .env
&lt;span class="c"&gt;# OPENAI_API_KEY=your_groq_key&lt;/span&gt;
&lt;span class="c"&gt;# OLLAMA_BASE_URL=https://api.groq.com/openai/v1&lt;/span&gt;
&lt;span class="c"&gt;# OLLAMA_MODEL=llama-3.1-8b-instant&lt;/span&gt;
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Streaming output so the UI updates as each source completes&lt;/li&gt;
&lt;li&gt;Better synthesis prompts for small models (1.5b demo)&lt;/li&gt;
&lt;li&gt;Persistent research sessions with diff between iterations&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://huggingface.co/spaces/ecerocg/research-agent" rel="noopener noreferrer"&gt;https://huggingface.co/spaces/ecerocg/research-agent&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/RobertoDeLaCamara/Research-Agent" rel="noopener noreferrer"&gt;https://github.com/RobertoDeLaCamara/Research-Agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions about any part of the architecture.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
