<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marc Builds</title>
    <description>The latest articles on DEV Community by Marc Builds (@marcbuilds).</description>
    <link>https://dev.to/marcbuilds</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2020657%2F1a0997ab-823e-46fe-b7d4-a841074d63cf.jpeg</url>
      <title>DEV Community: Marc Builds</title>
      <link>https://dev.to/marcbuilds</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/marcbuilds"/>
    <language>en</language>
    <item>
      <title>TinySearch: Let Small Local LLMs Search the Web Without Burning Context</title>
      <dc:creator>Marc Builds</dc:creator>
      <pubDate>Fri, 29 May 2026 07:47:03 +0000</pubDate>
      <link>https://dev.to/marcbuilds/tinysearch-let-small-local-llms-search-the-web-without-burning-context-17ic</link>
      <guid>https://dev.to/marcbuilds/tinysearch-let-small-local-llms-search-the-web-without-burning-context-17ic</guid>
      <description>&lt;p&gt;I’ve been playing around with local LLM agents a lot lately.. Mostly smaller models, MCP tools, Cline/Roo-style workflows, and home lab setups.&lt;/p&gt;

&lt;p&gt;Not the “infinite context, infinite budget” world.&lt;/p&gt;

&lt;p&gt;More like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can this 4B/9B model actually use the web without getting buried alive by garbage context?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That was the problem that kept annoying me.&lt;/p&gt;

&lt;p&gt;Most web-search tools technically work, but they often dump way too much raw page text into the model. You ask a simple question, and suddenly your local model is trying to reason through cookie banners, broken markdown, SEO filler, navigation menus, duplicated paragraphs, and five pages of irrelevant junk.&lt;/p&gt;

&lt;p&gt;For small models, that is painful.&lt;/p&gt;

&lt;p&gt;They do not need “the whole web”.&lt;/p&gt;

&lt;p&gt;They need a small, useful, source-grounded slice of the web that matches the actual query.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;TinySearch&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/MarcellM01/TinySearch" rel="noopener noreferrer"&gt;https://github.com/MarcellM01/TinySearch&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kygwztzj8btusb05aok.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0kygwztzj8btusb05aok.gif" alt="Gif: How a practical search looks like, from crawling to returned prompt" width="600" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What TinySearch does
&lt;/h2&gt;

&lt;p&gt;TinySearch is a small open-source MCP research tool that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;searches the web&lt;/li&gt;
&lt;li&gt;crawls selected pages&lt;/li&gt;
&lt;li&gt;chunks the extracted content&lt;/li&gt;
&lt;li&gt;reranks the useful parts&lt;/li&gt;
&lt;li&gt;returns a compact, source-grounded prompt for your model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The flow is basically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search -&amp;gt; crawl -&amp;gt; rerank -&amp;gt; return grounded prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the whole idea.&lt;/p&gt;

&lt;p&gt;TinySearch does &lt;strong&gt;not&lt;/strong&gt; answer the question itself.&lt;/p&gt;

&lt;p&gt;It prepares the evidence.&lt;/p&gt;

&lt;p&gt;Your actual LLM then answers using that evidence.&lt;/p&gt;

&lt;p&gt;That matters because I do not want another LLM layer summarizing summaries. I want the model to receive clean, ranked, URL-attached context and reason from there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built it
&lt;/h2&gt;

&lt;p&gt;The original pain was simple: I wanted local agents to have web research without insane context overhead.&lt;/p&gt;

&lt;p&gt;When a tool dumps entire pages into context, it creates three problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it burns tokens for no reason&lt;/li&gt;
&lt;li&gt;it confuses smaller models&lt;/li&gt;
&lt;li&gt;it makes agent workflows feel heavier than they need to be&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TinySearch is for that annoying middle ground where you want web research, but you do &lt;strong&gt;not&lt;/strong&gt; want to set up a whole search stack or pay for a commercial API for every agent query.&lt;/p&gt;

&lt;p&gt;It makes sense for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local LLM agents&lt;/li&gt;
&lt;li&gt;MCP workflows&lt;/li&gt;
&lt;li&gt;small RAG experiments&lt;/li&gt;
&lt;li&gt;personal research tools&lt;/li&gt;
&lt;li&gt;coding agents that occasionally need current docs&lt;/li&gt;
&lt;li&gt;smaller models that cannot handle context bloat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not trying to replace Perplexity, Exa, Tavily, or a production search system.&lt;/p&gt;

&lt;p&gt;That would be cope.&lt;/p&gt;

&lt;p&gt;It is just a small research layer that gives your model cleaner web context.&lt;/p&gt;




&lt;h2&gt;
  
  
  You can use it through Glama
&lt;/h2&gt;

&lt;p&gt;One of the easiest ways to try TinySearch is through &lt;strong&gt;Glama&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So if you do &lt;strong&gt;not&lt;/strong&gt; want to host it yourself, you can use the Glama option instead.&lt;/p&gt;

&lt;p&gt;That is probably the lowest-friction path if you just want to plug it into an MCP-compatible workflow and test whether the tool fits your setup.&lt;/p&gt;

&lt;p&gt;But if you prefer running things yourself, there is also a Docker image.&lt;/p&gt;




&lt;h2&gt;
  
  
  Docker image
&lt;/h2&gt;

&lt;p&gt;The Docker version is the easiest self-hosted setup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MCP_TRANSPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;streamable-http &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MCP_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0 &lt;span class="se"&gt;\&lt;/span&gt;
  marcellm01/tinysearch:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect your MCP client to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tinysearch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8000/mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TinySearch exposes one simple tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research(query)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You pass the user’s question as-is, and TinySearch handles the search, crawl, rerank, and prompt-building flow.&lt;/p&gt;

&lt;p&gt;There is also an optional FastAPI server if you want to use it over HTTP instead of MCP.&lt;/p&gt;




&lt;h2&gt;
  
  
  What happens under the hood
&lt;/h2&gt;

&lt;p&gt;The current pipeline looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user question
   ↓
DuckDuckGo HTML search
   ↓
search-result reranking
   ↓
crawl selected pages with Crawl4AI
   ↓
chunk extracted markdown
   ↓
global chunk reranking
   ↓
dedupe + source quotas
   ↓
build source-grounded answer prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final output is not just scraped text.&lt;/p&gt;

&lt;p&gt;It is a structured prompt containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the original question&lt;/li&gt;
&lt;li&gt;today’s date&lt;/li&gt;
&lt;li&gt;instructions for the answering model&lt;/li&gt;
&lt;li&gt;source titles&lt;/li&gt;
&lt;li&gt;URLs&lt;/li&gt;
&lt;li&gt;search previews&lt;/li&gt;
&lt;li&gt;the most relevant extracted chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to shrink the web into something a small model can actually use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example output shape
&lt;/h2&gt;

&lt;p&gt;TinySearch returns something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;================================================================================
SEARCH-GROUNDED ANSWER PROMPT
================================================================================

QUESTION
What are the latest Basel III updates?

TODAY
2026-05-18

CRITICAL INSTRUCTIONS
Use only the text under RESULTS.
If the answer is not supported, say the results are not enough.
Cite source URLs after factual claims.

RESULTS

RESULT 1
TITLE
...

URL
...

SEARCH PREVIEW
...

RELEVANT TEXT
...

RESULT 2
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That format is intentional.&lt;/p&gt;

&lt;p&gt;I want the final model to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what evidence it can use&lt;/li&gt;
&lt;li&gt;where the evidence came from&lt;/li&gt;
&lt;li&gt;when the search happened&lt;/li&gt;
&lt;li&gt;when it should say “not enough information”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters a lot for questions involving “latest”, “today”, “this year”, or anything time-sensitive.&lt;/p&gt;




&lt;h2&gt;
  
  
  Embeddings
&lt;/h2&gt;

&lt;p&gt;TinySearch supports local ONNX embeddings or OpenAI-compatible embedding APIs.&lt;/p&gt;

&lt;p&gt;The repo includes local presets like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fast      -&amp;gt; all-MiniLM-L6-v2 ONNX
balanced  -&amp;gt; bge-small-en-v1.5
quality   -&amp;gt; bge-base-en-v1.5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can start simple, then tune later.&lt;/p&gt;

&lt;p&gt;Search depth, rerank weights, chunk limits, crawl concurrency, tokenizer settings, and embedding backend are configurable.&lt;/p&gt;

&lt;p&gt;But the default idea stays the same:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;return useful research context, not a landfill of raw page text.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What TinySearch is not
&lt;/h2&gt;

&lt;p&gt;TinySearch is not magic.&lt;/p&gt;

&lt;p&gt;It does not guarantee perfect search coverage.&lt;/p&gt;

&lt;p&gt;It does not build a long-term index.&lt;/p&gt;

&lt;p&gt;It does not replace proper production search infrastructure.&lt;/p&gt;

&lt;p&gt;And honestly, that is the point.&lt;/p&gt;

&lt;p&gt;The goal is not to be everything.&lt;/p&gt;

&lt;p&gt;The goal is to be small, inspectable, and useful enough that you can drop it into a local agent workflow and immediately get better web research without context-window abuse.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I care about this
&lt;/h2&gt;

&lt;p&gt;I think a lot of local agent work is blocked less by model intelligence and more by the surrounding harness.&lt;/p&gt;

&lt;p&gt;People focus a lot on the model, but then the model gets wrapped in tools that behave as if context is free and every model has infinite attention.&lt;/p&gt;

&lt;p&gt;That works badly for small models.&lt;/p&gt;

&lt;p&gt;But even large models benefit from cleaner inputs.&lt;/p&gt;

&lt;p&gt;LLMs need less junk.&lt;/p&gt;

&lt;p&gt;Context is not a trash can.&lt;/p&gt;

&lt;p&gt;That is the actual problem TinySearch is trying to solve.&lt;/p&gt;

&lt;p&gt;Not “web search for AI” in some huge abstract way.&lt;/p&gt;

&lt;p&gt;More like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can we give a local model just enough high-quality web context to answer properly, without burying it alive?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the game.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/MarcellM01/TinySearch" rel="noopener noreferrer"&gt;https://github.com/MarcellM01/TinySearch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MCP_TRANSPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;streamable-http &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;MCP_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0 &lt;span class="se"&gt;\&lt;/span&gt;
  marcellm01/tinysearch:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the Glama option if you do not want to host it yourself.&lt;/p&gt;

&lt;p&gt;Feedback is very welcome, especially from people building with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local LLMs&lt;/li&gt;
&lt;li&gt;MCP&lt;/li&gt;
&lt;li&gt;Cline / Roo-style coding agents&lt;/li&gt;
&lt;li&gt;RAG systems&lt;/li&gt;
&lt;li&gt;small model workflows&lt;/li&gt;
&lt;li&gt;personal research agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yeah, roast it too.&lt;/p&gt;

&lt;p&gt;That is usually where the useful feedback is anyway.&lt;/p&gt;




&lt;h2&gt;
  
  
  Suggested dev.to tags
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#ai #opensource #mcp #llm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>opensource</category>
      <category>mcp</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
