<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amito Vrito</title>
    <description>The latest articles on DEV Community by Amito Vrito (@amito_843a9904d48).</description>
    <link>https://dev.to/amito_843a9904d48</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878061%2F2791b56d-e3ea-4b5b-b6fc-402c5b8ee65f.jpeg</url>
      <title>DEV Community: Amito Vrito</title>
      <link>https://dev.to/amito_843a9904d48</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amito_843a9904d48"/>
    <language>en</language>
    <item>
      <title>SynapseKit - A Production-Grade LLM Framework Built for Speed, Simplicity, and Scale</title>
      <dc:creator>Amito Vrito</dc:creator>
      <pubDate>Thu, 23 Apr 2026 07:39:11 +0000</pubDate>
      <link>https://dev.to/amito_843a9904d48/synapsekit-a-production-grade-llm-framework-built-for-speed-simplicity-and-scale-1ag0</link>
      <guid>https://dev.to/amito_843a9904d48/synapsekit-a-production-grade-llm-framework-built-for-speed-simplicity-and-scale-1ag0</guid>
      <description>&lt;p&gt;*&lt;em&gt;&lt;a href="https://github.com/SynapseKit/SynapseKit" rel="noopener noreferrer"&gt;https://github.com/SynapseKit/SynapseKit&lt;/a&gt;&lt;br&gt;
&lt;a href="https://synapsekit.github.io/synapsekit-docs/" rel="noopener noreferrer"&gt;https://synapsekit.github.io/synapsekit-docs/&lt;/a&gt;&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;SynapseKit is an async-first Python framework for building LLM applications - chains, agents, RAG pipelines, tool calling, and multi-agent orchestration. Two base dependencies. 48 built-in tools. 31 LLM providers. Designed for engineers who need production-grade tooling without production-grade complexity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The right abstraction disappears. You stop thinking about the framework and start thinking about the problem."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;What SynapseKit Is&lt;/strong&gt;&lt;br&gt;
SynapseKit is an open-source Python framework for building applications powered by large language models. It covers the full surface area - from a single LLM call to multi-agent orchestration with cost guardrails - with a design philosophy that prioritizes speed, debuggability, and minimal abstraction.&lt;/p&gt;

&lt;p&gt;The core principle: every layer of abstraction must earn its place by making the engineer faster, not by making the framework more flexible.&lt;/p&gt;

&lt;p&gt;What ships in the box:&lt;/p&gt;

&lt;p&gt;31 LLM providers - OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, and 25 more. Switch providers by changing one string.&lt;br&gt;
48 built-in tools - 12 work with zero configuration. No pip install, no API key, no setup.&lt;br&gt;
43 document loaders - PDF, HTML, CSV, JSON, Markdown, DOCX, and more. Standardized interface across all formats.&lt;br&gt;
Multi-agent primitives - Sequential, parallel, supervisor, hierarchical, pipeline, and feedback loop patterns. All six supported out of the box.&lt;br&gt;
MCP server support - Model Context Protocol integration for tool-rich agent deployments.&lt;br&gt;
Cost guardrails - Built into the execution engine. Set a budget, the agent stops cleanly instead of burning your API credits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design Philosophy&lt;/strong&gt;&lt;br&gt;
Two Dependencies&lt;br&gt;
SynapseKit's base install pulls two packages. Not 67. Not 43. Two.&lt;/p&gt;

&lt;p&gt;SynapseKit:  2 &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9av3ik09pu0mdk2ue4h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo9av3ik09pu0mdk2ue4h.png" alt=" " width="800" height="793"&gt;&lt;/a&gt;  · 48 MB RA&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecgscfptszubl0bopsfq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecgscfptszubl0bopsfq.png" alt=" " width="800" height="484"&gt;&lt;/a&gt;M  · 80ms cold start&lt;br&gt;
LangChain:  67 dependencies  · 189 MB RAM · 2,400ms cold start&lt;br&gt;
LlamaIndex: 43 dependencies  · 112 MB RAM · 1,100ms cold start&lt;/p&gt;

&lt;p&gt;Fewer dependencies means fewer version conflicts, faster installs, smaller container images, and cold starts that don't punish your users. In serverless deployments where every scale-from-zero event pays the cold start tax, 80ms vs 2.4 seconds is the difference between responsive and broken.&lt;/p&gt;

&lt;p&gt;Async From the Ground Up&lt;br&gt;
Every base class - BaseTool, BaseRetriever, BaseLLM - is async def by default. Not sync with an async wrapper bolted on. Not run_in_executor hiding a blocking call.&lt;/p&gt;

&lt;p&gt;This matters because async correctness propagates. When the base class is async, every implementation is async. Contributors don't accidentally write sync tools. The framework never silently dispatches to a thread pool. At 50 concurrent requests, SynapseKit achieves 96.8% of theoretical throughput - near-baseline async efficiency.&lt;/p&gt;

&lt;p&gt;Shallow Call Stacks&lt;br&gt;
When something fails at 3am in production, the traceback is 8 lines, not 47. The agent loop is 47 lines of readable Python. No RunnableSequence.&lt;strong&gt;call&lt;/strong&gt; chains, no middleware dispatch, no callback manager traversal. You read the error, you find the bug, you fix it.&lt;/p&gt;

&lt;p&gt;One Tool Interface&lt;br&gt;
Define a tool once with a JSON schema. Export to OpenAI format with .schema(). Export to Anthropic format with .anthropic_schema(). Same source of truth, zero duplication. One definition that works across all 31 providers.&lt;/p&gt;

&lt;p&gt;What You Can Build&lt;br&gt;
RAG Pipelines&lt;br&gt;
from synapsekit import LLM, RAGPipeline, PDFLoader&lt;/p&gt;

&lt;p&gt;docs = PDFLoader("reports/").load()&lt;br&gt;
rag = RAGPipeline(docs=docs, llm=LLM("openai/gpt-4o"))&lt;br&gt;
rag.build()&lt;/p&gt;

&lt;p&gt;answer = await rag.query("What were Q3 revenue figures?")&lt;/p&gt;

&lt;p&gt;Seven lines. Load, build, query. Chunking, embedding, indexing, retrieval, and generation - all handled. Switch to Anthropic by changing "openai/gpt-4o" to "anthropic/claude-sonnet-4-20250514". Nothing else changes.&lt;/p&gt;

&lt;p&gt;Agents with Tools&lt;br&gt;
Built-in tools for calculation, datetime, web search, file operations, and more. Define custom tools with a class and a JSON schema. The agent loop handles reasoning, tool selection, execution, and observation routing.&lt;/p&gt;

&lt;p&gt;Multi-Agent Orchestration&lt;br&gt;
The Crew and Task primitives support six orchestration patterns. Declare dependencies between tasks, not between agents. The framework handles execution order, context passing, and result aggregation.&lt;/p&gt;

&lt;p&gt;from synapsekit import Crew, Task, Agent&lt;/p&gt;

&lt;p&gt;researcher = Agent(name="researcher", tools=[search_tool])&lt;br&gt;
writer = Agent(name="writer", tools=[])&lt;/p&gt;

&lt;p&gt;research_task = Task(agent=researcher, description="Find latest data on X")&lt;br&gt;
write_task = Task(agent=writer, description="Write report", context_from=[research_task])&lt;/p&gt;

&lt;p&gt;crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])&lt;br&gt;
result = await crew.run()&lt;/p&gt;

&lt;p&gt;Streaming&lt;br&gt;
async for token in llm.stream("Explain quantum computing"):&lt;br&gt;
    print(token, end="")&lt;/p&gt;

&lt;p&gt;First-class streaming with the cleanest API across any framework. No callback handlers, no special configuration.&lt;/p&gt;

&lt;p&gt;Where SynapseKit Fits&lt;br&gt;
SynapseKit is built for a specific engineer: the one building LLM-powered products that need to work reliably in production, not just in a notebook demo.&lt;/p&gt;

&lt;p&gt;Use SynapseKit when:&lt;/p&gt;

&lt;p&gt;You need fast cold starts (serverless, edge, CLI tools)&lt;br&gt;
You want minimal dependency footprint in containerized deployments&lt;br&gt;
You're building agent-heavy applications with multiple tools&lt;br&gt;
You need to switch between LLM providers without rewriting code&lt;br&gt;
You want cost controls built into the execution layer&lt;br&gt;
Consider alternatives when:&lt;/p&gt;

&lt;p&gt;You need LlamaIndex's advanced chunking strategies (SemanticSplitterNodeParser, KnowledgeGraphIndex)&lt;br&gt;
You need LangChain's ecosystem breadth and community integrations&lt;br&gt;
You need LangChain's ToolException error recovery pattern for complex agent loops&lt;br&gt;
We publish these tradeoffs openly. The 30-notebook LLM Framework Showdown on Kaggle benchmarks SynapseKit against LangChain and LlamaIndex across 18 production dimensions - including the dimensions where SynapseKit loses. Honest benchmarking means publishing the uncomfortable numbers too.&lt;/p&gt;

&lt;p&gt;The Vision&lt;br&gt;
LLM frameworks today are where web frameworks were in 2010. Too many abstractions solving for flexibility instead of velocity. Too much ceremony for simple operations. Too many dependencies for production deployments.&lt;/p&gt;

&lt;p&gt;SynapseKit is a bet on a different direction: that the best framework is the one that disappears. You think about your application logic, not about the framework's internal architecture. You debug your code, not the framework's middleware. You deploy with confidence because you understand every line between your function call and the LLM API.&lt;/p&gt;

&lt;p&gt;The roadmap:&lt;/p&gt;

&lt;p&gt;Evaluation harness - standardized benchmarks you can run against your own agents&lt;br&gt;
Visual debugger - trace agent execution, tool calls, and token usage in real time&lt;br&gt;
Plugin marketplace - community tools and integrations with a single install command&lt;br&gt;
Enterprise features - audit logging, role-based access, deployment presets for AWS/GCP/Azure&lt;br&gt;
SynapseKit is MIT-licensed, fully open source, and built in the open. Every design decision is documented. Every benchmark is reproducible. Every line of code is readable.&lt;/p&gt;

&lt;p&gt;Get Started&lt;br&gt;
pip install synapsekit&lt;/p&gt;

&lt;p&gt;GitHub: github.com/SynapseKit/SynapseKit&lt;br&gt;
Benchmarks: LLM Framework Showdown on Kaggle&lt;br&gt;
Documentation: Ships with the package&lt;br&gt;
Two dependencies. One pip install. Start building.&lt;/p&gt;

&lt;p&gt;Engineers of AI&lt;/p&gt;

&lt;p&gt;Read more: &lt;a href="http://www.engineersofai.com" rel="noopener noreferrer"&gt;www.engineersofai.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If this was useful, forward it to one engineer who should be reading it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>llm</category>
    </item>
    <item>
      <title>Why I Modelled My LLM Pipeline as a DAG Instead of a Chain — and What I Found Out</title>
      <dc:creator>Amito Vrito</dc:creator>
      <pubDate>Thu, 16 Apr 2026 09:27:58 +0000</pubDate>
      <link>https://dev.to/amito_843a9904d48/why-i-modelled-my-llm-pipeline-as-a-dag-instead-of-a-chain-and-what-i-found-out-22cj</link>
      <guid>https://dev.to/amito_843a9904d48/why-i-modelled-my-llm-pipeline-as-a-dag-instead-of-a-chain-and-what-i-found-out-22cj</guid>
      <description>&lt;h1&gt;
  
  
  The problem with chains in production
&lt;/h1&gt;

&lt;p&gt;Every major Python LLM framework gives you the same primitive: a chain.&lt;/p&gt;

&lt;p&gt;LangChain's LCEL. LlamaIndex's pipeline. Haystack's components. They all model your pipeline as a linear sequence of steps — input flows through A, then B, then C, output comes out the end.&lt;/p&gt;

&lt;p&gt;For a hello-world RAG demo, that's fine. For a production system, you hit the wall fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  What chains can't express cleanly
&lt;/h2&gt;

&lt;p&gt;Here's a real pipeline I needed to build:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Classify the incoming query&lt;/li&gt;
&lt;li&gt;Based on classification: route to either semantic search, keyword search, or both in parallel&lt;/li&gt;
&lt;li&gt;If both: merge and re-rank results&lt;/li&gt;
&lt;li&gt;Generate response from ranked context&lt;/li&gt;
&lt;li&gt;If any retrieval stage fails: surface a clear error, don't silently continue&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Try expressing that as a chain. You end up with nested chains, manual asyncio.gather() calls outside the framework, try/except blocks swallowing exceptions to keep the chain going, and no clean way to express "step D depends on both B and C."&lt;/p&gt;

&lt;p&gt;The abstraction is fighting you.&lt;/p&gt;

&lt;h2&gt;
  
  
  DAGs are the right model
&lt;/h2&gt;

&lt;p&gt;A directed acyclic graph expresses all of this naturally.&lt;/p&gt;

&lt;p&gt;Nodes are tasks. Edges are data dependencies. Execution is topologically ordered — a node fires when all its upstream dependencies have resolved.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;synapsekit&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ClassifierNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RAGNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;BM25Node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bm25_index&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RerankerNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;LLMNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;explain async-native design&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;semantic_search and keyword_search run concurrently. rerank waits for both. generate waits for rerank. The execution engine handles ordering. You describe the dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure propagation that actually works
&lt;/h2&gt;

&lt;p&gt;In a chain, a failed step either kills the whole pipeline or gets swallowed silently.&lt;/p&gt;

&lt;p&gt;In a DAG, failure has semantics. If semantic_search fails, rerank — which depends on it — is cancelled. generate — which depends on rerank — is also cancelled. You get a clear error naming the failed node and its dependents.&lt;/p&gt;

&lt;p&gt;No silent degradation. Failure is explicit and traceable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The async piece
&lt;/h2&gt;

&lt;p&gt;Nodes with no dependency relationship between them run concurrently automatically. The execution engine handles asyncio.gather() at each topological level. You write individual async node functions. The graph handles orchestration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SemanticSearchNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No manual gather calls. The graph structure encodes the parallelism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is it worth the complexity?
&lt;/h2&gt;

&lt;p&gt;For simple A → B → C pipelines with no branching: a chain is fine.&lt;/p&gt;

&lt;p&gt;The moment you have parallel retrieval, conditional routing, or stages where failure isolation matters — the chain abstraction costs more than it saves.&lt;/p&gt;

&lt;p&gt;SynapseKit is the framework I built around this model:&lt;br&gt;
&lt;a href="https://github.com/SynapseKit/SynapseKit" rel="noopener noreferrer"&gt;https://github.com/SynapseKit/SynapseKit&lt;/a&gt;&lt;br&gt;
API Doc: &lt;a href="https://synapsekit.github.io/synapsekit-docs/" rel="noopener noreferrer"&gt;https://synapsekit.github.io/synapsekit-docs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;10k PyPI downloads since launch. The engineers who need this know exactly why.&lt;/p&gt;




&lt;p&gt;What does your production RAG architecture look like? Drop it in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>llm</category>
      <category>architecture</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing SynapseKit: The Async-Native Python LLM Framework I Built Because LangChain's Async is Broken</title>
      <dc:creator>Amito Vrito</dc:creator>
      <pubDate>Wed, 15 Apr 2026 11:44:29 +0000</pubDate>
      <link>https://dev.to/amito_843a9904d48/introducing-synapsekit-the-async-native-python-llm-framework-i-built-because-langchains-async-is-46h8</link>
      <guid>https://dev.to/amito_843a9904d48/introducing-synapsekit-the-async-native-python-llm-framework-i-built-because-langchains-async-is-46h8</guid>
      <description>&lt;p&gt;&lt;strong&gt;Article:&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Why I built another LLM framework
&lt;/h1&gt;

&lt;p&gt;I know. Another one.&lt;/p&gt;

&lt;p&gt;But hear me out — because the reason I built SynapseKit is specific, and it might be the same reason you've been frustrated too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The async lie in Python LLM frameworks
&lt;/h2&gt;

&lt;p&gt;LangChain has async support. LlamaIndex has async support. Haystack has async support.&lt;/p&gt;

&lt;p&gt;Technically true. Practically — look at the source.&lt;/p&gt;

&lt;p&gt;You'll find &lt;code&gt;asyncio.get_event_loop().run_in_executor()&lt;/code&gt; wrapping sync functions. You'll find internal blocking IO disguised behind &lt;code&gt;async def&lt;/code&gt;. You'll find &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; doing the actual work.&lt;/p&gt;

&lt;p&gt;That's not async-native. That's sync code wearing an async costume.&lt;/p&gt;

&lt;p&gt;For simple scripts and demos, it doesn't matter. For production services handling concurrent requests — FastAPI services, real-time RAG systems, high-throughput agent workflows — it matters enormously. You're paying the cost of threads AND the overhead of the async event loop, with none of the actual throughput benefits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SynapseKit does differently
&lt;/h2&gt;

&lt;p&gt;I built the async layer first. Every IO operation — LLM calls, retrieval, embedding generation — is genuinely non-blocking from the ground up. There's no sync wrapper underneath.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;synapsekit&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RAGNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LLMNode&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RAGNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;LLMNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is async-native design?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: no &lt;code&gt;.run_in_executor()&lt;/code&gt;. No thread pool. Just async.&lt;/p&gt;

&lt;h2&gt;
  
  
  DAGs, not chains
&lt;/h2&gt;

&lt;p&gt;The second architectural decision: pipelines are directed acyclic graphs, not linear chains.&lt;/p&gt;

&lt;p&gt;Every major framework pushes you toward &lt;code&gt;.pipe()&lt;/code&gt; or &lt;code&gt;|&lt;/code&gt; operator chains. That works for the happy path. Production systems aren't the happy path.&lt;/p&gt;

&lt;p&gt;In a real RAG system you might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve from multiple sources in parallel&lt;/li&gt;
&lt;li&gt;Route to different generation strategies based on query classification&lt;/li&gt;
&lt;li&gt;Have fallback paths when a retrieval stage fails&lt;/li&gt;
&lt;li&gt;Run a re-ranking step that depends on two upstream retrievers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A chain can't express that cleanly. A DAG can.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ClassifierNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RAGNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc_store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;WebSearchNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RerankerNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;LLMNode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both retrieval nodes run concurrently. The reranker waits for both. The LLM waits for the reranker. Topological ordering handles execution automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;~10,000 PyPI downloads in ~20 days of active development. No Product Hunt. No HN. No launch post.&lt;/p&gt;

&lt;p&gt;Developers found it through PyPI search. That told me the demand is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;synapsekit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/SynapseKit/SynapseKit" rel="noopener noreferrer"&gt;https://github.com/SynapseKit/SynapseKit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've been frustrated with async in your LLM stack — or you're building something where throughput actually matters — I'd genuinely love your feedback.&lt;/p&gt;

&lt;p&gt;And if this resonates, a GitHub star helps surface it to other developers who are hitting the same walls.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;SynapseKit is open source under Apache license. Built by, Senior AI Specialists and founder of EngineersOfAI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
