Article:
Why I built another LLM framework
I know. Another one.
But hear me out — because the reason I built SynapseKit is specific, and it might be the same reason you've been frustrated too.
The async lie in Python LLM frameworks
LangChain has async support. LlamaIndex has async support. Haystack has async support.
Technically true. Practically — look at the source.
You'll find asyncio.get_event_loop().run_in_executor() wrapping sync functions. You'll find internal blocking IO disguised behind async def. You'll find ThreadPoolExecutor doing the actual work.
That's not async-native. That's sync code wearing an async costume.
For simple scripts and demos, it doesn't matter. For production services handling concurrent requests — FastAPI services, real-time RAG systems, high-throughput agent workflows — it matters enormously. You're paying the cost of threads AND the overhead of the async event loop, with none of the actual throughput benefits.
What SynapseKit does differently
I built the async layer first. Every IO operation — LLM calls, retrieval, embedding generation — is genuinely non-blocking from the ground up. There's no sync wrapper underneath.
import asyncio
from synapsekit import Pipeline, RAGNode, LLMNode
async def main():
pipeline = Pipeline()
pipeline.add_node("retrieve", RAGNode(vectorstore=my_store))
pipeline.add_node("generate", LLMNode(model="gpt-4o"))
pipeline.add_edge("retrieve", "generate")
result = await pipeline.run(query="What is async-native design?")
print(result)
asyncio.run(main())
Notice: no .run_in_executor(). No thread pool. Just async.
DAGs, not chains
The second architectural decision: pipelines are directed acyclic graphs, not linear chains.
Every major framework pushes you toward .pipe() or | operator chains. That works for the happy path. Production systems aren't the happy path.
In a real RAG system you might:
- Retrieve from multiple sources in parallel
- Route to different generation strategies based on query classification
- Have fallback paths when a retrieval stage fails
- Run a re-ranking step that depends on two upstream retrievers
A chain can't express that cleanly. A DAG can.
pipeline.add_node("classify", ClassifierNode())
pipeline.add_node("retrieve_docs", RAGNode(vectorstore=doc_store))
pipeline.add_node("retrieve_web", WebSearchNode())
pipeline.add_node("rerank", RerankerNode())
pipeline.add_node("generate", LLMNode())
pipeline.add_edge("classify", "retrieve_docs")
pipeline.add_edge("classify", "retrieve_web")
pipeline.add_edge("retrieve_docs", "rerank")
pipeline.add_edge("retrieve_web", "rerank")
pipeline.add_edge("rerank", "generate")
Both retrieval nodes run concurrently. The reranker waits for both. The LLM waits for the reranker. Topological ordering handles execution automatically.
The numbers
~10,000 PyPI downloads in ~20 days of active development. No Product Hunt. No HN. No launch post.
Developers found it through PyPI search. That told me the demand is real.
Try it
pip install synapsekit
GitHub: https://github.com/AmitoVrito/synapsekit
If you've been frustrated with async in your LLM stack — or you're building something where throughput actually matters — I'd genuinely love your feedback.
And if this resonates, a GitHub star helps surface it to other developers who are hitting the same walls.
SynapseKit is open source under Apache license. Built by, Senior AI Specialists and founder of EngineersOfAI.
Top comments (0)