<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Srinath Reddy</title>
    <description>The latest articles on DEV Community by Srinath Reddy (@srinath_reddy_d2bf468f07a).</description>
    <link>https://dev.to/srinath_reddy_d2bf468f07a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2497564%2F5bf02259-d430-4078-b1dc-4450970762a4.png</url>
      <title>DEV Community: Srinath Reddy</title>
      <link>https://dev.to/srinath_reddy_d2bf468f07a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srinath_reddy_d2bf468f07a"/>
    <language>en</language>
    <item>
      <title>How I Built a Visual AI Orchestration Engine</title>
      <dc:creator>Srinath Reddy</dc:creator>
      <pubDate>Tue, 19 May 2026 16:30:10 +0000</pubDate>
      <link>https://dev.to/srinath_reddy_d2bf468f07a/how-i-built-a-visual-ai-orchestration-engine-51kd</link>
      <guid>https://dev.to/srinath_reddy_d2bf468f07a/how-i-built-a-visual-ai-orchestration-engine-51kd</guid>
      <description>&lt;p&gt;Every time I started a new AI project I wrote the same code.&lt;/p&gt;

&lt;p&gt;Chain the LLM call. Wire up the tools. Handle the tool loop. Stream the output. Add a REST endpoint. Write logs. Fix the one case where the model calls two tools at once and the whole thing breaks.&lt;/p&gt;

&lt;p&gt;By the fourth project I wasn't building products anymore — I was rebuilding infrastructure. So I stopped and built the infrastructure once, properly, as a visual tool I could just open and use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;If you've built anything non-trivial with LLMs you know the real work isn't prompting. It's orchestration.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need a loop that keeps running until the model stops calling tools&lt;/li&gt;
&lt;li&gt;You need to pass tool results back correctly&lt;/li&gt;
&lt;li&gt;You need to handle multiple nodes executing in the right order&lt;/li&gt;
&lt;li&gt;You need structured output so your frontend isn't parsing free-text&lt;/li&gt;
&lt;li&gt;You need streaming so users aren't staring at a blank screen for 8 seconds&lt;/li&gt;
&lt;li&gt;You need a public endpoint so the workflow is actually callable from your app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every one of these is solvable. But solving all of them, again, for every new project? That's the tax that kills momentum.&lt;/p&gt;

&lt;p&gt;The other frustration was debugging. When something broke in a multi-step pipeline I had no visual sense of where it broke. I was reading logs and mentally reconstructing the execution graph that I should just be &lt;em&gt;able to see&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Approach
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Pipecat&lt;/strong&gt; — a visual AI workflow builder where you design pipelines as a DAG (Directed Acyclic Graph) on a canvas and run them via a public API.&lt;/p&gt;

&lt;p&gt;The core idea: the execution graph should be visible. Not inferred from logs — actually drawn, with nodes that light up as they run.&lt;/p&gt;

&lt;p&gt;The secondary idea: once built, a workflow should be callable with a single &lt;code&gt;curl&lt;/code&gt; command. No re-deployment. No glue code. Just an API key and an endpoint.&lt;/p&gt;

&lt;p&gt;The original target was developers — people who build AI-powered features but don't want to maintain a custom orchestration layer for every one of them. That's still the core.&lt;/p&gt;

&lt;p&gt;But as I built it I kept running the same workflow myself: connect to an external API, search something, return a structured result. The most common version of that pattern turned out to be product search for e-commerce stores. So Pipecat grew a second use case: a drop-in AI shopping assistant for Shopify and any other storefront, built on the same DAG engine underneath. Same infrastructure, different front door — developers get a visual workflow builder and a public invoke API; merchants get a chat widget that knows their catalog and can push items straight to a Shopify cart.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Breakdown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Canvas
&lt;/h3&gt;

&lt;p&gt;The workflow is a DAG. Nodes execute in topological order. The three primitives are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input node&lt;/strong&gt; — receives the user's prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM node&lt;/strong&gt; — runs the model, handles the tool-use loop internally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output node&lt;/strong&gt; — returns the result (plain text or structured JSON)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You wire them on a canvas. Parallel branches are supported — if two nodes have no dependency on each other they execute concurrently.&lt;/p&gt;

&lt;p&gt;The LLM node runs an agentic loop: call the model → if it wants a tool, call it → feed the result back → repeat until the model produces a final response. &lt;code&gt;max_iterations&lt;/code&gt; caps runaway loops.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzybzt3qh27kriuwf0wli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzybzt3qh27kriuwf0wli.png" alt="Pipecat canvas — workflow DAG with nodes lighting up during execution" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools
&lt;/h3&gt;

&lt;p&gt;Tools are HTTP endpoints you register in the dashboard. You give each one a name, a description, a method, a URL, headers, and parameters.&lt;/p&gt;

&lt;p&gt;The description is what the model reads to decide whether to call the tool. Write it like a prompt, not a docstring — be specific about inputs and what the tool returns.&lt;/p&gt;

&lt;p&gt;Parameters work exactly like function calling schemas. The model extracts values from the user's message and maps them to your API's expected fields.&lt;/p&gt;

&lt;p&gt;Headers (including auth tokens) are encrypted at rest before storage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrhpcqe11acwisv1dpt1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrhpcqe11acwisv1dpt1.png" alt="Tool configuration panel — name, description, HTTP method, URL, parameters" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Invoke API
&lt;/h3&gt;

&lt;p&gt;Once you publish a workflow it becomes callable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.pipecat.in/invoke/&lt;span class="o"&gt;{&lt;/span&gt;workflow_id&lt;span class="o"&gt;}&lt;/span&gt;/invoke &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: your-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"input": "What is the weather in Tokyo?"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123e4567-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"It's currently 22°C and sunny in Tokyo."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Tokyo"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;temperature&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: 22, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;condition&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Sunny&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you turn on &lt;strong&gt;structured output&lt;/strong&gt; on the Output node, &lt;code&gt;output&lt;/code&gt; becomes a parsed JSON object matching the schema you defined — not a string. Useful when the downstream consumer is code, not a human.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3xlukfz33zszx4x9bws.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm3xlukfz33zszx4x9bws.png" alt="Structured output toggle and schema editor on the Output node" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzumwqnlcl4tkjxfj1db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzumwqnlcl4tkjxfj1db.png" alt="Output schema field editor — define field names, types, and descriptions" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  WebSocket Streaming
&lt;/h3&gt;

&lt;p&gt;For real-time UIs there's a WebSocket endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`wss://api.pipecat.in/ws/run/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onopen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nx"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// status: "running" | "tool_call" | "tool_result" | "success" | "error"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get events as each node starts, as tools fire and return, and when the full workflow completes. Enough to build a proper "thinking..." UI with per-step feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-provider LLM Support
&lt;/h3&gt;

&lt;p&gt;LLM nodes support OpenAI, Anthropic, Gemini, and OpenRouter. You switch models from the canvas without touching anything else in the workflow. Useful for cost experiments — swap GPT-4.1 for Gemini Flash on a high-volume node and see the difference in the stats panel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcobcd9g99721gors82e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcobcd9g99721gors82e.png" alt="LLM node configuration — provider, model, system prompt, max tokens, max iterations" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Shopify Integration and Hybrid Product Retrieval
&lt;/h3&gt;

&lt;p&gt;One of the workflows I built on top of Pipecat is a shop chat assistant — a widget you embed in a Shopify storefront that answers product questions and pushes items to cart. It's a good example of how the node model composes with external data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsblp7t9r5cia6wvcd9an.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsblp7t9r5cia6wvcd9an.png" alt="Connect store — paste URL, name, description, optional Shopify domain and token" width="799" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sync path.&lt;/strong&gt; When a store connects their Shopify Storefront API token, we pull the full product catalog via paginated GraphQL rather than scraping HTML. Then each product goes through two enrichment passes: a Gemini call that extracts structured attributes (style, occasion, gender, materials, type) and an embedding pass that turns the product's title + description + tags into a semantic vector stored in pgvector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The retrieval path is where it gets interesting.&lt;/strong&gt; The naive approach would be: embed the user's query → cosine similarity → return top-N. That works fine for "red dress" but falls apart for "something under $50 for a 3-year-old boy." Vector similarity has no way to enforce a hard price constraint.&lt;/p&gt;

&lt;p&gt;Instead, the chat endpoint does a multi-stage search that treats vector as a ranker, not a filter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Filter extraction&lt;/strong&gt; — an LLM call pulls structured filters out of the natural-language query: &lt;code&gt;max_price&lt;/code&gt;, &lt;code&gt;min_price&lt;/code&gt;, &lt;code&gt;style&lt;/code&gt;, &lt;code&gt;gender&lt;/code&gt;, &lt;code&gt;occasion&lt;/code&gt;, &lt;code&gt;material&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;keyword&lt;/code&gt;, age range.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL pre-filter&lt;/strong&gt; — those filters become hard &lt;code&gt;WHERE&lt;/code&gt; clauses against our product table. Price and gender are hard stops; style, occasion, and material use &lt;code&gt;ILIKE&lt;/code&gt; for fuzzy matching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector ranking within candidates&lt;/strong&gt; — the query is embedded in parallel with the candidate count check. We rank the filtered set by cosine distance and return the top 10. Crucially, we only consult vector space &lt;em&gt;after&lt;/em&gt; the structured filters have already narrowed the catalog.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The fallback cascade is what makes it reliable in practice.&lt;/strong&gt; Three things can go wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The keyword extracted from the query is too specific and returns fewer than 5 candidates — we retry without the keyword filter, keeping price/gender/style intact.&lt;/li&gt;
&lt;li&gt;All metadata filters combined still produce fewer than 3 candidates — we drop to a pure embedding search across the whole store.&lt;/li&gt;
&lt;li&gt;The embedding service is down — we fall back to recency-sorted SQL results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The upshot: a query like "gift for my toddler nephew, budget around 30 USD" routes through price + gender + keyword filters first, gets ranked by semantic similarity within that matching subset, and only falls back to pure vector if the intersection is genuinely empty. Compared to pure vector search, this eliminates entire categories of wrong results (expensive items surfaced for a budget query, adult products surfaced for a children's query) before the model ever touches the output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support queries are short-circuited entirely.&lt;/strong&gt; If the message matches known support keywords (returns, shipping, order status, size guide) and the store has scraped FAQ content, we skip product search and route straight to an LLM call that answers from the FAQ text. No embeddings, no SQL, just context injection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At checkout time&lt;/strong&gt;, the widget calls the Shopify Storefront API directly — fetching real-time variant availability and creating carts via GraphQL mutations — so inventory and pricing are always live, not cached.&lt;/p&gt;

&lt;p&gt;If you're a Shopify merchant rather than a workflow developer, the dedicated landing page covers setup end-to-end: &lt;strong&gt;&lt;a href="https://app.pipecat.in/ecomm" rel="noopener noreferrer"&gt;https://app.pipecat.in/ecomm&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Going Live: One Script Tag
&lt;/h3&gt;

&lt;p&gt;Once the store is connected and the product catalog is synced, embedding the assistant takes three steps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Grab your embed snippet from the Overview tab:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script
  &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://app.pipecat.in/embed.js"&lt;/span&gt;
  &lt;span class="na"&gt;data-store-id=&lt;/span&gt;&lt;span class="s"&gt;"YOUR_STORE_ID"&lt;/span&gt;
  &lt;span class="na"&gt;defer&lt;/span&gt;
&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Paste it before &lt;code&gt;&amp;lt;/body&amp;gt;&lt;/code&gt; in your theme.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For Shopify: Admin → Online Store → Themes → Edit code → &lt;code&gt;layout/theme.liquid&lt;/code&gt; → paste before &lt;code&gt;&amp;lt;/body&amp;gt;&lt;/code&gt; → Save.&lt;/p&gt;

&lt;p&gt;For any other site: same snippet, same place in your HTML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. That's it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The script is a self-contained IIFE. It reads the &lt;code&gt;data-store-id&lt;/code&gt; attribute, injects a fixed-position floating bubble, and lazy-loads the chat UI inside a sandboxed iframe — no framework dependencies, no extra requests on page load. On first click it fetches your widget config (brand color, logo URL, welcome message) and applies it to the bubble in real time, so the launcher always matches your store's branding without you touching CSS.&lt;/p&gt;

&lt;p&gt;The iframe communicates back to the parent via &lt;code&gt;postMessage&lt;/code&gt; — when the user closes the chat, the message &lt;code&gt;cw:close&lt;/code&gt; collapses the iframe and brings the bubble back. The whole open/close animation is CSS transitions, no JavaScript jank.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2wc6h5sn35850ocw2ys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2wc6h5sn35850ocw2ys.png" alt="Store settings — brand color, widget title, logo, starter prompts" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The tool-call loop edge cases were brutal.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The spec says: model outputs tool calls → you call the tools → you feed results back → model continues. Simple. But in practice: what if the model calls three tools simultaneously? What if a tool returns an error — do you surface it to the model or abort? What if &lt;code&gt;max_iterations&lt;/code&gt; is hit mid-loop?&lt;/p&gt;

&lt;p&gt;I had to make explicit decisions for all of these and test each one. Concurrent tool calls now execute in parallel. Tool errors are returned to the model as error messages so it can attempt recovery. Hitting &lt;code&gt;max_iterations&lt;/code&gt; terminates with a partial result rather than silently dropping output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging parallel branches.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When two branches execute concurrently and one fails, the error needs to be attributed to the right node without corrupting the other branch's state. Early versions had race conditions that produced interleaved log entries and wrong duration timestamps. Fixing this meant being much more deliberate about execution context isolation per-node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Making the canvas feel fast.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The visual editor needs to feel instant even on large workflows. React Flow handles most of the heavy lifting but there were edge cases with node state updates during live execution (nodes lighting up as they run) that caused full re-renders. Fixed by memoizing node state and only updating the affected node's data rather than broadcasting to the full graph.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;The default workflow when you sign up looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Input] → [LLM] → [Output]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can turn that into a real research agent in about 5 minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a Tool node pointing to a search API&lt;/li&gt;
&lt;li&gt;Attach it to the LLM node&lt;/li&gt;
&lt;li&gt;Set the system prompt: &lt;em&gt;"You are a research assistant. Use the search tool to find current information before answering."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Enable structured output — fields: &lt;code&gt;summary&lt;/code&gt; (string), &lt;code&gt;sources&lt;/code&gt; (array), &lt;code&gt;confidence&lt;/code&gt; (number)&lt;/li&gt;
&lt;li&gt;Publish&lt;/li&gt;
&lt;li&gt;Call the invoke endpoint from anywhere&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The canvas shows you each node's status in real time during a run — grey (waiting), blue (running), green (done), red (error). When a tool fires you see it light up separately from the LLM node. It makes the execution graph immediately legible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzybzt3qh27kriuwf0wli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzybzt3qh27kriuwf0wli.png" alt="Live canvas execution — nodes lighting up as the workflow runs" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Write the description for every tool as if the model is a new hire.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The LLM decides whether to call your tool based entirely on the description. Vague descriptions ("fetches data") produce unpredictable tool usage. Specific descriptions ("returns current weather conditions for a given city name — use this when the user asks about weather, temperature, or climate in a specific location") produce reliable behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured output is underused.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most people default to plain text output and then write parsing logic downstream. Defining the output schema upfront and letting the model fill it in is almost always the right move when your consumer is code. It's more reliable than regex on a free-text response and the model respects it well across all the major providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't use vector search as a filter — use it as a ranker.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest retrieval mistake I see in demos is embedding the query and ranking the entire catalog by cosine similarity. That doesn't enforce hard constraints. The right pattern: extract structured filters from the query with an LLM call, apply them as SQL &lt;code&gt;WHERE&lt;/code&gt; clauses to narrow the candidate set, then rank &lt;em&gt;that subset&lt;/em&gt; by vector similarity. You get semantic relevance within a constraint-respecting pool, with cascading fallbacks when the intersection is too narrow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual debugging changes how you think about architecture.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once I could see the graph execute I started designing workflows differently. You notice immediately when a workflow is sequential when it could be parallel. You see duration bottlenecks at the node level. The canvas isn't just a UI — it's also a profiler.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you're building anything that involves chaining LLM calls, calling external APIs from a model, or exposing an AI feature via a REST endpoint — Pipecat cuts the setup down significantly.&lt;/p&gt;

&lt;p&gt;Free tier gets you started without a credit card: &lt;strong&gt;&lt;a href="https://app.pipecat.in" rel="noopener noreferrer"&gt;https://app.pipecat.in&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
For stores : &lt;strong&gt;&lt;a href="https://app.pipecat.in/ecomm" rel="noopener noreferrer"&gt;https://app.pipecat.in/ecomm&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
For tutorials : &lt;strong&gt;&lt;a href="https://app.pipecat.in/blogs" rel="noopener noreferrer"&gt;https://app.pipecat.in/blogs&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions and hear your feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Actively Iterating on the product.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;ai&lt;/code&gt; &lt;code&gt;python&lt;/code&gt; &lt;code&gt;webdev&lt;/code&gt; &lt;code&gt;productivity&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
