DEV Community

Ben Racicot
Ben Racicot

Posted on • Originally published at modelpiper.com

Ollama Pipelines on Mac: Chain Models Without Writing Glue Code

Ollama runs one model at a time. Send it a prompt, get a response. For single-turn chat, that's enough.

But useful work chains capabilities. Record a meeting, transcribe the audio, summarize the transcript, extract action items. Each step needs a different model or tool. Ollama handles one of those steps. The orchestration is your problem.

The usual workaround is a Python script calling Ollama's API in a loop, piping output from one model into the next. Then you want to swap the summarization model, add a translation step, or figure out why step three produced garbage. Now you're maintaining a custom orchestration layer for something that should be a drag-and-drop operation.

What a pipeline looks like

A pipeline is a visual workflow where each block represents a model or operation. Data flows between blocks through connections on a canvas. You build workflows, not prompts.

Block types include text generation, speech-to-text, text-to-speech, OCR, embedding, and image upscale. Text generation blocks can use any model from any connected provider - Ollama, ToolPiper's built-in llama.cpp, or a cloud API. The pipeline builder handles data flow automatically.

Configurations are stored as JSON. Duplicate a pipeline, swap one block, have a variation running in seconds.

Three pipelines you can build with Ollama models

Voice conversation: STT → LLM → TTS

The simplest multi-model pipeline. Speech-to-text (Parakeet v3) transcribes your voice. An Ollama chat model reasons about the transcript. Text-to-speech reads the response aloud. Three blocks, three capabilities, one workflow.

Document Q&A: OCR → Embed → Index → Chat

Drop a scanned PDF in. OCR (Apple Vision) extracts text. An embedding block indexes it in a local vector collection. A chat block with RAG context answers questions, citing specific passages. Documents stay on your Mac and become searchable through natural language.

Multilingual content: Chat → Translate → TTS

Ask your Ollama model a question in English. A second chat block translates the response. TTS reads it aloud in the target language. Changing the language is a one-field edit in the translation block's system prompt.

How Ollama fits in

Connect Ollama as a provider in ToolPiper. Every downloaded model appears as an option in the pipeline builder's text generation blocks. No re-downloading, no format conversion.

The practical advantage: you've already invested time pulling the right models. A 7B coding model, a 3B fast-chat model, a 13B for complex reasoning. In a pipeline, use each where it's strongest - fast model for classification, large one for generation - without managing separate API calls.

The Ollama connection works through localhost:11434. You'll need CORS configured for the browser-based builder to reach Ollama. Or use ToolPiper's built-in engine, which needs no CORS setup.

Build one from scratch

A three-block pipeline: transcribe an audio clip, then summarize the transcript. Meeting notes in two clicks.

  1. Open the pipeline builder. Empty canvas with a block palette.
  2. Drag an STT block onto the canvas. Defaults to Parakeet v3 (Neural Engine).
  3. Drag a text generation block next to it. Select an Ollama model (Llama 3.2 3B works well for summarization). Set the system prompt: "Summarize this transcript in 3-5 concise bullet points."
  4. Draw a connection from STT output to the chat block input.
  5. Drop an audio file into the STT block. Click run.

Parakeet transcribes. The transcript flows to your Ollama model. Summary appears in the output panel. Two models, one click, no scripting.

Want to extend it? Add a TTS block after the summary to hear bullet points read aloud. Add a translation block between summarization and TTS for a multilingual workflow. Each extension is another block and another connection.

Limitations

Latency compounds. Each block adds processing time. A three-block voice pipeline adds ~1-2s total on M2 Max. Five blocks with OCR, embedding, retrieval, chat, and TTS takes longer. For real-time interaction, keep chains short.

Memory adds up. Each model block needs its own RAM. Voice chat (STT + 3B + TTS) needs ~3GB. Document Q&A (OCR + embeddings + 7B) might need 6-7GB. ToolPiper's resource monitor shows whether a pipeline's models fit before you run it.

Single-model chat doesn't need a pipeline. If you're asking a question and reading an answer, the pipeline builder is overhead. Pipelines earn their complexity when the workflow involves more than one capability.

Full walkthrough with more pipeline examples

Top comments (0)