DEV Community: Jonathan Gastón Löwenstern

Ollama’s New Thinking Mode in less than 5 Minutes

Jonathan Gastón Löwenstern — Sat, 07 Jun 2025 11:03:13 +0000

Get the most of Ollama’s reasoning models with the new thinking mode.

Ollama v0.9.0 was just released with support for t*hinking mode, and now the **Ollama Python SDK* reached parity with v0.5.0. This means you can start using this powerful reasoning feature right away to build smarter local AI agents.

Why this is exciting

Benefits of Thinking Mode

Improved performance on complex tasks: thinking before responding leads to more accurate, step‑by‑step answers for reasoning and planning.
Better understanding of user instructions: the model can unpack nuanced prompts and pinpoint key requirements.
More creative and informative responses: by exploring multiple possibilities internally, it surfaces fresh ideas and richer explanations.

What you will learn

Ollama's new thinking mode allows models to reason through complex tasks before providing a final answer. This is a game-changer for building local AI agents that can think through problems, plan solutions, and provide more accurate responses.

On this tutorial I will guide you through setting up a simple interactive chat application that demonstrates this new feature using the Ollama Python SDK. You’ll see how to pull a thinking‑capable model, install the SDK, and run a chat that reveals the model's thought process in real-time.

What you will do

Upgrade to the latest Ollama release
Pull thinking‑ready models (qwen3:0.6b)
Install the brand‑new Ollama Python SDK
Run a fully interactive “thought bubble” chat in your terminal using the rich library

Prerequisites

Python 3.10 or higher
uv for Python package management
Ollama version ≥ v0.9.0
A thinking‑capable model like qwen3:0.6b pulled from Ollama

[!NOTE]
Heads‑up: Only models trained to expose their reasoning support thinking today. Check the thinking models list that Ollama is maintaining.

Let’s get started!

Step 1. Let's Upgrade Ollama to v0.9.0

We will need the latest Ollama release to use the thinking mode. If you already have Ollama installed, ensure it is at least version 0.9.0. You can check your version with:

ollama --version   # should print 0.9.0 or higher

If you need to upgrade, you can do so with the following command:

# If you have the desktop app installed, it will prompt you to update.

# macOS or Linux (Homebrew)
brew upgrade ollama

# Windows
winget upgrade Ollama

If you don’t have Ollama installed yet check the official website https://ollama.com/.

Step 2. Pull a thinking‑capable model

We will use the qwen3:0.6b model, since it is super small and fast, yet supports thinking mode. You can pull it with the following command:

ollama pull qwen3:0.6b

Let's run one quickly to see the CLI in action:

ollama run qwen3:0.6b "Is 5 a Fibonacci number?" --think

You'll see two distinct sections: first, the dim Thinking... output showing the model's internal reasoning, followed by the clean final answer. Because it is only 0.6B parameters, this tiny model blazes through tokens faster than you can read them 🤣

Step 3. Install the Python SDK with thinking support

Now let's generate a virtual environment using uv and install the latest Ollama Python SDK.

[!TIP]
If you don't have uv installed, you can do check the uv documentation.

# Create a new directory for the demo
mkdir ollama-thinking-demo

# Change into the new directory
cd ollama-thinking-demo

# Create a new virtual environment
uv venv
# Activate the virtual environment
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

Now let's install the latest Ollama Python SDK, which includes support for thinking mode and the rich library for pretty terminal output:

uv add ollama rich

[!TIP]
Version 0.5.0 introduces the think parameter in both generate and chat helpers. We are installing rich as well

Step 4. Copy&paste the ThinkingChat demo

Create a new file called ollama_thinking_chat.py and copy the following code into it:

import asyncio
import ollama
from rich.console import Console
from rich.live import Live
from rich.markdown import Markdown


class ThinkingChat:
    def __init__(self, model: str = "qwen3:0.6b"):
        self.console = Console()
        self.model = model
        self.ollama = ollama.AsyncClient()
        self.messages = [{"role": "system", "content": "You are a helpful assistant that thinks through answers."}]

    async def ask(self, question: str):
        """Ask a question and see the model think through the answer"""
        self.messages.append({"role": "user", "content": question})
        response = await self.ollama.chat(
            model=self.model,
            messages=self.messages,
            stream=True,
            think=True  # <-- Enable thinking mode
        )

        thinking = ""
        answer = ""

        with Live(console=self.console, refresh_per_second=8) as live:
            async for chunk in response:
                msg = chunk['message']

                # Show thinking process
                if msg.get('thinking'):
                    if not thinking:
                        thinking = "🤔 **Thinking:**\n\n"
                    thinking += msg['thinking']
                    live.update(Markdown(thinking, style="dim"))

                # Show final answer
                if msg.get('content'):
                    answer += msg['content']
                    live.update(Markdown(answer))
        if answer:
            self.messages.append({"role": "assistant", "content": answer})

    async def chat(self):
        """Simple chat loop"""
        self.console.print(f"[bold green]💭 Thinking Chat[/bold green] [dim]({self.model})[/dim]")
        self.console.print("[yellow]Ask me anything! Type 'quit' to exit.[/yellow]\n")

        while True:
            try:
                question = input("Question: ").strip()
                if question.lower() in ['quit', 'exit']:
                    print("Goodbye! 👋")
                    break
                if question:
                    await self.ask(question)
                    print()  # Add space after response
            except (KeyboardInterrupt, EOFError):
                print("\nGoodbye! 👋")
                break


# Run the chat
if __name__ == "__main__":
    chat = ThinkingChat()
    asyncio.run(chat.chat())

Save the file as ollama_thinking_chat.py and run it with:

uv run ollama_thinking_chat.py

Now you have a fully interactive chat that shows the model's thought process in real-time! Let's try it out. Type a question like:

Question: Is 5 a Fibonacci number?

You should see the model's thinking process displayed in a dimmed format, followed by the final answer. Check out the demo in action:

What you can build next?

Educational tutors that teach by example, revealing step‑by‑step logic.
Debugging dashboards that compare the chain‑of‑thought across models.
Creative assistants that brainstorm ideas and show their reasoning.
Interactive agents that explain their decisions in real-time.

Check out the complete code on my GitHub repository. There you will find:

The ollama_thinking_chat.py file with the full implementation.
And the extended version ollama_thinking_chat_extended.py with additional features and capabilities.

If you like this repository, consider dropping a ⭐️

Final thoughts

With Ollama's new thinking mode and the Ollama Python SDK, you can now build applications that leverage the model's reasoning capabilities. This opens up exciting possibilities for creating more intelligent local AI agents that can think through complex tasks and provide better answers.

Enjoy building! If this guide saved you time, consider sharing a ❤️ on this post. Thank you for your support, and happy coding! 🚀

Resources

Using Ollama with TypeScript: A Simple Guide

Jonathan Gastón Löwenstern — Wed, 14 May 2025 20:05:26 +0000

Once you've installed Ollama and experimented with running models from the command line, the next logical step is to integrate these powerful AI capabilities into your TypeScript applications. This guide will show you how to use Ollama with TypeScript.

Note: If you prefer Python, check out my other guide: Using Ollama with Python: A Simple Guide

Setting Up

First, make sure Ollama is installed and running on your system.

You can check this other article Getting Started with Ollama: Run LLMs on Your Computer if you are not familiar with Ollama yet.

Required Ollama Models

Before running the TypeScript examples in this guide, make sure you have the necessary models pulled. You can pull them using the Ollama CLI:

# Pull the models used in these examples
ollama pull llama3.2:1b

You only need to pull these models once. Check which models you already have with:

ollama list

Setting Up Your TypeScript Project

Let's create a new TypeScript project to work with Ollama:

# Create a new directory for your project
mkdir ollama-typescript
cd ollama-typescript

# Initialize a new Node.js project
npm init -y

# Install TypeScript and tsx for running TypeScript files
npm install typescript @types/node --save-dev
npm install tsx --save-dev

# Install the Ollama library
npm install ollama

Next, create a TypeScript configuration file (tsconfig.json):

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "esModuleInterop": true,
    "strict": true,
    "outDir": "./dist"
  },
  "include": ["src/**/*"]
}

Create a src directory for your TypeScript files:

mkdir src

Basic Usage

Let's start with a simple example using the Llama 3.2 1B model.

Create a file named src/generate.ts with this content:

import { Ollama } from "ollama";

async function main() {
  const ollama = new Ollama();

  // Regular response
  const response = await ollama.generate({
    model: "llama3.2:1b",
    prompt: "Why is the sky blue?"
  });

  console.log(response.response);
}

main().catch(console.error);

Run this example with:

npx tsx src/generate.ts

This will output the model's explanation of why the sky is blue as a complete response.

Streaming Responses

For a more interactive experience, you can get the response as it's being generated.

Create a file named src/generate-stream.ts with this content:

import { Ollama } from "ollama";

async function main() {
  const ollama = new Ollama();

  console.log("Streaming response:");

  // Streaming response
  const stream = await ollama.generate({
    model: "llama3.2:1b",
    prompt: "Why is the sky blue?",
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.response);
  }

  console.log(); // New line at the end
}

main().catch(console.error);

This displays the response incrementally as it's generated, creating a more interactive experience.

How Streaming Works in TypeScript

When you use the streaming functionality with Ollama in TypeScript, the response is returned as an AsyncIterable that you can consume with a for await...of loop.

The ollama.generate() function with stream: true returns an AsyncIterable in TypeScript. This allows you to process each chunk as it becomes available from the model:

Each chunk contains a small piece of the response in chunk.response
Using process.stdout.write() prevents adding newlines between chunks
The chunks are displayed immediately as they arrive

This creates the effect of watching the AI "think" in real-time, similar to watching someone type.

Using System Prompts

The system prompt allows you to set context and instructions for the model before the conversation starts. It's a powerful way to define the model's behavior.

Create a file named src/chat-system-role.ts with this content:

import { Ollama } from "ollama";

async function main() {
  const ollama = new Ollama();

  // Define a system prompt
  const systemPrompt = "You speak and sound like a pirate with short sentences.";

  // Chat with a system prompt
  const response = await ollama.chat({
    model: "llama3.2:1b",
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: "Tell me about your boat." }
    ]
  });

  console.log(response.message.content);
}

main().catch(console.error);

The system prompt stays active throughout the conversation, influencing how the model responds to all user inputs.

Conversational Context

Maintain a conversation with context using streaming for a more interactive experience.

Create a file named src/chat-history-stream.ts with this content:

import { Ollama } from "ollama";
import * as readline from "readline";

async function main() {
  const ollama = new Ollama();

  // Initialize an empty message history
  const messages: Array<{ role: string, content: string }> = [];

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  const askQuestion = () => {
    rl.question("Chat with history: ", async (userInput) => {
      if (userInput.toLowerCase() === "exit") {
        rl.close();
        return;
      }

      // Get streaming response while maintaining conversation history
      let responseContent = "";

      const stream = await ollama.chat({
        model: "llama3.2:1b",
        messages: [
          ...messages,
          { role: "system", content: "You are a helpful assistant. You only give a short sentence by answer." },
          { role: "user", content: userInput }
        ],
        stream: true
      });

      for await (const chunk of stream) {
        if (chunk.message) {
          const responseChunk = chunk.message.content;
          process.stdout.write(responseChunk);
          responseContent += responseChunk;
        }
      }

      // Add the exchange to the conversation history
      messages.push(
        { role: "user", content: userInput },
        { role: "assistant", content: responseContent }
      );

      console.log("\n"); // Add space after response
      askQuestion();
    });
  };

  askQuestion();
}

main().catch(console.error);

Here you can see how this example looks like.

Conclusion

The Ollama TypeScript library makes it easy to integrate powerful language models into your TypeScript and JavaScript applications. Whether you're building a simple script, a Node.js application, or integrating AI into a web app, the library's straightforward API allows you to focus on creating value rather than managing the underlying AI infrastructure.

As you become more comfortable with the basics, explore more advanced features and consider how you can use these capabilities to solve real-world problems in your projects.

Resources

In this GitHub repository, you'll find working code examples: GitHub Repository

Using Ollama with Python: A Simple Guide

Jonathan Gastón Löwenstern — Mon, 12 May 2025 13:10:49 +0000

Once you’ve installed Ollama and experimented with running models from the command line, the next logical step is to integrate these powerful AI capabilities into your Python applications. This guide will show you how to use Ollama with Python.

Setting Up

First, make sure Ollama is installed and running on your system.

You can check this other article Getting Started with Ollama: Run LLMs on Your Computer if you are no familiar with Ollama yet.

Required Ollama Models

Before running the Python examples in this guide, make sure you have the necessary models pulled. You can pull them using the Ollama CLI:

Pull the models used in these examples

ollama pull llama3.2:1b

You only need to pull these models once. Check which models you already have with:

ollama list

Creating a Virtual Environment

It’s a good practice to use a virtual environment for your Python projects. This keeps your dependencies isolated and makes your project more portable:

# Create a virtual environment  

python -m venv ollama-env  

# Activate the virtual environment  

# On Windows:  

ollama-env\Scripts\activate  

# On macOS/Linux:  
source ollama-env/bin/activate

Installing Dependencies

Install the Ollama Python library:

pip install ollama

Creating a requirements.txt

For better project management, create a requirements.txt file:

pip freeze > requirements.txt

To install from this file in the future:

pip install -r requirements.txt

Basic Usage

Let’s start with a simple example using the Llama 3.2 1B model.

Create a file named generate.py with this content:

from ollama import generate  
# Regular response  
response = generate('llama3.2:1b', 'Why is the sky blue?')  
print(response['response'])

This will output the model’s explanation of why the sky is blue as a complete response.

Streaming Responses

For a more interactive experience, you can get the response as it’s being generated.

Create a file named generate-stream.py with this content:

from ollama import generate  
# Streaming response  
print("Streaming response:")  
for chunk in generate('llama3.2:1b', 'Why is the sky blue?', stream=True):  
    print(chunk['response'], end='', flush=True)  
print()  # New line at the end

This displays the response incrementally as it’s generated, creating a more interactive experience.

Why `for chunk in generate` is used?

When you use the streaming functionality with Ollama, the response isn’t returned all at once. Instead, it’s broken into small pieces (chunks) that arrive one at a time as they’re generated by the model.

The generate() function with stream=True returns an iterator in Python. This iterator yields new chunks of text as they become available from the model. The for loop processes these chunks one by one as they arrive:

Each chunk contains a small piece of the response in chunk['response']
The end='' parameter prevents adding newlines between chunks
The flush=True ensures text displays immediately

This creates the effect of watching the AI “think” in real-time, similar to watching someone type.

Using System Prompts

The system prompt allows you to set context and instructions for the model before the conversation starts. It’s a powerful way to define the model’s behavior.

Create a file named chat-system-role.py with this content:

from ollama import chat  

# Define a system prompt  
system_prompt = "You speaks and sounds like a pirate with short sentences."  
# Chat with a system prompt  
response = chat('llama3.2:1b',   
                messages=[  
                    {'role': 'system', 'content': system_prompt},  
                    {'role': 'user', 'content': 'Tell me about your boat.'}  
                ])  
print(response.message.content)

The system prompt stays active throughout the conversation, influencing how the model responds to all user inputs.

Conversational Context

Maintain a conversation with context using streaming for a more interactive experience.

Create a file named chat-history-stream.py with this content:

from ollama import chat  

# Initialize an empty message history  
messages = []  
while True:  
    user_input = input('Chat with history: ')  
    if user_input.lower() == 'exit':  
        break  
    # Get streaming response while maintaining conversation history  
    response_content = ""  
    for chunk in chat(  
        'llama3.2:1b',  
        messages=messages + [  
            {'role': 'system', 'content': 'You are a helpful assistant. You only give a short sentence by answer.'},  
            {'role': 'user', 'content': user_input},  
        ],  
        stream=True  
    ):  
        if chunk.message:  
            response_chunk = chunk.message.content  
            print(response_chunk, end='', flush=True)  
            response_content += response_chunk  
    # Add the exchange to the conversation history  
    messages += [  
        {'role': 'user', 'content': user_input},  
        {'role': 'assistant', 'content': response_content},  
    ]  
    print('\n')  # Add space after response

Here you can see how this example looks like.

Conclusion

The Ollama Python library makes it easy to integrate powerful language models into your Python applications. Whether you’re building a simple script or a complex application, the library’s straightforward API allows you to focus on creating value rather than managing the underlying AI infrastructure.

As you become more comfortable with the basics, explore more advanced features and consider how you can use these capabilities to solve real-world problems in your projects.

Resources

In this GitHub repository, you'll find working code examples: GitHub Repository

Build an MCP Client in Minutes: Local AI Agents Just Got Real

Jonathan Gastón Löwenstern — Fri, 09 May 2025 14:48:19 +0000

Built an MCP server already? well done! But it's only half the story. Without a client your model is shouting into the void.

Give me 8 minutes and you'll:

Ship a full MCP client in under 100 lines
First steps into the world of local AI agents
Plug it into any MCP server
Keep every byte local. No cloud fees.
Privacy first. No API keys.

Why roll your own client?

Cloud agents are fun until the bill hits, and who really knows where your data ends up. A local MCP client means:

Full privacy & control. Every token lives on your machine, nowhere else.
Zero API keys. No provider lock‑in, no surprise invoices.
Good‑enough performance today, better tomorrow. Local models aren’t GPT‑4o yet, but they’re getting sharper every release.

The code below is inspired by the official MCP client quickstart (which targets Anthropic models) and tweaked for Ollama so every byte runs offline. Let's build! 🚀

Requirements

Python 3.10+ (uv handles the virtual env)
Ollama installed, follow the official install guide
A local model pulled, the client defaults to llama3.2:3b; run ollama pull llama3.2:3b or switch with --model

Quick setup with uv

# 1. create project folder
mkdir simple-mcp-client && cd simple-mcp-client

# 2. init uv (fast Python package manager)
uv init

# 3. create & activate virtual env
uv venv
source .venv/bin/activate

# 4. add deps
uv add mcp ollama rich

# 5. drop in the two files
client.py   # we will code this on this article
server.py   # from this repo https://github.com/jonigl/mcp-client-for-ollama/blob/simple-client/server.py

# 6. run it
uv run client.py --mcp-server server.py

Five commands and you're chatting locally ⚡️

The 93‑line client

Below is the heart of our client. Copy it, paste it, run it. I'll highlight the spicy parts right after.

# client.py
import argparse
import asyncio
from contextlib import AsyncExitStack
import ollama
from ollama import ChatResponse
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from rich.console import Console
from rich.markdown import Markdown
from typing import Optional

class MCPClient:
    # Initialize the MCP client with a model and console
    def __init__(self, model: str = "llama3.2:3b"):              
        self.console = Console()
        self.exit_stack = AsyncExitStack()
        self.model = model
        self.ollama = ollama.AsyncClient()
        self.session: Optional[ClientSession] = None

    # Connect to the MCP server using the provided script path
    async def connect_to_server(self, server_script_path: str):        
        is_python = server_script_path.endswith('.py')
        is_js = server_script_path.endswith('.js')
        if not (is_python or is_js):
            raise ValueError("Server script must be a .py or .js file")

        command = "python" if is_python else "node"
        server_params = StdioServerParameters(command=command, args=[server_script_path], env=None)
        stdio_transport = await self.exit_stack.enter_async_context(stdio_client(server_params))
        self.stdio, self.write = stdio_transport
        self.session = await self.exit_stack.enter_async_context(ClientSession(self.stdio, self.write))
        await self.session.initialize()

        # List tools
        meta = await self.session.list_tools()
        self.console.print("Server connected. Tools:", [t.name for t in meta.tools], style="dim green")

    # Process a query by sending it to the model and handling tool calls
    async def process_query(self, query: str) -> str:
        messages = [{"role": "user", "content": query}]
        meta = await self.session.list_tools()
        tools_meta = [{"type": "function", "function": {"name": t.name, "description": t.description, "parameters": t.inputSchema}} for t in meta.tools]

        # Initial call
        resp: ChatResponse = await self.ollama.chat(model=self.model, messages=messages, tools=tools_meta)
        final = []
        if getattr(resp.message, "content", None):
            final.append(resp.message.content)
        # Check for tool calls
        elif resp.message.tool_calls:
            for tc in resp.message.tool_calls:
                # Call the tool
                result = await self.session.call_tool(tc.function.name, tc.function.arguments)
                messages.append({"role": "tool", "name": tc.function.name, "content": result.content[0].text})                
                # Call the model again with the tool result, so we can get the final answer. Max tokens is set to 500 speed up the process. Adjust as needed.
                resp = await self.ollama.chat(model=self.model, messages=messages, tools=tools_meta, options={"max_tokens": 500})
                final.append(resp.message.content)
        return "".join(final)

    # Main loop for user interaction
    async def chat_loop(self):
        self.console.print("[bold green]MCP Client Started![/bold green] [cyan]Model: {}[/cyan]".format(self.model))
        self.console.print("[yellow]Type your queries below or [bold]'quit'[/bold] to exit.[/yellow]")
        while True:
            q = self.console.input("[bold blue]Query:[/bold blue] ").strip()
            if q.lower() == 'quit':
                break
            try:
                ans = await self.process_query(q)
                self.console.print(Markdown(ans, style="orange3"))
            except Exception as e:
                self.console.print("Error:", e, style="bold red")

    async def cleanup(self):
        await self.exit_stack.aclose()

# Main function to parse arguments and run the client
async def main():
    p = argparse.ArgumentParser()
    p.add_argument("--mcp-server", required=True)
    p.add_argument("--model", default="llama3.2:3b")
    args = p.parse_args()
    client = MCPClient(model=args.model)
    try:
        await client.connect_to_server(args.mcp_server)
        await client.chat_loop()
    finally:
        await client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Run it

uv run client.py --mcp-server server.py

Ask something like:

What's the weather in Tokyo?

If your server exposes get_weather, the model silently calls it and answers like a pro. That's MCP + Ollama working together on your desk.

Here you can see how it looks like! 🤩

How it works

connect_to_server launches the mcp server (python or node) and chats over stdio.
list_tools grabs available tools and hands them to Ollama as function specs.
process_query lets the model decide, executes the chosen tool, then loops back for the final reply.

Same pattern big providers use, but running locally, though latency depends on your hardware and the model you load.

Want turbo mode?

I’ve expanded this simple MCP client into a more feature-rich CLI tool called ollcmp. Check it out in my repo 👉 mcp-client-for-ollama.

Quickstart with `ollcmp`

Run it instantly with:

uvx ollcmp

Or install it globally for easy access:

pip install ollcmp --upgrade

ollcmp ships with:

🌐 Multi‑Server Support — connect to several MCP servers at once
🎨 Rich Terminal UI — slick interactive console
🚀 Dynamic Model Switching — swap any installed Ollama model on the fly
🛠️ Tool Management — toggle tools or whole servers mid‑chat
🔄 Cross‑Language Servers — Python and JavaScript MCP servers work out of the box
🔌 Plug‑and‑Play — point at any MCP‑compliant server and go

And if you want just fork it, bend it, contribute it.

If you found this helpful, consider giving it a reaction ❤️ to show your support!

Next moves

Browse the official MCP server list and plug one in.
Try other Ollama models (for example qwen2.5:7b).
Wrap your own script in MCP and watch your LLM gain a superpower.

Clone it, hack it, show it off. Drop a link to your build below and let’s continue learning together.

Resources

Grab the full example

All the code with the client.py, a ready-to-run server.py, plus the uv project scaffold—is on GitHub:

👉 https://github.com/jonigl/mcp-client-for-ollama/tree/simple-client

Clone it, run it, and start wiring your own tools in minutes 🙌

Check out the asciinema demo

https://asciinema.org/a/718592

Want to know how to build an MCP server?

You can check out my article on how to build an MCP server in minutes here: Your first MCP Server (quick).

MCP filesystem: Server disconnected

Jonathan Gastón Löwenstern — Wed, 07 May 2025 15:23:10 +0000

If you’re excited about using your Claude Desktop app with the new Model Context Protocol (MCP) but keep running into frustrating configuration errors, you’re not alone. This is especially common if you’re a Node Version Manager (nvm) user.

The Problem

You followed the “Quickstart guide: For Claude Desktop User,” carefully copied the JSON configuration into your own claude_desktop_config.json file, but it keeps failing repeatedly.

You’ll likely see this specific error alert: “MCP filesystem: Server disconnected”

After some investigation, I discovered the root cause from a comment in the project’s GitHub repository (see Sources section below):

The MCP Server Commands environment cannot directly access Node.js executables installed via nvm unless they are properly configured.

The Solution: Create a Wrapper Script

This elegant solution creates a wrapper script that ensures the correct Node.js environment is used:

Step 1: Find your Node.js path

First, determine your nvm-installed Node.js path:

which node

This will return something like: /Users/username/.nvm/versions/node/v16.x.x/bin/node

Step 2: Create a wrapper script

Create a new file at /usr/local/bin/npx-for-claude with the following content:

##!/usr/bin/env bash  

export PATH="/Users/YOUR-USERNAME/.nvm/versions/node/YOUR-NODE-VERSION/bin:$PATH"  
exec npx "$@"

Make sure to replace:

YOUR-USERNAME with your actual username
YOUR-NODE-VERSION with your actual Node.js version (e.g., v16.20.0)

Step 3: Make the wrapper script executable

chmod +x /usr/local/bin/npx-for-claude

Step 4: Configure Claude Desktop

Edit your claude_desktop_config.json file to use the wrapper script:

{  
    "mcpServers": {  
        "filesystem": {  
            "command": "npx-for-claude",  
            "args": [  
                "-y",  
                "@modelcontextprotocol/server-filesystem",  
                "/Users/username/path/to/allowed/directory"  
            ]  
        }  
    }  
}

Final Step

Restart your Claude Desktop app, and it should start working with the new configuration!

Why This Works

This solution works because:

The wrapper script sets up the correct PATH environment that includes your nvm-installed Node.js binaries
It then executes the npx command with all the arguments passed to it

The wrapper script essentially bridges the gap between Claude Desktop’s execution environment and your nvm setup, allowing you to use the standard configuration pattern recommended in the documentation.

Source

This solution was inspired by some comments in the ModelContextProtocol GitHub repository: Issue #64

I have also added this solution as a comment on this GitHub issue: comment

Getting Started with Ollama: Run LLMs on Your Computer

Jonathan Gastón Löwenstern — Wed, 07 May 2025 15:17:24 +0000

Ollama makes it easy to run large language models (LLMs) locally on your own computer. This simple guide will show you how to install Ollama, run your first model, and use it in a Python script.

Installing Ollama

macOS

Download the installer from ollama.ai
Open the downloaded file and drag Ollama to your Applications folder
Open Ollama from your Applications

Windows

Download the installer from ollama.ai
Run the .exe file and follow the installation wizard
Ollama will start automatically when installation completes

Linux

Just run this

curl -fsSL https://ollama.ai/install.sh | sh

Docker

Ollama is also available as a Docker container:

docker pull ollama/ollama  
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Running Your First Model

Let’s try Llama 3.2 1B, a compact but capable model:

ollama run llama3.2:1b

The first time you run this command, Ollama will download the model. Once it’s ready, you’ll see a prompt where you can start chatting:

>>> Why is the sky blue?  
The sky appears blue due to a phenomenon called Rayleigh scattering. As sunlight travels through the atmosphere, the shorter blue wavelengths of light are scattered more by air molecules than the longer red wavelengths. This scattered blue light comes to us from all directions in the sky, making the sky appear blue during the day.

To exit the Ollama terminal, you can:

Type /bye and press Enter
Press Ctrl+D (on macOS/Linux)
Press Ctrl+C twice

Basic Ollama Commands

Here are some useful commands to get you started:

### List all your downloaded models  
ollama list

### Download a model without running it  
ollama pull llama3.2:1b

### Remove a model you no longer need  
ollama rm llama3.2:1b

### Get information about a model  
ollama info llama3.2:1b

Next Steps

Once you’re comfortable with the basics, you can try the Ollama Python library to integrate it in your Python applications. Check this article Using Ollama with Python: A Simple Guide.

Enjoy the freedom of running AI locally with Ollama!

How to Build Your First MCP Server (Fast)

Jonathan Gastón Löwenstern — Wed, 07 May 2025 13:59:00 +0000

Introduction

If you’ve landed on this article, you’re probably wondering: “What’s all this MCP stuff about?” and “Why is it getting so much hype lately?” Or maybe you already have an idea and just want to build your own MCP server to let LLMs interact with your tools. So, let’s quickly answer the basics to get on the same page and then jump right into building your first MCP Server.

What’s all this MCP stuff about?

At this point you probably know what MCP is, but if you don’t, no worries - I’ve got you covered. MCP was introduced last year by Anthropic (the company behind Claude) and stands for Model Context Protocol It might sound complex at first, but it’s actually quite simple. MCP is a way to let LLMs (Large Language Models) interact with tools in a open standard way, allowing them to get context from different data sources and even execute tasks on you behalf, so building agents is just around the corner.

Why is it getting so much hype lately?

Answering this isn’t easy, because everyone has their own take. But here’s the deal: MCP gives the community a standard way to build connectors for AI apps, so you don’t have to build everything yourself. Most tools can be plug-and-play. Third-party services can run their own servers, and you can use them without needing custom implementations. That simplicity is what’s getting everyone excited and backing MCP as the go-to standard.

Clock is ticking, How can I build my first MCP Server (Fast!)?

First of all, we need to decide what tool we want to make available to LLMs. Let’s keep it simple and just create a get weather by city tool which is kind of a “hello world” example. For this we will use wttr.in and as they say, they are “the right way to check curl the weather!” and it will allow us to create this weather server really straightforward!

Now that we know what tool we want to build, we need to select one of the SDKs available. I will pick one for you this time, selecting the python one for our first server.

We will use uv to manage our Python project. Open your favorite terminal and follow me on these commands. Let’s create a project together:

If you are not familiar with uv you can check this page and learn how to use it and install it.

uv init weather-mcp-server  
cd weather-mcp-server

Add the mcp dependency to your project:

uv add "mcp[cli]"

Now we need a file to write our MCP server code:

touch weather-server.py

Nice, we are ready to start writing code! Open this file in your favorite code editor. If you’re new to decorators or docstrings, pay extra attention to those parts , they’re key to how tools are defined and described for the LLM. So, don’t forget to take a look on the comments.

# FastMCP is all what we need from mcp dependency  
from mcp.server.fastmcp import FastMCP  
# We will use this lib to request the weather from wttr.in   
import urllib  

# Now lets create an MCP Server  
mcp = FastMCP("Weather")  

# Now let's register a tool with this decorator,   
@mcp.tool()  
def get_weather(city: str) -> str: # define a function with city argument  
  # And now we will docuement this cuntion using Python Docstrings  
  # FastMCP will add this documentation to the LLM so it can decide when to use  
  # this tool and how to use it.  
  """  
  Get the current weather for a given city  
  Args:  
    city (str): The name of the city  
  Returns:  
    str: The current weather in the city, for example, "Sunny +20°C"  
  """  
  try:  
    # URL-encode the city name.  
    url_encoded_city = urllib.parse.quote_plus(city)  
    # Prepare wittr url request  
    wttr_url = f'https://wttr.in/{url_encoded_city}?format=%C+%t'  
    # Request weather  
    response = urllib.request.urlopen(wttr_url).read()    
    return response.decode('utf-8')  
  except Exception as e:  
    # If something goes wrong we let the LLM know about it  
    return f"Error fetching weather data"  

# And here we add the main entry point for the server  
if __name__ == "__main__":  
  # Here we initialize and run the server  
  # We select stdio transport for process-based communication.   
  # This allow a process (the client) to communicate with its parent   
  # process through pipes using standard input/output.  
  mcp.run(transport='stdio')

Yay! We have our first MCP server ready!

Using your MCP Server

Now we need to prepare a configuration JSON file to let local MCP clients like Claude Desktop, vscode, etc know how to execute your server.

Let’s create a file named mcp-servers-config.json:

{  
  "mcpServers": {  
    "weather": {  
      "command": "uv",  
      "args": [  
        "--directory",  
        "/REPLACE/ME/WITH/THE/dir/path/to/your/mcp/weather-server/",  
        "run",  
        "weather-server.py"  
      ]  
    }  
  }  
}

Make sure you are replacing /REPLACE/ME/WITH/THE/dir/path/to/your/mcp/weather-server/ with the directory where you have your weather-server.py file.

At this point we only need to find a MCP client to test it. I have written one that works with Ollama called ollmcp, let’s use it here for simplicity since it can be executed in any terminal and it is open source.

If you are not familiar with Ollama you can check my article Getting Started with Ollama: Run LLMs on Your Computer
Ollama is a great starting point because it’s easy to set up locally and supports llm models that can run locally and use tools through MCP.

Let’s install the MCP client

pip install ollmcp

I recommend using the qwen2.5:7b model, but you can use any Ollama model that supports tools.

ollama pull qwen2.5:7b

Run the MCP client using our JSON configuration pointing to our MCP Sever and the model we want to use:

# if you are in the same directory as the json file execute it like this  
ollmcp --servers-json mcp-servers-config.json --model qwen2.5:7b # run it  

# Pro tip: ollmcp --help

This command will execute the ollmcp MCP client showing you the MCP server and tools available for your ollama model while you can interact with it writing queries where the LLM can use your MCP Servers. Here you can see how this tool interface looks like:

This command will run the ollmcp MCP client, showing your MCP server and tools available for your Ollama model. You can interact by writing queries, and the LLM will decide when to use your MCP server.

Here you can see how ollmcp interface looks like when started

As you can see on the screenshot we have 1/1 tools enabled, which is great! you can see that our weather MCP Server gets listed with one tool called get_weather.

Try writing something like: What is the weather in Ibiza? and hit return. It might take some time depending on the model and your machine’s resources.

The LLM will seamlessly call your tool, grab the response from your MCP Server, and craft a complete answer for you. Congrats, you have your first MCP Server working!

If you like ollmcp, you can check out the repo here: https://github.com/jonigl/mcp-client-for-ollama

Conclusion

MCP servers are super easy to build and let you share your APIs, apps, or tools with different AI applications without much hassle. And if you’re making an AI app yourself, you can tap into the huge number of ready-to-go MCP servers out there , no need to build or maintain everything from scratch!

Now you’re ready to build even more powerful tools for your AI!

If you like it this article you can leave a ⭐️

What's next?

Try different MCP Servers. Check out the official and community list of MCP servers to find new servers to connect with.
Explore different MCP Clients too! You can continue using ollmcp as we did earlier, or try other clients like Claude Desktop, Visual Studio Code, and more to see how different environments interact with your server.
Build your own MCP Server. Base it on what you learned in this article, or dive deeper with the official documentation.
Write your own MCP Client! I’ll be publishing a new article focused on this topic soon. Let me know in the comments if you’re interested!

Resources

MCP Server code

You can also find this MCP Server code in my following GitHub repo.

Acknowledgments

Big thanks to:

Ollama for providing an awesome local platform to run LLMs easily.
Model Context Protocol (MCP) for creating a powerful open standard to connect LLMs with external tools.
wttr.in for offering a simple and reliable weather API service used in our example.

DEV Community: Jonathan Gastón Löwenstern

Ollama’s New Thinking Mode in less than 5 Minutes

Why this is exciting

What you will learn

What you will do

Prerequisites

Step 1. Let's Upgrade Ollama to v0.9.0

Step 2. Pull a thinking‑capable model

Step 3. Install the Python SDK with thinking support

Step 4. Copy&paste the ThinkingChat demo

What you can build next?

Final thoughts

Resources

Using Ollama with TypeScript: A Simple Guide

Setting Up

Required Ollama Models

Setting Up Your TypeScript Project

Basic Usage

Streaming Responses

How Streaming Works in TypeScript

Using System Prompts

Conversational Context

Conclusion

Resources

Using Ollama with Python: A Simple Guide

Setting Up

Required Ollama Models

Pull the models used in these examples

Creating a Virtual Environment

Installing Dependencies

Creating a requirements.txt

Basic Usage

Streaming Responses

Why for chunk in generate is used?

Using System Prompts

Conversational Context

Conclusion

Resources

Build an MCP Client in Minutes: Local AI Agents Just Got Real

Why roll your own client?

Requirements

Quick setup with uv

The 93‑line client

Run it

How it works

Want turbo mode?

Quickstart with ollcmp

Next moves

Resources

Grab the full example

Check out the asciinema demo

Want to know how to build an MCP server?

MCP filesystem: Server disconnected

The Problem

The Solution: Create a Wrapper Script

Step 1: Find your Node.js path

Step 2: Create a wrapper script

Step 3: Make the wrapper script executable

Step 4: Configure Claude Desktop

Final Step

Why This Works

Source

Getting Started with Ollama: Run LLMs on Your Computer

Installing Ollama

macOS

Windows

Linux

Docker

Running Your First Model

Basic Ollama Commands

Next Steps

How to Build Your First MCP Server (Fast)

Introduction

What’s all this MCP stuff about?

Why is it getting so much hype lately?

Clock is ticking, How can I build my first MCP Server (Fast!)?

Using your MCP Server

Conclusion

What's next?

Resources

Why `for chunk in generate` is used?

Quickstart with `ollcmp`