DEV Community

Lingdas1
Lingdas1

Posted on • Originally published at github.com

Function Calling for Local LLMs: DeepSeek, Qwen, GLM-4 & LangChain

06 — Function Calling & Tool Use

🔴 Advanced — Give your local LLM superpowers: let it call APIs, run code, search the web, and interact with other software — all autonomously.


What Is Function Calling? (Plain English First)

Imagine you ask an assistant: "What's the weather in Tokyo right now?"

A normal LLM can only guess — it doesn't know today's weather. But with function calling, the LLM can say:

"I don't know the weather, but I know someone who does. Let me call the weather API."

The pattern is simple:

User: "What's the weather in Tokyo?"
  ↓
LLM: "I should call get_weather(city='Tokyo')"
  ↓
Your code: calls the actual weather API → gets result
  ↓
LLM: "The weather in Tokyo is 22°C and sunny."
Enter fullscreen mode Exit fullscreen mode

Function calling = the LLM decides when to use a tool, and your code executes it.

💡 Why this matters without the cloud: On a cloud API (GPT-4, Claude), function calling is a checkbox feature. On local LLMs, it's not automatic — you need to know which models support it, how to format the tool definitions, and how to handle the response correctly. That's what this chapter covers.


How Function Calling Works (The Technical Pattern)

Every function calling flow follows the same 5-step cycle:

Step 1: Define your tools (as JSON schema)
Step 2: Send user message + tool definitions to the LLM
Step 3: LLM responds with either:
         - A normal text reply (no tool needed)
         - A "tool call" request (which tool + what arguments)
Step 4: Your code executes the requested tool
Step 5: Send the tool result back to the LLM
         → LLM produces the final response
Enter fullscreen mode Exit fullscreen mode

Here's what a tool definition looks like in JSON:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "City name, e.g., 'Tokyo'"
        }
      },
      "required": ["city"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

1. DeepSeek-R1: Function Calling

DeepSeek-R1 is excellent at function calling — it's one of its standout features. It uses the OpenAI-compatible format, which means you can use the same code you'd use with GPT-4.

Basic Setup

First, make sure DeepSeek-R1 is running locally:

ollama pull deepseek-r1:14b

# Or for smaller setups:
ollama pull deepseek-r1:7b
Enter fullscreen mode Exit fullscreen mode

Single Tool Call Example (Python)

import json
import requests

# Step 1: Define the tools available to the LLM
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Perform a mathematical calculation",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression, e.g., '2 + 2' or 'sqrt(144)'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

# Step 2: Send message + tools to the model
def chat_with_tools(messages, tools):
    response = requests.post(
        "http://localhost:11434/v1/chat/completions",
        json={
            "model": "deepseek-r1:14b",
            "messages": messages,
            "tools": tools,
            "stream": False
        }
    )
    return response.json()

# Step 3: Execute tool calls and return results
def execute_tool(tool_call):
    """Execute the tool the LLM requested and return the result."""
    name = tool_call["function"]["name"]
    args = json.loads(tool_call["function"]["arguments"])

    if name == "get_weather":
        # In real code, you'd call a real weather API here
        city = args["city"]
        unit = args.get("unit", "celsius")
        return json.dumps({
            "city": city,
            "temperature": 22 if unit == "celsius" else 72,
            "condition": "Sunny",
            "humidity": "65%"
        })

    elif name == "calculator":
        try:
            result = eval(args["expression"], {"__builtins__": {}}, {
                "sqrt": __import__("math").sqrt,
                "sin": __import__("math").sin,
                "cos": __import__("math").cos,
                "pi": __import__("math").pi
            })
            return json.dumps({"result": result})
        except Exception as e:
            return json.dumps({"error": str(e)})

    return json.dumps({"error": f"Unknown tool: {name}"})

# Step 4: Run the full interaction
def run_with_tools(user_message):
    messages = [
        {"role": "system", "content": "You are a helpful assistant with access to tools."},
        {"role": "user", "content": user_message}
    ]

    # First LLM call
    response = chat_with_tools(messages, tools)
    response_message = response["choices"][0]["message"]
    messages.append(response_message)

    # Check if the LLM wants to call tools
    if response_message.get("tool_calls"):
        for tool_call in response_message["tool_calls"]:
            result = execute_tool(tool_call)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call["id"],
                "content": result
            })

        # Second LLM call — now it has the tool results
        final_response = chat_with_tools(messages, tools)
        return final_response["choices"][0]["message"]["content"]

    return response_message["content"]

# Test it
print(run_with_tools("What's the weather in Tokyo in celsius?"))
# → "The weather in Tokyo is 22°C and sunny."

print(run_with_tools("Calculate 2^10 + 5*3"))
# → "The result is 1024 + 15 = 1039."
Enter fullscreen mode Exit fullscreen mode

Key Differences from Cloud APIs

Aspect GPT-4 (Cloud) DeepSeek-R1 (Local)
tool_choice Supports "auto", "required", "none" Supports "auto" and "none"
Parallel tool calls ✅ Yes ✅ Yes (multiple tools in one response)
Streaming with tools ✅ Yes ⚠️ Partially (use stream: false for reliability)
Response format OpenAI format OpenAI-compatible ✅

Tip: If DeepSeek-R1 doesn't call tools when you expect it to, try adding explicit instructions in the system prompt like: "You have access to tools. Use them when the user asks for information you don't know."


2. Qwen 3.6 / 2.5: Function Calling

Qwen models have native function calling support and are particularly good at following complex tool schemas.

Setup

# Qwen 3.6 (newer, better function calling)
ollama pull qwen3.6:8b

# Or Qwen 2.5 (more widely tested)
ollama pull qwen2.5:7b
Enter fullscreen mode Exit fullscreen mode

Example: Multi-Tool Chatbot

import json
import requests

def qwen_chat_with_tools(messages, tools):
    """Qwen uses the same OpenAI-compatible format."""
    response = requests.post(
        "http://localhost:11434/v1/chat/completions",
        json={
            "model": "qwen3.6:8b",  # or "qwen2.5:7b"
            "messages": messages,
            "tools": tools,
            "tool_choice": "auto",
            "temperature": 0.3,  # Lower = more deterministic tool selection
            "stream": False
        }
    )
    return response.json()

# Define a web search tool (mock)
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the internet for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a file on the local system",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Absolute file path"}
                },
                "required": ["path"]
            }
        }
    }
]

def execute_qwen_tool(tool_call):
    name = tool_call["function"]["name"]
    args = json.loads(tool_call["function"]["arguments"])

    if name == "search_web":
        # In production, use a real search API
        return json.dumps({
            "query": args["query"],
            "results": [
                {"title": f"Result about {args['query']}", "url": "https://example.com"}
            ]
        })
    elif name == "read_file":
        try:
            with open(args["path"], "r") as f:
                content = f.read()[:2000]  # Limit to 2000 chars
            return json.dumps({"path": args["path"], "content": content})
        except Exception as e:
            return json.dumps({"error": str(e)})

    return json.dumps({"error": "Unknown tool"})

# Full interaction loop
messages = [
    {"role": "system", "content": "You are an AI assistant with access to search and file tools. Use them when needed."}
]

user_input = "Can you read my config file and tell me what model I'm using?"
messages.append({"role": "user", "content": user_input})

# First response
response = qwen_chat_with_tools(messages, tools)
msg = response["choices"][0]["message"]
messages.append(msg)

# Handle tool calls
if msg.get("tool_calls"):
    for tc in msg["tool_calls"]:
        result = execute_qwen_tool(tc)
        messages.append({
            "role": "tool",
            "tool_call_id": tc["id"],
            "content": result
        })

    # Get final response
    final = qwen_chat_with_tools(messages, tools)
    print(final["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

Qwen-Specific Tips

Tip Why
Use temperature: 0.3 Qwen is more creative by default; lower temp = more reliable tool selection
Describe tools in Chinese + English Qwen was trained bilingually; descriptions in English work fine, but Chinese descriptions can improve accuracy
Max 5 parallel tools Qwen 3.6 supports parallel tool calls but performs best with ≤5 at once
Use tool_choice: "auto" Explicitly setting this prevents the model from ignoring tools

3. GLM-4.7: Tool Use & Agents

GLM-4 (from Zhipu AI / z.ai) is specifically designed for agentic workflows. It has the strongest tool-use capabilities among Chinese local models — it was trained with tool use as a first-class feature, not an afterthought.

Setup

ollama pull glm4:9b
Enter fullscreen mode Exit fullscreen mode

GLM's Unique Tool Format

GLM uses a slightly different tool definition format. Note the required_parameters field instead of required:

import json
import requests

# GLM tool definition format
glm_tools = [
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email to a recipient",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string", "description": "Recipient email address"},
                    "subject": {"type": "string", "description": "Email subject"},
                    "body": {"type": "string", "description": "Email body content"}
                },
                "required_parameters": ["to", "subject", "body"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": "List files in a directory",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Directory path"}
                },
                "required_parameters": ["path"]
            }
        }
    }
]

def glm_chat(messages, tools):
    response = requests.post(
        "http://localhost:11434/v1/chat/completions",
        json={
            "model": "glm4:9b",
            "messages": messages,
            "tools": tools,
            "stream": False
        }
    )
    return response.json()
Enter fullscreen mode Exit fullscreen mode

Multi-Step Agent Example

GLM-4 excels at multi-step reasoning — deciding to call tools in sequence:

messages = [
    {"role": "system", "content": "You are an AI assistant that can use tools. Use them when helpful."},
    {"role": "user", "content": "List the files in /home/user/projects, then tell me which ones are Python files."}
]

# GLM will:
# 1. Call list_directory("/home/user/projects")
# 2. Receive the file list
# 3. Analyze and respond with which are Python files

response = glm_chat(messages, glm_tools)
msg = response["choices"][0]["message"]

if msg.get("tool_calls"):
    for tc in msg["tool_calls"]:
        result = execute_glm_tool(tc)  # Your tool execution function
        messages.append({
            "role": "tool",
            "tool_call_id": tc["id"],
            "content": result
        })

    # GLM will now synthesize the results
    final = glm_chat(messages, glm_tools)
    print(final["choices"][0]["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

GLM vs Others: When to Use Each

Task Best Model Why
Simple tool call (1-2 tools) DeepSeek-R1:7b Fastest inference, reliable
Complex multi-step (3+ tools) GLM-4:9b Best agentic reasoning
Following exact tool schema Qwen 3.6:8b Most accurate parameter extraction
Cost-sensitive (low VRAM) Qwen 2.5:7b 4.5GB at Q4, works on most GPUs

4. LangChain Integration

LangChain is the most popular framework for building LLM-powered applications. Here's how to use your local models with function calling in LangChain.

Installation

pip install langchain langchain-community
Enter fullscreen mode Exit fullscreen mode

Basic LangChain + Ollama Tools

from langchain_community.chat_models import ChatOllama
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools import tool
from langchain_core.prompts import ChatPromptTemplate

# Step 1: Define tools using the @tool decorator
@tool
def get_weather(city: str) -> str:
    """Get current weather for a city. Input: city name."""
    # Replace with real API call
    return f"The weather in {city} is 22°C and sunny."

@tool
def calculate(expression: str) -> str:
    """Perform a mathematical calculation. Input: math expression string."""
    import math
    safe_dict = {
        "sqrt": math.sqrt, "sin": math.sin, "cos": math.cos,
        "pi": math.pi, "e": math.e, "abs": abs
    }
    try:
        result = eval(expression, {"__builtins__": {}}, safe_dict)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {e}"

@tool
def search_web(query: str) -> str:
    """Search the web for current information. Input: search query."""
    # In production, use DuckDuckGo or similar
    return f"Top result for '{query}': [Example result]"

# Step 2: Create the LLM
llm = ChatOllama(
    model="qwen2.5:7b",  # or "deepseek-r1:7b", "glm4:9b"
    temperature=0.3,
)

# Step 3: Create the agent
tools = [get_weather, calculate, search_web]
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant with access to tools."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # Shows you what tools are being called
    max_iterations=5,  # Safety limit
)

# Step 4: Run it
result = agent_executor.invoke({
    "input": "What's the weather in London and calculate 15% of 200?"
})
print(result["output"])
# → "The weather in London is 22°C and sunny. 15% of 200 is 30."
Enter fullscreen mode Exit fullscreen mode

Running the LangChain Example

# Save the code above as langchain-agent.py
python langchain-agent.py

# You should see:
# > Entering new AgentExecutor chain...
# > Invoking: get_weather with {'city': 'London'}
# > Invoking: calculate with {'expression': '0.15 * 200'}
# > The weather in London is 22°C and sunny. 15% of 200 is 30.
Enter fullscreen mode Exit fullscreen mode

Model-Specific LangChain Tips

Model LangChain Model Class Notes
DeepSeek-R1 ChatOllama(model="deepseek-r1:14b") Best for reasoning-heavy agents
Qwen 3.6/2.5 ChatOllama(model="qwen3.6:8b") Most reliable with LangChain's tool format
GLM-4 ChatOllama(model="glm4:9b") May need `stop: ["<

5. Practical: Build a Code Assistant Bot

Let's put it all together — a real tool-using assistant that can:

  • Read and write files
  • Run shell commands
  • Search for packages
  • Answer questions about your codebase {% raw %}
import json
import requests
import subprocess
import os

# === Tool Definitions ===

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Absolute path to file"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file (overwrites existing)",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string", "description": "File content"}
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_command",
            "description": "Run a shell command (read-only, safe commands only)",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string", "description": "Shell command to run"}
                },
                "required": ["command"]
            }
        }
    }
]

def execute(name, args):
    if name == "read_file":
        try:
            with open(args["path"], "r") as f:
                return f.read()[:3000]
        except Exception as e:
            return f"Error: {e}"

    elif name == "write_file":
        try:
            with open(args["path"], "w") as f:
                f.write(args["content"])
            return f"Written {len(args['content'])} bytes to {args['path']}"
        except Exception as e:
            return f"Error: {e}"

    elif name == "run_command":
        # Safety: only allow read-only commands
        safe_prefixes = ["ls", "cat", "grep", "find", "pwd", "echo", "which", "head", "tail"]
        cmd = args["command"].split()[0]
        if cmd not in safe_prefixes:
            return f"Blocked: '{cmd}' is not in the allowed command list."
        try:
            result = subprocess.run(
                args["command"], shell=True, capture_output=True,
                text=True, timeout=10
            )
            output = (result.stdout + result.stderr)[:3000]
            return output if output else "(no output)"
        except subprocess.TimeoutExpired:
            return "Command timed out after 10 seconds"
        except Exception as e:
            return f"Error: {e}"

# === Main Loop ===

def chat_tool(ollama_host="http://localhost:11434", model="qwen2.5:7b"):
    messages = [{
        "role": "system",
        "content": "You are a coding assistant. Use your tools to read files, write code, and run commands."
    }]

    print(f"🤖 Code Assistant ({model}) — type 'quit' to exit\n")

    while True:
        user = input("You: ")
        if user.lower() in ("quit", "exit", "q"):
            break

        messages.append({"role": "user", "content": user})

        # Tool-call loop (max 5 iterations to prevent infinite loops)
        for i in range(5):
            resp = requests.post(
                f"{ollama_host}/v1/chat/completions",
                json={
                    "model": model,
                    "messages": messages,
                    "tools": TOOLS,
                    "stream": False
                }
            ).json()

            msg = resp["choices"][0]["message"]
            messages.append(msg)

            if not msg.get("tool_calls"):
                break  # No more tools needed

            # Execute each tool
            for tc in msg["tool_calls"]:
                fn_name = tc["function"]["name"]
                fn_args = json.loads(tc["function"]["arguments"])
                print(f"  🔧 Calling: {fn_name}({json.dumps(fn_args)})")
                result = execute(fn_name, fn_args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc["id"],
                    "content": str(result)
                })

        # Print final response
        print(f"🤖 {msg['content']}\n")

if __name__ == "__main__":
    chat_tool()
Enter fullscreen mode Exit fullscreen mode

Save and run:

python3 code-assistant.py
Enter fullscreen mode Exit fullscreen mode

Example interaction:

You: Read my main.py and tell me if there are any bugs
  🔧 Calling: read_file({"path": "./main.py"})
🤖 I can see your main.py. It looks mostly fine, but I notice
   line 42 has a typo: "retrun" should be "return".

You: Fix it
  🔧 Calling: read_file({"path": "./main.py"})
  🔧 Calling: write_file({"path": "./main.py", "content": "..."})
🤖 Fixed! Changed "retrun" to "return" on line 42.
Enter fullscreen mode Exit fullscreen mode

Quick Reference: Model Function Calling Support

Feature DeepSeek-R1 Qwen 3.6 / 2.5 GLM-4 Notes
OpenAI format Same tools parameter
Parallel calls Multiple tools at once
tool_choice: "auto" LLM decides when to use tools
tool_choice: "required" ⚠️ Partial Not widely supported locally
Streaming + tools ⚠️ Partial ⚠️ Partial Use stream: false to be safe
Multi-step reasoning Good Very Good Excellent GLM-4 leads on agentic workflows
Min VRAM (Q4) ~4.5 GB (7b) ~5 GB (8b) ~5.5 GB (9b) All fit on 8GB GPUs

Common Mistakes & Solutions

Mistake Symptom Fix
Wrong model name "does not support tools" error Verify: curl -s http://localhost:11434/api/tags
Missing system prompt Model never calls tools Add: "You have access to tools. Use them when helpful."
Too many tools Model calls wrong tool Limit to ≤5 tool definitions per call
No tool_choice: "auto" Model ignores tools Explicitly set tool_choice: "auto"
Infinite tool loop Model keeps calling tools Add max_iterations guard (e.g., 5)
Temperature too high Tool calls are random/lazy Set temperature: 0.3 or lower
Wrong Ollama port Connection refused Check: ollama serve is running on 11434

What's Next

You now have a local LLM that can see files, run commands, search the web, and execute code. This is the foundation for building:

  • AI coding assistants that read and modify your codebase
  • Personal research agents that search the web and summarize
  • Automation bots that interact with APIs and databases
  • Your own AutoGPT — a multi-step reasoning agent

The GitHub repo has ready-to-run scripts for all the examples above. Star it to get notified when new chapters drop! ⭐


Found this useful? ⭐ Star the repo — it helps others find it and you'll get notified when new chapters drop.

Top comments (0)