Emilien Lancelot

Posted on May 20

Calling code with local LLM is a hoax

Having a local LLM spewing text is good. But what you need is the LLM to execute YOUR code!

Introduction

Is calling tools even doable? Sure chatGPT makes it easy. But what of your local LLMs. In this article, we'll be trying multiple agent frameworks with tool-calling capabilities and see if our local LLM can use them.

My configuration is:

RTX4090 with 32GB of RAM

Using the following LLMs for testing:

llama3:8b
dolphin-mixtral:8x7b-v2.7-q4_K_M
mistral:latest

Powered locally by Ollama.

I. AutoGPT

AutoGPT is a framework that seems nice. It has a cool CLI and a flutter UI to create agents from the browser. Its main purpose is to work with your local stuff (documents, audio, videos, etc)

BUT

It mostly relies on chatGPT or any proprietary LLM providers to do the heavy lifting. At least that's how I understand it.

Using local models

Here you can find the configuration file where we must set our config.

We must trick AutoGPT to use the Ollama endpoint like it's chatGPT.

## OPENAI_API_KEY - OpenAI API Key (Example: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
OPENAI_API_KEY="helloworld"

...

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL=http://localhost:11434/v1

...

## SMART_LLM - Smart language model (Default: gpt-4-turbo)
SMART_LLM=dolphin-mixtral:8x7b-v2.7-q4_K_M

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
FAST_LLM=mistral:latest

This should do the trick.

./autogpt.sh run

value is not a valid enumeration member; permitted: 'text-embedding-ada-002', 'text-embedding-3-small', 'text-embedding-3-large', 'gpt-3.5-turbo-0301', 'gpt-3.5-turbo-0613', 'gpt-3.5-turbo-16k-0613', 'gpt-3.5-turbo-1106', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo', 'gpt-3.5-turbo-16k', 'gpt-4-0314', 'gpt-4-32k-0314', 'gpt-4-0613', 'gpt-4-32k-0613', 'gpt-4-1106-preview', 'gpt-4-1106-vision-preview', 'gpt-4-0125-preview', 'gpt-4-turbo-2024-04-09', 'gpt-4', 'gpt-4-32k', 'gpt-4-turbo', 'gpt-4-turbo-preview', 'gpt-4-vision-preview' (type=type_error.enum; enum_values=[<OpenAIModelName.EMBEDDING_v2: 'text-embedding-ada-002'>, <OpenAIModelName.EMBEDDING_v3_S: 'text-embedding-3-small'>, <OpenAIModelName.EMBEDDING_v3_L: 'text-embedding-3-large'>, <OpenAIModelName.GPT3_v1: 'gpt-3.5-turbo-0301'>, <OpenAIModelName.GPT3_v2: 'gpt-3.5-turbo-0613'>, <OpenAIModelName.GPT3_v2_16k: 'gpt-3.5-turbo-16k-0613'>, <OpenAIModelName.GPT3_v3: 'gpt-3.5-turbo-1106'>, <OpenAIModelName.GPT3_v4: 'gpt-3.5-turbo-0125'>, <OpenAIModelName.GPT3_ROLLING: 'gpt-3.5-turbo'>, <OpenAIModelName.GPT3_ROLLING_16k: 'gpt-3.5-turbo-16k'>, <OpenAIModelName.GPT4_v1: 'gpt-4-0314'>, <OpenAIModelName.GPT4_v1_32k: 'gpt-4-32k-0314'>, <OpenAIModelName.GPT4_v2: 'gpt-4-0613'>, <OpenAIModelName.GPT4_v2_32k: 'gpt-4-32k-0613'>, <OpenAIModelName.GPT4_v3: 'gpt-4-1106-preview'>, <OpenAIModelName.GPT4_v3_VISION: 'gpt-4-1106-vision-preview'>, <OpenAIModelName.GPT4_v4: 'gpt-4-0125-preview'>, <OpenAIModelName.GPT4_v5: 'gpt-4-turbo-2024-04-09'>, <OpenAIModelName.GPT4_ROLLING: 'gpt-4'>, <OpenAIModelName.GPT4_ROLLING_32k: 'gpt-4-32k'>, <OpenAIModelName.GPT4_TURBO: 'gpt-4-turbo'>, <OpenAIModelName.GPT4_TURBO_PREVIEW: 'gpt-4-turbo-preview'>, <OpenAIModelName.GPT4_VISION: 'gpt-4-vision-preview'>])

Seems like it's not... The model name MUST be a proprietary name like "GPT4-turbo" or any other from the above list. Unfortunately, my models are not named like that.

Now to see If it could go a bit further with a fake (but compliant) model name I set "GPT4-turbo" and ran again.

./autogpt.sh run
2024-05-19 16:03:01,937 ERROR  Invalid OpenAI API key! Please set your OpenAI API key in .env or as an environment variable.
2024-05-19 16:03:01,938 INFO  You can get your key from https://platform.openai.com/account/api-keys

It doesn't like my API key. I've tried many different keys. It won't go further.

Clonclusion on autoGPT

To fix the model names you could create a custom model in Ollama called GPT4-turbo which you would base on any local model you already have. It's just a way to rename your model and trick AutoGPT. But that wouldn't fix the API key error.

Also as mentioned HERE you could maybe duplicate the OpenAi model provider file from AutoGPT and remove any non-compliant parts. But I'm unsure how to perform such an operation.

The documentation doesn't have anything about using local models and doesn't mention calling tools.

In the end, I don't think that AutoGPT is ready for local model use and you should wait and hope the paradigm shifts toward a more local approach.

II. LangChain & LangGraph

Langchain has been at the core of many projects since the beginning of the AI gold rush. Why it's not the king already is probably because of its complex syntax that many developers don't have the time to learn.

Langchain has a way of using the most obscure Python functionalities and makes you feel like you never have read Python code before.

For instance:

chain = prompt | model | outputparser
chain.invoke("Question.")

The LCEL system of Python uses pipes ("|") to string things up. This is rendered possible in Python by overriding Python's __or__ magic method. in other words, Langchain overrides operators like you would in C++.

But did we really need this kind of idea? I'll let you make up your mind…

Now about using local models:

Langchain has 2 plugins:

Ollama chat: Allows you to chat with an LLM
Ollama-functions: Allows an LLM to answer using a specific formatting output. For instance, if you want your LLM to answer as JSON or as YAML then you can define the format type, keys, and values type that you expect.

Beware with the "function calling" capability ! It's a troll from OpenAI… Worst feature naming possible! It doesn't calls functions like "using tools" would. It's only about formating the output of the LLM.

=> Now, what about tool calling (aka executing real code locally)?

Well… The Ollama plugin doesn't have this functionality…

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model = ChatOllama(model="mistral:latest")
model_with_tools = model.bind_tools([multiply]) # <== Binding tool here

Running this will output:

ChatOllama doesn't have a method bind_tools()

And I can confirm it doesn't… So we're F*****.

Conclusion on Langchain & Langraph

I'm a bit disappointed. As this framework powers many others like CrewAI etc I thought that it would have a nice integration with local tools. In the end, it's not that great. It's just a complicated mess that doesn't fix our main concern.

III. Rivet

I must say that I love this one! It's kind of new but has tremendous potential in the future.

It's some kind of IDE for LLM interactions that uses a canvas to create an execution diagram (DAG). It can run in the browser but you can also export the DAG and run it as code to empower your software.

Look at this! How cool it looks! There is an Ollama plugin so you can use it locally.

Just make sure to click the 3 dots in the top right and change the executor to "node" otherwise it might not run.

Unfortunately, I didn't find a way to call custom tools and the documentation is quite lacking anyway. It's clearly a project that needs to be watched for further updates!

Conclusion on Rivet

Cool software! Free and open source. Love the canvas system that feels like what LangGraph should have.

Still requires tool calling before being useful. But have fun with it if you have a chatGPT account.

IV. AutoGen

One of the best candidates on this list. Autogen is backed up by one of the largest tech companies out there.

I have done the tutorial and I must say… I don't understand most of what I'm doing! If the first pages are okay, the situation rapidly gets out of control and then you would need an AI agent framework to explain to you how all of this works.

However, it does have all you need and supports Ollama out of the box:

code_writer_agent = ConversableAgent(
    "code_writer_agent",
    system_message=code_writer_system_message,
    llm_config={"config_list": 
      [{"model": "dolphin-mixtral:8x7b-v2.7-q4_K_M",
      "api_key": "hello world",
      "base_url": "http://127.0.0.1:11434/v1"}]},
    code_execution_config=False,
)

The best functionalities are IMO:

Generating code on the fly and executing it
Call tools (aka calling your code)
Human input

But does tool calling works ?

Still doesn't… Only OpenAI-compatible tools calling LLMs can use this. So Ollama + Mistral won't make the cut. However, the code generation and execution thingy works quite well. Also, note that calling LangChain tools is not supported.

Available chat mechanisms you can use

Two chats pattern: Two LLMs speak to each other to complete the task

Sequential chats: Tasks will be evaluated in the order you specified

This is starting to get complicated. The carryover mechanism which contains the context accumulated over the multiple conversations is a hard concept to grasp. And why is each task still is a conversation between 2 agents ?? And why is it A=>B, A=>C, A=>D, A=>E ? Why always start with A ? God knows.

Group chat: Don't expect an explanation...

This is when things get out of hand! One agent seems to be the brain, installing some kind of hierarchy between the agents. If the concept is appealing, the examples from the documentation are not really helpful.

It also supports the last trending prompting stuff like:

ReAct: Allows to decompose actions and make a plan. Then it tries to follow each step and if things go wrong it makes another plan and starts again. It's all about creating context that has a semantic meaning to the LLM and helps it focus on what it should do right now.
Reflection: It's kind of like ReAct but with an emphasis on its own output. After "speaking" it will ask itself "Is this correct ?". And it seems that iterating over its own answers yields better results.

As always, "better results" means "fewer hallucinations" as this is the main issue with LLMs.

AutogenStudio

Also if you don't want to mess with code you can download the AutoGenStudio software that allows you to define agents without the need of coding. It's an interesting piece of software but doesn't really help you grasp the core functionality of the framework.

Conclusion on AutoGen

AutoGen clearly has a bright future in front of it. As it's made by Microsoft, we can only hope that they won't pull the plug on it or make it an OpenAI only software.

However, still, no tool calling is available with local LLMs. :-(

V. CrewAI

Another excellent piece of software.

If the documentation is okay and the framework simple it does have a few issues.

On the brightside :

Ollama support
LangChain tools calling
Custom tools calling
Human input

On the dark side:

Tool calling still isn't working!
Human input doesn't always trigger
Low consistency with infinite loops
Bugs
Soooooo many prompts to write

Available chat mechanisms you can use

There are "sequential" and "hierarchical". Sequential will allow your LLMs to go through the tasks in the order you choose. Hierarchical on the other side will create a ghost agent that automatically decides which one of your agents should be triggered using its description.

Hierarchical would be great if only it worked. There are constant errors about agents that can find their co-workers. It rapidly gets tedious.

The Framework proposes 3 types of classes:

First, you have the Agents which have the following prompts bound to them:

A role: What it does for a living
A goal: What it should do in the team
A backstory: A story of its life…

writer = Agent(
  role='Writer',
  goal='Write a fake anecdote using a number.',
  backstory='An experienced writer with vivid imagination.',
  llm=ollama_mistral,
  verbose=True
)

Then you have the Tasks that also have prompts:

description: What should the task do
expected_output: The output that is expected of this task

teacher_task = Task(
  description='Decompose the arithmetic operations.',
  expected_output='A consise list of operation to execute',
  agent=teacher
)

Finally, you have the Tools:

Tools can be bound to Agents to give them capabilities. But for some reason, they can also be bound to tasks… Which I don't think makes much sense.

@tool("sleep")
def my_sleep(nb_seconds: int) -> str:
    """Will sleep the amount of specified seconds provided as a number"""
    print(nb_seconds)
    return time.sleep(nb_seconds)

I like having the @tool decorator. You simply have to pass a string that describes your tool and the LLM should know if it should use it or not.

In the end, you'll have so many prompts to write that you'll lose yourself.

Does this prompt belong to a task or an agent? Does this tool belong to this agent or this one? Or maybe the tool should be bound to the task itself… So many questions but so few answers as the CrewAI documentation is quite scarce!

Conclusion on crewAI

If you wish to have agents speak to each other then it's the simpler framework out there. Besides having to many prompts to write it's quick and easy. However, calling tools don't work so we still have the same issue.

Also, the consistency is quite bad. Often will you see you're agents going into infinite loops.

A note on the constant Youtube AI trend bullshit: Haven't you noticed how many YouTubers have made videos on Agent frameworks. The subject is always about writing about stupid AI trends and making poor RAG systems. Well, that's because there is currently not much you can do as calling local tools isn't a thing right now. Except if you use chatGPT, Grok, or Claude.

VI. Conclusion of all conclusions

We're screwed.

Honestly, it's time that we get a way to better integrate low-cost LLMs in our applications. Calling tools is the way to go but needs a simpler architecture and one that doesn't rely on openAI's complex format.

What use of small models like PHI would we have on our mobile devices if the only thing it can do is spew text and can't integrate any of it with our applications?

If I have made mistakes or overseen anything please let me know in the comments.

Any idea how to get local code executed is good to know. Please advise in the comment section !

Thx for reading. Leave a thumbs up if you liked this article. ❤

Other authors you might like

https://medium.com/@rootOrNothingElse/the-rise-of-human-based-botnets-unconventional-threats-in-cyberspace-cb084b87c5bf