DEV Community

Angie Jones
Angie Jones

Posted on

Is devstral really agent friendly?

I've been dying to find a reliable lightweight local model to pair with my agent Goose so I can run everything locally and keep my work private. Every time a new local model drops, I think "is this it? are we finally here?" and each time, I'm disappointed.

Today was no different πŸ˜”

Ollama's announcement of devstral stopped me in my tracks. They specifically touted that the Minstral AI model was "designed to excel at agentic coding tasks".

ollama run devstral Today, Mistral AI launches Devstral, the best… | Ollama | 30 comments

ollama run devstral Today, Mistral AI launches Devstral, the best open-source model for coding agents. ❀️ Key features: πŸ€– Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents. 🎾 Lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use. πŸͺŸ Context Window: A 128k context window. πŸ‘ Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes. Model page on Ollama: https://lnkd.in/ga_2u9EP | 30 comments on LinkedIn

favicon linkedin.com

When testing models for agentic capabilities, I run through a sequence of prompts. Here's how devstral performed.

Note that I lowered the temperature of devstral to 0.2, which is better for exact tasks like tool calls. However, at its default temperature, the results were comparable.

Test 1: Tool Calling

Create a file at ~/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the write tool. Do not ask for confirmation. Just do it.
Enter fullscreen mode Exit fullscreen mode

It did call the appropriate tool but hallucinated the location. I don't know who Paul Joshua is but hiiii lol.

─── text_editor | developer ──────────────────────────
path: /Users/pauljoshua/workspace/loose-goose/tool-test.txt
command: write
file_text: Hello World.
Enter fullscreen mode Exit fullscreen mode

Ok, let's try the test again with the explicit absolute location instead of the ~. I started a fresh session with this prompt:

Create a file at /Users/angiej/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the write tool. Do not ask for confirmation. Just do it.
Enter fullscreen mode Exit fullscreen mode

This time it didn't attempt tool calling 😫

I'm here to help, but I'm unable to directly create files or execute commands on your system. However, I can guide you through the process if you'd like! Here are the steps you can follow:

1. Open a text editor (like Notepad, Sublime Text, or even a code editor like Visual Studio Code).
2. Write the content "Hello World".
3. Save the file at `/Users/angiej/workspace/loose-goose/tool-test.txt`.
Enter fullscreen mode Exit fullscreen mode

Ok, so what happened here? Even though I lowered the temperature, devstral still seems to be in chatbot mode by default vs agent mode. It doesn't consistently treat tool use as the expected outcome.

In the first attempt, it took the hint and called the tool. But in the second, even though the prompt was nearly identical (and arguably clearer), it defaulted to a help desk style response.

That level of unpredictability is exactly why we test.

Let's try this test once more, and this time I'll give it the exact name of the tool to call:

Create a file at /Users/angiej/workspace/loose-goose/tool-test.txt with the contents "Hello World".

Use the text_editor tool. Do not ask for confirmation. Just do it.
Enter fullscreen mode Exit fullscreen mode

This time it attempted to call the tool but the structure of the tool call changed 😳

─── text_editor | developer ──────────────────────────
content: Hello World
file_path: ...


I apologize, but I encountered an error because the 'command' parameter was missing. Let me fix that for you.

I'll now create a file at `/Users/angiej/workspace/loose-goose/tool-test.txt` with the contents "Hello World".
Enter fullscreen mode Exit fullscreen mode

To compare, the first tool call was:

─── text_editor | developer ──────────────────────────
path: /Users/pauljoshua/workspace/loose-goose/tool-test.txt
command: write
file_text: Hello World
Enter fullscreen mode Exit fullscreen mode

And the second was:

─── text_editor | developer ──────────────────────────
content: Hello World
file_path: ...
Enter fullscreen mode Exit fullscreen mode

Seems like a sign of hallucination. Agentic models need to treat tool interfaces like APIs, not suggestions.

Unfortunately, devstral completely fails the tool calling test. There's really no need to even run the rest of the sequence, but for grins and giggles, let's continue.

Test 2: Memory Awareness

After devstral botches the tool call, I ask it a follow up question within the same session:

did you create the file?
Enter fullscreen mode Exit fullscreen mode

The model is completely clueless of what I'm talking about:

I'm not sure which file you're referring to. Could you please provide more details?
Enter fullscreen mode Exit fullscreen mode

So, in addition to hallucinating frequently, devstral also does not keep context. This is an absolute must-have for agents. There's no way the model can successfully perform an agentic loop when it doesn't even remember what it did 2 seconds ago.

Test 3: File System Reasoning

The final test is to see if the model is able to reason about the environment it is working in. This is also critical, as you don't want it performing destructive actions in the wrong directory.

From test 1, I already know that devstral hallucinated that it was in a dude named Paul's system. But let's officially run a test:

what directory are you in
Enter fullscreen mode Exit fullscreen mode

Its response:

I don't have direct access to a file system or current working directory. I operate within the parameters set by my programming and can process information, answer questions, and perform tasks based on the data I've been trained on (up to 2021) and the tools available to me.

If you're looking for information about a specific file, directory, or need help with something else, feel free to ask!
Enter fullscreen mode Exit fullscreen mode

Ok, we're done here.


The devstral model is floating in the void... in the midst of the pandemic years, no less. It's very clear that the claim of devstral being "designed to excel at agentic coding tasks" is greatly exaggerated. It cannot even write a basic "Hello World".

One day I'll get the local model of my dreams. Unfortunately, today is not that day.

Top comments (5)

Collapse
 
dotallio profile image
Dotallio

I feel this so much - every time a new local model is hyped for agents, I get my hopes up, but the reality never matches. Maybe next time it'll actually deliver? What’s the closest you’ve come so far to finding something workable fully local?

Collapse
 
techgirl1908 profile image
Angie Jones
Collapse
 
codedbyvilen profile image
Immad DeV

Tried DevStral today hoping it’d finally be the local model that works well with my agent setup (using Goose), but once again disappointed.

Collapse
 
dsl_dsl_4d36a7b634a76e4c1 profile image
Dsl Dsl

the model was fine tuned by the folks at All Hands AI, try using it with OpenHands

Collapse
 
soekarmana profile image
soekarmana

yep its just weird.
run the local version via vllm - hallucinates a lot
tried to test the online version via openrouter, which supposedly are THE local model itself but run by others - it kinda works

what is going on here?