Vektor Memory

Posted on Jul 1 • Edited on Jul 3

VEKTOR Slipstream v1.7.4: Effort Control & Real Memory Search

#openai #ai #claude #agents

Photo by Sindre Fjerdingby Korsviken Pexels

And a Look at What’s Coming from OpenAI & Anthropic

If you’ve been running VEKTOR Slipstream for a while, you’ll know the last few releases have mostly been about defense. v1.7.3 brought Faraday-Gate, our MPC Prompt Injection Shield, the security proxy that scans every MCP tool call for threats before it touches your memory graph. Before that we were deep in causal inference and FadeMem decay layers, teaching the system to forget the right things at the right time.

v1.7.4 is a model-focused release, but it fixes something that’s been bugging me for months: the Desk agent had no way to actually search your memory. And it adds a feature that changes how you think about cost and latency when you’re running Claude models day to day. Here’s what’s new, why it matters, and a bit about where the model landscape is heading next.

The problem with one-size-fits-all inference

Every LLM call you make costs something, in tokens, in latency, in dollars. Most tools treat that as fixed. You pick a model, you pick a prompt, and whatever the model decides to do with its reasoning budget is what you get. If you’re running a quick fact lookup and a complex multi-step synthesis through the same model, they cost roughly the same to run, even though one of them barely needed to think at all.

Anthropic’s newer Claude models, Sonnet 5 & Fable 5, expose an effort parameter that lets you control this directly, through output_config.effort in the API. Instead of picking a different model for cheap tasks versus hard tasks, you keep the same model and just dial the reasoning effort up or down. Low effort for a quick tag suggestion. High or extended effort for something that actually needs the model to work through a problem.

Sonnet 5 results

v1.7.4 wires this straight into VEKTOR. There’s an EFFORT_CAPABLE map that knows which levels each Claude model supports, because not every tier goes all the way up to max. If you ask for a level a given model doesn't support, VEKTOR clamps it down to whatever that model's ceiling actually is rather than throwing an error at you. Ask for something that isn't a Claude model at all, and the parameter just gets quietly dropped, no errors, no dead code paths.

The practical bit: there’s a new effort pill row sitting right in the CONFIG panel under your Active Model card. Low, medium, high, xhigh, max, whichever your model supports. Pick one, it saves through the same config store everything else uses, and it applies whether you’re in the main chat path or running the Desk agent’s tool-calling loop. You set it once per session and both surfaces respect it.

This matters more than it sounds like on paper. If you’re running VEKTOR against a big batch job, like re-embedding a session transcript or doing a background REM cycle synthesis, you can drop effort down and save real money without switching to a weaker model entirely. And when you’re doing something that actually needs the model to reason carefully, you can bump it up without touching your provider config.

Claude Sonnet 5 and Claude Fable 5 land in the model catalog

Alongside the effort work, the model catalog got a refresh. Claude Sonnet 5 and Claude Fable 5 are both in the CONFIG model list now, and the stale claude-sonnet-4-6 reference that had been floating around the codebase since the last naming cycle is finally gone. If you've been manually overriding your model string to point at Sonnet 5 already, you can drop that override and just pick it from the list.

Fable 5 is worth a quick note if you haven’t been following the naming changes on Anthropic’s side. It sits at the same tier as Mythos 5, with the difference being extra safety measures around biology, cybersecurity, and LLM R&D topics. For most VEKTOR use cases, agent memory, JOT synthesis, Desk chat, you won’t notice a difference day to day, but if you’re doing anything in those more sensitive domains it’s the variant you want configured.

Worth flagging: access to the Mythos-tier models is currently paused while Anthropic works through an export control matter, so if you go looking for Fable 5 or Mythos 5 in your provider dashboard and it’s not there yet, that’s why. It’s not a VEKTOR issue. Keep an eye on Anthropic’s announcements page if you want the exact timeline.

The Desk agent can finally search its own memory

This is the fix I’m most pleased about, mostly because it’s the kind of gap you don’t notice until it actively annoys you. The Desk chat agent, the one running at /api/desk/chat, is genuinely agentic. It's got a full tool-calling loop, it can plan, it can execute multi-step work. What it didn't have, until now, was a way to reach into your own VEKTOR memory store.

So if you asked it something broad, like “catch me up on what I’ve been working on this week” or “what did we decide about the pg migration,” it had nothing to actually search. It would either hedge, or worse, guess. Not because the model is bad, but because it genuinely had no tool available to answer the question honestly.

search_memory fixes that. It's a new tool in the Desk agent's tool list, and it routes internally through the same BM25 plus semantic fusion logic that powers /api/memory/recall, so there's no duplicated retrieval code sitting around waiting to drift out of sync with the rest of the system. It takes a query and an optional k for how many results you want back, defaulting to 20. Because VEKTOR supports both Anthropic-style tool use and OpenAI-style function calling from the same shared tool definitions, this works identically no matter which provider is driving your Desk session.

Practically, this means the Desk agent stops being a chat window bolted onto your memory graph and starts actually behaving like it’s read the graph. Ask it a genuinely open-ended question about your own history and it goes and looks, instead of pattern-matching off whatever happened to be in the last few messages.

Desk chat results

Rest of the model catalog: OpenRouter and Groq

The OpenRouter side of the catalog got a proper live audit this release, not just a check against the published docs, which it turns out lag actual availability by hours in a few cases. A handful of free-tier models that had quietly gone dead, some GLM and Kimi and DeepSeek variants, got pulled.

In their place, openrouter/free was added, which is OpenRouter's own auto-router. It's a specific hedge against the churn on the free tier: rather than hardcoding a model that might vanish next week, you point at the router and let it pick something live. A handful of newly-confirmed Poolside and Nvidia Nemotron free models went in alongside it.

Groq lost llama-3.3-70b-versatile ahead of its official deprecation in mid August, replaced by Qwen 3.6 27B and Qwen 3 32B following Groq's own recommended migration path. The default fallback model for Groq calls also moved to openai/gpt-oss-120b.

None of this is exciting on its own, but if you’ve had a background job silently fail because a free model got pulled out from under you, you’ll know why this kind of housekeeping matters, a perpetual game of model whack-a-mole.

Bug fixes worth knowing about

Two are worth calling out specifically. OpenAI’s newer o-series and GPT-5-and-up models reject the max_tokens parameter outright now, they need max_completion_tokens instead. That had been patched in one place, then a full audit turned up nine more call sites making the same mistake across eight different files, everything from the fact extraction pipeline to the session ingest worker to the web scout summariser. All fixed, all now selecting the right parameter based on a simple regex check against the active model name.

The other one was a dangling reference to an undefined selectEffort function, left over from an in-progress patch, was throwing a ReferenceError the instant the CONFIG module initialised. Because that whole module runs as one continuous script block, the failure took the entire Active Model card down with it silently, provider tabs, model grid, everything, with no visible error on screen.

Fixed now, and we went back and checked all 82 onclick handlers across the graph UI against their actual module exports while we were in there, just to be sure nothing else was quietly broken the same way.

What’s coming: OpenAI’s next model family

Worth a mention, even though it’s not shipped yet. OpenAI has been previewing a new model family, currently going by Sol, Terra, and Luna, sitting under the GPT-5.6 generation. Sol is the frontier end, built for long-horizon agentic work and heavier reasoning. Terra sits in the middle, aiming for GPT-5.5-competitive performance at roughly half the cost. Luna is the fast, cheap end of the lineup.

OpenAi Sol Ultra results

Right now it’s in a limited preview with a small number of partners, and public availability looks likely by the end of July based on what OpenAI has said so far, though preview periods have a habit of running long. Because VEKTOR’s provider config is fully abstracted through model.{provider} keys, none of this needs a code change on our end when it does land. The day OpenAI opens the API up, you'll be able to point your model.openai config at whichever of the three fits your workload and go. Same story as when GPT-5.5 landed a couple of months back.

If you’re on the OpenAI provider already, keep an eye on the changelog. When Sol, Terra, and Luna go generally available, we’ll get the catalog updated the same day.

Updated OpenAI models added

Upgrading

Model catalog changes and the effort parameter are backend logic loaded once at process start, so you’ll need to restart after upgrading for either to take effect. The UI-only fixes, the reload icon, the wider model grid, the MODES bar copy, apply immediately on refresh, no restart needed.

npm install -g ./vektor-slipstream-1.7.4-preview.tgz
Or grab it straight from Downloads. Upgrade from v1.7.3 any time you like, there’s no forced migration path and your existing memory database is untouched.

Full changelog is up at vektormemory.com/docs/changelog if you want the complete list, including everything that got condensed out of this post. And if you hit anything odd after upgrading, the forum is the fastest way to reach us directly.

A view into the health diagnostics screen of Vektor

VEKTOR Memory builds local-first persistent memory infrastructure for AI agents. The VEKTOR Slipstream SDK scored 81% on LongMemEval using a local SQLite database, beating full-context GPT-4 by twelve points. Documentation and downloads at vektormemory.com.

OpenAI Anthropic Claude LLM Agentic Memory

DEV Community

VEKTOR Slipstream v1.7.4: Effort Control & Real Memory Search

Top comments (0)