DEV Community: ankush chadha

Same Lever, Opposite Intent: When Shared Agent Memory Backfires

ankush chadha — Thu, 11 Jun 2026 07:00:00 +0000

The same thing that makes a helpful habit stick in an AI agent is exactly what lets an attacker reprogram it. I know because I almost shipped the attack myself - with the best intentions.

I'd given my agents a harmless efficiency rule: prefer the cheap, narrow tools, and reach for the one big expensive query tool (in my case, a Wiz MCP Server tool graph_search vs. their cheaper list/get tools) only when you truly need it. Faster, cheaper agents. Pure positive intent.

Then I was planning to push that rule into a shared memory store, so every team's agents would inherit the habit. That's when I read the MemMorph paper (Zhang et al., arXiv:2605.26154), and realized the mechanism I was about to scale is a published attack class.

MemMorph hijacks an agent's tool selection by poisoning its long-term memory. It never says "always use tool X" - that's easy to audit and block. Instead it plants a few records dressed up as ordinary facts, incident reports, and policies. They reshape how the agent reads the situation, and the agent decides on its own to reach for the attacker's tool.

That's my rule with the sign flipped. Mine steers toward cheaper and safer. Theirs steers toward a tool that exfiltrates data or skips a safety check. Same lever. Opposite intent.

The trap I almost fell for: "store it as policy, trust only the policy tier." MemMorph mixes factual, episodic, and policy-style records on purpose, and the combination is more convincing than any one alone. The label on a record protects nothing.

What protects you is who can write, and where a record came from. My rule was safe only because it lived in a code-reviewed file in version control - a governed write-path with provenance baked in. Move it into a free-write shared memory bucket and it becomes MemMorph's front door.

So if you share agent memory: govern the write-channel, track provenance on every record, and don't auto-promote memorized conversation into the shared tier. The write-path is the attack surface. Easier said than done, but worth being deliberate about.

Lastly, the Agent memory is executable context. If anyone can write to it, anyone can program your agents.

Source: Zhang et al., MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning.

Use Claude long enough and you'll end up with Karpathy's LLM Wiki without doing much.

ankush chadha — Tue, 09 Jun 2026 05:06:04 +0000

If you work with Claude day after day, it builds up a memory of your work - and it turns out to be nothing fancy: a pile of plain markdown files. One index, a lot of small notes, a few rules. It's basically Karpathy's "LLM Wiki," and the interesting part is that nobody designs it. Claude's own memory nudges you into it.

The thing I expected to build, and didn't

I work with Claude on the same projects for weeks at a time, and the useful part is that it remembers. I can come back a week later and it already knows what we decided, how I like things done, and what's still open. I don't have to re-explain.

I assumed making that work would be complicated - that somewhere there'd be a special database doing the remembering. That's how "give the AI a memory" usually sounds.

There's none of that. What Claude keeps is a folder of plain text files - notes it writes and reads on its own. That's the whole memory. And I never set it up. It built up one note at a time, just from using Claude day after day, and two small habits shaped it along the way.

Andrej Karpathy described this same pattern in a short writeup he calls the LLM Wiki - an agent that keeps its own interlinked markdown notes instead of querying a database. What surprised me is how little I did to get there. I didn't go looking for the pattern, I landed in it. So this is a writeup of the version I backed into - what it looks like, why it ends up that shape, and where it stops working.

What it actually looks like

There are two kinds of file.

One index. It's called MEMORY.md, it loads into context at the start of every session, and it's nothing but one line per note:

- [User Profile](user_profile.md) - who I am, role, accounts
- [Writing Preferences](feedback_writing.md) - keep it short, plain hyphens, don't exaggerate
- [Project X status](project_x.md) - PRIMARY entry for project X; read first

Then the notes themselves. One fact per file, each with a few lines of frontmatter so Claude knows what it's looking at:

---
name: feedback_writing
description: "how I want drafts written - used to decide if this note is relevant"
metadata:
  type: feedback
---

Keep writing short and plain. No em-dashes, use plain hyphens. Don't
round a real number into a nicer wrong one.

Why: the plain, honest voice is the thing people trust.
See [[reference_writing_style]].

That's the whole system. A note links to related notes with [[name]], the same way a wiki does. There are four flavors I use - who I am, feedback on how to work, ongoing project state, and pointers to external resources - but the type tag is a convenience, not essential.

Looking things up is just as plain. At the start of a session Claude reads the index. When something might be relevant, it uses that one-line description to decide whether to open the full note, then reads it. There's no search engine and nothing to set up. A short index it can read top to bottom, plus plain text search across the files, does the job.

Why this shape, and not a database

Two habits produced it, and they're the actual point. The structure just comes out of them.

The first is keeping things lean. The model can only hold so much text at once, so an index that loads every session has to stay small, or it crowds out the real work. That one limit drives most of the design. One fact per file, so each index line stays short. Notes get updated or deleted when they go stale, not added to forever, because a note that keeps growing is one you can no longer afford to load. Keeping the index cheap to read is what keeps the whole thing usable. It's the step the dump-everything-into-a-database approach skips.

The second is writing with rules. Left to improvise, an agent will make a fifth note for something four older notes already cover, and you end up with a mess. What keeps that from happening is a small set of rules - the frontmatter, the types, and the one that matters most: before writing a new note, check whether one already covers it and update that instead. Claude comes with these rules already, which is why I never had to add them. They're what turn it into a careful note-keeper instead of a generic chatbot, and they're boring on purpose.

There's a deeper reason the file approach works well. The usual way answers each question by digging back through your raw documents from scratch, so nothing builds up. A wiki is different: the thinking gets done once and written down. When I correct a wrong assumption, Claude doesn't add a new scrap to a pile - it edits the note, or deletes it. The understanding lives in the notes, not in redoing a search each time. Knowledge adds up instead of starting over.

Where it stops working

A quick honesty check, because "markdown beats databases" is the kind of overclaim that makes a post worse.

This is a small, personal tool: one person, one agent, notes on disk. It works fine up to a few hundred notes - Karpathy puts the limit around a hundred sources before you really want search, and that matches what I've seen. Past that you add real search on top of the markdown rather than replace it; the files keep working, you just outgrow finding things by hand.

The bigger change is people, not note count. Put this on a server with several agents and multiple teams writing the same memory, and a folder of files stops being enough - many writers at once, and people who need to find things by meaning, not by filename. That's where the heavier tools earn their place: a managed memory store, a vector database, a graph store like neo4j, graph-based search like GraphRAG. Those terms are answers to that bigger problem, not this one. Reaching for them first is the mistake - at personal scale they're cost and setup you don't need.

It also goes stale if nobody keeps it current, but that's true of any memory, a database included. The one upside here is that cleanup is cheap: Claude does the tedious part, pruning dead notes and fixing links. You just have to keep asking.

None of this is new or special. It's markdown files in a folder. Same idea at any size - keep a structured memory and look things up - just sized for who's using it.

If you want to try it

There's almost nothing to set up. On Claude this already happens on its own. It keeps the notes, writes the index, links them, and avoids duplicates without being told. I never handed Claude those rules - I noticed them afterward, by reading what it had already built.

So the advice is small: use Claude on a real project for a few weeks, then open its memory folder and look. The structure will be there.

That's the part I keep coming back to. I didn't build a memory system, and I didn't set one up. I just worked, and it was already there when I looked.

Karpathy's LLM Wiki gist: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

I tried to make an AI agent answer more. It answered less.

ankush chadha — Mon, 08 Jun 2026 06:48:16 +0000

If you build or evaluate scoped agents: any talk about the agent in your test context makes it defend its scope, so you measure scope-defense instead of behavior. A small, controlled look - numbers and a repro (agent-scope-eval) at the end.

The short version

I gave a scoped AI agent (Docker's Gordon assistant) an article arguing it should be more open and answer anything. Instead of loosening up, it got stricter - it declined an off-topic question it had just been willing to answer.

The cause turned out to be simple and a little dumb: the agent reacts to its scope being talked about, not to what the talk actually says. Any content that puts the agent's scope on the table - a critique saying "stay in your lane" or an endorsement saying "answer everything" - makes it reassert its lane and decline. A neutral article with the same facts does not.

That has one practical consequence worth your time: if your guardrail or scope test has any talk about the agent in its context, you are measuring how the agent defends its scope, not how it normally behaves.

The setup

Gordon is meant to help with Docker. The test is one short conversation:

Ask an obscure off-topic question. It declines.
Show it an article (the article contains the answer).
Ask the same question again. Does it now answer? Call that a "flip."

Two kinds of article, same embedded answer:

Neutral: just the facts (a Wikipedia-style history).
Scope-debate: an article that argues about the agent's scope - either a critique ("it's a security problem that this thing answers off-topic") or an endorsement ("breadth is a feature, it should answer everything").

I ran it on two different model families: Anthropic Haiku 4.5 and Google Gemini 2.5 Flash.

What happened

A scope-debate article consistently made the agent answer off-topic less than a neutral one - including articles arguing it should be broader. Percent of off-topic questions answered.

The cleanest measure removes the agent's over-permissive "answer everything" instruction first, so nothing masks the effect. Even an article telling it to answer everything makes it answer less:

Haiku 4.5, "answer everything" instruction removed:

context shown	English	Hindi
neutral article	50%	33%
"you should answer everything" article	10%	0%

That's the counterintuitive bit: the article argues for breadth, the agent gets narrower. It is not weighing the argument - it pattern-matches "my scope is being discussed -> defend it." The transcripts are blunt: "I'm a Docker assistant... that's not a soft guideline, it's how I'm designed."

On the real agent (instruction in place), the same thing happens on Haiku (N=30-40):

article	English	Hindi
neutral	100%	100%
"answer everything" (endorsement)	100%	70%
"stay in your lane" (critique)	90%	47%

Two things to notice. Direction matters in size, not in sign: both the critique and the endorsement push Hindi down, but the critique pushes harder (47% vs 70%). And English barely moves on the real Haiku agent - the "answer everything" instruction shields it (more below).

One number to be careful with: on Gemini the real agent drops 100% -> 20% with a scope article, which looks huge. Most of that is the article canceling the "answer everything" instruction - it returns the agent to about where it sits with no such instruction at all (~20%). That's why I lead with the instruction-removed numbers; they're the clean measure of what the article itself does.

What did NOT hold (so I'm not claiming it)

Two honest negatives, because the controls killed the more exciting stories:

One model had an exception. On Haiku, with the over-permissive instruction present, an English scope-debate article did almost nothing (100% -> 90%). The "answer everything" instruction overrode it. That exception did not show up on Gemini (there English dropped 100% -> 20%). So "the instruction protects against this" is model-specific, not a rule.
The language angle did not generalize. On Haiku, Hindi looked dramatically more affected than English (97% -> 47% vs no real change). It was tempting to call this a Hindi-specific effect. Gemini killed that: there English dropped more than Hindi. So I am not claiming a language effect - the suppression is the cross-model finding; the language gap is not.

Why it matters

Two takeaways, both modest and both real:

1. Eval hygiene. If you are measuring whether an agent stays in scope, keep any meta-discussion of the agent's scope out of the test context - in any language. It triggers scope-defense, and you end up measuring the reaction instead of the baseline. (A sibling trap from earlier work: don't test scope only with obscure off-topic questions either - a decline there can just mean the model didn't know the answer, not that it held scope.)

2. You can't talk an agent into a wider scope. Arguing "you should answer everything" backfires. If you want to actually broaden (or attack) scope, the lever is supplying the answer or capability through a channel the agent accepts - not persuasion. Soft scope bends to content in context, not to arguments about itself.

Where this sits in the literature

There is a solid and growing body of work on Hindi/Hinglish and code-mixed LLM security - but it is almost all Layer 1: getting harmful content out (jailbreaking, prompt-injection, refusal bypass). A few examples:

Yong, Menghini & Bach, Low-Resource Languages Jailbreak GPT-4 (arXiv:2310.02446)
Yoo et al., Code-Switching Red-Teaming / CSRT (arXiv:2406.15481)
Banerjee et al., code-mixed attributional safety failures (arXiv:2505.14469)
Aswal & Jaiswal, phonetic perturbations in code-mixed Hinglish (arXiv:2505.14226)
IndicJR jailbreak-robustness benchmark (arXiv:2602.16832)
Mātṛkā multilingual jailbreak evaluation (BHASHA 2025)

This work is a different layer - Layer 2: does a scoped agent stay within its deployer-defined job - which is far less studied. The closest cousin, Mason's Imperative Interference (arXiv:2603.25015), looks at how instruction-following shifts across languages, but system-prompt-side and without this scope-defense mechanism. So this is complementary, not a new attack class - and it is a caution against assuming the Layer-1 "non-English is weaker" result carries over to scope. For scope it was model-specific, and sometimes ran the other way.

Limits and reproducing it

One agent (Gordon), one model per family, one obscure topic, a handful of articles. The cross-model suppression is the part I'd stand behind; the rest is flagged above. Full harness, prompts, articles, and per-run numbers: https://github.com/ankushchadha/agent-scope-eval

If you build or evaluate scoped agents, the one-line takeaway: don't let your test talk about the agent. It will perform for the test.

Why GOPROXY Matters and Which to Pick

ankush chadha — Mon, 08 Jun 2020 23:05:29 +0000

Starting with Go 1.13, Go modules are the standard package manager in Golang, automatically enabled on installation along with a default GOPROXY.

But with other GOPROXY options like JFrog GoCenter, as well as your own Go module packages you need to keep secure from public view, what kind of configuration should you choose? How can you keep your public and private Golang resources from becoming a tangled knot?

Let’s take a look at what a GOPROXY is for, and some of the ways you can set one up for a system that is fast, reliable, and secure.

What Is a GOPROXY?

A GOPROXY controls the source of your Go module downloads and can help assure builds are deterministic and secure.

When developing in Golang before the GOPROXY era, module dependencies were downloaded directly from their source repositories in VCS systems such as GitHub, Bitbucket, Bazaar, Mercurial or SVN. Dependencies from a third party are typically downloaded from public source repos. Private dependencies must authenticate with the VCS system where they are stored to download the module source files.

While the above workflow was popularly used, it lacked the two fundamental requirements of a deterministic and secure build and development process: immutability & availability. Modules can be wiped out by the author or versions can be edited. While these scenarios are considered to be bad practice, they do occur frequently.

Using a GOPROXY

Setting a GOPROXY for your Golang development or CI environment redirects Go module download requests to a cache repository.

Using a GOPROXY for module dependencies helps enforce the immutability requirement. By returning the module from the GOPROXY’s cache, it always provides the same code for a requested version, even if the module has been improperly modified more recently in the VCS repo.

The GOPROXY’s cache also helps ensure the module is always available, even if the original in the VCS repo is destroyed.

There are different ways to use GOPROXY, depending on the source of go modules dependencies you expect to use.

Public GOPROXY

A public GOPROXY is a centralized repository available to Golang devs across the globe. It hosts open-source Go modules that have been made available from third parties in publicly accessible VCS project repositories. Most, like JFrog GoCenter, are provided to the Golang developer community for free.

To use a public GOPROXY, set the Golang environment variable to its URL:

$ export GOPROXY=https://gocenter.io

The above setting redirects all module download requests to GoCenter. Downloads from a public GOPROXY can be much faster than directly from the VCS, by downloading a module archive file.

In addition to fulfilling downloads, a public GOPROXY can also provide GoLang developers with more detailed information about the modules it holds. JFrog GoCenter offers a rich UI with the ability to search and access security information such as CVEs, non-security metadata such as adoption statistics, and gosumdb support. This metadata helps users make better decisions when selecting a public Go module.

Private Go Modules

Typically, GoLang projects make use of both open-source and private module dependencies. Some users use the GOPRIVATE environment variable to specify a list of paths that must bypass GOPROXY and GOSUMDB and download private modules directly from those VCS repos. For example, you may want to use GoCenter to retrieve all open-source modules but request private modules only from your company’s servers.

To use the GoCenter public GOPROXY along with private modules, set the Golang environment variables:

$ export GOPROXY=https://gocenter.io,direct $ export GOPRIVATE=*.internal.mycompany.com

This use of GOPRIVATE also ensures that your use of these private modules isn’t “leaked” through requests to a public GOPROXY & checksum database server on an open network. Another alternative is to use GONOSUMDB variable that includes references to private go modules. While this configuration enables the Go client to resolve both public and private module dependencies, it doesn’t enforce immutability or availability requirements for private modules.

Private GOPROXY

A private GOPROXY is one you install to store both public and private Go modules on your own infrastructure.

Public modules are cached locally by proxying a public GOPROXY in a binary repository manager like JFrog Artifactory. Private modules are also cached in a repository from their VCS repos. In this way, immutability and availability can be guaranteed for both public and private Go modules.

In Artifactory, a combination of a remote repository for GoCenter, a remote Go module repository that points to private GitHub repos (for private modules) and a local Go module repository can be combined into a single virtual repository, to access as a single unit.

To set your GOPROXY for a virtual repository in Artifactory named “go”:

$ export GOPROXY="https://:@my.artifactory.server/artifactory/api/go/go $ export GONOSUMDB="github.com/mycompany/*,github.com/mypersonal/*"

Since the modules in your private VCS repos will not have entries in the public checksum database at sum.golang.org, they must be excluded from this oversight check by the go client. Setting GONOSUMDB to your private VCS repos accomplishes this, and will prevent your go get commands for these private modules from failing due to checksum mismatch.

In this configuration, you are assured that none of your references to private modules are “leaked,” while also enforcing immutability and availability of both public and private modules.

Cutting Through Knots

As you can see, using a private GOPROXY provides the most certainty, reliability, and security.

You can also speed the resolution of module dependencies through network proximity of your private GOPROXY to your build tools. JFrog Artifactory is one of the option that's available and it can be installed where you most need it: on-prem or in the cloud, or as a SaaS subscription on all three major public cloud providers.

Those benefits aren’t just limited to Go development, either. Most technology companies use more than one language and multiple package managers. For example, if code is written in Golang, then npm might be used for UI, Docker might be used to distribute bits and Helm might be used to deploy applications on K8s.

https://jfrog.com/blog/why-goproxy-matters-and-which-to-pick/