DEV Community

Scott Raisbeck
Scott Raisbeck

Posted on

Did some actual coding today - found a blind spot example for coding agents

TL;DR - Wrote some code for a new feature, had to refactor existing code to ensure we didn't have a double spend on sorting, tested to see if coding agents would spot the same issue - they didn't.

So for the past few weeks I have been busy as hell, my GitHub activity bar, like many of us now, is lit up like a Christmas tree.

I am now more or less 100% utilising coding agents, with the exception of producing 'Walking Skeletons' on anything new that I haven't tackled before (usually still with the help of AI - more so to get an understanding of it than anything else). Most of this kind of activity is taking place in Go projects, as that is something I've been learning over the past few months.

I realised the other day, that while I had been reviewing a lot of Python and C# (my more native languages), I would struggle to write something from scratch on my own again without using an AI.

I have been maintaining an open source memory mcp for AI agents and I wanted to add some additional re-ranker providers.

I figured it was a straightforward implementation, to add support for a generic HttpProvider, allowing users to utilise the likes of the majority of cloud reranker providers but also allowing them to use other self hosted rerankers such as those hostable on llama.cpp or vLLM.

My existing reranker code for the FastEmbed (the existing reranker provider) was implemented with Protocols to allow for further expansion and implementation of adapters. The Protocol dictated that the reranker provide a simple list of floats, the scores for each of the documents relevance to the query, in the order that the documents were presented and then the method calling it from the memory repository would handle ordering them in to the appropriate rank for top_k filtering.

Here's the code in the memory repository:

scores = await self.rerank_adapter.rerank(query=rerank_query, documents=documents)

scored_candidates = list(zip(dense_candidates, scores))
scored_candidates.sort(key=lambda x: x[1], reverse=True) 

top_k_memories = [memory for memory, score in scored_candidates[:k]]
Enter fullscreen mode Exit fullscreen mode

When I came to implement the HttpAdapter I noticed it was returning both the original index and the score sorted by the score already.

I realised at this point I needed to refactor the repository and the existing adapter so that the repository would now expect the reranking adapter to handle the ordering, otherwise if we left it as is, we would be having to take the new http adapter, re-order it to the original document order and then send it back to the repository only for it to re-order back by score.

Just one of them moments where you just realised you needed to do a little bit more work, however nothing major. I made the change and the tests still passed. I then asked Claude if I might have missed any impact and it confirmed I had not.

I then proceeded to implement the rest of the HttpAdapter code. When I suddenly had a thought, I wonder if Claude, or any of the other models would have picked this one up or whether they would have just gone ahead and implemented the double spend on sorting.

So I stashed the changes and fired up some coding agents to see how they would do. My workflow was simple, I used my context gather command and then entered the following:

Create a plan for implementing a http reranker adapter using httpx, the user should be able to configure a reranker http endpoint, model, api key (optional), as well as specifying a reranking provider of http. When the rerank provider is set to HTTP, we should use the provided environment variables to make http calls to the v1/rerank endpoint.

When the http provider is configured then the memory repository should use it for ranking memories

I tried with three models:

  • Claude Opus 4.6 (Claude Code Agent Harness)
  • Codex 5.3 (Co-pilot CLI Agent Harness)
  • Gemini Pro 3.0 (Co-pilot CLI Agent Harness)

All of the models implemented a change to return the sorted list of scores, that would then be re-sorted back to the order we originally received from the reranker API. Effectively resulting in a double spend on sorting.

Now to be clear, the models might have been trying to stick to the task at hand, the prompt did not mention any optimisation. My Global Claude.MD (which my copilot cli is also reading) file has some development philosophy that encourages only making changes to achieve the objectives, but also has the following from a list of Development Philosophy:

When extending existing code, understand why the current design works the way it does, not just what it does. Don't assume existing patterns are optimal - if something seems off, raise it with the user before proceeding rather than silently conforming or silently refactoring.

There is good reason you might not want them to go off on a tangent refactoring everything, however at the same time none of the models highlighted this inefficiency or even considered it an issue.

For me when I encounter something like this, I think about my guardrails and prompts, and to some extent there might be something I can come up with, but also part of me was brought back to a recent talk by Professor Michael John Wooldridge at the Royal Society, This is not the AI we were promised, that I highly encourage people to watch.

Anyhow, that's me for my Sunday musings.

Top comments (0)