Skip to content

DEV Community

Building an AI agent that works in production Series' Articles

Back to ThomasP's Series

ThomasP

Mar 17

Why finding where a product is made is an AI problem

#ai #machinelearning #webdev #beginners

9 min read

ThomasP

Mar 23

The prompt engineering that didn't work (and what did)

#ai #llm #promptengineering #machinelearning

9 min read

Cover image for Why your LLM agent needs a benchmark before it needs a prompt

ThomasP

Mar 27

Why your LLM agent needs a benchmark before it needs a prompt

#ai #llm #agents #testing

8 min read

Cover image for GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

ThomasP

Mar 28

GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

#ai #llm #benchmark #agents

8 min read

ThomasP

Apr 8

LLM-as-Judge: using Claude to review a Gemini agent

#ai #llm #agents #evaluation

7 min read