DEV Community: Alex Chaplinsky

Postman for AI – a tool that has been missing for a while

Alex Chaplinsky — Thu, 19 Mar 2026 07:54:12 +0000

Everyone’s building AI agents nowadays. I have also been building AI agents for several years now. At some point, I got tired of the same friction: vendor playgrounds that don’t support variables or multi-provider comparisons, cloud SaaS tools that are great at one thing (logging, evals, tracing) but force you to stitch four of them together, and all of it routing your prompts and API keys through someone else’s servers.

So I (with the help of AI and other humans) built Reticle — a local desktop app for designing, running, and evaluating LLM scenarios and agents.

Yes, you could do all of this with code. Just like you can test APIs with curl. But Postman exists for a reason — the right GUI collapses the feedback loop, makes iteration faster, and catches issues you'd only find after shipping. That's what Reticle is trying to be for AI development.

Here’s how I built it, and the key decisions behind each part.

The stack: Tauri + React + Rust + SQLite

Reticle is a Tauri app — a React frontend with a Rust backend, packaged as a native desktop binary. Everything lives locally: all scenarios, agents, test cases, run history, and API keys are stored in a SQLite database on your machine. No account, no sync, no cloud.

The Local-first approach was the main idea for this app. Your keys never leave your machine. Your prompts and traces are yours alone. There’s no subscription standing between you and your own development environment — and no vendor lock-in on your iteration history.

There’s also a practical performance angle: a local SQLite database with no network round-trips is fast in a way that cloud-backed tools simply can’t match for tight iteration loops. When you’re running evals across 50 test cases, that difference is noticeable.

This wasn’t just a privacy checkbox for me — it was a first-class design constraint. Everything else in Reticle’s architecture follows from it.

Scenarios and Agents: the two first-class citizens

Reticle is built around two core primitives. Scenarios are single-shot LLM calls — a system prompt, conversation history, model config, and variables. Agents are ReAct loops where the model reasons, calls tools, gets results, and iterates until it reaches a final answer.

Scenarios are where you work out what the model should say and how it should say it. The {{variable}} syntax lets you define a prompt template once and fill in values per test run, because in production, prompts are always templates, and testing them with hardcoded strings doesn't reflect real behavior. The same scenario can be run across OpenAI, Anthropic, and Google in one click, with outputs, latency, and cost landing side by side for a direct comparison.

Agents are where things get more complex and more interesting. The hard part of building agents isn’t getting them to work; it’s debugging them when they don’t. Every agent run streams a structured event log in real time: loop iterations, LLM requests with exact messages, tool calls with arguments and results, token usage and latency per step. When the model passes a wrong argument to a tool or gets stuck in a loop, you see exactly where and why.

Runs history and built-in usage tracking

Every run also records token usage and calculates an estimated cost based on each model’s published per-token pricing. It’s an approximation, not your exact cloud invoice. But it’s accurate enough to answer the questions that actually matter during development: which model is cheapest for this use case, how much does a full eval suite cost to run, and why did that last agent run cost 10× the previous one. That last question usually has an answer — a runaway loop, an unexpectedly long context, a model being used where a smaller one would’ve been fine. Token-level visibility makes it findable instead of mysterious.

All runs are stored and fully inspectable after the fact. You can go back to any previous execution, see exactly what the model received and returned, and compare behavior across runs. This matters more than it sounds: the number of times I’ve made a change, gotten a worse result, and had nothing to compare against used to be embarrassing. Now the history is just there.

Evals: text matching, schema validation, and LLM-as-judge

The eval system is what helps an engineer sleep at night. Prompt changes, model upgrades, and tool updates are all silent regressions waiting to happen. Unless you have assertions in place.

Reticle supports five assertion types: contains/equals/not_contains for text, json_schema for structured output validation via AJV, tool_called/tool_sequence for verifying agent behavior, and llm_judge for anything subjective.

The llm_judge type is the one that is useful for non-deterministic outputs. You write a criteria statement in plain English — "the response should be empathetic and avoid technical jargon" — and delegate evaluation to a configurable model (default: gpt-4o-mini) at temperature 0. It returns PASS or FAIL with a reason. For a huge category of real-world outputs that can't be rule-checked, this makes testing practical.

Try it / tear it apart

Reticle is open source and in public beta. If you’re building agents and the current tooling landscape feels as fragmented to you as it did to me, give it a try.

Download: reticle.run
Source: github.com/fwdai/reticle
I’d genuinely love feedback — what’s missing, what’s wrong, what you’d prioritize. And I’m curious: how are you building and testing your agents today? What does your workflow look like? Drop it in the comments — I’m always looking to learn how others are approaching this.

If you made it till the end, give this project a ⭐ on GitHub, this helps a lot!

Online Security 101 or why you Should use a Password Manager

Alex Chaplinsky — Wed, 13 Nov 2019 21:49:12 +0000

Passwords are part of our daily life whether you log in on any website or enter a passphrase for your ssh key. Every person who works with a computer enters at least one password a day.

The majority of people use very weak passwords and reuse them on different websites and rarely change them. Mostly because they don't fully realize how dangerous it might be to use an easily-guessable password or even worse use the same password for every single website. Those are anti-patterns of personal cybersecurity.

Nobody likes remembering long and complex passwords, especially when you need to change them frequently and memorize a new one again. But in order to be sure that your accounts are not an easy target for hackers it is better to follow a couple of simple rules with passwords.

1. Use long and hard to guess passwords

It is definitely not a good idea to use your pet name as a password. Well, actually any name or commonly used words. Attacker already has a couple of giant dictionaries with commonly used words to iterate over and try to match your password.

Combining a couple of words together would also not help. The best solution is to come up with more than 10 characters of letters, numbers and special symbols jumbled together. Something that would be definitely hard to find in a dictionary :)

2. Use a unique password for every website (or service)

Using one password for all your accounts definitely makes it easier to remember credentials for all websites and services. And nobody will ever know, right?

Not really, if an attacker hacks a website and steals all passwords from the database. He now has an ability to log in with your credentials to all other services that you're using. Even though passwords in the database are probably encrypted (well, in fact, they should be!), a hacker can also run your encrypted passwords against a couple of rainbow tables and find a match. And voila, he knows your real password!

So please, never use the same password for multiple different services.

3. Change passwords every once in a while

It is a good habit to change passwords every 3-6 months. This reduces the risk of being pwned. If your password gets leaked into some database that hackers sell to interested parties, it would be useless for them if you change your passwords frequently enough.

So that being said, we have to admit that it is hard to come up with a password complex enough and memorize it when you are creating a new account on some website. But fortunately, you don't have to. A Password Manager can help you with this!

Password Manager is basically a piece of software that helps you to generate strong and unique passwords for every account, store them securely on your device and rotate them as frequently as you want. All those credentials are kept encrypted and secured with one master password that you need to remember.

There are of course a lot of different password managers out there. Some of them can work offline and store data on your computer and some of them sync your encrypted data to their web servers. Which is ok, since data is encrypted and can only be decrypted with a master password that only you know.

Swifty is a Free Password Manager which works offline by default and keeps all your sensitive data encrypted on your computer. It is a simple tool like a notepad where you can write all your passwords and keep them for yourself but a lot more secure than regular notepad :)

Swifty also helps you to easily generate passwords containing letters, numbers and special characters with length up to 50 characters. It also tells you if you have duplicate passwords or passwords older than 6 months (since they were added to Swifty).

If you want to feel a bit safer and not be afraid of losing all your credentials in case your hard drive dies you can sync your encrypted vault file to your personal Google Drive. And then later you can restore your data with Swifty on another computer. Still, no data is sent to third-party web services. Just to your personal GDrive.

And also Swifty is FREE and Open Source! So you can easily check out its code and maybe contribute ;)