DEV Community: lukas kunhardt

How to avoid locking yourself into one AI provider

lukas kunhardt — Tue, 16 Jun 2026 12:39:00 +0000

Most model providers give you two ways to work: a “chat” and a “code” environment. Most people just use chat, but for anything you’ll come back to later the code environment is actually the better choice, because it allows you to work in your own file system.

The name “code” is a bit misleading, because the difference isn’t about doing more or less programming (I use it for many tasks that have nothing to do with coding), but about where your work ends up.

In chat, everything you produce lives in the chat history on the provider’s server. All the input (files you drag in) and outputs (files the model creates for you) and the whole conversation in between is accessible to you only through the chat interface.

With a coding agent you point it at a folder on your own machine, and all the inputs and outputs are just files sitting on your disk.

This matters when you want to switch providers. If they raise their prices, or another model gets better, or their best model is no longer accessible outside of the US, you’ll likely want to move. If all your work lives inside one provider’s chat interface that’s difficult to do.

So the thing you want is to keep the model separate from your files. The model is just a processor you’re renting, the files are the part you actually care about, so it makes sense to store them somewhere that doesn’t depend on which model you happen to be using this month.

There’s a great essay called “file over app” by Obsidian’s founder Steph Ango that makes the argument: store your work in simple formats you control instead of locking it inside whatever app created it. A few reasons this helps:

if an app or provider goes out of business, your work is still there
your files are on your own machine, so you can get to them offline
multiple programs / users / AI agents can read and write the same file

That last part I find especially useful when working together with AI, because it allows me to actually work alongside whatever AI agent I am using. My usual setup is Obsidian open on the right half of my screen and Claude or Codex on the left, both pointed at the same project folder.

I can edit the same documents the agent is working on, and my changes show up for it immediately. Sometimes I use Claude, sometimes Codex - there’s no switching penalty, because neither of them is holding onto anything.

Another problem I often see is that people don’t make files at all. When the model produces a good analysis, or you come to a decision, or you finally figure something out, that shouldn’t just stay sitting in the conversation, but be pulled out into its own document.

If you come back a month later you just open a folder, rather than finding the relevant chat and then re-reading a transcript trying to find the part that mattered.

So the rule I follow is pretty simple: if it’s a one-off thing I’ll never need again, I use chat (chat basically replaces googling something). If it’s something I’ll come back to, I’ll use a coding agent in a dedicated folder.

When I start a project I make a new folder, put the relevant context in it, and point the agent at it. When the agent produces something useful I have it write the result back into the folder as a document, table, or whatever digital artifact is most appropriate. Next time I don’t have to find the right provider and then the right chat, I just open the Projects folder.

Another advantage of keeping your files separate from the model is that they keep getting more useful over time as the models improve.

Models change fast, and the one you use today probably won’t be the one you use next month. But your project folder mostly just sits there and grows slowly as you keep adding to it.

This matters most for the things you only touch now and then. You might open your tax return folder once a year, and by the time you come back the models have changed completely. If everything lived in last year’s chat you’d be starting over. If it lives in files, you just point this year’s model at it.

The way I think about it is that you have two things growing independently. One is your context , which is all the information and insights you’ve gathered on a project. The other is model capability. And what you get out of it is roughly the two multiplied together.

The same folder of tax documents is worth more next year than this year, even if you didn’t change anything, just because the model working on it got better. This year the model might not be able to actually file your taxes for you. Next year it probably can - and when it can, it can do it against everything you already gathered.

I try to think of myself as a context hoarder. The job is to gather everything a future model would find useful and get it out of my head and out of the chat history into structured files in the right folder.

That way my own knowledge and whatever the model figures out along the way both end up in the same place - not stuck in my brain, not stuck in a transcript, just sitting in files, waiting for whatever model comes next.

Don't try to delegate your understanding

lukas kunhardt — Tue, 16 Jun 2026 12:38:15 +0000

I have been developing headaches and brain fog when working with Claude lately - and it’s actually due to an improvement: It can now work for long stretches without me.

Working with recent frontier models with subagent heavy “workflows” enabled feels less like a chat and more like email:

Long complicated instructions in … wait … long complicated result out.

Obviously I don’t sit there waiting for the agents to finish, but I pick up another todo in a new tab. The more complex the tasks I work on, the longer the waiting time, the more tasks I end up multitasking on.

I end up waiting for one of the 4-5 Claude instances I have working simultaneously to finish.

This is mentally taxing because it forces me to continuously have to task switch, and the outputs I have to verify tend to be more complex.

Increasing the model’s output quality by having it work on the problems for longer made the interaction asynchronous in nature.

Depending on the type of task this type of async interaction is either exactly what I want, or counterproductive.

I split the tasks I have to work on in two categories:

Tasks where I know what good output looks like
Tasks where I have to figure out what good output looks like

If the desired output is already known, and it’s easily verifiable, I want to delegate the task to the model to disappear for an hour and report back to me when it’s done.

But for all other tasks, tasks where I am actively building an understanding of what it is that I want to build , I actually want a conversational interaction with speed that is as close to realtime as possible. That way I can stay focussed on working on one thing at a time.

So there are essentially two modes I need:

One “ thinking partner ” interactive mode where responses are as fast as possible, with sufficient output quality to augment my ability to research and implement while staying inside the same thread of thought
One “ fire and forget ” delegation mode that finishes tasks for me that I can easily verify with little to no effort (currently either Claude / Codex)

The thinking partner is used to build an understanding, and this is arguably the most important part. LLMs make it trivial to create huge amounts of “volume”, the quality is more than ever the distinguishing feature.

The only way to output better quality work than other people using the same models is to augment the model in some area it’s lacking, so it’s necessary to understand the problem as deeply as possible.

I experimented with different models and currently use Cursor’s Composer 2.5 model for these types of interactive tasks, which is incredibly fast.

I analyzed the average response time from the last couple of days and it was around 25 seconds. My average Claude session (with subagents) is about 80 seconds, up to an average of 6 minutes in sessions with subagents.

Don't make this personal

lukas kunhardt — Tue, 16 Jun 2026 12:37:46 +0000

The reason LLMs behave sycophantically when answering your questions is that you unnecessarily cast yourself as a character in a relationship.

Consider this question:

This idea is clearly far from “fantastic”, so Gemini is now giving you bad advice.

The answer is a product not only of the model’s understanding of terrible startup ideas, but also by its understanding of relationships.

Because of how the question is framed, the model now simulates an interaction between you , the person excitedly pitching their “great” idea, and an I , which doesn’t exist - it will now start emulating some personality, and suddenly interpersonal dynamics come into play. (You wouldn’t want to break your friends heart and tell them their startup idea sucks?)

This issue is inherent to the way the models are created. After pretraining (where the models learn to predict the next word), the model goes through a second phase - called RLHF (reinforcement learning from human feedback). In this phase, the model is fine-tuned to give answers that people rate highly.

The issues is, humans tend to like being affirmed more than to be challenged, hence the models develop a tendency to be suckups.

Let’s compare our previous attempt to this question:

The model clearly didn’t spare your feelings in any way in this response, mostly because it didn’t know it was your idea in the first place.

If you ask the same question to an LLM and a human, LLMs will provide 50% more sycophantic answers than humans - they will take your side, even when you are wrong.

The model trying to guess what you want to hear does significantly worsen the truthfulness of its output.

Try removing the “you” and the “I” from the question, so that the model isn’t tempted to flatter you.

Best Dictation for Mac

lukas kunhardt — Tue, 16 Jun 2026 12:36:52 +0000

Dictating Text is about 3 to 4 times faster than typing, so it really makes sense to give it a shot if you haven’t tried it already.

If you happen to have any one of the recent Mac Models with M-Chip (even M1), they contain a really powerful neural engine, that enables them to be unusually good at running powerful transcription models locally, there’s really no need to use cloud services.

I tried a lot of different dictation tools over the last couple months. E.g., Whisperflow, Mac Whisper, and now Snaply.ai, and I have settled on Snaply.ai as the best one. (no affiliation)

Winning points:

it’s completely free for individuals, no featured paywall gated
runs on your own mac (works offline)
clean speaker separation when transcribing meetings, and does so without joining your calls
european company (founded in zurich, founder is from Trieste, Italy)

Caveats:

whisperflow does do more accurate transcription, but Snaply.ai does it well enough for the difference to be too small to matter.

The simplest way to improve LLM Answers

lukas kunhardt — Tue, 16 Jun 2026 12:21:31 +0000

If you ask a language model a question, you'll get a prediction of the average of the internet's answer - whatever is said most often in reply to a question like yours.

You improve the output with one additional step: you force the model to first reason about which specific person would be best to answer this question, and then answer as them.

This works so well because experts in each field usually have a large public body of writing: books, lecture transcripts, blogs, etc. that went into the model's training data, so the model has a great understanding of how an expert thinks - and simulating "what does Paul Graham think about my startup idea" pulls far better answers than the crowd average.

I initially got the idea for this approach from this Karpathy tweet:

Since then I have experimented with it and found that the best way to use this framing is to have the model really pick one specific person, or a panel of specific people to discuss this, and first explain why the chosen people are particularly well suited to answer this question. This seems to reinforce the character simulation.

I turned this into a reusable skill. Install it with one command, npx skills add lukaskunhardt/skills, or grab it on GitHub.