AI can make your product feel elevated. It can also make it feel unreliable, expensive, and risky. So the real question isn’t "Should I add AI?" but rather
Can I add AI without breaking my UX or my budget?
Building with LLMs is not like integrating a normal API.
You’re dealing with:
- Non-determinism: the same prompt can yield different outputs
- Tokens: every request/response has a budget that drives cost + latency
- New attack surfaces: prompt injection, data leakage
- Privacy constraints: you may be sending sensitive user text to a third party
And there’s a mindset trap too:
- AI isn’t your product (most of the time). It’s just a tool from your larger engineering toolbox that can elevate your stack if integrated properly
- LLMs aren’t truth machines. If you treat them like one, your UX will suffer. The models can be prone to hallucinations and need to be monitored with guardrails in place.
- Overfitting your UX to AI often makes the product worse, not better due to the added complexity
I have been leveraging LLMs in the last few months and here are just some of my key learnings:
Start with mindset, not models.
Before I write a single prompt, I ask myself:
- What user pain does this solve today?
- What’s the “manual” version of this flow? And how does AI improve the experience, without taking control away?
- For my app, a rule of thumb is that I only add AI when I can justify it through discovery and feedback. And I usually layer it on top of an existing non-AI flow first. That gives me a baseline UX that still works when AI is slow, wrong, or unavailable.
Technical discovery is time well spent.
For larger features, I stretch technical discovery across a few days, exploring different use-cases. This is a habit that helps me gain user perspective.
What I actually did during my last discovery session:
- Write acceptance criteria (what “good” looks like)
- Sketch the visual journey using a tool like Excalidraw
- Decide where users need control (accept/reject AI generated outputs)
- Define my scope for faster iterations and early feedback
Treat every AI feature like a pipeline, not a simple function call
For AI features implementation, I think in a sequential pipeline:
- Normalize input
- Sanitize and redact (privacy-first)
- Schema validation and assertions
- Trigger logic (Threshold)
- Generation logic (prompt + tool choices)
- Post-processing (format, structure, safety)
- Delivery (UI controls, logging, persistence)
This pipeline approach solves two problems at once:
- It tames non-deterministic outputs with constraints and checks
- It makes the system observable when potential risks are uncovered
How I control output quality:
- Strict assertions where possible (format, JSON shape, required fields)
- Tight system prompts + a few high-quality examples of input/output pairs (few-shot)
Assume latency and cost will spike. Then design against it.
If your AI feature becomes popular, you will see it in your bill and latency.
What I put in place early:
- Feature-level AI usage logging (so I know what’s expensive)
await aiUsageLogger({
userID: userID,
type: 'AI_PROMPT',
model: 'gpt-4o-mini',
inputTokens: completion.usage?.prompt_tokens ?? 0,
inputCachedTokens: completion.usage?.prompt_tokens_details?.cached_tokens ?? 0,
outputTokens: completion.usage?.completion_tokens ?? 0,
});
- Rate limiting + hard daily caps (avoid surprise bills)
- Deduping (event table + hash keys)
- Similarity checks to avoid “same output, different words” fatigue
- Feature flags to ship safely and roll back fast
- Breadcrumbs + Alerts in Sentry across the whole pipeline for visibility. When the feature fails, I want to know where and why in seconds.
Evaluate like an engineer, not like a researcher.
Early on, I don’t start with “LLM-as-a-judge.” It adds complexity fast.
My default is simpler:
- Manual review of recent traces
- Bottom-up error analysis: group failures, count patterns, fix the top ones
Don’t aim for perfection from the start, just keep tinkering.
Recent examples of AI features I implemented for my app.
- AI-generated prompts
- AI-generated tasks
Each feature has its own pipeline. That separation gives clear ownership and keeps changes localized. I also created reusable utility functions to be shared across these pipelines.
lib/
├─ ai/
│ ├─ AiQuickPrompts/
│ │ ├─ assert.ts
│ │ ├─ generate.ts
│ │ └─ threshold.ts
│ ├─ AiTasks/
│ │ ├─ assert.ts
│ │ ├─ generate.ts
│ │ └─ threshold.ts
│ ├─ utils/
│ │ ├─ fuzzy.ts
│ │ ├─ normalize.ts
│ │ └─ sanitize.ts
│ └─ types.ts
One UX decision made a big difference: users can accept or reject the AI output.
That does two things:
- It keeps the user in control (mindset)
- It gives me a clean signal of usefulness from users (technical feedback loop). When users reject an output, I don’t take it personally. I treat it like a failing test.
If you’re adding AI to your product this week, I’d do this:
Start with one question: What improves for the user if AI is added?
If the answer is “nothing,” you’re adding complexity without any added value.
If yes, ship the smallest AI feature that has:
- A clear input contract
- A pipeline (even a simple one)
- Logging + cost visibility
- A UX escape hatch (edit, reject, fallback)
- Security layers in place
Keep your mindset clear:
- AI is a tool, not the product
- Your UX should still make sense without it
- Don’t let the model steer the roadmap
The same principles I introduced in software engineering can also be applied to life in general.
I think about AI as a power tool. Used well, it cuts out the mundane parts. Used poorly, it cuts into your judgment.
So I try to keep a balance. I use AI to accelerate repetitive work and explore options. But I keep the important decisions and reflections grounded in my own thinking.
If you’re building with LLMs, I’m curious: what do you wish you knew before you shipped your first LLM feature?
And if you’re into improving your critical thinking skills as an engineer, I’ve been building Jots to help. Jots uses research-backed frameworks and AI assistance to prompt you with the right questions, to reflect and learn in a better way.


Top comments (0)