From Prompt to Production: Practical Lessons from Generative AI in .NET

YousufAmre — Wed, 03 Jun 2026 18:48:42 +0000

Everyone is excited about Generative AI, but after building AI features into a .NET application using Microsoft's Semantic Kernel and Azure AI, I've learned that the real challenge isn't calling an LLM, it's controlling the context you send to it.

A few lessons that made a significant difference:

🔹 Don't send everything to the model

The temptation is to dump an entire object graph or database response into the prompt. Resist it.

With Semantic Kernel, create dedicated context builders that extract only the data relevant to the user's question. Every unnecessary token increases cost, latency, and the chances of the model getting distracted.

The best prompt is often the shortest prompt that still contains the necessary context.

🔹 Token optimization matters more than model selection

Many teams spend days debating GPT versions while sending thousands of unnecessary tokens on every request.

Before upgrading models:

✅ Remove redundant fields
✅ Summarize large datasets before injection
✅ Chunk documents intelligently
✅ Cache reusable context

A 30% reduction in tokens often provides a bigger ROI than switching models.

🔹 Temperature is not just a number

Different use cases require different settings.

📊 Product facts, analytics, rankings, reports:

Temperature: 0 - 0.2

💡 Brainstorming and ideation:

Temperature: 0.7+

If users complain that AI is "making things up", the first place I check is the temperature setting.

🔹 Vague prompts produce vague answers

If the user asks:

"Tell me about this Thing."

The model doesn't know whether they want:

A summary
Its history
The attributes

The quality of the response is directly proportional to the specificity of the request.

Guide users with suggested prompts rather than relying on free-form questions.

However, a well-engineered system prompt can significantly reduce this problem. Instead of forcing users to ask perfect questions, the system prompt can instruct the model to:
Infer likely intent from the conversation context
Ask clarifying questions when ambiguity is high
Default to a predefined response structure
Prioritize the most relevant information based on the application domain
The goal isn't to train users to write better prompts—it's to design the AI experience so that even imperfect prompts produce useful results.
A strong system prompt often contributes more to response quality than adding additional context or increasing model size.
This addition introduces an important engineering principle: good AI products compensate for imperfect user inputs rather than expecting users to become prompt engineers.

🔹 Feedback loops are critical

The most valuable AI telemetry isn't token usage.

It's:

👍 Helpful

👎 Not Helpful

Every thumbs-up and thumbs-down becomes training data for prompt engineering.

The fastest way to improve an AI assistant is to learn where users disagree with it.

🔹 Maintain conversation state carefully

A chatbot without memory feels broken.

A chatbot with too much memory becomes expensive and confused.

Maintain session history, but periodically summarize older conversations and inject the summary instead of the entire chat history.

Semantic Kernel makes this pattern relatively straightforward.

🔹 *Turn on Debug Mode
*
One of the most underrated features while developing AI solutions.

Track:

Prompt tokens
Completion tokens
Total cost
Latency
Retrieved context
Function calls
Model selection

When a response looks wrong, the answer is usually hiding in the prompt or retrieved context.

🔹 RAG is not always required

Many AI projects immediately jump to vector databases and Retrieval-Augmented Generation.

Ask yourself first:

Can the required data already be loaded from an API, database, or domain object?

If yes, simply inject the relevant data into the prompt.

RAG becomes valuable when:

✅ Knowledge is large
✅ Data is unstructured
✅ Documents change frequently
✅ Users need semantic search across thousands of records

Not every chatbot needs a vector database.

Sometimes a well-designed query against SQL is all you need.

My biggest takeaway:

Building AI features is becoming easier.

Building AI features that are fast, reliable, cost-effective, and trustworthy is where the real engineering begins.

dotnet #SemanticKernel #OpenAI #AzureOpenAI #GenerativeAI #LLM #SoftwareArchitecture #RAG #AIEngineering #CSharp

Redis Is Not Free Performance

YousufAmre — Wed, 24 Dec 2025 08:20:29 +0000

Why adding Redis often shifts complexity instead of removing it and what that means for correctness.

You’re building an application.
You care about fast pages.
You want to protect your database.
You don’t want to keep recalculating the same results on every request.
So you introduce Redis.
Responses speed up.
Latency drops.
The database finally gets a break.
You deploy feeling confident.
Performance feels “solved.”
Then the product evolves.
User traffic grows.
Features multiply.
Caching starts to look like the obvious solution everywhere.
And slowly, production feels… off.
Some users see outdated data.
Counts stop lining up.
Memory usage keeps rising for Redis servers.
A single cache miss suddenly overwhelms the database.
Nothing is outright failing.
But the system feels brittle.
What’s really happened is simple.
You didn’t eliminate complexity.
You relocated it.
Redis isn’t “free performance.”

It’s:
• Another data layer
• With its own structure
• Its own edge cases
• And its own failure scenarios

That fundamentally changes your system.

Before Redis: App → Database
One authority.
Clear consistency.

After Redis: App → Redis → Database

Now Redis is:
• The primary read path
• A buffer in front of the database
• A possible source of stale or incorrect data
• A core part of system correctness
Cache invalidation stops being theoretical.

You now have to decide:
• What should expire
• What must stay consistent
• What’s allowed to be slightly out of date

Redis won’t guide those choices.
It will faithfully do exactly what you configure — even if that leads to subtle bugs over time.

The fundamentals of performance don’t change.

Key design matters.
TTL decisions matter.
Data shape matters.
Redis helps avoid redundant work.
It reduces unnecessary database pressure.
But it doesn’t remove accountability.

You still need to:
• Choose carefully what gets cached
• Understand read and write behavior
• Design for cache misses
• Treat Redis as an optimization, not a system of record

Redis enables fast systems, no doubt. Making them correct, stable,
and maintainable over time, well
that responsibility never leaves you.

DEV Community: YousufAmre

From Prompt to Production: Practical Lessons from Generative AI in .NET

dotnet #SemanticKernel #OpenAI #AzureOpenAI #GenerativeAI #LLM #SoftwareArchitecture #RAG #AIEngineering #CSharp

Redis Is Not Free Performance