YousufAmre

Posted on Jun 3

From Prompt to Production: Practical Lessons from Generative AI in .NET

#ai #openai #llm #dotnet

Everyone is excited about Generative AI, but after building AI features into a .NET application using Microsoft's Semantic Kernel and Azure AI, I've learned that the real challenge isn't calling an LLM, it's controlling the context you send to it.

A few lessons that made a significant difference:

🔹 Don't send everything to the model

The temptation is to dump an entire object graph or database response into the prompt. Resist it.

With Semantic Kernel, create dedicated context builders that extract only the data relevant to the user's question. Every unnecessary token increases cost, latency, and the chances of the model getting distracted.

The best prompt is often the shortest prompt that still contains the necessary context.

🔹 Token optimization matters more than model selection

Many teams spend days debating GPT versions while sending thousands of unnecessary tokens on every request.

Before upgrading models:

✅ Remove redundant fields
✅ Summarize large datasets before injection
✅ Chunk documents intelligently
✅ Cache reusable context

A 30% reduction in tokens often provides a bigger ROI than switching models.

🔹 Temperature is not just a number

Different use cases require different settings.

📊 Product facts, analytics, rankings, reports:

Temperature: 0 - 0.2

💡 Brainstorming and ideation:

Temperature: 0.7+

If users complain that AI is "making things up", the first place I check is the temperature setting.

🔹 Vague prompts produce vague answers

If the user asks:

"Tell me about this Thing."

The model doesn't know whether they want:

A summary
Its history
The attributes

The quality of the response is directly proportional to the specificity of the request.

Guide users with suggested prompts rather than relying on free-form questions.

However, a well-engineered system prompt can significantly reduce this problem. Instead of forcing users to ask perfect questions, the system prompt can instruct the model to:
Infer likely intent from the conversation context
Ask clarifying questions when ambiguity is high
Default to a predefined response structure
Prioritize the most relevant information based on the application domain
The goal isn't to train users to write better prompts—it's to design the AI experience so that even imperfect prompts produce useful results.
A strong system prompt often contributes more to response quality than adding additional context or increasing model size.
This addition introduces an important engineering principle: good AI products compensate for imperfect user inputs rather than expecting users to become prompt engineers.

🔹 Feedback loops are critical

The most valuable AI telemetry isn't token usage.

It's:

👍 Helpful

👎 Not Helpful

Every thumbs-up and thumbs-down becomes training data for prompt engineering.

The fastest way to improve an AI assistant is to learn where users disagree with it.

🔹 Maintain conversation state carefully

A chatbot without memory feels broken.

A chatbot with too much memory becomes expensive and confused.

Maintain session history, but periodically summarize older conversations and inject the summary instead of the entire chat history.

Semantic Kernel makes this pattern relatively straightforward.

🔹 *Turn on Debug Mode
*
One of the most underrated features while developing AI solutions.

Track:

Prompt tokens
Completion tokens
Total cost
Latency
Retrieved context
Function calls
Model selection

When a response looks wrong, the answer is usually hiding in the prompt or retrieved context.

🔹 RAG is not always required

Many AI projects immediately jump to vector databases and Retrieval-Augmented Generation.

Ask yourself first:

Can the required data already be loaded from an API, database, or domain object?

If yes, simply inject the relevant data into the prompt.

RAG becomes valuable when:

✅ Knowledge is large
✅ Data is unstructured
✅ Documents change frequently
✅ Users need semantic search across thousands of records

Not every chatbot needs a vector database.

Sometimes a well-designed query against SQL is all you need.

My biggest takeaway:

Building AI features is becoming easier.

Building AI features that are fast, reliable, cost-effective, and trustworthy is where the real engineering begins.

dotnet #SemanticKernel #OpenAI #AzureOpenAI #GenerativeAI #LLM #SoftwareArchitecture #RAG #AIEngineering #CSharp

DEV Community

From Prompt to Production: Practical Lessons from Generative AI in .NET

dotnet #SemanticKernel #OpenAI #AzureOpenAI #GenerativeAI #LLM #SoftwareArchitecture #RAG #AIEngineering #CSharp

Top comments (0)