Building LLM Applications: Core Concepts of RAG, Embeddings, and Orchestration

Objective
This article explains the core architecture and implementation of LLM-based systems

Table of Contents

LLM Invocation
Prompt Engineering
Embeddings & Vector Search
RAG Pipeline
LangGraph Workflows
Production Architecture
Streaming & Scaling
Key Takeaways
References

LLM Invocation
LLM Invocation: How We “Call” Large Language Models (and What Actually Happens)

Large Language Models (LLMs) like GPT-style models are usually accessed through something that looks like an API call. But what you’re really doing is an LLM invocation: sending structured input (messages) into a model and receiving generated output back.

This post explains what “LLM invocation” means, why it’s different from typical APIs, and the execution flow that happens every time you ask a model a question.

What is LLM Invocation?

LLM Invocation is the process of interacting with a large language model by:

sending structured input (usually a list of messages)
receiving generated output (the model’s response)

Unlike traditional APIs, LLM invocation has some unique characteristics.

How LLM Invocation differs from traditional APIs

1) Input is natural language (plus structure)

In a typical REST API, your input is rigid (JSON payloads with fixed fields). With LLMs, your “input” is mostly language.

Even though the request may be wrapped in a JSON format (roles, messages, metadata), the substance is natural language instructions.

2) Output is probabilistic

Traditional APIs return deterministic results for the same request (assuming the underlying data doesn’t change).

LLMs don’t work like that. They generate output via token-by-token prediction, so the result can vary depending on:

randomness settings (temperature, top_p, etc.)
tiny wording differences in the prompt
context length and ordering
model version

In short: same prompt does not always mean the exact same output.

3) Context is everything

This is the most important point operationally:

LLMs don’t “remember” in the way apps do.

They only see what you send inside the context window during that invocation. If something isn’t included in the messages, the model can’t use it (unless it’s part of the model’s training, which is general—not your private state).

Core Concept: Context Window

LLMs operate inside a context window, which is basically the maximum amount of text (tokens) the model can consider at once.

That means:

The model does not retain memory across requests by default
Every time you invoke it, it processes the full message stack you provide
Input quality determines output quality

If your prompt is unclear, contradictory, or missing key constraints, the model’s output will reflect that.

Execution Flow: What happens during an LLM call?

A simplified invocation pipeline looks like this:

1) User Query

2) Message Formatting (system + user + optional assistant history)

3) LLM Processing (token-by-token prediction)

4) Generated Response

The “magic” is in step 2 and step 3:

Step 2 determines what the model is allowed to assume and how it should behave (system instructions are especially powerful).
Step 3 is not retrieval of a stored answer; it’s generation of the next most likely token repeatedly until the output is complete.

Example: A simple LLM invocation (JavaScript)

const response = await model.invoke([
  { role: "system", content: "You are a technical assistant" },
  { role: "user", content: "Explain vector databases" }
]);

Final Takeaways

LLM invocation = sending structured messages + receiving generated output.
Unlike traditional APIs, LLM outputs are probabilistic and context-dependent.
LLMs don’t remember across calls; they only know what you include in the context window.
The quality and structure of your input (especially system + user messages) strongly determines the quality of output.

Note

This article is based on my hands-on learning and implementation of LLM systems. AI tools were used to assist in structuring and refining the content.
This is part 1 of the series. In upcoming parts, we will dive into other topics.
Follow along to build a complete understanding of LLM-based systems.