DEV Community

Autonix Lab
Autonix Lab

Posted on • Originally published at autonix-lab.online

AI Acronyms Explained: A 2026 Glossary of Prompting, Agent & Training Terms

Spend ten minutes reading about modern AI and you'll drown in abbreviations. A research paper mentions CoT and ReAct; a vendor pitch promises RLHF; an engineer says they'll just LoRA a model. The jargon is a barrier — but the underlying ideas are surprisingly approachable once someone spells them out.

This glossary decodes the acronyms that show up most often in 2026, split into two families: how we get AI to think and act (prompting and agents), and how we build and shape the models themselves (training). For each term you'll get the expansion, a plain-English definition, and a note on when it actually matters.

On this page
Prompting & Agents
CoT — Chain of Thought
ToT — Tree of Thoughts
ReAct — Reason + Act
AoT — Agent of Thought
HITL — Human In The Loop
A2A — Agent-to-Agent
MAS — Multi-Agent System
Model Training
FT — Fine-Tuning
SFT — Supervised Fine-Tuning
RLHF — RL from Human Feedback
DPO — Direct Preference Optimization
PPO — Proximal Policy Optimization
PEFT — Parameter-Efficient Fine-Tuning
LoRA — Low-Rank Adaptation
FAQ
Part 1 — Prompting & Agents
This first family is about inference time — techniques you apply to a model that's already trained, to make it reason better, use tools, and operate as part of a larger system. None of these require touching the model's weights; they're about how you prompt, structure, and orchestrate. If you're new to the concept of an agent altogether, start with our primer on what AI agents are and why your business needs one.

Term Stands for In one line
CoT Chain of Thought Reason step by step before answering
ToT Tree of Thoughts Explore many reasoning branches, then pick the best
ReAct Reason + Act Alternate thinking with tool use
AoT Agent of Thought Treat each reasoning step as an agent action
HITL Human In The Loop A person reviews or approves at key steps
A2A Agent-to-Agent Agents talk directly to other agents
MAS Multi-Agent System A team of coordinated specialist agents
CoT — Chain of Thought
The simplest and most influential trick in prompting. Instead of asking a model for an answer outright, you ask it to show its reasoning step by step ("let's think this through"). Walking through intermediate steps dramatically improves accuracy on anything that requires logic — math, multi-step questions, planning. It works because the model effectively "thinks out loud," giving itself room to work rather than guessing in a single leap. Chain of Thought is the foundation almost every other technique on this list builds on.

ToT — Tree of Thoughts
Tree of Thoughts generalizes CoT from a single line of reasoning into a branching tree. The model generates several possible next steps, evaluates how promising each one looks, and explores the best — backtracking when a path turns out to be a dead end, much like a chess player considering several moves ahead. It costs more compute, but it shines on problems with a large search space and many valid approaches, such as puzzles, planning, and creative problem-solving where the first idea isn't always the best one.

ReAct — Reason + Act
ReAct is the pattern that turns a chatbot into an agent. The model interleaves reasoning ("I need the current exchange rate") with acting ("call the currency API") and then observing the result ("the rate is 1.08") before reasoning again. This think–act–observe loop lets a model use tools, search the web, run code, and react to real-world feedback instead of relying only on what's in its head. Nearly every production agent in 2026 runs some version of this loop under the hood — see our breakdown of agentic AI moving from pilots to production for how it plays out in practice.

AoT — Agent of Thought
Agent of Thought pushes the reasoning-as-action idea further: each step in the model's thinking is framed as an autonomous action an agent takes, rather than just a sentence in a chain. In practice this blurs the line between "reasoning" and "doing" — the model decides, acts, and reflects in a tighter, more self-directed loop. It's a newer and less standardized term than the others here, so treat it as a direction of travel (more agentic, more self-guided reasoning) rather than a single fixed recipe.

HITL — Human In The Loop
Not every decision should be automated. Human In The Loop means a person reviews, edits, or approves the AI's output at the points that matter — before a payment goes out, an email is sent, or a record is changed. HITL is the single most important safety pattern for deploying AI in a real business: it lets you capture most of the efficiency while keeping a human accountable for high-stakes calls. The right design expands autonomy gradually, as the system earns trust through a track record you can measure.

A2A — Agent-to-Agent
As more organizations run their own agents, those agents increasingly need to talk to each other — your procurement agent negotiating with a supplier's sales agent, for example. A2A refers to the protocols and patterns that let agents communicate, share context, and delegate directly, rather than routing everything through a human or a brittle integration. It's an emerging standard layer, and it's what makes genuinely cross-organization automation possible.

MAS — Multi-Agent System
A Multi-Agent System is a team of specialized agents collaborating on work no single agent could handle reliably — one researching, another drafting, a third checking compliance, with an orchestrator coordinating the whole effort. MAS architectures improve accuracy through specialization, speed through parallelism, and safety through isolation. They also multiply cost and complexity, so they're worth it only when a single agent structurally fails. We cover the patterns that work in production in our deep dive on multi-agent systems and when one agent isn't enough.

Prompting and agent techniques don't change the model — they change how cleverly you use it. That's why they're the fastest, cheapest lever most businesses have for getting more out of AI.

Part 2 — Model Training
The second family is about changing the model itself — adjusting its internal weights so it behaves differently. This is heavier machinery: it needs data, compute, and expertise, and it's usually only worth it when prompting and retrieval have hit their limits. Understanding these terms helps you ask the right question of any vendor: are you actually training a model, or just prompting one well? The two have very different cost and risk profiles — something we get into in how much AI implementation actually costs.

Term Stands for In one line
FT Fine-Tuning Further-train a pretrained model on your data
SFT Supervised Fine-Tuning Train on labeled input→output examples
RLHF Reinforcement Learning from Human Feedback Align a model using human preference rankings
DPO Direct Preference Optimization Learn from preferences without a reward model
PPO Proximal Policy Optimization The RL algorithm often used inside RLHF
PEFT Parameter-Efficient Fine-Tuning Tune a tiny fraction of the model's weights
LoRA Low-Rank Adaptation The most popular PEFT method
FT — Fine-Tuning
Fine-tuning means taking a model that's already been trained on the internet at large and continuing its training on your own narrower data so it adapts to a specific style, domain, or task. Think of it as sending a well-educated generalist to specialize. It can bake in your brand voice, teach a niche vocabulary, or improve performance on a repetitive task — but it requires quality data and ongoing maintenance, and it's frequently overkill when good prompting or retrieval would do the job.

SFT — Supervised Fine-Tuning
The most common form of fine-tuning. In Supervised Fine-Tuning you provide explicit pairs of input and the correct output — a question and its ideal answer, a document and its ideal summary — and the model learns to imitate those examples. "Supervised" simply means every example comes with a known right answer. SFT is usually the first stage of adapting a base model, and for many business use cases it's all you need before reaching for the more advanced alignment methods below.

RLHF — Reinforcement Learning from Human Feedback
RLHF is the technique that made modern AI assistants feel helpful and well-behaved. Rather than showing the model one "correct" answer, you let it produce several, have humans rank which they prefer, train a separate "reward model" to predict those preferences, and then use reinforcement learning to nudge the model toward higher-rated responses. It's powerful for capturing fuzzy qualities like helpfulness, tone, and safety that are hard to express as a single labeled answer — but it's complex, expensive, and involves several moving parts.

DPO — Direct Preference Optimization
DPO is the streamlined successor to RLHF. It uses the same kind of human preference data — "answer A is better than answer B" — but skips the separate reward model and the reinforcement-learning loop entirely, optimizing the model directly on those preference pairs with a single, stable training step. The result is much simpler and cheaper to run while reaching comparable quality, which is why DPO has become the default preference-tuning method for many teams in 2026.

PPO — Proximal Policy Optimization
PPO is not specific to language models at all — it's a general-purpose reinforcement-learning algorithm from robotics and game-playing that became the standard "engine" inside RLHF. Its key idea is to improve the model in small, controlled steps so each update never strays too far from the previous version, keeping training stable. When people say RLHF is fiddly, PPO's sensitivity is a big part of why — and avoiding it is precisely what makes DPO attractive.

PEFT — Parameter-Efficient Fine-Tuning
Full fine-tuning updates all of a model's weights — billions of numbers — which is slow, expensive, and produces a giant new copy of the model for every task. PEFT is the umbrella term for methods that freeze the original model and train only a small set of new parameters instead. You get most of the benefit of fine-tuning at a fraction of the compute, memory, and storage cost, and you can keep many lightweight specializations around one shared base model. PEFT is what put custom models within reach of teams that aren't hyperscalers.

LoRA — Low-Rank Adaptation
LoRA is the most popular PEFT method by a wide margin. It works by inserting small low-rank matrices alongside the frozen model and training only those — a few million parameters instead of billions. The original model is untouched, the trained "adapter" is tiny enough to swap in and out, and quality stays close to full fine-tuning. LoRA (and its memory-saving variant QLoRA) is the practical reason a startup can now fine-tune a capable model on a single GPU. When an engineer says they'll "just LoRA it," this is what they mean.

How it all fits together
These two families aren't competitors — they're layers. A model is first trained and aligned (SFT, then RLHF or DPO, often via PEFT/LoRA) to be capable and well-behaved. Then, at inference time, you apply prompting and agent techniques (CoT, ReAct, HITL, MAS) to put that capability to work on real tasks. Most businesses spend the bulk of their effort in the second layer — and rightly so, because it's faster, cheaper, and lower-risk than training. Knowing where a given acronym sits helps you judge whether a proposed solution is appropriately sized for the problem.

If you're trying to map these choices onto an actual roadmap and budget, our guide to building an AI strategy that delivers ROI walks through how to decide what's worth doing — and what isn't.

Frequently asked questions
What's the difference between Chain of Thought and Tree of Thoughts?
Chain of Thought reasons in a single straight line of steps. Tree of Thoughts explores several reasoning branches at once, scores them, and can backtrack — spending more compute for better results on problems with many possible paths.

RLHF vs DPO — which is better?
They solve the same problem (aligning a model to human preferences) but DPO does it without a separate reward model or reinforcement-learning loop, making it simpler and more stable to train. RLHF is more flexible and battle-tested. For most teams in 2026, DPO is the pragmatic default; RLHF remains valuable where you need its extra flexibility.

Do I need to fine-tune a model for my business?
Usually not as a first step. Strong prompting, retrieval (feeding the model your documents), and a well-designed agent solve a large share of business use cases without any training at all. Fine-tuning (FT/SFT) and preference tuning (RLHF/DPO) are worth it once you've hit the limits of those cheaper approaches and have quality data to train on.

Is LoRA the same as fine-tuning?
LoRA is a type of fine-tuning — specifically a parameter-efficient one. Instead of updating the whole model, it trains a small add-on adapter, giving you most of the benefit at a fraction of the cost. It's the most common way smaller teams fine-tune models today.

Top comments (0)