Beginner’s AI Glossary

#ai #machinelearning #tutorial #beginners

If terms like LLMs, Agent, deep learning make you feel like everyone secretly attended an AI meeting without inviting you, this guide is for you. Let’s decode the jargon before the robots fully take over.

Core AI Concepts
Data & Training Concepts
Learning Methods
Modern LLM Concepts
AI Applications
Development & Infrastructure
Popular AI Tools & Platforms
AI Safety & Society

1. Core AI Concepts

Term	Acronym	Definition	Example
Artificial Intelligence	AI	Machines performing tasks that normally require human intelligence	ChatGPT, Claude
Machine Learning	ML	Systems that learn patterns from data without being explicitly programmed	Netflix recommendations
Deep Learning	—	An advanced form of machine learning using layered neural networks	Face recognition
Neural Network	—	A computational model loosely inspired by the structure of the human brain	Image classification
Foundation Model	—	A large AI model trained on broad data that can be adapted to many tasks	GPT-4, Claude, Gemini
Generative AI	—	AI that creates new content — text, images, audio, code, and more	AI-generated images, ChatGPT
Large Language Model	LLM	AI trained on massive text datasets, capable of generating and understanding language	GPT-4, Claude, Llama
Agentic AI	—	AI that can autonomously plan and execute multi-step tasks, often using tools	AutoGPT, Claude with tool use
AI Agent	—	A single AI system that perceives its environment, makes decisions, and takes actions to achieve a goal	A coding agent that writes, tests, and fixes code on its own
Multimodal AI	—	AI that can process and generate multiple types of content — text, images, and audio together	GPT-4o image analysis
Artificial General Intelligence	AGI	A hypothetical AI that matches or exceeds human-level intelligence across all tasks — does not yet exist	Often discussed as a long-term goal in AI research
Chatbot	—	A software application that simulates conversation with users, often powered by an LLM	Customer support bots, ChatGPT
AI Alignment	—	The challenge of ensuring AI systems behave in ways that are safe and consistent with human intentions and values	Preventing an AI from pursuing goals that harm people

2. Data & Training Concepts

Term	Acronym	Definition	Example
Dataset	—	A structured collection of data used to train or evaluate AI models	A database of labelled customer photos
Training Data	—	The specific data an AI model learns from during training	Millions of labelled images
Model	—	The trained AI system that makes predictions or generates outputs	A spam detector
Parameters	—	The internal numerical values a model learns during training; more parameters generally means more capability	GPT-4 has hundreds of billions
Weights	—	Another word for parameters — the numerical values stored inside a model after training	"Downloading model weights" means downloading the trained model itself
Token	—	Small units of text that AI processes — roughly a word or part of a word	"running" = 1 token; "unbelievable" = 3 tokens
Embeddings	—	Numerical representations of text that capture meaning and relationships	Used in semantic search
Vector Database	—	A database that stores embeddings so AI can quickly retrieve relevant information	Pinecone, Weaviate
Pre-training	—	The initial large-scale training phase where a model learns from a huge, general dataset before any specialisation	Training an LLM on the entire internet
Fine-Tuning	—	Training an existing pre-trained model further on specialised data to improve it for a specific task	Training a general model on medical records to create a medical chatbot
Transfer Learning	—	Reusing a model trained on one task as the starting point for a different but related task	Using an image model trained on photos to kickstart a medical imaging model
Synthetic Data	—	Artificially generated data used to train or test models when real data is scarce or sensitive	Generating fake patient records to train a healthcare AI
Epoch	—	One complete pass through the entire training dataset during model training	Training for 10 epochs means the model sees all the data 10 times
Batch Size	—	The number of training examples processed together in one step	A batch size of 32 means the model updates its weights after every 32 examples
Gradient Descent	—	The core algorithm that adjusts a model's weights during training to minimise errors	How a neural network "learns" by slowly correcting its mistakes
Inference	—	The process of a trained model generating outputs in response to new inputs	An AI answering your question
Model Card	—	A short document published alongside an AI model describing what it does, how it was trained, and its limitations	Hugging Face model cards

3. Learning Methods

Term	Acronym	Definition	Example
Supervised Learning	—	Training a model using labelled input-output pairs	Spam detection (email → spam/not spam)
Unsupervised Learning	—	Finding patterns in data without predefined labels	Customer segmentation
Reinforcement Learning	—	Training a model through a system of rewards and penalties for its actions	Game-playing AI like AlphaGo
Reinforcement Learning from Human Feedback	RLHF	A training technique where human raters score AI outputs, and the model learns to produce responses humans prefer	How ChatGPT and Claude were fine-tuned to be helpful and safe
Classification	—	A model predicting which category an input belongs to	Fraud detection (fraudulent vs. legitimate)
Regression	—	A model predicting a continuous numeric value	Predicting house prices
Clustering	—	Grouping similar data points together without predefined labels	Market segmentation
Data Augmentation	—	Artificially expanding a training dataset by creating modified versions of existing data	Flipping, rotating, or cropping images to give a model more variety to learn from
Overfitting	—	When a model memorises training data too closely and performs poorly on new data	A model that aces training tests but fails in the real world
Underfitting	—	When a model is too simple to learn the underlying patterns in the data	A model that makes weak or random predictions
Cross-Validation	—	A technique for testing how well a model generalises by training and evaluating it on different subsets of data	Splitting data into 5 "folds" and rotating which one is used for testing

4. Modern LLM Concepts

Term	Acronym	Definition	Example
Natural Language Processing	NLP	The field of AI focused on enabling machines to understand and generate human language	Language translation, sentiment analysis
Prompt	—	The instruction or question you give to an AI model	"Write a blog post about climate change"
System Prompt	—	A hidden set of instructions given to an AI model before the conversation starts, used to set its behaviour, tone, or rules	A company using a system prompt to make Claude respond only about their product
Prompt Engineering	—	The practice of crafting and refining prompts to get better, more reliable AI outputs	Using structured formatting or examples in your prompt
Zero-Shot Prompting	—	Asking an AI to complete a task with no examples provided	"Translate this sentence to French."
Few-Shot Prompting	—	Giving an AI a small number of examples before asking it to complete a task	Showing 2–3 example summaries before asking it to summarise a new article
Chain-of-Thought Prompting	—	Encouraging an AI to reason step by step before giving a final answer, which improves accuracy on complex tasks	Adding "Think step by step" to a maths or logic prompt
Transformer	—	An attention-based neural network architecture that is the foundation of most modern LLMs	GPT, Claude, and Gemini are all transformer-based models
Attention Mechanism	—	The part of a transformer that lets the model focus on the most relevant parts of the input when generating each word	How a model knows "it" in "The cat sat because it was tired" refers to the cat
Retrieval-Augmented Generation	RAG	A technique that combines AI generation with real-time retrieval of relevant documents or data	A chatbot that searches your company's PDF documents before answering
Function Calling	—	A feature that lets an LLM trigger external tools or APIs — such as searching the web or running code — as part of its response	An AI assistant that calls a weather API to answer "Will it rain tomorrow?"
Context Window	—	The maximum amount of text an AI can read and "remember" in a single interaction	A model with a 200,000-token context window can read roughly 150,000 words at once
Temperature	—	A setting that controls how predictable or creative an AI's output is. Low = more focused; high = more varied and creative	Set low for factual Q&A; set high for creative writing
Top-p Sampling	—	A setting that controls AI output variety by limiting the pool of possible next words to a cumulative probability threshold	Often used alongside temperature to tune output quality
Hallucination	—	When an AI confidently states something that is factually incorrect or entirely made up	An AI inventing a citation to a research paper that doesn't exist
Guardrails	—	Rules or filters applied to an AI to prevent it from producing harmful, off-topic, or inappropriate outputs	A customer service bot that refuses to discuss competitors
Jailbreak	—	A technique used to trick an AI into bypassing its safety guidelines or guardrails	Roleplaying prompts designed to make an AI ignore its rules
Prompt Injection	—	An attack where malicious instructions are hidden in content the AI reads, trying to hijack its behaviour	A webpage that contains hidden text telling a browsing AI to send your data elsewhere

5. AI Applications

Term	Acronym	Definition	Example
Computer Vision	—	AI that can interpret and understand visual information from images and video	CCTV object recognition, medical imaging
Speech Recognition	—	AI that converts spoken audio into text	Siri, Google Voice
Text-to-Speech	TTS	AI that converts written text into natural-sounding spoken audio	ElevenLabs, Google Text-to-Speech
Text-to-Image	—	AI that generates images from a text description	DALL·E, Midjourney, Stable Diffusion
Sentiment Analysis	—	AI that identifies the emotional tone of a piece of text — positive, negative, or neutral	Analysing customer reviews to gauge satisfaction
Recommendation System	—	AI that predicts what a user might want to see or do next, based on past behaviour	YouTube's "Up Next" queue, Spotify's Discover Weekly
Automation	—	Using AI to reduce or eliminate manual, repetitive tasks	Auto-generating reports, routing support tickets
Explainable AI	XAI	AI systems designed so that their reasoning and decisions can be understood by humans	A loan-rejection system that shows which factors (income, credit score) influenced the decision
AI Ethics	—	The principles and practices for developing and deploying AI responsibly and fairly	Preventing bias, ensuring transparency, protecting privacy
Bias	—	When an AI system produces unfair or skewed outcomes, often because of imbalanced training data	A hiring tool that systematically ranks male applicants higher than equally qualified female applicants
Red Teaming	—	Deliberately trying to break or misuse an AI system to find safety vulnerabilities before release	Researchers probing a model with harmful prompts to see how it responds

6. Development & Infrastructure

Term	Acronym	Definition	Example
Application Programming Interface	API	A defined way for software systems to communicate with each other	The OpenAI API lets developers build apps powered by GPT
API Key	—	A private authentication token that identifies you when making API calls	You paste your API key into code to give it permission to use a service
Graphics Processing Unit	GPU	Specialised hardware that dramatically accelerates AI training and inference workloads	NVIDIA A100 GPUs used in data centres
Tensor Processing Unit	TPU	Hardware designed specifically for AI workloads, developed by Google	Used to train Google's AI models
Cloud Computing	—	Running applications and storing data on remote internet-connected servers rather than locally	AWS, Azure, Google Cloud
Edge AI	—	Running AI models directly on a local device rather than in the cloud	AI on a smart camera that processes footage without sending it to a server
Latency	—	The delay between sending a request to an AI and receiving a response	A model with low latency feels instant; high latency feels slow
Model Quantisation	—	A technique that reduces a model's size and memory usage by representing its weights with less precision, making it faster and cheaper to run	Running a compressed version of Llama on a laptop instead of a server
Open Source Model	—	An AI model whose weights and/or code are publicly available for anyone to use and modify	Meta's Llama models
Hugging Face	—	A popular platform for sharing, discovering, and running open-source AI models and datasets	Often called "the GitHub of AI"
Benchmark	—	A standardised test used to evaluate and compare AI model performance	MMLU, HumanEval

7. Popular AI Tools & Platforms

Category	Tools
AI Assistants	ChatGPT, Claude, Gemini
AI Model Hub	Hugging Face
Image Generation	DALL·E, Midjourney, Stable Diffusion
Coding & Development	Python, Jupyter Notebook
Data Analysis	Pandas, NumPy
Data Visualisation	Power BI, Tableau
Machine Learning Frameworks	Scikit-learn, TensorFlow, PyTorch
Cloud Platforms	AWS, Azure, GCP
Workflow & AI Orchestration	LangChain, n8n
API Testing	Postman, SoapUI

8. AI Safety & Society

These terms come up constantly in news, policy, and real-world AI discussions. Every beginner should know them.

Term	Acronym	Definition	Example
AI Safety	—	The field focused on ensuring AI systems behave reliably and don't cause unintended harm as they become more capable	Research into preventing models from pursuing dangerous goals
AI Alignment	—	The challenge of ensuring AI systems pursue goals that are actually consistent with human intentions and values	Ensuring a powerful AI optimises for human wellbeing, not just task completion
Deepfake	—	AI-generated video, audio, or images that realistically depict someone saying or doing something they never did	Synthetic video of a public figure making a fake speech
Copyright & IP	—	Legal questions about who owns AI-generated content and whether training data was used lawfully	Ongoing lawsuits between AI companies and artists or publishers
Data Privacy	—	The concern about how personal data is collected, stored, and used to train AI models	Whether your chat history is used to improve a model
AI Regulation	—	Government laws and policies designed to govern how AI is developed and deployed	The EU AI Act, US executive orders on AI
Carbon Footprint of AI	—	The energy and environmental cost of training and running large AI models	Training GPT-4 is estimated to have used millions of kilowatt-hours of electricity
Human-in-the-Loop	HITL	A system design where a human reviews or approves AI decisions before they take effect	A doctor reviewing an AI's diagnosis before acting on it

DEV Community