Shreyas Taware

Posted on Jan 13

AI Engineering Explained

#programming #ai #software #interview

"AI Engineer" is a new role that has emerged because of Artificial Intelligence platforms like ChatGPT, Gemini, and Claude.

Therefore, it is essential that developers now master not only their core skills but also this emerging field called AI Engineering.

The aim of this blog is to cover essential basics of AI Engineering so that beginners have a guiding path to dive deeper into this field.

Conventional AI vs Generative AI

Conventional AI is mostly about making decisions or predictions from data.

You give it an input, and it gives you a very specific kind of output — a label, a number, or a yes/no answer.

For example, it might tell you whether an email is spam, whether a loan should be approved, or what the price of a house might be. It’s intelligent, but it’s not creative. It’s essentially learning patterns so it can make better decisions.

Generative AI creates entirely new data based on learned patterns. So instead of just predicting an outcome, it actually creates new content.

You give it a prompt, and it generates something that didn’t exist before — like text, images, code, or even audio.

Models like ChatGPT don’t just choose from predefined options; they generate responses word by word based on the patterns they’ve learned from massive amounts of data.

Another important difference is determinism.

Traditional AI systems usually give you the same output for the same input.

Generative AI doesn’t — two prompts that look identical can still result in slightly different outputs, because the model is sampling from probabilities rather than following fixed rules.

Keyword Search vs Semantic Search vs AI Search

Keyword Search uses direct text matching, finding pages with the exact keywords you type.

It is best for exact lookups, technical searches, or simple queries (e.g., "weather")

Example: Twitter Search (as of 2026) still uses traditional Keyword Search

Semantic Search understands the meaning, context, and intent behind your query (e.g., "Apple stock price" means the company, not the fruit).

It uses AI, Natural Language Processing (NLP), and vector embeddings to grasp concepts and relationships.

Example: Google Search Engine uses Semantic Search at the core, although it is gearing towards AI Search.

AI Search interprets complex natural language, provides direct answers, and learns from interactions, often incorporating semantic understanding as its core.

It delivers conversational, context-aware, and personalized results, often generating direct answers.

It builds upon semantic search using advanced AI models (LLMs) and techniques like Retrieval-Augmented Generation (RAG).

Example: OpenAI ChatGPT, Google Gemini, Anthropic Claude, xAI Grok are a few of the notable ones.

Context vs Memory

Context is the immediate, temporary information within a session that an AI uses for real-time understanding, while

Memory is the broader, persistent system (like databases or user profiles) that stores long-term knowledge, past interactions, and facts, allowing for continuity and personalized actions beyond a single session, with context often being dynamically retrieved from memory to enrich the current interaction

LLM vs AI Agent

An LLM (Large Language Model) is the "brain" that understands and generates text, while

an AI Agent is a system that uses an LLM as its core to take action, plan, use tools (APIs, code), and complete multi-step tasks autonomously, acting like an orchestrator with memory and decision-making loops, rather than just answering prompts.

Think of an LLM as a smart assistant who can write a recipe (text output), and

an Agent as a chef who reads the recipe, uses kitchen tools (APIs, software), and actually cooks the meal (completed task).

Vector Database

A vector database stores and manages data as vector embeddings, which are numerical representations of unstructured data like text, images, and audio.

They are different from traditional databases in that

A Traditional Database (SQL/ NoSQL) is a system designed for the storage and retrieval of scalar data (text, numbers, booleans), optimized for deterministic and exact-match queries.

Vector Databases are engineered for probabilistic retrieval in high-dimensional vector space. They store unstructured data as dense vector embeddings generated by deep learning models (e.g., Transformers).

Vector databases store and search high-dimensional data (like text, images, audio) using unstructured embeddings (vectors/arrays of floats).

They are a critical component for modern AI applications, powering tools like recommendation engines, semantic search, and large language models (LLMs).

Example Vector DBs include Pinecone, Chroma, Weaviate, Qdrant, pgvector

How it works

Vector Embeddings: Machine learning models convert data into high-dimensional vectors (arrays of numbers) where each dimension represents a feature of the data. For example, an image of a car could be represented by vectors for its color, number of doors, and size.

Indexing: The vectors are organized into a special index within the database. This index groups similar vectors together, so items with similar characteristics are stored close to each other.

Similarity Search: When a query is made (e.g., a user searches for a car image), the database converts the query (data received in the form of text, image, etc) into a numerical vector embedding.

The index of that vector is used to quickly find the "nearest neighbors" – the most similar vectors to the query vector.

The items with the closest vector embeddings are found out in a high-dimensional space using distance metrics (like Cosine Similarity or Euclidean distance) to identify similar items.

To do this matching efficiently, libraries like FAISS are used which allow efficient similarity search and clustering of dense vectors.

Example Algorithms include k-Nearest Neighbors (k-NN) and more efficient Approximate Nearest Neighbor (ANN) methods like Hierarchical Navigable Small World (HNSW)

Result: The database returns the most relevant results based on this similarity, rather than just matching keywords.

What is RAG?

RAG stands for Retrieval Augmented Generation. There are four steps in a RAG:

1) Indexing: This foundational step involves preparing the external knowledge base for efficient search and retrieval

Data Sourcing: Raw data (e.g., documents, web pages, database records) is crawled and collected from various data sources and made to ready for data ingestion.

Chunking: The data collected is then ingested, parsed, and broken down into smaller, manageable pieces called "chunks".

This process is called Chunking.

This process is crucial because LLMs have limits on how much text they can process at once (context window), and smaller chunks allow for more precise information retrieval.

Embedding Generation: Each text chunk is converted into a numerical vectors called a vector embedding using an embedding model. These embeddings capture the semantic meaning of the text.

Some example Embedding Models are Word2Vec for words, BERT for context-aware text, and Sentence-BERT (SBERT) for sentence similarity.

Vector Storage: The resulting vector embeddings are stored in a specialized database, known as a vector database (or vector store), which is optimized for rapid similarity searches.

2) Retrieval: When a user submits a query, the RAG system searches the prepared knowledge base for relevant information.

Query Encoding: The user's input query is also converted into a vector embedding using the same embedding model used during indexing.

Similarity Search: The query's vector is used to perform a similarity search within the vector database to find the top "k" most semantically similar data chunks. This process efficiently identifies the most pertinent information from the external source.

3) Augmentation: The retrieved chunks of information are then used to create a more comprehensive prompt for the LLM.

Contextual Fusion: The system takes the original user query and the retrieved documents and combines them into a single, structured prompt.

Prompt Engineering: The prompt is engineered to provide clear instructions to the LLM, effectively telling it to use the provided context to answer the question.

For example, the prompt might be structured as: "Context: [Retrieved Documents]. User Question: [Original Query]. Please provide an answer based solely on the context provided."

4) Generation: In the final step, the augmented prompt is sent to the Large Language Model to generate the final output.

LLM Processing: The LLM uses its inherent knowledge combined with the specific, relevant context provided in the augmented prompt to formulate an accurate and grounded response.

Response Delivery: The system returns the generated, factually accurate, and context-aware answer to the user. This process helps mitigate common issues with standard LLMs like generating incorrect information or "hallucinations".

What is an Agentic Workflow?

An agentic workflow is a dynamic, AI-driven process where autonomous agents plan, decide, and execute complex, multi-step tasks with minimal human intervention, adapting in real-time to achieve a goal.

Unlike static workflows, they use AI's reasoning and tool-use capabilities to break down problems, select actions, and self-correct, enabling greater autonomy and efficiency in automated processes, from IT support to customer service

Langchain vs LangGraph vs LangSmith

LangChain: The libraries (code) to build the app.

LangGraph: The architecture (logic) to control complex agents and loops.

LangSmith: The platform (dashboard) to test, debug, and monitor the app.

LangChain is a framework that simplifies building LLM applications by providing abstractions. It connects LLMs (like GPT-4) to other data sources and tools.

Core Concept: "Chains" (DAGs - Directed Acyclic Graphs). It is excellent for linear workflows where step A leads to step B, which leads to step C.

LangGraph is a library built on top of LangChain specifically for building agents and stateful applications.

Core Concept: "Cyclic Graphs." Unlike LangChain's linear chains, LangGraph allows loops. This enables an agent to try a task, fail, critique its own work, and try again (a loop) before finishing.

LangSmith is a developer platform (cloud service) for observability, testing, and fine-tuning. It is not a code library you import to build logic; it is a dashboard you log into.

Core Concept: "Tracing and Evaluation." LLMs are "black boxes"—it is hard to know why they failed. LangSmith records every step the AI took so you can inspect it.

And that is it, these are all the concepts required for you to get started in the field of AI Engineering.

To explore and take your knowledge further, feel free to explore libraries like LangGraph/ Langchain, FAISS, and Vector Databases like Pinecone, Weaviate.

If you came till the end, thank you so much. Feel free to let me know your thoughts in the comments, or visit my website to know more: shreyastaware.me

Until next time,
Shreyas

DEV Community