DEV Community: Shreyas Taware

AI Engineering Explained

Shreyas Taware — Tue, 13 Jan 2026 15:30:00 +0000

"AI Engineer" is a new role that has emerged because of Artificial Intelligence platforms like ChatGPT, Gemini, and Claude.

Therefore, it is essential that developers now master not only their core skills but also this emerging field called AI Engineering.

The aim of this blog is to cover essential basics of AI Engineering so that beginners have a guiding path to dive deeper into this field.

Conventional AI vs Generative AI

Conventional AI is mostly about making decisions or predictions from data.

You give it an input, and it gives you a very specific kind of output — a label, a number, or a yes/no answer.

For example, it might tell you whether an email is spam, whether a loan should be approved, or what the price of a house might be. It’s intelligent, but it’s not creative. It’s essentially learning patterns so it can make better decisions.

Generative AI creates entirely new data based on learned patterns. So instead of just predicting an outcome, it actually creates new content.

You give it a prompt, and it generates something that didn’t exist before — like text, images, code, or even audio.

Models like ChatGPT don’t just choose from predefined options; they generate responses word by word based on the patterns they’ve learned from massive amounts of data.

Another important difference is determinism.

Traditional AI systems usually give you the same output for the same input.

Generative AI doesn’t — two prompts that look identical can still result in slightly different outputs, because the model is sampling from probabilities rather than following fixed rules.

Keyword Search vs Semantic Search vs AI Search

Keyword Search uses direct text matching, finding pages with the exact keywords you type.

It is best for exact lookups, technical searches, or simple queries (e.g., "weather")

Example: Twitter Search (as of 2026) still uses traditional Keyword Search

Semantic Search understands the meaning, context, and intent behind your query (e.g., "Apple stock price" means the company, not the fruit).

It uses AI, Natural Language Processing (NLP), and vector embeddings to grasp concepts and relationships.

Example: Google Search Engine uses Semantic Search at the core, although it is gearing towards AI Search.

AI Search interprets complex natural language, provides direct answers, and learns from interactions, often incorporating semantic understanding as its core.

It delivers conversational, context-aware, and personalized results, often generating direct answers.

It builds upon semantic search using advanced AI models (LLMs) and techniques like Retrieval-Augmented Generation (RAG).

Example: OpenAI ChatGPT, Google Gemini, Anthropic Claude, xAI Grok are a few of the notable ones.

Context vs Memory

Context is the immediate, temporary information within a session that an AI uses for real-time understanding, while

Memory is the broader, persistent system (like databases or user profiles) that stores long-term knowledge, past interactions, and facts, allowing for continuity and personalized actions beyond a single session, with context often being dynamically retrieved from memory to enrich the current interaction

LLM vs AI Agent

An LLM (Large Language Model) is the "brain" that understands and generates text, while

an AI Agent is a system that uses an LLM as its core to take action, plan, use tools (APIs, code), and complete multi-step tasks autonomously, acting like an orchestrator with memory and decision-making loops, rather than just answering prompts.

Think of an LLM as a smart assistant who can write a recipe (text output), and

an Agent as a chef who reads the recipe, uses kitchen tools (APIs, software), and actually cooks the meal (completed task).

Vector Database

A vector database stores and manages data as vector embeddings, which are numerical representations of unstructured data like text, images, and audio.

They are different from traditional databases in that

A Traditional Database (SQL/ NoSQL) is a system designed for the storage and retrieval of scalar data (text, numbers, booleans), optimized for deterministic and exact-match queries.

Vector Databases are engineered for probabilistic retrieval in high-dimensional vector space. They store unstructured data as dense vector embeddings generated by deep learning models (e.g., Transformers).

Vector databases store and search high-dimensional data (like text, images, audio) using unstructured embeddings (vectors/arrays of floats).

They are a critical component for modern AI applications, powering tools like recommendation engines, semantic search, and large language models (LLMs).

Example Vector DBs include Pinecone, Chroma, Weaviate, Qdrant, pgvector

How it works

Vector Embeddings: Machine learning models convert data into high-dimensional vectors (arrays of numbers) where each dimension represents a feature of the data. For example, an image of a car could be represented by vectors for its color, number of doors, and size.

Indexing: The vectors are organized into a special index within the database. This index groups similar vectors together, so items with similar characteristics are stored close to each other.

Similarity Search: When a query is made (e.g., a user searches for a car image), the database converts the query (data received in the form of text, image, etc) into a numerical vector embedding.

The index of that vector is used to quickly find the "nearest neighbors" – the most similar vectors to the query vector.

The items with the closest vector embeddings are found out in a high-dimensional space using distance metrics (like Cosine Similarity or Euclidean distance) to identify similar items.

To do this matching efficiently, libraries like FAISS are used which allow efficient similarity search and clustering of dense vectors.

Example Algorithms include k-Nearest Neighbors (k-NN) and more efficient Approximate Nearest Neighbor (ANN) methods like Hierarchical Navigable Small World (HNSW)

Result: The database returns the most relevant results based on this similarity, rather than just matching keywords.

What is RAG?

RAG stands for Retrieval Augmented Generation. There are four steps in a RAG:

1) Indexing: This foundational step involves preparing the external knowledge base for efficient search and retrieval

Data Sourcing: Raw data (e.g., documents, web pages, database records) is crawled and collected from various data sources and made to ready for data ingestion.

Chunking: The data collected is then ingested, parsed, and broken down into smaller, manageable pieces called "chunks".

This process is called Chunking.

This process is crucial because LLMs have limits on how much text they can process at once (context window), and smaller chunks allow for more precise information retrieval.

Embedding Generation: Each text chunk is converted into a numerical vectors called a vector embedding using an embedding model. These embeddings capture the semantic meaning of the text.

Some example Embedding Models are Word2Vec for words, BERT for context-aware text, and Sentence-BERT (SBERT) for sentence similarity.

Vector Storage: The resulting vector embeddings are stored in a specialized database, known as a vector database (or vector store), which is optimized for rapid similarity searches.

2) Retrieval: When a user submits a query, the RAG system searches the prepared knowledge base for relevant information.

Query Encoding: The user's input query is also converted into a vector embedding using the same embedding model used during indexing.

Similarity Search: The query's vector is used to perform a similarity search within the vector database to find the top "k" most semantically similar data chunks. This process efficiently identifies the most pertinent information from the external source.

3) Augmentation: The retrieved chunks of information are then used to create a more comprehensive prompt for the LLM.

Contextual Fusion: The system takes the original user query and the retrieved documents and combines them into a single, structured prompt.

Prompt Engineering: The prompt is engineered to provide clear instructions to the LLM, effectively telling it to use the provided context to answer the question.

For example, the prompt might be structured as: "Context: [Retrieved Documents]. User Question: [Original Query]. Please provide an answer based solely on the context provided."

4) Generation: In the final step, the augmented prompt is sent to the Large Language Model to generate the final output.

LLM Processing: The LLM uses its inherent knowledge combined with the specific, relevant context provided in the augmented prompt to formulate an accurate and grounded response.

Response Delivery: The system returns the generated, factually accurate, and context-aware answer to the user. This process helps mitigate common issues with standard LLMs like generating incorrect information or "hallucinations".

What is an Agentic Workflow?

An agentic workflow is a dynamic, AI-driven process where autonomous agents plan, decide, and execute complex, multi-step tasks with minimal human intervention, adapting in real-time to achieve a goal.

Unlike static workflows, they use AI's reasoning and tool-use capabilities to break down problems, select actions, and self-correct, enabling greater autonomy and efficiency in automated processes, from IT support to customer service

Langchain vs LangGraph vs LangSmith

LangChain: The libraries (code) to build the app.

LangGraph: The architecture (logic) to control complex agents and loops.

LangSmith: The platform (dashboard) to test, debug, and monitor the app.

LangChain is a framework that simplifies building LLM applications by providing abstractions. It connects LLMs (like GPT-4) to other data sources and tools.

Core Concept: "Chains" (DAGs - Directed Acyclic Graphs). It is excellent for linear workflows where step A leads to step B, which leads to step C.

LangGraph is a library built on top of LangChain specifically for building agents and stateful applications.

Core Concept: "Cyclic Graphs." Unlike LangChain's linear chains, LangGraph allows loops. This enables an agent to try a task, fail, critique its own work, and try again (a loop) before finishing.

LangSmith is a developer platform (cloud service) for observability, testing, and fine-tuning. It is not a code library you import to build logic; it is a dashboard you log into.

Core Concept: "Tracing and Evaluation." LLMs are "black boxes"—it is hard to know why they failed. LangSmith records every step the AI took so you can inspect it.

And that is it, these are all the concepts required for you to get started in the field of AI Engineering.

To explore and take your knowledge further, feel free to explore libraries like LangGraph/ Langchain, FAISS, and Vector Databases like Pinecone, Weaviate.

If you came till the end, thank you so much. Feel free to let me know your thoughts in the comments, or visit my website to know more: shreyastaware.me

Until next time,
Shreyas

Everything about Docker – Basics, MCPs, and more...

Shreyas Taware — Fri, 03 Oct 2025 15:42:18 +0000

If you are new to Docker or to Docker's newly released agentic features, this is a guide to help you understand it in the most jargon-free language as possible.

So, what is Docker?

Docker is a platform for developing, shipping, and running applications.

There are three main concepts in Docker:

Container - Consider it as a smaller version of a computer but specific to the hardware it was created on. If you know what a Virtual Machine or VM is, a Container is just like that except its lightweight and shares resources with the computer while keeping the processes separate.
Image - It is a set of instructions that when run create a container. You store those instructions in a file called as "Dockerfile".
Registry - It is a location for storing and sharing container images.

Examples include Docker Hub, Amazon ECR, Azure ACR, Google's GCR

You can also create your own registry locally or shared it within a specific network and thus keep it privy to only that network.

Important Concept Alert

Whenever you create a Docker Container and run it, it serves a specific purpose: it might run a database, or keep a backend server running, or run a cron job.

But let's say we create our database in one container, and our backend server is in another.

Then we'll have to manually figure out the port connections from the database container to the backend one, thus adding more work and slowing down our deployment timelines.

This is when Docker Compose comes in.

Docker Compose: It is a command using which you can define multiple docker containers and store their configurations in a single YAML file.

This way all the containers get defined individually while "docker compose" takes care of the headache of connecting them.

–––

Before moving on to advanced concepts in Docker, here are few pre-requisites:

Model Context Protocol (MCP): It is a standardized open-source protocol for connecting AI applications to external systems and data sources.

Examples - ChatGPT can access your Notion Chats using Notion MCP.

Ask Gordon - It is the newly released AI-powered assistant specifically for Docker Desktop & Docker CLI.

Docker Model Runner (DMR): It lets you run, manage, and deploy AI models using Docker.

–––

Agentic workflows are created when AI tools are connected to disparate tools - and those tools require MCPs to orchestrate the agents's tasks.

To orchestrate MCPs, we need to understand about the MCP architecture.

MCP follows a client-server architecture that enables a standardized communication between AI applications and external tools.

MCP Servers - These are specialized programs that provide specific tools and capabilities to AI models through the Model Context Protocol.

MCP Clients - These act as the bridge between AI applications and MCP servers.

MCP Gateway - It is Docker's open-source solution which connects MCP servers to MCP clients.

We can connect one MCP client with multiple MCP servers, and conversly one MCP server with multiple MCP clients.

Thus, MCP clients and MCP servers have a many-to-many relationship, and MCP Gateway is what connects these two.

MCP Catalog - It is what hosts MCP servers.

MCP Clients connect through MCP Gateway to access the cataloged servers.

MCP Toolkit - It is a gateway that lets you set up, manage, and run containerized MCP servers and connect them to AI agents.

Depending on the MCP server, the tools it provides might run within the same container as the server or in dedicated containers:

Below you can see, multiple servers being spawned depending upon the operation:

Docker Hub MCP server: It is a MCP server that interfaces with Docker Hub APIs to make rich image metadata accessible to LLMs thus enabling intelligent parsing and discovery of your Docker containers.

And that is it, these are all the the concepts of Docker you needed to know to get started to tinkering with MCPs using your favourite code editors and AI tools.

So, what are you waiting for?

–––

If you came till the end, thank you! This is just my second blog on the Dev platform!

If you liked this blog, please let me know in the comments!

Until next time,
Shreyas

14 CS fundamental questions to prepare for your next interview & not sound like a vibe-coder!

Shreyas Taware — Thu, 04 Sep 2025 16:36:45 +0000

Optimizing Program Performance

Is a switch statement always more efficient than a sequence of if-else statements?
How much overhead is incurred by a function call?
Is a while loop more efficient than a for loop?
Are pointer references more efficient than array indexes?
Why does our loop run so much faster if we sum into a local variable instead of an argument that is passed by reference?
How can a function run faster when we simply rearrange the parentheses in an arithmetic expression?

Understanding link-time errors

What does it mean when the linker reports that it cannot resolve a reference?
What is the difference between a static variable and a global variable?
What happens if you define two global variables in different C files with the same name?
What is the difference between a static library and a dynamic library?
Why does it matter what order we list libraries on the command line?
And scariest of all, why do some linker-related errors not appear until run time?

Avoiding security holes

What are buffer overflow vulnerabilities?
What is the program stack?