DEV Community: Ashutosh Piprode

Open-Source AI, Hugging Face, and the Building Blocks of Modern AI Development

Ashutosh Piprode — Tue, 02 Jun 2026 10:31:32 +0000

Open-source AI has made it much easier for developers to experiment with powerful models without building everything from scratch.

Today, we have access to platforms, libraries, and tools that allow us to run text models, audio models, image-generation models, and even large language models with just a few lines of code. One of the biggest names in this ecosystem is Hugging Face.

Hugging Face has become a central place for working with open-source AI models, datasets, and applications. But to use it properly, it is important to understand the ecosystem around it — models, datasets, pipelines, tokenizers, transformers, quantization, and tools like Google Colab.

This blog gives a simple overview of these concepts and how they fit together.

What is Hugging Face?

Hugging Face is an open-source AI platform that provides access to pre-trained models, datasets, and demo applications.

It has three major parts:

1. Models

Models are pre-trained AI systems that can perform specific tasks.

For example, there are models for:

Text generation
Sentiment analysis
Translation
Question answering
Image generation
Speech recognition
Code generation

Instead of training a model from scratch, developers can use these pre-trained models and build applications on top of them.

2. Datasets

Datasets are collections of data used to train, fine-tune, or evaluate models.

Hugging Face provides access to many public datasets for NLP, vision, audio, and other AI tasks.

3. Spaces

Spaces are demo applications hosted on Hugging Face.

They are often built using tools like Gradio or Streamlit and allow developers to showcase AI projects directly in the browser.

Hugging Face Libraries

Hugging Face is not just a website. It also provides Python libraries that make AI development easier.

Some of the most important libraries are:

Transformers

The transformers library is used to load and run pre-trained models.

It supports many model families and tasks, including text generation, classification, summarization, translation, question answering, speech recognition, and image-related tasks.

Datasets

The datasets library is used to load and process datasets efficiently.

It helps when working with training data, evaluation data, or custom datasets.

Hub

The Hugging Face Hub allows developers to access, upload, and share models, datasets, and applications.

Together, these libraries make it easier to build AI applications with less boilerplate code.

Why Google Colab is Useful for AI Development

One major challenge in AI development is hardware.

Many models require GPUs, and not every developer has a powerful machine. Google Colab helps solve this problem by providing a browser-based Python environment with access to free or paid GPUs.

Colab is useful for:

Running AI/ML notebooks
Testing Hugging Face models
Running GPU-based experiments
Training or fine-tuning smaller models
Trying image, audio, and text models without local setup

For beginners, Colab is especially useful because it removes a lot of installation and hardware-related friction.

Running AI Models with Pipelines

One of the easiest ways to use Hugging Face models is through pipelines.

A pipeline is a high-level API that combines multiple steps into one simple interface.

Usually, running a model involves:

Loading the tokenizer
Loading the model
Preparing the input
Running inference
Processing the output

A pipeline hides much of this complexity.

Example:

from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("Open-source AI is making development more accessible.")
print(result)

This can return an output showing whether the sentence is positive or negative.

Pipelines are available for many tasks, including:

Sentiment analysis
Text generation
Named Entity Recognition
Question answering
Summarization
Translation
Speech recognition
Image classification

This makes pipelines one of the best starting points for quickly testing AI capabilities.

Common NLP Tasks: Sentiment Analysis, NER, and Question Answering

Hugging Face models can be used for many practical NLP tasks.

Sentiment Analysis

Sentiment analysis detects whether a piece of text is positive, negative, or neutral.

It is commonly used in:

Product reviews
Customer feedback
Social media analysis
Brand monitoring

Named Entity Recognition

Named Entity Recognition, or NER, identifies important entities in text.

For example, it can detect:

Person names
Organizations
Locations
Dates
Skills
Products

NER is useful in resume parsing, document processing, search systems, and information extraction.

Question Answering

Question-answering models can extract answers from a given context.

For example, if a paragraph says that Google Colab provides GPU access, the model can answer:

Question: What does Google Colab provide?

Answer: GPU access.

This is useful for document assistants, search tools, and chatbot systems.

Audio Models: Whisper

Open-source AI is not limited to text.

Whisper is a speech recognition model used to convert audio into text.

It can be used for:

Meeting transcription
Podcast transcription
Subtitle generation
Voice assistants
Audio note-taking

A basic voice AI workflow can look like this:

User speech → Whisper → Text → LLM → Response

This is the foundation of many voice-based AI applications.

Image Generation with Stable Diffusion and FLUX

Image-generation models allow users to create images from text prompts.

Two popular examples are:

Stable Diffusion
FLUX

These models can be used for:

Content creation
Design
Concept art
Marketing visuals
Product mockups
Creative experiments

Because image-generation models can be resource-heavy, they are commonly run on GPUs using platforms like Google Colab.

What are Tokenizers?

Large language models do not directly understand raw text.

Before text is passed into a model, it is converted into smaller units called tokens. These tokens are then converted into numerical IDs.

This process is called tokenization.

A simple flow looks like this:

Text → Tokens → Token IDs → Model

Tokenizers usually provide two important methods:

encode()
decode()

encode() converts text into token IDs.

decode() converts token IDs back into readable text.

Tokenization matters because model input limits are measured in tokens, not words. When people say a model has an 8k, 32k, or 128k context window, they are talking about token capacity.

Special Tokens and Chat Templates

Some tokens have special meaning.

These are called special tokens.

They can represent things like:

Start of text
End of text
System message
User message
Assistant message

Chat models also use chat templates to structure conversations properly.

For example, a chat template helps the model understand which part of the input is the system instruction, which part is the user’s message, and where the assistant should respond.

Using the wrong chat template can reduce model performance because different models expect different input formats.

Why Different Tokenizers Matter

Different models use different tokenizers.

The same sentence may be split differently by LLaMA, DeepSeek, Qwen, or other model families.

This affects:

Token count
Speed
Context usage
Cost
Model behavior

For example, if one tokenizer converts a sentence into fewer tokens than another, it may use less context and run slightly more efficiently.

This becomes important when working with long prompts, documents, or retrieval-augmented generation systems.

Transformers: The Architecture Behind Modern LLMs

Transformers are the foundation of modern large language models.

The key idea behind transformers is attention.

Attention allows a model to focus on relevant tokens while processing input and generating output.

This is what helps models understand relationships between words, context, and meaning.

Transformers are used in:

Chatbots
Text generation
Translation
Summarization
Code generation
Multimodal AI systems

Most modern LLMs are based on transformer architecture.

Quantization: Making Models Smaller

AI models contain millions or billions of parameters.

These parameters are stored as numbers. Usually, they may be stored in formats like 32-bit or 16-bit precision.

Quantization reduces the precision of these numbers.

For example:

32-bit → 16-bit → 8-bit → 4-bit

The goal is to make models smaller and easier to run.

Benefits of quantization:

Lower memory usage
Faster inference
Easier deployment on limited hardware
Ability to run larger models on smaller GPUs

The trade-off is that extreme quantization may reduce output quality slightly. But in many practical cases, quantized models work well enough for real applications.

LLaMA-Style Model Architecture

LLaMA-style models follow the general transformer-based language model flow.

A simplified version looks like this:

Text → Tokens → Token IDs → Embeddings → Decoder Layers → Output

The important parts are:

Token Embeddings

Token IDs are converted into vectors called embeddings.

These embeddings help the model represent the meaning of tokens numerically.

Decoder Layers

Decoder layers process the input step by step and help the model generate the next token.

Attention

Attention helps the model decide which tokens are important in the current context.

Together, these parts allow the model to generate coherent and context-aware responses.

How These Concepts Connect

All these concepts are connected in the AI development workflow.

For example, if you are building a chatbot, the flow may look like this:

User input → Tokenizer → Model → Generated output → Decoding → Response

If you are building a voice assistant, the flow may become:

User speech → Whisper → Text → Tokenizer → LLM → Response

If you are building an image-generation tool:

Prompt → Text encoder/model → Diffusion model → Generated image

Platforms like Hugging Face and Google Colab make these workflows easier to experiment with and build upon.

Final Thoughts

Open-source AI has made powerful AI development more accessible than ever.

With platforms like Hugging Face, developers can use pre-trained models, datasets, and demo applications without starting from zero. With Google Colab, they can run experiments on GPUs without needing expensive local hardware.

But using these tools effectively requires understanding the basics behind them.

Concepts like tokenizers, pipelines, transformers, quantization, embeddings, and model architecture are not just theoretical terms. They directly affect how AI models are used, optimized, and deployed.

The more clearly we understand these building blocks, the better we can use open-source AI to build practical applications across text, audio, images, and automation.

Building Multi-Modal Chatbots with Tool Calling and Agentic AI Workflows

Ashutosh Piprode — Tue, 26 May 2026 09:23:25 +0000

Building Multi-Modal Chatbots with Tool Calling and Agentic AI Workflows

The ability of chatbots to interact with humans in a more natural and intuitive way has revolutionized the field of artificial intelligence. One of the key advancements in this area is the development of multi-modal chatbots that can leverage tool calling and agentic AI workflows to provide more accurate and reliable responses. In this article, we will explore the process of creating such chatbots and discuss the importance of using reasoning effort, routers, abstraction layers, and tool calling to build more powerful AI applications.

Introduction to Multi-Modal Chatbots

Multi-modal chatbots are AI systems that can interact with humans through multiple channels, such as text, voice, or visual interfaces. These chatbots use large language models (LLMs) to understand and respond to user input. The use of tool calling and agentic AI workflows allows these chatbots to go beyond simple text-based conversations and provide more accurate and reliable responses.

Key technologies involved:
- LLMs: Large language models that can understand and respond to user input.
- Groq: A fast LLM provider that can be used in chatbot responses, tool-calling workflows, and agentic AI systems.
- Routers: Systems that decide where a request should go.
- Abstraction layers: Layers that hide the complexity of different providers or APIs behind a common interface.

Understanding Reasoning Effort

Reasoning effort refers to how deeply a model thinks before answering a question. It is a crucial aspect of building multi-modal chatbots, as it controls the quality of responses.

Higher reasoning effort: Provides more accurate responses, but increases computational cost and response time.
Lower reasoning effort: Provides faster responses, but may compromise on accuracy.

Using Groq as a Fast LLM Provider

Groq is a fast LLM provider that can be used in chatbot responses, tool-calling workflows, and agentic AI systems. It provides fast inference and ease of use, making it an ideal choice for building multi-modal chatbots.

import groq

# Create a Groq client
client = groq.Client()

# Define a function to handle user input
def handle_input(input_text):
    # Use Groq to generate a response
    response = client.generate_text(input_text)
    return response

# Test the function
input_text = "Hello, how are you?"
response = handle_input(input_text)
print(response)

Routers and Abstraction Layers

Routers and abstraction layers are crucial components of building multi-modal chatbots. Routers decide where a request should go, while abstraction layers hide the complexity of different providers or APIs behind a common interface.

import routers
import abstraction_layers

# Define a router to direct user input to the appropriate module or function
router = routers.Router()

# Define an abstraction layer to hide the complexity of different providers or APIs
abstraction_layer = abstraction_layers.AbstractionLayer()

# Define a function to handle user input
def handle_input(input_text):
    # Use the router to direct the input to the appropriate module or function
    module = router.route(input_text)
    # Use the abstraction layer to hide the complexity of the provider or API
    response = abstraction_layer.call(module, input_text)
    return response

# Test the function
input_text = "Hello, how are you?"
response = handle_input(input_text)
print(response)

Tool Calling and Its Importance

Tool calling refers to the ability of LLMs to suggest the use of external tools, such as calculators or databases, to provide more accurate responses.

How tool calling works:
1. LLM suggests tool: The LLM suggests the use of an external tool.
2. Client code executes tool: The client code executes the tool and sends the result back to the LLM.
3. Result is sent back to LLM: The result is sent back to the LLM, which uses it to provide a more accurate response.

Agentic AI Workflows

Agentic AI workflows refer to systems where LLMs reason, decide steps, use tools, and work toward a goal.

import agentic_ai

# Define an agentic AI workflow
workflow = agentic_ai.Workflow()

# Define a function to handle user input
def handle_input(input_text):
    # Use the workflow to reason, decide steps, and use tools
    response = workflow.execute(input_text)
    return response

# Test the function
input_text = "Plan a study schedule for me"
response = handle_input(input_text)
print(response)

Conclusion

Building multi-modal chatbots with tool calling and agentic AI workflows is a complex task that requires a deep understanding of LLMs, routers, abstraction layers, and tool calling. By using reasoning effort, routers, abstraction layers, and tool calling, developers can build more powerful AI applications that provide more accurate and reliable responses.

Key Takeaways

Reasoning effort is crucial: For controlling the quality of responses.
Groq can be used as a fast LLM provider: For AI projects.
Routers and abstraction layers can simplify: The use of multiple models and providers.
Tool calling allows LLMs to suggest the use of external tools: To provide more accurate responses.
Agentic AI workflows can be used: To build more powerful AI applications.

Future Directions

The field of multi-modal chatbots is rapidly evolving, and there are many future directions that researchers and developers can explore. Some potential areas of research include:

Improving reasoning effort: Developing more efficient and effective methods for controlling reasoning effort.
Integrating multiple tools: Integrating multiple tools and services into agentic AI workflows.
Developing more advanced abstraction layers: Developing more advanced abstraction layers that can hide the complexity of different providers or APIs behind a common interface.

Introduction to LLMs for Developers: Tokens, Prompts, Context Windows, and First AI Apps

Ashutosh Piprode — Wed, 20 May 2026 05:28:39 +0000

Build Your First LLM Product: A Practical Guide for Developers

Reader Promise

In this article, we will take you through the process of building your first LLM (Large Language Model) product. We will cover the basics of LLMs, their types, and how to use them to create a real-world application. By the end of this article, you will have a solid understanding of LLMs and be able to build your own LLM product.

Introduction to LLMs

What are LLMs?

LLMs are a type of artificial intelligence model that can understand and generate human-like language. They come in several flavors:

Base models: Similar to autocomplete functionality, these models are better for fine-tuning to learn a new skill.
Chat/Instruct models: These models are chatbots that are good at making conversations.
Reasoning/Thinking models: These models are reasoning models that are good at problem-solving.
Hybrid models: A combination of Chat/Instruct models and Thinking/Reasoning models.

Example Use Cases for LLMs

Some examples of LLMs in action include:

Virtual assistants, such as Siri or Alexa
Chatbots on websites or social media platforms
Automated content generation, such as news articles or blog posts
Language translation apps

Key Concepts in LLMs

Synthesizing Information

LLMs can answer questions in depth with a structured, well-researched answer and often include a summary.

Fleshing out a Skeleton

LLMs can form a couple of notes, building out a well-crafted email, or a blog post, and iterating on it until perfect.

Coding

LLMs have the ability to write and debug code, making them a valuable resource for engineers.

How LLMs Work

Tokens and Context Window

LLMs use tokens, which are pieces of text that can be small or big, as inputs. The context window is the maximum amount of data that can be passed in a single request to an LLM.

Stateless Nature of LLMs

Every individual call to an LLM is stateless, meaning that we pass in the whole conversation so far to help it understand the entire context.

Input Types

LLMs are trained on mostly three types of inputs: Natural Languages like English, Markdowns, and JSON.

Prompting LLMs

One Shot Prompting

Giving the LLM an example of how the output should look like is called one shot prompting.

Mult-ishot Prompting

Giving multiple examples is called mult-ishot prompting.

Defining Agents

LLM Agents

An LLM agent can control the workflow, run tools in a loop to achieve goals, and have memory, planning, autonomy, and LLM orchestration via tools.

Building Your First LLM Product

Step 1: Choose an LLM Type

Choose the type of LLM that best fits your use case. Consider the following factors:

The complexity of the task you want the LLM to perform
The amount of data you have available for training
The level of customization you need

Step 2: Define Your Prompt

Define a clear and concise prompt that the LLM can understand. This includes:

Providing context for the task
Specifying the output format
Giving examples of the desired output

Step 3: Test and Refine

Test your LLM product and refine it as needed. This includes:

Evaluating the accuracy of the output
Adjusting the prompt or training data as needed
Continuously testing and refining the model

Takeaways

LLMs are powerful tools for generating human-like language
Choosing the right LLM type and defining a clear prompt are crucial for success
Testing and refining your LLM product is an ongoing process
LLMs can be used for a variety of tasks, including content generation, language translation, and coding

Example Code

import gradio as gr

# Define the LLM model
model = ...

# Define the prompt
prompt = "Write a short story about a character who..."

# Create a Gradio interface
demo = gr.Interface(
    fn=model,
    inputs="text",
    outputs="text",
    title="LLM Story Generator",
    description="Generate a short story using an LLM",
)

# Launch the interface
demo.launch()

Conclusion

Building your first LLM product can seem daunting, but with the right guidance, it can be a rewarding experience. By following the steps outlined in this article, you can create a practical and useful LLM product. Remember to choose the right LLM type, define a clear prompt, and test and refine your product continuously.

Important Considerations

Accuracy and Bias: The accuracy of LLMs can vary depending on the quality of the training data. LLMs can be biased if the training data is biased.
Ethical Concerns: The use of LLMs raises ethical concerns, such as the potential for job displacement and the need for transparency in decision-making.

Suggested DEV.to Tags

LLM
Large Language Model
AI
Machine Learning
Natural Language Processing
Gradio