ZyVOP

Posted on Jun 25 • Originally published at zyvop.com

I Thought AI Was Magic Until I Built My Own Model

#ai #machinelearning #python #beginners

When people start learning AI today, they usually jump straight into ChatGPT, Gemini, Claude, or other AI tools.

I almost did the same.

But I kept wondering:

How does an AI model actually learn?

Not how to call an API.

Not how to use an AI service.

How does a machine learn that "hello" is a greeting and "bye" means goodbye?

To answer that question, I decided to build the simplest AI model I could think of.

No APIs.

No GPUs.

No large language models.

Just Python and a small machine learning model.

By the end of this project, I understood:

What training data is
How text becomes numbers
How models make predictions
Why AI sometimes makes mistakes
Why data is often more important than code

Let's build it together.

Step 1: Create a Project Folder

Create a new directory for the project:

mkdir ai-learning
cd ai-learning

Step 2: Create a Virtual Environment

A virtual environment keeps project dependencies isolated.

Create one:

python3 -m venv venv

Activate it:

macOS / Linux

source venv/bin/activate

Windows

venv\Scripts\activate

If successful, you'll see:

(venv)

at the beginning of your terminal prompt.

Step 3: Install Required Libraries

We'll use a library called scikit-learn.

Install it:

pip install scikit-learn

Verify installation:

pip list

You should see:

scikit-learn
numpy
scipy
joblib
threadpoolctl

Step 4: Create Your First AI Program

Create a file called:

chatbot.py

Add the following code:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

texts = [
    "hello",
    "hi",
    "hey",
    "hey there",
    "greetings",
    "good morning",
    "good evening",
    "good afternoon",
    "bye",
    "see you later",
    "what is the price",
    "how much does it cost",
    "pricing information",
]

labels = [
    "greeting",
    "greeting",
    "greeting",
    "greeting",
    "greeting",
    "greeting",
    "greeting",
    "greeting",
    "goodbye",
    "goodbye",
    "pricing",
    "pricing",
    "pricing",
]

At first glance, this looks like a simple list.

But this is actually your training data.

Understanding Training Data

Each message has a matching label.

For example:

Message	Label
hello	greeting
hi	greeting
bye	goodbye
pricing information	pricing

This is how we teach the model.

We're essentially saying:

If you see "hello", the correct answer is "greeting".

If you see "bye", the correct answer is "goodbye".

Machine learning starts with examples.

Lots of examples.

Step 5: Convert Text into Numbers

This was the moment AI stopped feeling like magic.

Add:

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

print(X.toarray())
print(vectorizer.get_feature_names_out())

Run:

python chatbot.py

You'll see output similar to:

['bye' 'cost' 'good' 'hello' 'hi' 'price' ...]

Why?

Because computers don't understand words.

They understand numbers.

The vectorizer creates a vocabulary and converts every sentence into a numerical representation.

For example:

hello

might become:

[0 0 0 1 0 0 ...]

The computer isn't reading the word.

It's reading numbers.

That realization completely changed how I thought about AI.

Step 6: Train the Model

Now let's teach the machine.

Add:

model = LogisticRegression()

model.fit(X, labels)

This line trains the model.

During training, the model tries to find patterns.

It learns relationships such as:

hello -> greeting
hi -> greeting
bye -> goodbye
price -> pricing

Not because we programmed those rules.

Because it discovered them from the examples.

Step 7: Make Predictions

Let's test it.

Add:

while True:
    user_input = input("\nYou: ")

    if user_input.lower() == "exit":
        break

    user_vector = vectorizer.transform([user_input])

    prediction = model.predict(user_vector)[0]

    print("Intent:", prediction)

Run:

python chatbot.py

Example:

You: hello
Intent: greeting

You: bye
Intent: goodbye

You: pricing information
Intent: pricing

Congratulations.

You just built your first AI model.

When My Model Got It Wrong

Then something interesting happened.

I typed:

pricing

The model replied:

greeting

That didn't make sense.

I thought my code was broken.

It wasn't.

The problem was my data.

I had trained the model using:

pricing information

but never:

pricing

by itself.

The model wasn't making a coding mistake.

It simply hadn't seen enough examples.

This taught me one of the most important lessons in machine learning:

Bad predictions are often caused by bad or incomplete data.

Looking at Probabilities

I wanted to understand how the model was making decisions.

So I added:

print(model.predict_proba(user_vector))

Example:

[[0.20 0.22 0.56]]

The classes were:

['goodbye', 'greeting', 'pricing']

Meaning:

goodbye = 20%
greeting = 22%
pricing = 56%

The model isn't certain.

It's making educated guesses based on probabilities.

The highest probability wins.

Improving the Dataset

To improve the model, I added more pricing examples:

texts.extend([
    "pricing",
    "price",
    "cost",
    "subscription",
    "plan"
])

labels.extend([
    "pricing",
    "pricing",
    "pricing",
    "pricing",
    "pricing"
])

After retraining, the predictions became much more accurate.

The code didn't change.

The algorithm didn't change.

Only the data improved.

The Biggest Lesson

Before building this project, AI felt mysterious.

After building it, I realized the core idea is surprisingly simple:

Training Data
        ↓
Text to Numbers
        ↓
Pattern Recognition
        ↓
Prediction

The models used by modern AI systems are far more advanced.

But the basic idea is still the same.

The most surprising lesson wasn't learning how to train a model.

It was discovering that when the model made a bad prediction, the problem was usually my data—not my code.

What I'm Learning Next

Now that I understand the basics, my next goals are:

Save and load trained models
Learn TF-IDF vectorization
Build a sentiment analysis model
Explore embeddings
Create a document search assistant
Learn PyTorch
Build a tiny GPT model from scratch

DEV Community

I Thought AI Was Magic Until I Built My Own Model

Step 1: Create a Project Folder

Step 2: Create a Virtual Environment

macOS / Linux

Windows

Step 3: Install Required Libraries

Step 4: Create Your First AI Program

Understanding Training Data

Step 5: Convert Text into Numbers

Step 6: Train the Model

Step 7: Make Predictions

When My Model Got It Wrong

Looking at Probabilities

Improving the Dataset

The Biggest Lesson

What I'm Learning Next

Top comments (0)