Naur

Posted on Oct 17

How to Study Machine Learning with Two Variables

#ai #python #machinelearning #beginners

Introduction

In software development, a common question is: Do more features generate more bugs?

Understanding this relationship can help teams better plan new features, prioritize fixes, and anticipate problems.

In this project, we'll show how to train a machine to analyze a small dataset with two variables:

Features → Number of features added
Bugs → Number of errors detected

We'll use data visualization techniques with scatter plots and learn how to create and interpret a simple decision tree to classify cases into quadrants.

Introduction

Define the variables you'll use for training (dataset).

In this example, we use a simple dataset, based on a Cartesian plane of bugs x features:

This first step simply displays the dataset, without sorting it. We manually classified the cases and obtained the following result:

Cases: 36
Features: 21
Bugs: 15

Setting Thresholds

To organize the data and prepare for decision tree training, we set thresholds.

By drawing a vertical line at the value of features = 3.6 (approximately 4), we divide the dataset into Side A and Side B:

This generates quadrants A and B, with their respective case, feature, and bug counts:

By drawing another horizontal line, we create two more quadrants (C and D), with their own data:

With this, we can choose any quadrant to explore and train a decision tree based on the features and bugs variables.

Exploring Quadrants and Decision Trees

Quadrant A (Side A)

By isolating the features = 3.6 variable or rounding it to 4, we can train the tree:

Quadrant B (Side B, encompassing B and D)

By isolating features = 3.6 or rounding, the tree can be trained:

The final result of the Cartesian plane with all quadrants:

Google Colab and Excalidraw

Also, the drawing may contain errors, so I also created a Google Colab in Python that reads, cleans, maintains, and structures the decision tree of this dataset:
Open in Colab

Furthermore, the study design is available in Excalidraw:
Open in Excalidraw

Conclusion

This approach is a small step toward understanding how machine learning works in decision tree logic.

If you've reached the end of this mini-article, be sure to follow me on Github and LinkedIn!

Happy studying!

DEV Community

How to Study Machine Learning with Two Variables

Introduction

Introduction

Setting Thresholds

Exploring Quadrants and Decision Trees

Google Colab and Excalidraw

Conclusion

Contacts

Top comments (0)