DEV Community

Cover image for How to Study Machine Learning with Two Variables
Naur
Naur

Posted on

How to Study Machine Learning with Two Variables

Introduction

In software development, a common question is: Do more features generate more bugs?

Understanding this relationship can help teams better plan new features, prioritize fixes, and anticipate problems.

In this project, we'll show how to train a machine to analyze a small dataset with two variables:

  • Features → Number of features added
  • Bugs → Number of errors detected

We'll use data visualization techniques with scatter plots and learn how to create and interpret a simple decision tree to classify cases into quadrants.


Introduction

Define the variables you'll use for training (dataset).

In this example, we use a simple dataset, based on a Cartesian plane of bugs x features:

Cartesian plane

This first step simply displays the dataset, without sorting it. We manually classified the cases and obtained the following result:

Manual Classification

  • Cases: 36
  • Features: 21
  • Bugs: 15

Setting Thresholds

To organize the data and prepare for decision tree training, we set thresholds.

  1. By drawing a vertical line at the value of features = 3.6 (approximately 4), we divide the dataset into Side A and Side B:

Vertical border

This generates quadrants A and B, with their respective case, feature, and bug counts:

Quadrants A and B

  1. By drawing another horizontal line, we create two more quadrants (C and D), with their own data:

Quadrants C and D

With this, we can choose any quadrant to explore and train a decision tree based on the features and bugs variables.


Exploring Quadrants and Decision Trees

  • Quadrant A (Side A)

Quadrant A
By isolating the features = 3.6 variable or rounding it to 4, we can train the tree:

Quadrant A Tree

  • Quadrant B (Side B, encompassing B and D)

Quadrant B
By isolating features = 3.6 or rounding, the tree can be trained:

Quadrant Tree B

The final result of the Cartesian plane with all quadrants:

Final Cartesian plane

Google Colab and Excalidraw

Also, the drawing may contain errors, so I also created a Google Colab in Python that reads, cleans, maintains, and structures the decision tree of this dataset:
Open in Colab

Furthermore, the study design is available in Excalidraw:
Open in Excalidraw

Conclusion

This approach is a small step toward understanding how machine learning works in decision tree logic.

If you've reached the end of this mini-article, be sure to follow me on Github and LinkedIn!

Happy studying!

Contacts

Top comments (0)