Introduction
In software development, a common question is: Do more features generate more bugs?
Understanding this relationship can help teams better plan new features, prioritize fixes, and anticipate problems.
In this project, we'll show how to train a machine to analyze a small dataset with two variables:
- Features → Number of features added
- Bugs → Number of errors detected
We'll use data visualization techniques with scatter plots and learn how to create and interpret a simple decision tree to classify cases into quadrants.
Introduction
Define the variables you'll use for training (dataset).
In this example, we use a simple dataset, based on a Cartesian plane of bugs x features:
This first step simply displays the dataset, without sorting it. We manually classified the cases and obtained the following result:
- Cases: 36
- Features: 21
- Bugs: 15
Setting Thresholds
To organize the data and prepare for decision tree training, we set thresholds.
- By drawing a vertical line at the value of features = 3.6 (approximately 4), we divide the dataset into Side A and Side B:
This generates quadrants A and B, with their respective case, feature, and bug counts:
- By drawing another horizontal line, we create two more quadrants (C and D), with their own data:
With this, we can choose any quadrant to explore and train a decision tree based on the features and bugs variables.
Exploring Quadrants and Decision Trees
- Quadrant A (Side A)
By isolating the features = 3.6
variable or rounding it to 4, we can train the tree:
- Quadrant B (Side B, encompassing B and D)
By isolating features = 3.6
or rounding, the tree can be trained:
The final result of the Cartesian plane with all quadrants:
Google Colab and Excalidraw
Also, the drawing may contain errors, so I also created a Google Colab in Python that reads, cleans, maintains, and structures the decision tree of this dataset:
Open in Colab
Furthermore, the study design is available in Excalidraw:
Open in Excalidraw
Conclusion
This approach is a small step toward understanding how machine learning works in decision tree logic.
If you've reached the end of this mini-article, be sure to follow me on Github and LinkedIn!
Happy studying!
Top comments (0)