DEV Community: Steve

Supervised Learning and Classification in Machine Learning: My Perspective

Steve — Sun, 24 Aug 2025 20:59:11 +0000

Understanding the concepts underpinning machine learning is critical as technological advances provide access to tools professionals can deploy to facilitate decision-making.

What is Supervised Learning?

Simply put, supervised learning involves teaching a model using labeled data for later use in predicting the labels for new data.
Labeled data comprises raw data with added features or informative labels that provide information that is the basis for training an ML model. The labels provide context for the model to learn the underlying patterns to help it correctly predict a certain output based on new data fed into the model. This process is known as model training.

Understanding Classification in Supervised Learning

Supervised learning comprises two broad techniques: Classification and Regression.
Classification deals with data assigned to specific categories or classes, unlike regression, which deal with numerical data to make predictions.

Types of Classification with Examples

Binary classification- Sorts data into two categories or classes. For instance, email spam filtering ("Spam" or "Not Spam").
Multiclass classification - Sorts data into more than two classes or categories. For example, an image recognition model classifying images using labels such as bus, car, and motorcycle.
Multilabel classification - Sorts data into multiple labels. For instance, content recommendation algorithms that classify a song title into multiple genres.

The most popular classification algorithms include logistic regression, decision tree, random forest, and K-nearest neighbors (KNN). Each model suits certain scenarios and data to provide useful predictions.

Notes on Effectively Applying Supervised Learning & My Perspective on Classification

Providing the labeled input data provides a reference point for the model to associate the data with a certain predicted output. Therefore, the model's prediction acuity is as good as the data one feeds into it. A model trained using incomplete, biased, or incorrectly labeled data will yield unreliable results. Such a model cannot achieve the same degree of prediction accuracy as one relying on clean data. Data cleaning and preprocessing can help detect anomalies before model training to optimize the model and improve reliability.

Learning about classification and its use cases in real-world applications of supervised learning has been enlightening. However, encountering near-perfect datasets to apply classification was a major challenge, given that overfitting contributes to biased conclusions when predictions of overly perfect data are used in decision-making. Practicing applying metrics such as accuracy, precision, recall, F1 score, and the confusion matrix will be crucial going forward.

Predicting Champions League Winner Using Python

Steve — Thu, 31 Jul 2025 20:53:46 +0000

Project Background

Venturing into Data Science this past few weeks has exposed me to various tools and concepts that can transform how we think about data, process it, and utilize insights to make decisions or form conclusions about a specific variable or element. Seeing how application programming interface (API) work was enlightening on the tools available for retrieving and making sense of data.

The week's task was to extract data from https://www.football-data.org/ to determine the probabilities of each team in the Premier League winning the cup. The API nested in the site is a goldmine for football enthusiasts (I don't consider myself one) looking to scrape data and analyze matches and teams for various competitions across the major football leagues.

Libraries Imported

Tools utilized during this exercises included:

Python
Libraries
pandas (data manipulation)
requests (for API calls)
python-dotenv for secure API key handling)
matplotlib and seaborn (visualizations) and
scipy.stats (handling probability distributions)

Data Extraction Process

The code kicked off with setting up the libraries
import requests import os from dotenv import load_dotenv import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import scipy.stats as stats

Connecting to the /v4/competitions/PL/standings?season=2024 endpoint using the API key
Fetch data from API
response = requests.get(url, headers=headers)

data = response.json()

Probability Formula

def calculate_win_probability_poisson(wins, played, remaining): if played == 0: win_rate = 0.3 else: win_rate = wins / (played + 1e-6) expected_wins = win_rate * remaining + wins prob = 1 - stats.poisson.cdf(24, expected_wins)

A Note on API Keys

A crucial insight gained from the exercise was the importance of using .env files to store API keys securely. These files are loaded via python-dotenv to enhance security by keeping sensitive data out of the public domain, where files are shared in public repositories.

Application of Measures of Central Tendency in Data Science

Steve — Tue, 22 Jul 2025 21:02:28 +0000

The three common measures of central tendency are:

Mean
Mode
Median

1. Mean
The mean of a dataset is the average- derived by summing all values then dividing by the number of values.

Use Case in Data Science:
The mean can be used to find indicators such as average consumer spending, average products sold, average cost of products, to inform sales and marketing decisions.
Note: Outliers affect the mean of a dataset, thus one should check their data for the same to identify the potential of obtaining skewed data.

2. Mode
The mode is the most frequently occurring value(s) in a dataset.

Use Case in Data Science:
The mode reveals patterns in data that a data scientist or analyst can use to make conclusions about certain element. For instance, the mode in a dataset of sales would reveal the most sold product a business and aid business intelligence analysts in identifying the contribution of the product to their turnover.

3. Median
This is the middle value of a dataset when the data is ordered, typically in ascending order.
Use Case in Data Science:
The median can help in evaluating elements such as household income and educational attainment during household surveys. This helps retrieve a representative figure that policy makers can base their planning and development decisions.

Exploring Microsoft Excel's Features

Steve — Wed, 11 Jun 2025 12:57:38 +0000

We have all heard of the versatility of Microsoft Excel, whether from colleagues, friends, or acquaintances using this invaluable tool to make sense of raw data. But what exactly does Microsoft Excel do?
Microsoft Excel is a powerful tool that enables users to organize, analyze, and visualize data to meet the needs of the user and the target audience for insights from data. From simple calculations like addition, subtraction, and multiplication using small data, creating pivot tables, to complex financial modelling, Excel can help one unpack insights from data.

Excel offers real-world utility to professionals from diverse fields. Management can base their business decisions on Excel output data after analyzing sales trends, demand forecasts, and inventory levels for effective resource planning and utilization.
Excel helps accounting and finance professionals do financial reporting by preparing and unpacking financial statements and budgets. They can use the tool to make calculations, develop reports for top management and shareholders.
Excel is also useful to marketers for tracking key metrics at the core of marketing strategies. Customer engagement, ratings and satisfaction scores for different products are examples of elements marketers can analyze using Excel.

Since beginning to learn Excel as a beginner, I have picked up on different features that are important when handling data. I will examine the three that have stuck out for me so far. These are:

Conditional Formatting: Conditional formatting reveals and highlights important values based on the users’ needs. These include values across a certain range, the top most and bottom values ina dataset and highlighting duplicate values.
VLOOKUP: This function helps one retrieve data by allowing you to search for a certain value in a table and returns the value in the desired column in the same row.
IF Function: This function enables one to extract only the desired data with conditions or criteria the data user has set. One must assign the value Excel returns if a condition is true and another value if it’s false. Using Excel has enabled me to unpack different insights from data based on what I want to achieve. Using Excel may be daunting at first, but I understood how each function or feature works and tied it to my data analysis needs and expectations.