DEV Community

Cover image for 30 days of Data Science — Day 1: Regression problems
Brian Rey
Brian Rey

Posted on • Originally published at Medium

30 days of Data Science — Day 1: Regression problems

Data. We have tons of it and each day we collect more and more. It sounds like a cool thing to own but without providing it any usage, data losses its value over time. One awesome way of using data is by using it to make predictions about the future. It sounds cool, right?

Machine learning to the rescue

Here is an article (written by me) explaining some aspects of machine learning and what it is.



In general, machine learning algorithms work by making predictions based on data. The machine learning algorithm looks at the data and tries to find patterns that connect the features (variables used to predict) to the labels (the variable we want to predict). The more data the algorithm has, the better it can learn these patterns.

This doesn’t happen by magic and there are different approaches to finding patterns in data, each of them with its own use case. We could divide them into:

Supervised Learning



Supervised learning is a type of machine learning that uses labeled data to train machine learning models. In labeled data, the output is already known. The model just needs to map the inputs to the respective outputs.

An example of supervised learning is to train a system that identifies the image of an animal by showing it the images along with what it is.

Unsupervised learning

Unsupervised learning is a type of machine learning that uses unlabeled data to train machines. Unlabeled data doesn’t have a fixed output variable. The model learns from the data, discovers the patterns and features in the data, and returns the output.

An example based on the image could be a machine learning algorithm that instead of classifying by known labels separates into groups where members are similar as possible.



An example based on the image could be a machine learning algorithm that instead of classifying by known labels, separates inputs into groups where members are similar as possible.

Reinforcement learning and transfer learning are other types of machine approaches used but in order to avoid adding too much noise, I’ll omit them.

It’s clear to understand the difference because based on these definitions we’ll build our definition of Simple Linear Regression.

Line separator

Regression enters the stage

What are regression problems



Regression problems are the problems where we try to make a prediction on a continuous scale.

Continuous variables are numeric variables that have an infinite number of values between any two values.

Examples could be predicting the stock price of a company or predicting the temperature tomorrow based on historical data. Here temperature or sales parameters are continuous variables and we are trying to predict the change in sales value based on certain, given input variables like man-hours used, etc.

So, regression is…?

Regression is a method for understanding the relationship between independent variables or features and a dependent variable or outcome.

Outcomes can then be predicted once the relationship between independent and dependent variables has been estimated.

Regression also is a field of study in statistics that forms a key part of forecast models in machine learning. It’s used as an approach to predict continuous outcomes in predictive modeling, so has utility in forecasting and predicting outcomes from data. Machine learning regression generally involves plotting a line of best fit through the data points. The distance between each point and the line is minimized to achieve the best fit line.

Alongside classification, regression is one of the main applications of the supervised type of machine learning.

Classification is the categorization of objects based on learned features, whereas regression is the forecasting of continuous outcomes. Both are predictive modeling problems.

Supervised machine learning is integral as an approach in both cases because classification and regression models rely on labeled input and output training data. The features and output of the training data must be labeled so the model can understand the relationship.

What are regression models used for?

A common use for machine learning regression models includes:

  • Forecasting continuous outcomes like house prices, stock prices, or sales.
  • Predicting the success of future retail sales or marketing campaigns to ensure resources are used effectively.
  • Predicting customer or user trends, such as on streaming services or e-commerce websites.
  • Analyzing datasets to establish the relationships between variables and output.
  • Predicting interest rates or stock prices from a variety of factors.

Take special care of data

As with all supervised machine learning, special care should be taken to ensure the labeled training data is representative of the overall population. If the training data is not representative, the predictive model will be overfitting to data that doesn’t represent new and unseen data.



This will (or might) result in inaccurate predictions once the model is deployed. Because regression analysis involves the relationships between features and outcomes, care should be taken to include the right selection of features too.

Line separator

Wrapping up

First of all, machine learning is a branch of artificial intelligence that allows the software to learn from data and make predictions based on that data. There are two main categories of machine learning: supervised and unsupervised learning.

Supervised learning is where the software is provided with labeled data, like when you’re learning to recognize cat photos from a set of unlabeled photos.
Unsupervised learning is where the software is given data but doesn’t know which category it falls into, like when you’re trying to make a Netflix recommendation for something to watch.
One of the most popular applications of machine learning is in the form of regressions. Regressions are a type of machine learning problem where you’re trying to figure out how one variable (the predictor variable) affects another variable (the outcome variable).

image.png

By reading (and writing in my case) this article, you and I've taken an important first step in learning about machine learning, supervised and unsupervised learning, and regression. In the next few articles, we’re going to delve even further into these concepts, culminating in a comprehensive understanding of how machine learning works.

So far, you’ve learned about the importance of data, the different types of learning algorithms, and the role of regression. By the end of this series, I hope we both have a solid grasp of the fundamentals of machine learning and be ready to start applying them to our real-world problems.

image.png

Thank you for reading! I hope that you have enjoyed learning about machine learning as much as I have. Keep up with the next articles.

Top comments (0)