DEV Community

Beatriz Gomes
Beatriz Gomes

Posted on

Fitting a linear regression model using numpy

Context

As python beginner, I try to learn from different sources, differents methods. I was reading a brazilian blog (estatsite.com.br) and I found an interesting topic about linear regression model, but it was in R.

So I took his example and applied to Python, but as soon I found problems using sklearn package, I found numpy method: numpy.polyfit(). The official documentation is very clear and concise.

What we want?

We want verify if students get better in grades when they earn more money, using linear regression model.

Fast review in official documentation

According to it, numpy.polyfit() fits polynomial least squares. Consider a polynomial p:
equation1

A p coefficient vectors which minimizes the squared errors of order equation1
are encountered.

The general sintax is:

numpy.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False).

Parameters as x, y, deg must be completed.

The general equation of squared errors minimized is:

equation3

The curious reader must read the official documentation.

Code

Hiphotesis: earn money upgrade students performances.

# importing manipulation lib to visualize data
import pandas as pd

# importing numerycal processor lib
import numpy as np

# importing visualization lib
import matplotlib.pyplot as plt

# creating a dataframe
alunos = pd.DataFrame({
    'André': {'Mesada': 36.02, 'Nota_Testao': 48},
    'Joao': {'Mesada': 11.83, 'Nota_Testao': 25},
    'Bia': {'Mesada': 22.0, 'Nota_Testao': 43},
    'Ana': {'Mesada': 24.0, 'Nota_Testao': 39},
    'José': {'Mesada': 100.0, 'Nota_Testao': 60},
    'Vinicius': {'Mesada': 10.0, 'Nota_Testao': 40},
    'Tulio': {'Mesada': 20.0, 'Nota_Testao': 48},
    'Josué': {'Mesada': 25.0, 'Nota_Testao': 47},
    'Antonella': {'Mesada': 22.0, 'Nota_Testao': 43},
})

# visualization of dataframe
alunos = alunos.T
alunos

# converting a series object into numpy array
x = np.array(alunos['Nota_Testao'])
y = np.array(alunos['Mesada']
Enter fullscreen mode Exit fullscreen mode

The column Mesada is our y, dependent variable. The column Nota_Testao (x) is our predictor.

print('Independent : ', x)
print('Dependent variable: ', y)

# ploting 
plt.plot(x, y, 'o')

m, b = np.polyfit(x, y, 1)
plt.xlabel("Mesada (R$)")
plt.ylabel("Nota")
plt.show();
Enter fullscreen mode Exit fullscreen mode

Thank you for reading and I'm open mind to suggestions, sorry for eventual mistakes. My github link is: github.com/biangomes.

Top comments (0)