DEV Community

Eric Leung
Eric Leung

Posted on

Explanation for the notation of linear models

In the Elements of Statistical Learning book, the chapter on Supervised Learning describes equation for a linear model as a vector (Eq 2.1) and matrix form (Eq 2.2). Both are equivalent.

As a vector is another word for a one dimensional array.

We can write this with parentheses going horizontal like

A=(1,5,7,3,2) A = (1, 5, 7, 3, 2)

but we can also write this in a more matrix form with square brackets in a vertical form

A=[15732] A = \begin{bmatrix} 1 \\ 5 \\ 7 \\ 3 \\ 2 \end{bmatrix}

Now we reach Equation 2.1 below.

Y^=β0^+j=1pXjβj^ \hat{Y} = \hat{\beta_0} + \sum_{j=1}^p X_j \hat{\beta_j}

This equation is a fancy way to write out an equation you might see for linear regression like this

Y=23+3x1+3.6x2 Y = -23 + 3x_1 + 3.6x_2

Now going back to our Equation 2.1, what do these variables mean? Here's the equation again.

Y^=β0^+j=1pXjβj^ \textcolor{orange}{\hat{Y}} = \hat{\beta_0} + \sum_{j=1}^p X_j \hat{\beta_j}

First, Y^\hat{Y} is the dependent variable value, or output vector. This will contain the values for what you want to predict. The caret symbol ^ here is called a "hat". This a hypothetical value we find from our model prediction. So we can read Y^\hat{Y} as "Y hat".

Y^=β0^+j=1pXjβj^ \hat{Y} = \textcolor{orange}{\hat{\beta_0}} + \sum_{j=1}^p X_j \hat{\beta_j}

The next variable we run into is β0\beta_0 . This is "the intercept, also known as the bias in machine learning."

On an X-Y plot you have seen in grade school, this is where the line crosses the vertical y-axis. Another way to think about this intercept is the baseline value for your dependent value when your independent variable predictors are all zero.

Y^=β0^+j=1pXjβj^ \hat{Y} = \hat{\beta_0} + \textcolor{orange}{\sum_{j=1}^p} X_j \hat{\beta_j}

Now, let's gears to talking about some notation. There is a large "E" looking symbol with some numbers and letters, j=1p\sum_{j=1}^p . We call the "E" symbol "sigma" and is a fancy way to say "add these things up".

What things are we adding up? And how? The Xjβj^X_j \hat{\beta_j} is actually a representation of matrices of XjX_j being a row in the matrix below and βj^\hat{\beta_j} as a single value in the column vector.

Xβ^=(x1,1x1,2x1,nx2,1x2,2x2,nxm,1xm,2xm,n)(β1β2βm) X \hat{\beta} = \begin{pmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,n} \\ x_{2,1} & x_{2,2} & \cdots & x_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m,1} & x_{m,2} & \cdots & x_{m,n} \end{pmatrix} \begin{pmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{m} \end{pmatrix}

Similarly, the β0^\hat{\beta_0} is a column vector:

(β1β2βm) \begin{pmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{m} \end{pmatrix}

Multiplying everything together will produce a single column vector, Y^\hat{Y} .

We all do this so that we can run equations like

Y=23+3x1+3.6x2 Y = -23 + 3x_1 + 3.6x_2

this over and over again across multiple sets of values, which are encoded as rows in the square matrices above XX .

Image of Datadog

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

Top comments (0)

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay