Introduction to Mathematics and Statistics for Data Science and machine learning

This is an introductory blog to learning mathematics and statistics for data science with a simple guide and resources where you can learn all the math concepts for free.

source: google images
We all know statistics and math concepts are necessary for data science and machine learning. math is at the heart of data science and machine learning. but the question comes how much math or statistics is required and where you can learn them for free?

Well, this is the article that will help you to go through some of those concepts and I have shared a bunch of resources to learn mathematics for data science. these are the resources that even I used when I started my data science journey and I really recommend you to go through them and keep them handy even when you are solving data science problems.

Mathematics for data science can be divided into four parts:-

1) Statistics (Descriptive and Inferential):
2) Linear Algebra
3) Probability
4) Calculus and optimization
1) Statistics:
I cannot imagine data science without this evergreen field of Statistics and its applications across industries and research fields. basically, statistical methods help us to summarise quantitative data and to get insights out of it. it is not easy to gain any insights by just seeing raw numerical data in any way, until and unless you are a math genius!

Topics about Descriptive Statistics:
1) Mean, Median, Mode

2) IQR, percentiles

3) Std deviation and Variance

4) Normal Distribution

5) Z-statistics and T-statistics

6) correlation and linear regression

Topics about Inferential Statistics:
1) Sampling distributions

2) confidence interval

3) chi-square test

4) Advanced regression

5) ANOVA

Resources to learn statistics:

https://www.khanacademy.org/math/statistics-probability
https://www.youtube.com/c/joshstarmer
Complete statistics by Krish Naik — https://www.youtube.com/watch?v=LZzq1zSL1bs
Algebra by Freecodecamp — https://www.youtube.com/watch?v=xxpc-HPKN28
2) Linear algebra:
It is a branch of Mathematics for studying systems of equations. it can be one, two, and multi-dimensional equations. it helps us to solve numerical data or relations between two or more variables by establishing relations between them.

linear algebra has a wide range of applications such as statistics and matrices calculations, linear regression equations, descriptive statistics, image data representations and transformations, Fourier series, graphs, and neural networks.

machine-learning algorithms like linear regression, and logistic regression uses linear algebra to solve our target variables with given inputs/attributes or feature vectors given in the data set.

Resources to learn linear algebra:

Essense of linear algebra by 3blue1Brown — (https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab)
Algebra by FreeCodeCamp — (https://www.youtube.com/watch?v=LwCRRUa8yTU)
3) Probability:
Oh! what to say about probability, it’s everywhere! we all think in terms of chances, for example, what are the chances of something happening in certain events? aren’t we?

There are certain types of probability, that we should focus on:

1) independent events probability

2) dependent events probability

3) conditional probability

Based on these we try to estimate various events and the likelihood of the outcome. sometimes we want graphical representations of probable outcomes which we call probability density functions or density curves.

concepts of probability help us estimate expected value from given variables, to solve confusion matrix in classification algorithms, information entropy, evidence of particular attributes in naive Bayes classification, and even in statistics for hypothesis testings. there are many more use cases than mentioned here. we will see based on the application in upcoming blogs.

Resource link: https://www.khanacademy.org/math/statistics-probability

4) Calculus and Optimization:
Optimization is a subfield of mathematics that comprises optimizing output based on given input variables. any data set has various input variables. any during training of machine learning algorithms sometimes functions overestimates or underestimate the output variable, and in some cases, functions contain bias in output prediction in the given data set. to estimate output and to fit the model to data well, the algorithm optimizes training datasets and keeps iterating over and over again to increase accuracy.

Function Optimization involves three elements: the input to the function (e.g. x), the objective function itself (e.g. f()) and the output from the function (e.g. cost).

Input (x): The input to the function to be evaluated, e.g. a candidate solution.
Function (f()): The objective function or target function that evaluates inputs.
Cost: The result of evaluating a candidate solution with the objective function, minimized or maximized.
Resources to learn Calculus:

The essence of calculus — (https://www.youtube.com/watch?v=WUvTyaaNkzM)
Mathematics for machine learning by Edureka — (https://www.youtube.com/watch?v=1VSZtNYMntM)
Machine learning mastery — (https://machinelearningmastery.com/start-here/#optimization)
How to learn math by Tina Huang — (https://www.youtube.com/watch?v=5wMl5FM2swo)
That’s a wrap for this introductory blog on statistics and machine learning for data science.

if you want to contact me, here’s my e-mail: avikumar.talaviya@gmail.com

You can connect with me over Twitter and LinkedIn
Check out my GitHub profile for Data science and machine learning projects