DATA SCIENCE vs MACHINE LEARNING

#python #numpy #pandas #life

Which one gears towards what type of work?
Is there a big difference?

How much Calculus and Trig do each require?
What type of Math/Sci should be known?

Top comments (3)

Matt Curcio • Sep 23 '20 • Edited

Usually, if you look up these types of terms you will find some vague Venn diagram.
Something like:

{A.I. {Data Science {Machine Learning tools} } }

Over-generalizing for a moment... Or my biased opinion is:

Artificial Intelligence is using computers to solve problems.
Data Science is a methodology for putting data together to understand it more clearly or provide some 'story'.
Machine Learning is a tool that may be used in Data Science which uses math, much of it can be very advanced, to allow a computer or person to better predict some future set of circumstances. Think about a regression line gives one info for all areas of a line segment or given X what is Y.

Frankly I find some book definitions pedantic and boring. Suffice it to say that Data Science uses math and code (sneakily called Algorithms) to bring out your idea(s). Putting many ideas together will hopefully provide insight. Many people relate "story telling" to Data Science, such that your data/ your investigation should have some logical flow which should help users get more useful information out of their data than just a simple statistic or number. One can use M.L. to investigate your data which could in turn help you bring to life the story that you think is important to share.

How is that for long and convoluted, lol.
HTH

Bret • Sep 23 '20

Thank you! I’ve been using React/Vue, and recently got started on python and Django. That got me interested in Numpy/Pandas for opportunities that are more direct, in a way. Styling can be anything in web design, and JavaScript, kinda is all over at times vs python

Matt Curcio • Sep 23 '20

If you are interested, look into 'Exploratory Data Analysis' to start. As the name suggest, it is the first step in looking at your data. It usually consists of boxplots(think: std. dev. and outliers), histograms (distributions, data spread and outliers) and then some houskeeping (what is your data comprised of {categorical, nominal, numerical or continuous} or {character, list, etc}, data types.

I like Roger Peng, leanpub.com/exdata. Free, and easy intro but it is more R than Python but you can get the main ideas easily.

For the next step after that let me know and I can suggest more...