How Should I Learn for Data Science?
I am assuming that you are a complete beginner in Python. So, I will start the learning path from scratch. For our convenience, I have divided the whole learning path into different steps. So that we can easily move forward step by step and achieve your learning goal.
I will also mention the resources to learn the different topics of Python. So, this article is a complete guide for your learning.
But before starting the learning path, I would like to discuss why Python is good for Data Science. I know, you already knew this… that’s why I will not bore you with so much detailed explanation. I will only explain why I like Python for Data Science.
Why Python is Good for Data Science?
The most appealing quality of Python is that anyone who wants to learn it, even beginners, can do so quickly and easily. Unlike other programming languages, such as R, Python excels when it comes to scalability.
And the most important thing is that Python has a wide variety of data analysis and data science libraries- pandas, NumPy, SciPy, StatsModels, and scikit-learn.
Python also has a huge community. That means, there is a huge Python community that can help you when you are stuck at some point.
So, these are the main reasons, I prefer Python for data science.
Now, let’s move to your question- “What Should I Learn in Python for Data Science?” and start with the first step-
Step 1- Learn Python Basics
You have to start with learning Python Basics. when I say, “Python Basics”…you might be thinking about what exact topics you have to learn.
Right?
So, don’t worry… I am going to list the topics which you have to learn in this step. So that, you will not be confused about topics.
In Python Basics, learn the following topics-
- Installing Python & Python Environment
- Numbers
- Strings
- List
- Standard Input, Output
- Basic commands in Python
- If-then-else statement
- Loops
- Functions
- Variable Scope
- Dictionary
- Sets
- Classes
- Methods & Attributes
- Modules & Packages
- List Comprehension
- Map, Filter, and Lambda
- Decorators
- File Handling
The list is long…but these are very easy to grasp topics. You can learn these topics within a week if you plan your learning accurately.
Now, I have discussed the topics…it’s time to discuss the resources to learn Python Basics.
- Python for Everybody Specialization– Coursera
- Introduction to Python Programming– Udacity (Free Course)
- The Python Tutorial- (PYTHON.ORG)
- Python Tutorial- MLTUT
- Introduction To Python Programming– Udemy, by Avinash Jain
Step 2-Learn Python Libraries for Data Science
Python has a rich set of libraries to perform data science tasks. At this step, you have to learn about these libraries.
Libraries are the collection of pre-existing functions and objects. You can import these libraries into your script to save time.
If you want to know the differnce between Differences Between Python Modules, Packages, Libraries, and Frameworks click the link below
Differences Between Python Modules, Packages, Libraries, and Frameworks
Python has the following libraries-Numpy- NumPy will help you to perform numerical operations on data. With the help of NumPy, you can convert any kind of data into numbers. Sometimes data is not in a numeric form, so we need to use NumPy to convert data into numbers.
Pandas- pandas is an open-source data analysis and manipulation tool. With the help of pandas, you can work with data frames. Dataframes are nothing but similar to Excel files.
Matplotlib– Matplotlib allows you to draw a graph and charts of your findings. Sometimes it’s difficult to understand the result in tabular form. That’s why converting the results into a graph is important. And for that, Matplotlib will help you.
Scikit-Learn- Scikit-Learn is one of the most popular Machine Learning Libraries in Python. Scikit-Learn has various machine learning algorithms and modules for pre-processing, cross-validation, etc.
So, these are the 4 libraries that you have to learn at this step. Now, let’s see the resources to learn these Python Libraries.
Resources for Learning Python Libraries
- NumPy Tutorial by freeCodeCamp
- Exploratory Data Analysis With Python and Pandas (Guided Project)
- Applied Data Science with Python Specialization by the University of Michigan
- NumPy user guide
- pandas documentation
- Matplotlib Guide
- scikit-learn Tutorial
Step 3- Learn basic Statistics with Python
Statistical knowledge is essential for data science. Knowledge of statistics will give you the ability to decide which algorithm is good for a certain problem.
Statistics knowledge includes statistical tests, distributions, and maximum likelihood estimators. All are essential in data science.
statsmodels is a popular Python library to build statistical models in Python. Statsmodels is built on top of NumPy, SciPy, and matplotlib, and contains advanced functions for statistical testing and modeling.
Resources to learn Statistics with Python
- Practical Statistics– Udacity
- Statistics with Python Specialization– University of Michigan
- Fitting Statistical Models to Data with Python– Coursera
- Statistics Fundamentals with Python– Datacamp
- Learn Statistics with Python– Codecademy
Step 4-Learn Accessing DataBase
You should know how to store and manage your data in a database. You can use SQL to store your data but it is good to know how to connect to databases using Python.
MySQLdb is an interface for connecting to a MySQL database server from Python. It implements the Python Database API v2.0 and is built on top of the MySQL C API.
PyMySQL is also an option. PyMySQL **is also an interface for connecting to a MySQL database server from Python. It implements the Python Database API v2.0 and contains a pure-Python MySQL client library.
The goal of **PyMySQL is to be a drop-in replacement for MySQLdb.
Resources to learn MySQLdb and PyMySQL
Step 5- Build Your First Machine Learning Model with scikit-learn
scikit-learn is a library offered by Python. scikit-learn contains many useful machine learning algorithms built-in ready for you to use.
Now you need to experiment with different machine learning algorithms.
Find a Machine learning problem, take data, apply different machine learning algorithms, and find out which algorithm gives more accurate results.
Step 6- Practice, Practice, and Practice
At this step, you need to practice as much as you can. The best way to practice is to take part in competitions. Competitions will make you even more proficient in Data Science.
When I talk about top data science competitions,** Kaggle is one of the most popular platforms for data science. Kaggle** has a lot of competitions where you can participate according to your knowledge level.
You can start with some basic level competitions such as Titanic — Machine Learning from Disaster, and as you gain more confidence in the competitions, you can choose more advanced competitions.
So, these are the steps to learn Python for Data Science. If you follow these steps and gain these required skills, then you can easily learn Python for Data Science.
Some Other Free Resources
Python
- Corey Schafer - Corey Schafer
- Sentdex-Sentdex
Machine Learning with Maths, Statistics and Linear Algebra
Natural Language Processing
Deep Learning
Data Science Projects
Blogs that are freely Available
[Feature Engineering Playlist](https://github.com/aikho/awesome-feature-engineering
)
[Feature Selection Playlist](https://github.com/anujdutt9/Feature-Selection-for-Machine-Learning
)
These are the few resources i use to learn Data Science and mostly available for free.
If you have any other resources feel free to share, that may help others to learn .
Conclusion
I hope you got an answer to your question-“What Should I Learn for Data Science?“. If you have any doubts or queries, feel free to ask me in the comment section.
Happy Learning!
Top comments (0)