DEV Community

Cover image for The Ultimate Guide to Getting Started in Data Science.
viola kinya kithinji
viola kinya kithinji

Posted on

The Ultimate Guide to Getting Started in Data Science.

Data Science

Is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

I will handle this topic using the following steps:

-Downloading and installing python and the text editors
-Learn the data science process.
-Attempt your first data science project.
-Statistical learning
-Data structures and algorithms.
-Machine learning
-Attempt an advanced project.
-Learn more python.
-Deep learning.

Downloading and installing python and text editors
As a data scientist you will need first to be conversant with excel this is because of data cleaning and sorting. Then go to your browser and download python, run and install. we have different environments to write your python code example pycharm, visual studio and python IDE. We also have online platform like the goggle colab and jupyter notebook.

Learn the data science process
1st revolution - mechanization led by the steam engine
2nd revolution - mass production driven by electricity and oil based power.
3rd revolution - Automated production supported by electronics and information technologies.
4th revolution - information technologies, internet of things, artificial intelligence, Big data, cloud, Cyber physical systems.

Data scientist - also known as data managers and statisticians. data scientist takes data projects from end to end. They can help store large amount of data, create predictive modelling processes and present the findings.
Data engineers - also known as database engineer and data architect, they use computer science to process large datasets. They focus on coding, cleaning up datasets and implementing requests that come from data scientists.
Data analysts - they help people from across the company understand specific queries with charts.


The graphical representation of information and using visual elements like charts, graphs and maps, data visualization tools provides an accessible way to see and understand trends, outliers and patterns in data. They use libraries like pandas, seaborn and numpy and matplotlib.

First data project

my first project was on data visualization I downloaded a mental health dataset from data world, sorted the data and cleaned. For data analysis and visualization I used jupyter notebook and google colab and python libraries like pandas and matplotlib. You can look for sample projects in areas that you are well polished and work with them. Or also hacker ranks.


We have relational and non relational database. Non Relational databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relation used in relational database.
relational database - structured to recognize relations between stored items of information.

statistical learning

In python we have a built in python library for descriptive can be used if your data datasets are not too large or if you you cant rely on importing other libraries. Numpy is a third party library for numerical computing, optimized for working with single and multi-dimensional arrays. Here you choose and get started with python statistics libraries, calculating descriptive statistics, working with 2D data, visualizing Data.

Data structures and Algorithms

Data structures - are containers that organize and group data according to type, They differ based on mutability and order. Mutability refers to the ability to change an object after creation. we have two types built in(lists, tuples, sets and dictionaries) and user defined data structures(Stacks using arrays, The condition check, Queue using arrays).
Data algorithms- Is a sequence of steps executed by a computer that takes an input and transforms it into a target output.

Machine learning

This is making the computer learn from studying data and statistics. Machine learning is a step into the direction of artificial intelligence(AI).ML is a program that analyzes data and learns to predict the is used in installing python, loading, summarizing and visualizing datasets, evaluates some algorithms and make some predictions.

Advanced project

If you are conversant with python libraries and some of the external libraries you should now know how to install external libraries and work with them. Below are some source code to use at an advance project:
-Digital clock GUI
-Get desktop notification with python.
-use your phone camera for computer vision.
-music player GUI.
-image converter GUI
-weight converter GUI.

Learn more python

Python evolves every time so you have to keep coding and discovering new techniques in python. One thing about coding languages they need consistency and committed. Don't be too comfortable in one area explore the whole language understanding every bit.

Deep Learning

Is a subset of machine learning that trains a computer to perform human like tasks, image identification and prediction making. It improves the ability to classify, recognize, detect and describe using data. It's role is to process both unlabeled and unstructured data. This learning method also creates more complex statistical models. With each new piece of data, the model becomes more complex.


That was a brief description of the ultimate guide to get started in data science, in your free time go through the steps deeply and make sure you understand what in happens in each step precisely. Your mindset and consistency is key.

Top comments (0)