DEV Community

Cover image for The Ultimate Guide to Getting Started in Data Science.
Naftal Rainer
Naftal Rainer

Posted on

The Ultimate Guide to Getting Started in Data Science.

Data Science is an inter-disciplinary field whose true foundation lies in Mathematics, Computer Science and Business.
This makes it a broad discipline with a fast growing track in recent years.
Inter-disciplinary Fields >Image by towards data science.

The definition of data science has varied with the advancement in technology from company to company over time.

There is no definite path for getting started in Data Science because all it takes is the desire to pursue and the will to get started after which passion and commitment will drive you to success which is the end game for any learning activity.

Multi device connectivity >Big data era by promptcloud.com

In the current data era, zillions of data is generated and shared across different devices in different formats hence the need for data science which enables people to harness and get insights from these data with one main objective of decision making; data-based decision making which implies making decisions that are supported and backed up with data as evidence. (The opposite would be intuition-based decision making).

As a beginner in this field, one is always confused about what to learn or where to begin. This is a major setback especially to the self taught programmers(data scientist) who may end up wasting a lot of time or resources before getting to the right track and worst of all he/she may hit a dead end and give up which should not be the case when there's proper guidance and mentorship.
This article gives a basic road map to acquiring skills that are super relevant in the field of data science as listed below:

1.) Mathematics and statistics.

Mathematics is said to be the backbone of modern civilization and a remarkably efficient source of new concepts and tools to understand the β€œreality”.
Therefore mathematics is applicable to every dimension of our daily life. Some mathematical aspects come in handy when manipulating data. This is the stage where one needs to familiarize with the following:

  1. Linear Algebra Concepts: such as vectors and matrices, linear combinations, linear dependence and independence, matrix transformations, inverse and transpose of a matrix.

  2. Calculus Concepts: such as derivatives, integrations, differential equations, series e.t.c most of which can be computed using a programming language but it's good to know the math behind it.

In statistics, basic concepts such as descriptive statistics e.g. Mean, Median & Mode and distributions e.g. Normal, Poisson & Chi-square among others are necessary for data insight derivation. Having an understanding of variance and standard deviation also proves to be important for Confidence Interval Estimation and hypothesis testing. Probability probability concepts are necessary for determining chances and possibility variations.
Same as above, the statistical computations can be handled using a programming language but it's also good to know the math behind it as well as the interpretation of key factors and figures.

2.) Programming Skills.

These skills are applied on General purpose languages which are used more extensively for the purpose of implementing various algorithms. There are several programming languages for data science as well such as python, R, MATLAB, e.t.c
python is the most popular and is majorly used for statistical modelling)
Python for data science involves four different fields:

  1. Data collection - here python facilitates the acquiring of data to be used in the data science projects and the libraries involved are selenium and scrapy.
  2. Data analysis - in this python is used to clean and transform data to get some insights and the libraries used are NumPy and pandas.
  3. Data Visualization - this creates visualization from our data such as bar plots and pie charts and the libraries used are matplotlib, seaborn
  4. Model building – this is all about machine learning and application of some mathematical concepts and libraries such as sci-kit learn and Tensor Flow.

Databases.

Besides python, there are languages used to perform various operations on different databases.
There's the use of Structured Query Language (SQL) which requires familiarity with the core concepts of relational databases and queries which requires the knowledge of:

  • Data Definition Language (DDL) commands such as create, alter, drop, truncate and rename.
  • Data Manipulation Language (DML) commands such as select, insert, update and delete, DCL statements such as grant and revoke.
  • Transaction Control Language (TCL) commands such as commit and rollback. Also, we should know how to join tables.

There's also the Non-Relational Database also referred to as NoSQL which stores information in a dynamic, non-normalized and a more flexible manner. MongoDB, Redis, Aerospike and Couchbase are examples of non-relational databases.
Click here for the differences between Relational and Non-Relational Databases.

Data Visualization.

Data Visualization

This is important in communicating findings to people and for Exploratory Data Analysis. The tools used include PowerBi, Tableau and python packages e.g. seaborn

Finally, there are a lot of resources and sites online that provides a basic and credible foundation for data science at a cost and some for free e.g. Datacamp, Simplilearn, IBM Data Science and youtube

For mentorship, Data Science East Africa has delivered appropriately.

conclusion.

Peak Moments

Image by Scratch.mit.edu
Remember document every step you go through successfully i.e. write articles which helps with concept mastery and as you reach the peak of your journey, be a mentor you wish you had to someone else who's at the early stages.

I hope you found it encouraging. For any errors observed in this article, please mention them in the comments. πŸ§‘πŸ»β€πŸ’»

Hope you have a happy coding time! πŸ‘‹ πŸŒ±πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.