DEV Community

Cover image for Data Science Foundation: 1. EVERYTHING you NEED to know about DATA SCIENCE
Abhishek Iyer
Abhishek Iyer

Posted on • Edited on

Data Science Foundation: 1. EVERYTHING you NEED to know about DATA SCIENCE

An Introduction about myself

Hello! I am Abhishek Iyer, I am an Electronics Engineering graduate and I am transitioning to the field of Data Science and Data Analytics. I've picked up quite a few things over the past couple of months that I believe will help you in achieving your learning goals in this field. I intend to make sure that these blogs are as comprehensive and beginner friendly as possible.

My motivation

Like a lot of beginners out there, I was once overwhelmed by the sheer amount of information available on the internet about Data Science. I am writing this series for everyone out there looking for some structured documentation at one place.
I also believe that to learn a complicated topic in depth, I need to teach it. So here I am teaching the things I've learnt.

The format and the Audience of the Series:

I will loosely follow the contents of the Data Science: ML specialist course and "Data Science from Scratch book" by Joel Grus. I highly recommend anyone starting off to take up this course.
I will try to write each article of this series from the perspective of a beginner Python developer and this series won't be suitable for the absolute non-programming beginners.

Comments and feedbacks(or corrections) about these posts are also very much welcome.

Now that the scene is set, lets dive in. Future articles will be less intro and more content.

The Origin and History of Data Science

The popularity of Data Science and Data Analysis has grown exponentially over the past couple of years due its applications in this present day data-driven world. Even though the term "Data Science" has evolved over the years, it was first conceptualized by Statisticians who came up with the idea of merging applied statistics with Computer Science. John W. Tukey, an American Mathematician foresaw the emergence of the field of Data Analysis in the article "The Future Of Data Analysis" in the early 1960s, even before the emergence of personal computers. During the 70s and 80s, the concepts of Data Science became more concrete with the founding of the International Federation of Classification Societies (IFCS), which focused on educating and training professionals on theories behind what would later be called Data Science. Data Science then emerged into a recognized and specialized field during the late 90s and early 2000s, when companies and businesses realized that they would need a specialized worker to collect and leverage enormous amounts of data to learn more about their customers and competitors. It was during this time that connectivity increased, thereby increasing the communication and the amount of data that could be collected.
Many tech giants like Google and Facebook saw the need for processing large amounts of data which led to the development of Hadoop and Spark four years later.
The applications of Data Science became ubiquitous in all our lives with the introduction of Machine Learning models to solutions. We come across Data Science driven solutions dozens of times a day, everything ranging from spam detection, credit card fraud detection to recommendation systems use the concepts of Data Science. This field is undergoing massive developments every year and people all around the world are making major breakthroughs in leveraging data for the better.

Roles and Responsibilities

There are 4 distinct roles that come under the Data Science bubble. These roles have their own respective pipelines.

  • Data Engineer: A data engineer is responsible for transforming real world data into something useful and better suited for analysis.

  • Data Analyst: An analyst uses the data to communicate their findings using interactive dashboards and summary statistics. Power BI and Tableau are the most common tools used by data analysts.

  • Data Scientists: A data scientist conveys stories and builds machine learning models.

  • ML/Operations Engineer: They deploy the model into the real world.

Different organizations have interchangeable roles and responsibilities depending on the size of the company, team and the data that is being processed. The main responsibilities of a data scientist include:

  • Data collection, storage and preprocessing

  • Data visualization and creation of KPI Dashboards

  • Experimentation and A/B testing

  • Statistical inference

  • Building ML models

  • Evaluating ML models

  • Deploying ML models

  • And Monitoring how ML models perform

Skillsets to have as a Data Scientist

These are the most crucial skills (both technical and non-technical) that a data scientist can possess.

  • A deep understanding on how Databases work.

  • Data visualization abilities

  • Programming abilities

  • A strong Statistical foundation

  • Business understanding

  • A strong communication ability

Resources

Here are a few resources if you want to read more on the different aspects of this article.

  1. The Evolution and Growth of Data Science
  2. Roles and responsibilities of a Data Scientist
  3. Data Science course
  4. Data Science from Scratch - First Principles with Python.
  5. Additionally you can also follow youtubers and podcasts related to this field.

Top comments (0)