DEV Community

Ethan
Ethan

Posted on • Updated on

Intro to Data Science

What is Data Science?

Data science utilizes skills from statistics, scientific methods, artificial intelligence/machine learning, and data analysis in order to gain some sort of value from data. Data scientists also prepare the data. They do this by cleaning data of any unwanted variables. They aggregate and manipulate data to run machine learning algorithms against it. These machine learning algorithms use AI technology to figure things out from data. Companies are able to use data science to turn data into actionable intelligence for refining products and services. For example, through data science medical tests have been improved and are now able to diagnose diseases earlier and therefore improve treatments.

Data Science Process

  • Business Understanding
    • Define a project and what its purpose is.
      • Types of questions a data scientists asks are: How much? Which category? Which group? Is this different? What would someone prefer?
  • Data Mining/Sourcing
    • If needed, collect data with varying methods.
  • Data Cleaning
    • Fix inconsistencies within data.
    • Handle missing values and placement holding values.
  • Data Exploration
    • Generate visualizations to better understand data.
    • Use information gained to develop a hypotheses.
  • Feature Engineering
    • Decide which feature are worth using.
    • Optimize data to make it more meaningful that original data.
  • Predictive Modeling
    • Train models.
    • Evaluate how well those models performs.
    • Generate predictions.
  • Data Visualization
    • Use predictions to generate visualizations.
    • Use visualizations to communicate with key stakeholders.

Top comments (2)

Collapse
 
remcoboerma profile image
Remco Boerma

Interesting, where would you draw the line between data scientists and data engineers?

Collapse
 
zealfire243 profile image
Ethan

While they share many of the same skills, data engineers focus is to build data architectures, complex data pipelines, and write complex queries in order to more easily access the data. Data Scientists focus is to perform complex statistical analysis on the data and they are versed in calculus and probabilities.