DEV Community

Cover image for Data Science: 21st Century’s ‘Sexiest Job’
Daniel Ndukwe
Daniel Ndukwe

Posted on

Data Science: 21st Century’s ‘Sexiest Job’

In October 2012, Data Science was famously tagged the “sexiest job of the 21st century” in a Harvard Business Review article by Thomas H. Davenport and D.J. Patil. Fast forward to the year 2025 and this career has seen an exponential growth and increase in demand and appeal for both top companies looking to drive revenue with data driven applications and undergrad pivoting more and more into these majors at top universities and polytechnics in the world. But what even is data science?

Data science is the application of statistical and computational techniques used to address or gain valuable insights into a real world problem. As simple as that definition of this ‘Sexy job of the 21st century is’, it packs a lot of complexity that may be intimidating to the ‘average joe’ out there. Despite its complexity, this endeavor has seen even more appeal over the years particularly with the advent of generative Artificial Intelligence (A niche within data science that uses machine learning and deep learning to communicate like humans through the use of Natural Language Processing)

But what does the data science workflow look like and what tools are used in this really exciting career path. Let’s explore this

The Data Science workflow usually starts with the understanding of the business problem. For example, A company wants to find out which advertising platform generates a lot of traffic for their product and which they should put more budget for their advertising into. Whether it is TV, Social Media or Traditional forms like Paper print and billboards. Your understanding of this problem sets the tone for more actionable insights into providing a solution.

The next involves data collection. This can be done by either collecting the data from web sources through the use of web-scraping and API calls and on-site or manual collection of data particularly those working in the geospatial industry.

Another step in the data science workflow involves the Data processing or Data pre-processing. This is the most crucial step into any data science career because real-world data is messy i.e. it contains lots of errors, missing values, duplication and redundancy. The process involved in this is usually called ‘Data Cleaning’.

Furthermore, the exploration of the data which involves checking for outliers, calculating the measure of central tendency (Mean, Median and Mode), Pearson's correlation techniques and several visualisation techniques used is another part of data processing. Also in feature engineering which simply means choosing the right features in the dataset to improve the performance of machine learning models, is done as par t of the process of transforming raw data into meaningful insights.

Machine learning, a subset of Artificial intelligence, is a technique where the machine learns from previous data to make a prediction. Think of it as a student studying for a future exam and using lots of Past Questions to get the mode and structure of the future exam. Machine learning teaches the computer (through your preprocessed data) to learn from the data so it can make valuable predictions or insights. Knowledge in statistics, calculus and probability are mostly key in this aspect.

The final workflow i’ll be talking about is ‘Visualisation’. This involves the process of turning data/numbers into charts and graphs to capture the imagination of what the data looks like. Techniques such as Bar Chart, Histogram, Box Plot, Violin plot etc. is usually used in this aspect.

There are other aspects of the data science workflow that helps to get data driven insights about a business problem. They may include Data story-telling, statistical learning, presentation skills et al.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# load the titanic dataset
titanic = pd.read_csv('titanic.csv')
titanic.head()

# Check the shape of the dataset
titanic.shape

# Check the info
titanic.info()

# check for the missing values
titanic.isnull().sum()
Enter fullscreen mode Exit fullscreen mode

Tools Used for Data Science
Programming Knowledge (Python, R, Julia, MATLAB)
ML Algorithms (Scikit-learn, Matplotlib, Seaborn, Pandas, Numpy, Scipy etc)
Visualisation tools (Tableau, PowerBI, Python)
Databases (SQL, NoSQL[MongoDB])

Top comments (0)