What is Data Science?
According to Wikipedia “Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,and apply knowledge and actionable insights from data across a broad range of application domains.”
It further add adds “Data science is a "concept to unify statistics, data analysis, informatics, and their related methods" in order to "understand and analyse actual phenomena" with data.”
Who is a data scientist?
“A data scientist is someone who is able to lead a team of data analysts or data engineers to work towards solving a business problem, by collecting data and come out with relevant findings which can potentially help the company’s growth.” - https://thelead.io/ultimate-guide-data-science
Data Engineers focus on making the data available and accessible to data scientists integrating various data sources and optimizing access to data while the data analysts are responsible for extracting insight from the data.
Lifecycle of Data Science
The core job of Data Scientists is to address problems and construct Models to make better decisions for multifaceted business challenges. Data Science processes involve the following steps:
- Defining Business Problem – This will be the main objective of the entire process so as to satisfy external stakeholders
- Data Collection and Preparation
- Exploratory Data Analysis
- Model Building
- Model Optimization
- Model Deployment and Evaluation
Tools used in Data Science
Statistics and Probability – Basic to advance knowledge of statistics and probability is required. From the simple measure of central tendencies (mean, mode and median) to complex multivariate probability distributions.
Data Mining & Cleaning - How to collect, extract, query, clean, and aggregate data for analysis.
Exploratory Data Analysis - How to explore datasets using visual analytics and statistical analysis.
Data Visualization and Storytelling Skills - Data storytelling combines data with human communication to craft an engaging narrative that’s anchored by facts. It uses data visualization techniques (e.g., charts and images) to convey the meaning of the data in a way that’s compelling and relevant to the audience.
Modelling, Validation & Problem Solving - A model is a mathematical representation of the real world and is used in formulating hypotheses and trying to solve business problems. All models must be validated using test data in order to provide insight into real-world problems and solutions.
Programming Language – Learn to program in computer programming languages that are commonly used in data science like R and Python. Other programming languages can be used, but when it comes to data science these two are the most popular. In addition Python and R are supported by libraries and packages that are meant for data science.
Communication - Communicate key findings and results by creating an effective presentation.
A.I. & Machine Learning - Artificial Intelligence is broad field with many subfields with of which one of them is machine learning.
Big Data Analysis - Big data is information that has one or all of the following; volume, velocity, variety and veracity.