"Data scientist: The person who is better at explaining the business implications of analytical results than any scientist, and better at the analytical science than any MBA." - Dr. Jennifer Priestley
Introduction
Data science courses are everywhere, but it is often approached from a technical point of view. This article introduces data science from a non-technical perspective to set you on the right path.
To put metaphorically, data science is a subtle marriage between applied statistics and computer science. Data scientists use a combination of statistical techniques and computer algorithms to find patterns within datasets and then use their domain knowledge to interpret the meaning of those patterns and how they apply to real world situations. The purpose is to gain insight for making decisions.
The number of people interested in the field of data science has increased tremendously over the past few years, thanks to numerous publications like this one on why it is the "Sexiest Job of the 21st Century".
This article aims to set the readers on the right track as they step into the world of data science. The following section will give you an overview of Data Science career.
Overview of Data Science Career
Before we dive deeper, let us briefly take a look at the top reasons why people choose to get started with a career in data science:
- The demand for data scientists is high
- Salaries are considerably better than most workers
- Data scientists usually have the freedom to work in any location or any industry in the world
- Due to the shortage of data scientists, there is minimal competition and ease of Job Hunting.
The data science industry has several varied roles that individuals can fit into: machine learning expert, data visualization expert, data scientist and data engineer are a few of the many roles that you could go into. Depending on your work experience and background, getting into one role would be easier than another. For example, a software developer would find it easier to get into data engineering role.
While a career in data science may be interesting and available, prospective data scientists should consider leveling up their skills in statistics and programming before planning their next step. Now that you have an idea what Data Science is and why people are choosing a career in it, the next section will introduce you to one of the core components of machine learning β statistics.
Statistics as a requirement for Data Science
"The only relevant test of the validity of a hypothesis is comparison of its predictions with experience." - Milton Friedman
One of the prerequisites and a core components of machine learning is statistics. Statistics helps you understand the underlying concepts that allow artificial intelligence to function. A foundation in statistics is crucial to finding insights and also drawing conclusions from data. The concepts and techniques of statistics are widely used in data analytics for manipulating the data.
I have listed a few topics that you need to know:
- Types of data
- Population and Sampling
- Probability
- Central tendency measures
- Measures of dispersion
- Variables and variable selection
- Different types of distributions
- Central limit theorem
- Hypothesis Testing
- Regression
There are a lot of free resources online that can quickly get you up to speed with the topics above, so dedicate some time to learn them. I will also discuss programming as one of the core components of Data Science in the following section.
Programming as a requirement for Data Science
One of the interesting parts of the field of data science is Machine Learning, which refers to a collection of techniques utilized by data scientists that allow computers to learn from data.
Coding is required to implement machine learning and programmers who are competent with the implementation will have a strong grasp on how the algorithms (models) work and will optimize those algorithms with ease.
Machine learning is interesting in that the goal is to train computers to learn on their own.
Data scientists usually choose a programming language to work with. There are packages available and written in the language for you to easily get the work done.
Python is one of the most popular data science programming languages and it comes with loads of packages and community support, there is hardly any project you need to complete on data science that you cannot implement with python. I recommend that you go ahead and learn python.
Other programming Language for Data Science includes Julia, R, JavaScript, SQL, etc.
The utilization of statistical techniques and programming for data science will not be complete without domain knowledge, and I will briefly discuss this in the following section.
The need for Domain Knowledge
To put quite simply in this context, domain knowledge refers to the general background knowledge of the environment or field to which the methods of data science are being applied. When you are building predictive algorithms, understanding your dataset is of prime importance.
Data scientists spend about 85 percent of their time understanding and cleaning data. This is because an understanding of your data can save you a lot of time and resources.
Domain knowledge may not be absolutely necessary in most cases, but it gives you a certain insight into your data which further gives you an edge when modeling.
A data scientist with domain knowledge can easily translate that knowledge into computer programs and active data, which in turn can transform a program and ensure it is specialized for a particular field, making it highly valuable for end-users.
So What Next?
Nobody started really good, we progressively learned on the job. You should not compare your beginning to someone who already has 10 years of experience, that will be unfair to your pursuit and what you are building. Give yourself time to grow.
There are a good number of people who do not have passion for working with data, yet they end up pursuing a career in data science.
Go ahead and get started, join online communities of data scientists and machine learning practitioners such as Kaggle, start up with the beginner-level activities and scale your learning process, build, deploy and get feedback!
Reap the benefits of the predictive power of your machine learning models by deploying them. This is one of the most important steps from a business perspective but also the least taught one.
So there you go, happy learning.
If you find this information useful, feel free to share it so other aspiring data scientists can benefit as well.
If you have any questions, feel free to reach out!
Top comments (0)