DEV Community

BettyES
BettyES

Posted on

At the start of your data science career? Need directions?

Several people, particularly from academia, have approached me asking how to get started with a career in data science. Obviously there are many online resources on this topic, nevertheless, here is my take on this. Feel free to send feedback if this was helpful or if you have something to add.

To start off, I'd like to explore what makes a good data scientist? Of course, there are plenty of blogs on that one [1-3].

In my opinion, essential skills include: good problem solving skills (finding a solution, even if it is not perfect), project management skills (being able to see a project through end to end), excellent communication with business/non-technical people, and being able to work well in a team. The latter might have been acquired during an M.Sc., most certainly during a PhD. However, some people have these skills naturally and might have proven so in other positions than a degree. In case you'd like to work on some of your soft skills have a look at this list of book recommendations [4]

Regarding the basic technical skills (again my personal recommendation): a coding language that you are comfortable with (focus on one for the beginning. Most used in the field: Python or R), solid math or statistical knowledge, a general idea about techniques to be applied in data science, and some example projects to present during an interview (in case you have no real business projects, it can be another data science project with a hypothetical link to a business context). Some great data-sets to work with can be found on Kaggle as a start, or simply use google's data-set search. But there are plenty of other publicly available data-sets, like shown in this video investigating the rise in knife crime in London.

Coding knowledge:

I would strongly recommend to practice your coding skills regularly (at least two days per week). Practice by solving short tasks, such as on Codewars, DataCamp, Excercism etc.
Particularly, interesting about Excercism are their mentored language tracks, where you will get feedback from experienced programmers on your coding solutions.

There are also plenty of -often free of charge- courses available on Udacity or Coursera to help you get started, e.g.:

Introduction to Python - Starting out in Python 3 (free)

Data Analysis with R - by Facebook (free)

If you ever get stuck, 'stackoverflow' is 'an absolute must' to search for answers, or even better, contribute yourself by posting questions or trying to help answer some. There is a high chance that a coding challenge will be part of your data science job interview. So make sure you are prepared. Have a look at some additional literature that might help [5], or have a look at this helpful blog [6] providing some questions and advice.

Statistical knowledge:

Refresh your stats knowledge. A friend of mine recommended the book 'Think Stats' [7] as a good introduction to probability and statistics using python programming. In case R will be your language of choice you could start with 'Introductory statistics with R' [8]. But there are plenty more. Again, you might also find helpful courses on Udacity, Coursera and the like.

Data Science knowledge:

To boost your data science knowledge have a look at Kaggle. Not only have they lots of free data sets to play with and the possibility to enter some data science challenges, but they also have good tutorials for starters. Another free R-based course 'Statistical Learning' (main focus regression and classification) [9], that I can highly recommend, accompanies the free book 'An Introduction to Statistical Learning (Applications with R)' [10]. Packt publishing offers a good range of data science books focusing on Python [11] and R 12

Finally, in case you are searching for some free programming books: https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md#r

Sources:

  1. Blog: the-essential-skills-and-traits-of-an-expert-data-scientist
  2. Blog: what-makes-great-data-scientist
  3. Blog: what-makes-a-good-data-scientist-at-a-small-company
  4. Blog: the-importance-of-soft-skills-in-data-science-book-recommendations
  5. Book: Cracking the Coding Interview - 6th edition. Gayle Laakmann McDowell (2015). (heavy on algorithms)
  6. Blog: notes-and-technical-questions-from-interviewing-as-a-data-scientist-in-2018
  7. Book: Think Stats - Probability and Statistics for Programmers, 2nd edition. (2014) A.B. Downey. O'Reilly
  8. Book: Introductory Statistics with R, 2nd edition. (2008) P. Dalgaard. Springer
  9. Course: Statistical Learning. (2016) T. Hastie & R. Tibshirani. Stanford University
  10. Book: An Introduction to Statistical Learning. (2017) G. James, D. Witten, T. Hastie & R. Tibshirani. Springer
  11. Book: Building machine learning systems with python, 3rd edition (2018) W. Richard et al.. Packt
  12. Book: Machine Learning with R, 3rd edition. (2019) B.Lantz. Packt

Top comments (3)

Collapse
 
shrysr profile image
Shreyas Ragavan • Edited

It is also important to understand that a large portion of a data scientist's time is likely to be spent in wrangling data.

Hadley's R for Data Science (r4ds.had.co.nz/) is excellent as a starting point for learning R.

Collapse
 
bettyes profile image
BettyES

Hey. Thanks for the recommendation. I think I will have a bit of a read in that one myself =)

Collapse
 
tejsinghrana profile image
Tej-Singh-Rana

for a beginner, Data Science is a good option to opt?