DEV Community

Cover image for What is YOUR advice to NEW Data Scientists.
Waylon Walker
Waylon Walker

Posted on

What is YOUR advice to NEW Data Scientists.

What is YOUR advice to NEW Data Scientists.

Top comments (6)

helenanders26 profile image
Helen Anderson • Edited
  • SQL, SQL, SQL. You may expect that enterprise organisations have Data Lakes full of perfectly organised data ready for you. The reality is you will find yourself pulling ad hoc datasets from relational databases. SQL is a must before you even start thinking about AI/ML/NLP/whatever.

  • Learn about ETL. Don't think that because there is a Data Engineer on the team that they will be dropping everything each time you need a new dataset. Having at the least an appreciation for what they do, and at best a few clues about how to cobble together a process to get what you need will do wonders.

  • Appreciate you may be doing Data Analysis work in between big data science projects. Cleansing data and creating dashboards may be part of your role if the infrastructure and support aren't there for you to dive into a 100% Data Science focussed role.

  • Data is not perfect. The datasets you have been using in your training is (generally) cleansed, tidy and ready to go. In the real world, data may be entered by humans into free text fields, who knows what you will find in there. If fields in a form aren't required there's a pretty good chance humans won't bother filling them in, so no data for you. Things change all the time. A field in a database may have only been created recently so if no backfilling has been done your historical records may be empty. I could go on all day but the message is ... data is messy, don't feel hard done by if you spend days cleaning it up.

  • Data Science is a very new field in some organisations. In some cases, it isn't seen as an essential role so don't be surprised if your project gets cancelled when priorities change or budgets get pulled.

  • Data Science projects take time. You won't be working on a project for a week, then throw it over the fence to be deployed. Some project take months or even years to roll out.

  • Not everything needs code. You may think it will be simple to set up a data pipeline with Python, set it and forget it. But remember you need to maintain it or convince someone else to. Sometimes a tool is the best way to go even though you may want to crunch through the code.

  • At the same time ... sometimes the shiny new tool or library may not be the answer. Take time to consider the cost to set up, migrate data, refactor code, and how much effort it will take to onboard your team.

waylonwalker profile image
Waylon Walker • Edited

@helenanders26 you are on fire 🔥, there needs to be a 🦄 for comments like this!

After an intense year with a consulting firm in an aws sandbox, (where we didn't have database access). I am really starting to appreciate SQL again. Simple aggregations in the database can have a massive impact on pipeline performance!

I can say I feel like I work at a very small, dare I say startup, inside of a big company. Many days I wear all hats from analyst to dev ops, cloud engineer, data engineer. It definitely pays to have a bit of skill everywhere in order to keep your projects moving. I love the idea of the T model.

I am really good/bad at solving every problem with code... It's my hammer

mrsaeeddev profile image
Saeed Ahmad

My advice is to :

  • Understand common data science and machine learning terms
  • Brush up your mathematical and statistical knowledge.
  • Know how to convert your understanding of mathematical relationships into code
  • Get domain knowledge about the field
  • Learn to explain things to all the stakeholders
ajeebkp23 profile image

I'm not Data Scientist yet. Here are some links from my bookmark manager.

Hope you find some answers.

oj_blazes profile image
✨ Enoch Ojonimi Daniels ♛✮

It's nice to be excited about data science and the amazing loads of benefits and opportunities it promises. However, it's nice to have a good foundational knowledge of Mathematics and statistics, because data science is more scientific than programmatic. Thus a level of knowledge is necessary to ensure a good grasp of the concept in data science. So, endeavor to understand the Mathematics before the code implementation with algorithms

ngochuyyds profile image
Pham Ngoc Huy

Learn Math Well.