DEV Community

Waylon Walker
Waylon Walker

Posted on

What harmful habits do datascientists pick up over time?

Weigh in with your thoughts!

Inspired by @ben

Discussion (7)

Collapse
mrsaeeddev profile image
Saeed Ahmad

In my opinion:

  • Building strong opinions about certain models
  • Not exploring data enough
  • forcing the data to "cry" i.e forcefully obtaining results that you want from data
  • relying too much on some libraries
  • thinking that every problem is a 'data-science' problem
Collapse
waylonwalker profile image
Waylon Walker Author

forcing the data to "cry" i.e forcefully obtaining results that you want from data

This one is so tough. I get it all the time. "Well can't you look again. Try this very forceful twist on it.". It can be a tough fight.

If there's no correlation don't make one, and don't let leadership force you into thinking there is one.

Collapse
dhaitz profile image
Dominik Haitz

"If you torture the data long enough it will confess"

Collapse
waylonwalker profile image
Waylon Walker Author

Not exploring data enough

Getting familiarity with data takes time. Leverage experts of the business as much as possible. Get datasets small enough that they are digestable by tools the business experts are familiar with.

Collapse
waylonwalker profile image
Waylon Walker Author • Edited

Lack of using git😭

I see almost every day, my-amazing-report-final1a-b-24.xlsx.

If you are not using git at least do yourself a favor and set up some rules around versioning to use every day. Always increment the version, never use words like final, or new (they will be outdated as fast as you write them).

I highly recommend git, but understand that its not for everyone. Just make it intuitive to everyone which version is the latest.

Collapse
waylonwalker profile image
Waylon Walker Author

Spending too much time working on a feature that was not needed.

i.e. lack of understanding the business.

Collapse
janmpeterka profile image
Jan Peterka (he/him)

I would say this is more of a developer then specifically data-science thing.
But other than that, I completely agree!