DEV Community

Discussion on: Welcome Thread - v49

Collapse
 
jonlim profile image
Jon Lim

Hey Celine!

Would be curious to know if there are any best practices for testing in data science models? What would that look like?

Collapse
 
celinemol profile image
Celine

I think writing unit tests for statement coverage and integration tests is the most important thing. You want to make really easy for yourself by writing a testing script that you can just run every time you have a new iteration to your model or the functions you use for preprocessing so that you know you haven’t broken anything and you can trust that your code works.

I’ll start writing some documentation to provide examples and make this concept easier to digest but for now a quick google search might help you (: Hope this helps!

Thread Thread
 
jonlim profile image
Jon Lim

Right on - in your experience, is it something that happens with a lot of data science teams? (The writing of tests, I mean.)

I'm a weird convert towards testing, if it isn't immediately obvious hehe, but I'm always surprised at just how few tests can be found out there sometimes.

Thread Thread
 
celinemol profile image
Celine

No, I don’t see a lot of testing in the data science world (: I agree, I think there could definitely be a lot more of it. It would make writing data models a lot easier to scale, instead of building code and fix models. But I feel like a lot of data scientists aren’t taught how to write good tests and that’s why they’ve been able to survive without it. Are you a data scientist? Where did you learn how to test?