DEV Community

zyazyaG
zyazyaG

Posted on

4 Things Your Data Science Project Might Be Missing

Throughout my yet short journey in the Data Science field, I’ve studied many projects and works of other beginners and professionals. Mainly, this is how I improve my skills and get directions on where I need to strengthen my knowledge. Looking at those works I’ve noticed 4 shortcomings related to project structure of beginner data scientists that I want to mention today.

1. No Comments

Most of the time when we look at someone else’s code, we have that weird facial expression called confusion.

Image description

The lack of well-written comments is the main weakness of the project. It is important not to only explain what you are doing, but also why are you doing that. Why?! If you want someone to look at your work and not skip over it, you need to make sure that you clearly explained the thought flow and process of your project. For example, if you are utilizing SMOTE in your project, explain that the reason behind it is data imbalance.

2. No Visualizations

Image description

If I ask you to close your eyes and list all continents, the first thing that your brain would do is visualize the map. The human brain tends to better comprehend and analyze visual information. Thus, while working on a project, it is essential to show your findings or analysis through graphs and plots. That way you’ll make sure that the person looking at your work will be more engaged and interested. And maybe even see things you might have missed.

3. No Virtual Environment

Imagine that someone liked your project and wanted to take a better look at it or test it on his/her own. So, they download it…but wait, for some reason your project is not working on their local machine. Most of the time that happens because some of the libraries or packages you’ve used in your project are not on their system. The best way to ensure that strangers won’t have problems utilizing your work is to create a virtual environment for your project. This environment will have all the packages and dependencies of your work that would ease the interaction of others with your project.

Image description

4. No User-Defined Functions

The last thing I wanted to mention is the lack of user-defined functions and classes. Most of the time the code snippets in Data Science projects repeat themselves: when you perform data cleaning, EDA, feature engineering, and modeling. Why not make your life and the life of others easier?! If you noted that you type the same code over and over, try to build a reusable function or class for it. In that case, you’ll save your time and make the code look more organized and easier to understand.

Image description

Keep in mind those 4 things, like I do when working on my projects. I hope you'll find those tips as useful as I do.

Top comments (0)