Daniel

Posted on Aug 30, 2022

The Five Worst Things About Jupyter Notebooks

#python #datascience #ai #programming

I used to love Jupyter. I still think they are a wonderful tool for many tasks like exploratory data analysis and presenting insights to colleagues nicely and easily. However, while they are great for data science some of the time, other times they are a headache. Like any software tool, they have their downsides. Here are the five worst things about Jupyter Notebooks for data science:

1. It is almost impossible to practice good code versioning

Jupyter Notebooks are terrible for code versioning. The problem is that they are stored as JSON files, which are basically just a bunch of nested dictionaries. This means that when you try to diff two Jupyter Notebooks, you just get a bunch of meaningless data. This makes working in a team with several notebooks extreme tedious and difficult

2. The non-linear workflow of jupyter - It's best and worst part

Jupyter Notebooks have a non-linear workflow. This is b This means that you can execute cells out of order, which can lead to confusion and errors. This is of course also one of the big selling points for Jupyter, but is only useful for early data analysis and exploration and therefore ends up being a downside more often then not.

3. Jupyter is bad for running long asynchronous tasks

Jupyter is not well suited for running long, asynchronous tasks. This is because Jupyter is designed to keep all cells in a notebook running in the same kernel. This means that if one cell is running a long, asynchronous task, it will block the execution of other cells.

This can be a major problem when you're working with data that takes a long time to process, or when you're working with real-time data that needs to be updated regularly. In these cases, it can be much better to use a tool like Dask, which is designed for parallel computing.

4. Jupyter can be slow

Jupyter can be slow to start up, and it can be slow to execute code. This is because Jupyter is an interactive tool, and it has to load the entire notebook in memory in order to provide the interactive features.
If you're working with large data sets or large notebooks, this can be a major problem. Jupyter is simply not designed to be used with large data sets.

5. No IDE integration

This is just my opinion, but not having linting and code styling warnings is a big downside for Jupyter. IDE features are simply too convinient - like the ability to jump between function declarations, code styling and other features make it a lesser developer experience compared to a full fledged IDE.
Now, this is a bit of a lie because I have been using Jupyter through Pycharm Proffessional, being able to use pycharm's debugger in cells is often the best of both worlds.

One more thing

It's often important to consider where computations are run. For code that’s easy to put into Docker, deploying to a cloud solution is easy. For notebooks, there are also good options, though you’re more locked into specific solutions.

If you want to look into Jupyter notebooks, it’s definitely worth looking into Amazon SageMaker and/or Kubeflow.

In conclusion, Jupyter Notebooks are not the ideal tool for data science projects. They are ideal for prototyping, but for you own sanity, migrate away from them before writing serious production code.

Star our Github repo and join the discussion in our Discord channel to help us make BLST even better!
Test your API for free now at BLST!

Top comments (6)

Git-Ilan • Aug 30 '22

I also prefer Pycharm!

Dendi Handian • Aug 30 '22

I guess Pycharm has more stable Jupyter notebook integration than VS Code.

Daniel • Aug 30 '22

It's the best! (sorry vs code lovers <3)

oh hi mark • Aug 31 '22

I gatta say I been using Hex.tech, it’s really awesome. Handles a few of these things, esp IDE and seamlessness between sql/Python (no affiliation)

Daniel • Sep 1 '22 • Edited

Looks super cool! I'll definitely check it out

Nelson Cárdenas Bolaño • Sep 6 '22

In VSCode you can write notebooks and use Flake8 or things like that to lint your code and take advantage of the VSCode Extensions, but I recognize notebooks generate a lot of problems.

Try REST API Generation for Snowflake

DevOps for Private APIs. Automate the building, securing, and documenting of internal/private REST APIs with built-in enterprise security on bare-metal, VMs, or containers.

Auto-generated live APIs mapped from Snowflake database schema
Interactive Swagger API documentation
Scripting engine to customize your API
Built-in role-based access control

Learn more

DEV Community

The Five Worst Things About Jupyter Notebooks

1. It is almost impossible to practice good code versioning

2. The non-linear workflow of jupyter - It's best and worst part

3. Jupyter is bad for running long asynchronous tasks

4. Jupyter can be slow

5. No IDE integration

One more thing

Top comments (6)

Try REST API Generation for Snowflake

Read next

AI Style Transfer Boosts Mammogram Training Data, Improves Cancer Detection Models

AI Adapts in Real-Time to Enhance Medical Scan Quality with 15% Better Accuracy

New Training Method Makes AI Decision-Making More Transparent and Logical

Step-by-Step AI Reasoning System Improves Language Model Accuracy by 8.5%

AWS GenAI LIVE!