DEV Community

Will Velida
Will Velida

Posted on

Exploring built-in Jupyter Notebooks in Azure Cosmos DB

In May 2019, The Cosmos DB team announced the preview of running Jupyter notebooks inside Cosmos DB accounts. The feature became publicly available in September 2019 and it’s available for all Cosmos DB API’s. With this feature, we can use Jupyter notebooks to run interactive queries, explore our data and visualize our data. We can even build machine learning models on our data in Cosmos DB!

What do I get out of this?

There are plenty of awesome benefits that we get from having Jupyter notebooks in our Cosmos DB accounts. These include:

  • Cool Data Visualizations that we can generate on our Cosmos DB data.
  • Easier Code Sharing in our notebooks. Ever tried to share code in Jupyter Notebook code in GitHub? It works, but by having Notebooks in Cosmos DB, it’s more interactive and we can display the results within the Azure portal.
  • Our code is more interactive and we can embed user controls within our notebooks
  • We can combine different types of documentation, text, images, animations etc etc within one document.
  • We can execute magic Cosmos commands in our notebooks! More on that later.

Sounds cool, how can I get this feature?

Creating a new Azure Cosmos DB Account with Notebook support is really straight forward:

In the Azure Portal, click on Create a resource and choose Azure Cosmos DB. In the Create Azure Cosmos DB Account, make sure you choose Notebooks. Click on Review + Create and then Create.

Once you’ve created your account, you will find your notebooks workspace in the Data Explorer pane.

But I already have a Cosmos DB account! Do I have to destroy my existing one to use notebooks?

Not at all! You can enable the notebooks feature by doing the following:
Head to the Data Explorer pane in your Cosmos DB account and select Enable Notebooks. Click on Complete Setup and your account will be enabled to use notebooks!

OK, I’m all set up! Show me a demo!

Now that we have notebooks in our Cosmos DB account, let’s start working with it! I’ve created a container in my Cosmos DB account and populated it with data by following this demo. (This demo is a one in a series of demos provided by the Cosmos DB Engineering team and if you want to deep dive into all things Cosmos DB, I’d highly recommend it!).

We can install new packages in our Jupyter notebooks in Cosmos just like any other notebook. Let’s install Pandas by typing in the following command:

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

Click run and shortly, we can use pandas in our notebook. Now let’s use one of those Cosmic Commands to create a Pandas DataFrame that we can work with. I have a collection called CustomCollection which is partitioned by /type. I want to create a DataFrame that deals with all my items that have a type of PurchaseFoodOrBeverage. We can do that by executing the following command:

%%sql --database EntertainmentDatabase --container CustomCollection --output df_foodOrBeverage
SELECT c.id, c.unitPrice, c.totalPrice, c.quantity, c.type FROM c WHERE c.type = "PurchaseFoodOrBeverage"
Enter fullscreen mode Exit fullscreen mode

In this query, I’ve only selected our POCO properties. SELECT * would also include the Cosmos DB system generated properties, so I’ve excluded them for now. We can now view our DataFrame by executing the following command:

df_foodOrBeverage.head(10)
Enter fullscreen mode Exit fullscreen mode

We should now see the following result (Your data may be different due to Faker generating random data):

Let’s finish off this quick demo by doing some basic visualization. Type in the following command:

pd.options.display.html.table_schema = True
pd.options.display.max_rows = None
df_foodOrBeverage.groupby("quantity").size()
Enter fullscreen mode Exit fullscreen mode

This produces a nteract data explorer. This allows us to filter and visualize our DataFrames. We just set the table_schema to True and max_rows to either a value that we want, or to None to show all the results.

There’s quite a range of things we can do with notebooks, such as using the built-in Python SDK for Cosmos DB and upload JSON files to a specific Cosmos DB container. Check out the documentation to see the full range of things we can do in Jupyter Notebooks for Cosmos DB!

Conclusion

Hopefully you now have a basic idea of what you can do in Cosmos DB Jupyter Notebooks. If you have any questions, please feel free to ask in the comments!

Top comments (0)