<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kevin White</title>
    <description>The latest articles on DEV Community by Kevin White (@kevindeepnote).</description>
    <link>https://dev.to/kevindeepnote</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1018541%2Fcdf6ae76-454b-440c-a6e4-6dc987ea569e.png</url>
      <title>DEV Community: Kevin White</title>
      <link>https://dev.to/kevindeepnote</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kevindeepnote"/>
    <language>en</language>
    <item>
      <title>How to speed up pandas with NumPy methods</title>
      <dc:creator>Kevin White</dc:creator>
      <pubDate>Tue, 28 Feb 2023 17:00:58 +0000</pubDate>
      <link>https://dev.to/deepnote/how-to-speed-up-pandas-with-numpy-methods-34pa</link>
      <guid>https://dev.to/deepnote/how-to-speed-up-pandas-with-numpy-methods-34pa</guid>
      <description>&lt;p&gt;You'd be hard-pressed to find a data scientist who doesn't use pandas for their day-to-day work, but sometimes it pays to go from pandas to NumPy.&lt;/p&gt;

&lt;p&gt;Pandas boasts great accessibility and a wide range of tools for small and medium-scale projects, but not all pandas methods utilize vectorization, which reduces the runtime of code. Take the &lt;code&gt;apply()&lt;/code&gt; method, for instance. It's essentially a glorified Python "for" loop, and it comes with immense speed constraints, especially with large data sets.&lt;/p&gt;

&lt;p&gt;On the bright side, you can speed up many pandas methods by pulling from NumPy. Let's look at how to speed up pandas with NumPy methods using a data notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases for using NumPy methods to speed up pandas
&lt;/h2&gt;

&lt;p&gt;Runtime optimization is crucial in a business context. It directly impacts the performance of programs --- especially bigger, more complex ones.&lt;/p&gt;

&lt;p&gt;Just consider the importance of customer experience. If your company's application is powered by a program that runs faster and more efficiently, your end users are bound to be more satisfied. Hence, increased customer loyalty and lifetime value. On the other hand, a sluggish app runs the risk of sending customers to your competitors.&lt;/p&gt;

&lt;p&gt;There are also cost savings and scalability to consider. Optimizing the runtime of a program means it will require fewer resources to execute, helping to reduce costs. And as applications grow in size and complexity, runtime optimization becomes even more critical to making sure performance doesn't suffer.&lt;/p&gt;

&lt;p&gt;The issue with pandas is that although it supports vectorization, some of its methods don't. This is true for the &lt;code&gt;apply()&lt;/code&gt; and &lt;code&gt;groupby()&lt;/code&gt; methods. You end up using native Python "for" loops for execution, which slows pandas down.&lt;/p&gt;

&lt;p&gt;But NumPy can help improve the performance of pandas in several ways. For instance, if you're performing numerical operations, NumPy offers a suite of numerical functions, including element-wise operations and linear algebra. By using these functions instead of equivalent pandas methods, you can get a performance boost.&lt;/p&gt;

&lt;p&gt;Plus, since NumPy provides a high-performance array data structure that is optimized for numerical computations, you can often achieve faster computations when compared to pandas DataFrames. There's also the fact that NumPy arrays are stored in a contiguous block of memory, making them slightly more memory-efficient than DataFrames, which store data in a more complex structure and form.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using NumPy methods to speed up pandas in a data notebook
&lt;/h2&gt;

&lt;p&gt;Pandas and NumPy are supported across both interactive Python environments and Python scripts. Therefore, runtime optimization can be achieved in either of them.&lt;/p&gt;

&lt;p&gt;For this tutorial, we'll be exploring how to go from pandas to NumPy methods in a notebook that has Python installed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;👉 Keep in mind that you can skip the setup process below if you're using a Deepnote notebook. Deepnote comes complete with the most popular &lt;a href="https://deepnote.com/docs/pre-installed-packages"&gt;data analytics and machine learning Python libraries&lt;/a&gt; already installed, so you can seamlessly import them into your project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Installing pandas &amp;amp; NumPy
&lt;/h3&gt;

&lt;p&gt;To install pandas and NumPy in a Jupyter notebook, run the following command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zotQS9gH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/18enskv7cqfev4xdosjy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zotQS9gH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/18enskv7cqfev4xdosjy.png" alt="Image description" width="880" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Importing pandas &amp;amp; NumPy
&lt;/h3&gt;

&lt;p&gt;Next, to use their methods, we must import them. The commonly used alias for pandas is &lt;code&gt;pd&lt;/code&gt; and for NumPy it's &lt;code&gt;np&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mfRS_oov--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ptml461aek2jfkkjid9i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mfRS_oov--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ptml461aek2jfkkjid9i.png" alt="Image description" width="880" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pandas vs. NumPy runtime comparison
&lt;/h2&gt;

&lt;p&gt;First, let's create a dummy data set for experimentation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AGRurUoX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/goc4nsx075iezz3t2f1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AGRurUoX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/goc4nsx075iezz3t2f1e.png" alt="Image description" width="880" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let's create a NumPy array with two columns and 10⁸ rows. From this array, we create a pandas DataFrame using the &lt;code&gt;pd.DataFrame&lt;/code&gt; method.&lt;/p&gt;

&lt;p&gt;The first five rows are shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yw_Bztoj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xq2y5pa2vurkf1kealy1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yw_Bztoj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xq2y5pa2vurkf1kealy1.png" alt="Image description" width="880" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Pandas apply() vs. NumPy where() for conditional columns
&lt;/h3&gt;

&lt;p&gt;We mostly use the &lt;code&gt;apply()&lt;/code&gt; method to create conditional columns in pandas. The implementation is shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WFPraJsD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hao2bmz8sa61aehaguux.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WFPraJsD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hao2bmz8sa61aehaguux.png" alt="Image description" width="880" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We categorize the column as "Class A" if the entry is less than 45 and "Class B" otherwise.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9kD6Lujo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gan3v63tw80b206us8ym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9kD6Lujo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gan3v63tw80b206us8ym.png" alt="Image description" width="880" height="134"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As discussed above, the &lt;code&gt;apply()&lt;/code&gt; method is basically a glorified "for" loop. As a result, it misses the whole point of vectorization.&lt;/p&gt;

&lt;p&gt;Using NumPy's &lt;a href="https://numpy.org/doc/stable/reference/generated/numpy.where.html"&gt;&lt;code&gt;where()&lt;/code&gt;&lt;/a&gt; method, we can create conditional columns while also having vectorization in place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--yVUwYkNt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zawb2btyuvpf9geqbspg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--yVUwYkNt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zawb2btyuvpf9geqbspg.png" alt="Image description" width="880" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this method, the condition is passed as the first argument. This is followed by the result if the condition evaluates to True (second argument) and False (third argument). As you can see above, the &lt;code&gt;np.where()&lt;/code&gt; method is approximately five times faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pandas apply() vs. NumPy select() for conditional columns
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;np.where()&lt;/code&gt; method allows us to choose between two results (i.e., it's for binary conditions). If there are multiple conditions, &lt;a href="https://numpy.org/doc/stable/reference/generated/numpy.select.html"&gt;&lt;code&gt;np.select()&lt;/code&gt;&lt;/a&gt; should be used.&lt;/p&gt;

&lt;p&gt;Consider we have the following function:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S9BfJmir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bnyf6rcq32uxksdenybf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S9BfJmir--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bnyf6rcq32uxksdenybf.png" alt="Image description" width="880" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the &lt;code&gt;apply()&lt;/code&gt; method, we get the following runtime:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b-WDntF9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tktrczqsf9g6l0mrwrxf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b-WDntF9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tktrczqsf9g6l0mrwrxf.png" alt="Image description" width="880" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But if we use &lt;code&gt;np.select()&lt;/code&gt;, we get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j178oax6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ybn9q5fm9bmnn42j53dd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j178oax6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ybn9q5fm9bmnn42j53dd.png" alt="Image description" width="880" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The conditions and the corresponding results are passed as the first two arguments. The last argument is the default result.&lt;/p&gt;

&lt;p&gt;You can see that the &lt;code&gt;np.select()&lt;/code&gt; method is five times faster than the &lt;code&gt;apply()&lt;/code&gt; method. This is because it's using vectorization, while the &lt;code&gt;apply()&lt;/code&gt; method is looping through the column to create a new column based on the conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pandas sort_values() vs. NumPy sort()
&lt;/h3&gt;

&lt;p&gt;Sorting is a common operation to arrange data in a specific order. In pandas, we use the &lt;code&gt;df.sort_values()&lt;/code&gt; method to sort a DataFrame.&lt;/p&gt;

&lt;p&gt;With the earlier DataFrame, we get the following runtime:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zCsYpkpx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lp9scleen6r2689qzy28.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zCsYpkpx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/lp9scleen6r2689qzy28.png" alt="Image description" width="880" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But NumPy also provides a sorting method. More specifically, it's the &lt;code&gt;np.sort()&lt;/code&gt; method:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--npWC_qr4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xmklh1xv7iln3n2z4idl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--npWC_qr4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xmklh1xv7iln3n2z4idl.png" alt="Image description" width="880" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using NumPy, we first convert the column we want to sort into a NumPy array. Next, we sort the obtained array. Lastly, we create a new DataFrame using the sorted array and rearrange the column entry of the other column.&lt;/p&gt;

&lt;p&gt;The NumPy approach is roughly 33% faster than pandas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best practices for runtime optimization with pandas &amp;amp; NumPy
&lt;/h2&gt;

&lt;p&gt;Optimization is important, but it's also important to proceed with caution.&lt;/p&gt;

&lt;p&gt;It's always a good practice to profile your code first so you're aware of the bottlenecks. If you optimize your code before profiling it, you may end up optimizing the wrong parts, making it even slower.&lt;/p&gt;

&lt;p&gt;The biggest bottlenecks in pandas usually come from looping through a DataFrame. Thus, look for parts that can be replaced with vectorization.&lt;/p&gt;

&lt;p&gt;Additionally, getting support from external libraries can offer many benefits as well. They're often optimized for performance and can be faster than custom implementations.&lt;/p&gt;

&lt;p&gt;Lastly, ensure that you don't compromise the readability of your code for optimization. Optimizing your code can sometimes make it less readable, which makes it harder for other people to understand and maintain it.&lt;/p&gt;

&lt;p&gt;In general, you should only move forward with optimization if you have evidence from profiling that your code is slow and that optimization will lead to a significant performance boost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combine pandas &amp;amp; NumPy with Deepnote
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepnote.com/sign-up"&gt;Get started for free&lt;/a&gt; to explore, collaborate on, and share your data.&lt;/p&gt;

</description>
      <category>pandas</category>
      <category>numpy</category>
      <category>python</category>
      <category>jupyter</category>
    </item>
    <item>
      <title>The good, the bad, &amp; the ugly: how to share Jupyter notebooks</title>
      <dc:creator>Kevin White</dc:creator>
      <pubDate>Wed, 22 Feb 2023 13:53:18 +0000</pubDate>
      <link>https://dev.to/deepnote/the-good-the-bad-the-ugly-how-to-share-jupyter-notebooks-k6d</link>
      <guid>https://dev.to/deepnote/the-good-the-bad-the-ugly-how-to-share-jupyter-notebooks-k6d</guid>
      <description>&lt;p&gt;Why do data teams need to share Jupyter notebooks? Let us count the ways.&lt;/p&gt;

&lt;p&gt;Code reviews, team presentations, delivering insights to non-programmers, general hotdogging (we all need to show off now and then) --- the list goes on and on.&lt;/p&gt;

&lt;p&gt;But &lt;a href="https://deepnote.com/blog/jupyter-notebook-alternative"&gt;sharing a Jupyter notebook is problematic&lt;/a&gt; (i.e., painful). You end up jumping through annoying, time-consuming hoops just to share the results of your analysis. And the workarounds you use often defeat the purpose of sharing your notebook in the first place.&lt;/p&gt;

&lt;p&gt;As with most things in data work, there's a good, bad, and ugly way of getting it done. Let's look at the different ways you can share a Jupyter notebook and weigh the advantages and disadvantages of each approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it means to share
&lt;/h2&gt;

&lt;p&gt;Before we start comparing and contrasting different Jupyter notebook sharing options, let's be clear on what we mean by "share." There's a whole spectrum of sharing, ranging from "I literally just want your eyeballs on this output" to "Go ahead and edit my code."&lt;/p&gt;

&lt;p&gt;Sometimes you want people to run your notebook. Maybe you even want them to be able to toggle a few dropdown menus so they can actually explore the data. Other times it's more of a look-but-don't-touch situation.&lt;/p&gt;

&lt;p&gt;At the end of the day, we share to collaborate. And successful data collaboration is dependent on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Speed (our ability to share work quickly)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reproducibility (our ability to duplicate it)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactivity (our ability to work together on it)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the criteria we'll use to evaluate different sharing options. So, without further ado, here are your choices for sharing a data notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ugly: the file option
&lt;/h2&gt;

&lt;p&gt;Downloading and emailing IPYNB files --- the bane of many a data professional's existence.&lt;/p&gt;

&lt;p&gt;Since Jupyter files run on your local machine, you can't simply send someone a link to your notebook. Instead, you have to download the file (which takes way too long) and send it off to a teammate so they can fire it up on their machine (which also takes way too long).&lt;/p&gt;

&lt;p&gt;Unfortunately, none of your assets are included. Maybe you add your database password to help speed things along (the exact kind of behavior that keeps your security team up at night), but now your colleague has to set up their environment from scratch. Only after the necessary Python packages have been installed and the environment configuration is complete can they run your notebook.&lt;/p&gt;

&lt;p&gt;It's not exactly what you'd call high-speed --- but it's better than exporting your notebook as a static file or copying and pasting screenshots. With those options, reproducibility and interactivity go right out the window.&lt;/p&gt;

&lt;p&gt;Say you need to share your notebook with a non-technical stakeholder --- they're not interested in running it, just getting to the insights you uncovered. You download it as a PDF and send it over. Then they spot a mistake that needs to be fixed. Or they have a follow-up request not covered in your analysis. Or your results lead to another question that needs to be answered.&lt;/p&gt;

&lt;p&gt;No matter the issue, you'll have to go back to your notebook, rerun it, re-export it, resend it, and repeat until the stakeholder is satisfied. On and on and on. You (and your teammates) get stuck in a never-ending loop of busy work that makes exploring and collaborating on data a grind.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bad: the view option
&lt;/h2&gt;

&lt;p&gt;The challenge of sharing a Jupyter notebook is nothing new --- that's why there's a cottage industry built around making it easier to view them.&lt;/p&gt;

&lt;p&gt;GitHub repositories are a great way to organize static data notebooks and make them accessible to teammates, but therein lies the rub: They're static. Notebooks are rendered on GitHub, not run. Reproducing the work, collaborating in real time, commenting --- none of it is an option. Ditto for tools like nbviewer.&lt;/p&gt;

&lt;p&gt;There's also Binder, which claims to help you "turn a Git repo into a collection of interactive notebooks." But "interactive" is a stretch. Once you get past the time it takes to load a repository and the custom libraries you need, you realize notebooks are displayed in isolated environments.&lt;/p&gt;

&lt;p&gt;After you close that URL, the notebook goes poof. You may feel like you're sharing the real deal, but as soon as you want to actually do anything collaborative, it instantly falls apart. Tools like this are good for quickly reproducing a notebook, but interactivity is ultimately an illusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Good: the cloud option
&lt;/h2&gt;

&lt;p&gt;Cloud-based technology allows you to share fully executable notebooks with a link --- no muss, no fuss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gPashRIj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/UgtSogJwQ6DMPU1BFOJt" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gPashRIj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/UgtSogJwQ6DMPU1BFOJt" alt="invite-members.png" title="invite-members.png" width="880" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't the same as JupyterHub, which is a DIY option that requires organizations to install and manage their own Jupyter notebook servers (i.e., you have to manage everything and deal with computing power and storage limits). Cloud-based notebooks are hosted for you, giving you an easy and scalable way to quickly share and reproduce projects.&lt;/p&gt;

&lt;p&gt;But keep in mind that not all cloud-based notebooks are created equal --- the ability to easily share your notebook doesn't always mean you can collaborate on it. Take Google Colab, for instance. You and your teammates can't share the same execution environment simultaneously or leave comments for each other. Each time a person makes an edit and saves the notebook, it reverts to their copy and destroys whatever their colleagues were working on.&lt;/p&gt;

&lt;p&gt;Then there's the question of permissions. Not all cloud-based notebooks have the same level of granularity when it comes to who can access notebooks and how they can use them (e.g., run a notebook but not alter the code).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--I05W8QD0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/aazt9Q1rTF2OakFwOOsq" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--I05W8QD0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/aazt9Q1rTF2OakFwOOsq" alt="access-controls.png" title="access-controls.png" width="880" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your best bet is a cloud-based data notebook that's truly collaborative by design (hint: That's what we made).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DCEJyZYk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/poNLmd1T0Em2TRK0sOAJ" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DCEJyZYk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://media.graphassets.com/poNLmd1T0Em2TRK0sOAJ" alt="sql_python.png" title="sql_python.png" width="880" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Sharing the same environment with collaborators at the same time, complete with database connections and environment configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Editing code with collaborators in real time and leaving comments for each other&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assigning granular access levels to collaborators, from view-only to full code access and everything in between&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Giving collaborators a shared workspace where they can easily store, organize, and find their teammates' notebooks to view, work on, or duplicate a project&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Publishing shareable notebooks as articles, dashboards, and interactive apps with just a click to make sharing insights with stakeholders that much easier&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sharing data notebooks seems like such a simple task, but the truth is it's as complicated as any machine learning model. Combining shared environments and accessibility controls --- and spinning it up in a package that's fast and accessible to people of all technical levels --- is not easy. Some sharing options do a passable imitation, but the devil's in the details.&lt;/p&gt;

&lt;p&gt;Data collaboration is a computational puzzle every team has to solve on its own. Next time you need to share a Jupyter notebook, consider a solution that's built for real teamwork.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Simplify sharing Jupyter notebooks with Deepnote&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepnote.com/sign-in"&gt;Get started for free&lt;/a&gt; to see how easy it is to share and collaborate on data notebooks.&lt;/p&gt;

</description>
      <category>jupyter</category>
      <category>notebooks</category>
      <category>collaboration</category>
    </item>
    <item>
      <title>3 data smells that mean you need a Jupyter notebook alternative</title>
      <dc:creator>Kevin White</dc:creator>
      <pubDate>Mon, 06 Feb 2023 18:39:24 +0000</pubDate>
      <link>https://dev.to/deepnote/3-data-smells-that-mean-you-need-a-jupyter-notebook-alternative-4llj</link>
      <guid>https://dev.to/deepnote/3-data-smells-that-mean-you-need-a-jupyter-notebook-alternative-4llj</guid>
      <description>&lt;p&gt;Jupyter notebooks are one of the world's most popular interactive computing tools (approximately &lt;a href="https://github.com/parente/nbestimate/blob/master/estimate.ipynb" rel="noopener noreferrer"&gt;10 million Jupyter notebooks on GitHub&lt;/a&gt; and counting), but that hasn't stopped businesses from seeking out Jupyter notebook alternatives.&lt;/p&gt;

&lt;p&gt;Why? Because despite Jupyter's omnipresence in the data science community, it simply isn't always the right tool for the job --- especially for data teams.&lt;/p&gt;

&lt;p&gt;We're all familiar with code smells, those pesky warning signs that all is not right with your source code. Let's look at three common circumstances --- what we call data smells --- that indicate it's time to seek out a replacement for your Jupyter notebook.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Your team wastes valuable time trying to reproduce analyses
&lt;/h2&gt;

&lt;p&gt;The results of your analysis don't count if they can't be replicated, but that's not easy with a Jupyter notebook.&lt;/p&gt;

&lt;p&gt;Essential elements of your analysis, such as the environment and required files, are typically bound to your local computer. For your team to reliably reproduce it, they'll need everything --- not just the notebook file itself.&lt;/p&gt;

&lt;p&gt;Team members will have to tediously configure their environments to match yours and make sure they have exactly the same assets if they want to reproduce your analysis.&lt;/p&gt;

&lt;p&gt;Ditto for getting access to the same data sources. That process usually goes something like this: Your teammate hunts down the proper credentials, searches for how to connect their notebook to a specific data source online, reads through the documentation, installs the necessary Python packages, works through the command line, and so on.&lt;/p&gt;

&lt;p&gt;And when it's time for a colleague to duplicate the same setup? Back to step one. Those tight feedback loops you're after remain out of reach with out-of-the-box Jupyter.&lt;/p&gt;

&lt;p&gt;But that's not the case with today's cloud-based notebooks. These Jupyter notebook alternatives allow data teams to share the same execution environment, file system, and data connection simultaneously for real-time collaboration.&lt;/p&gt;

&lt;p&gt;The ability to &lt;a href="https://deepnote.com/blog/how-webflow-optimizes-data-workflows" rel="noopener noreferrer"&gt;run reproducible environments in the cloud&lt;/a&gt; is enough to turn skeptics into advocates.&lt;/p&gt;

&lt;p&gt;"I was personally skeptical about the performance of a collaborative, hosted platform, but when I used Deepnote to collaborate with a colleague and we were able to remotely troubleshoot and try different plots, I immediately saw its value," said Webflow's Senior Manager of Data Science &amp;amp; Analytics Allie Russell.&lt;/p&gt;

&lt;p&gt;Team members can connect to any data source, work in their browser, and&lt;a href="https://deepnote.com/customers/gusto" rel="noopener noreferrer"&gt; share connections and environments&lt;/a&gt; with teammates who have been granted the proper user permissions. It's as easy as adding your credentials or API keys, which are then encrypted and securely stored. What was once a frustratingly convoluted process now only takes a few clicks.&lt;/p&gt;

&lt;p&gt;"We'd had a lot of technical issues when trying to pair up on other notebooks during remote interviews," said Gusto's Product Analytics Lead Becca Carter. "Deepnote was incredibly easy to set up and allows us to start new notebooks in seconds."&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Your team resorts to screenshots and PDFs to communicate findings
&lt;/h2&gt;

&lt;p&gt;Data teams are tasked with uncovering business insights, validating them with their peers, and making them actionable for business stakeholders.&lt;/p&gt;

&lt;p&gt;In other words: Data collaboration is critical.&lt;/p&gt;

&lt;p&gt;But sharing insights isn't so straightforward with Jupyter notebooks. You're usually forced to grab screenshots and port them to a document or download notebook files as static PDFs. These quickly become outdated, and the whole process must be repeated any time there's a follow-up question or data refresh.&lt;/p&gt;

&lt;p&gt;But modern data notebooks can be shared via email invitation or simply by sharing a link --- no different from Google Docs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2Fp7Xx6G6eTdWv8yKvSreP" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2Fp7Xx6G6eTdWv8yKvSreP" title="share-project.png" alt="share-project.png" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This helps&lt;a href="https://deepnote.com/customers/vantai" rel="noopener noreferrer"&gt; support faster, more collaborative iteration&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;"Working in Deepnote is like code review and rapid prototyping at the same time, saving valuable time in the iteration cycles," said VantAI CTO Luca Naef. "But as opposed to code review via GitHub, you have direct access to the runtime and program state, which makes understanding complex models much easier and leads to much more spontaneous creative ideas."&lt;/p&gt;

&lt;p&gt;And when it's time to share your analysis with non-technical teammates, you can &lt;a href="https://deepnote.com/explore" rel="noopener noreferrer"&gt;publish notebooks as code-free articles or interactive applications&lt;/a&gt;. You and your teammates can tag one another and leave each other comments to collaborate in real time or asynchronously.&lt;/p&gt;

&lt;p&gt;It's this ability to better&lt;a href="https://deepnote.com/blog/how-slido-brings-a-product-mindset-to-analytics-engineering" rel="noopener noreferrer"&gt; facilitate collaboration with business stakeholders&lt;/a&gt; that drives many companies to look beyond Jupyter notebooks.&lt;/p&gt;

&lt;p&gt;"Since metrics require a lot of input from subject matter experts, data consumers, and business stakeholders to define and align on definitions, we needed a collaborative layer where we could get immediate feedback," said Slido's Head of Analytics Engineering Michal Koláček.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Your team struggles to work together on interrelated projects
&lt;/h2&gt;

&lt;p&gt;Data notebooks are synonymous with &lt;a href="https://deepnote.com/blog/exploratory-programming" rel="noopener noreferrer"&gt;exploratory programming&lt;/a&gt;, but quick experiments and prototypes shouldn't be seen once and then forgotten. Projects often grow over time, changing hands and evolving alongside the needs of the business. This makes discoverability key.&lt;/p&gt;

&lt;p&gt;But Jupyter notebooks create silos. There's no single place where your analysis is stored, organized, version-controlled, and made easily accessible to the company.&lt;/p&gt;

&lt;p&gt;This is another area where modern notebooks excel. Teams can create workspaces where data science and analytics teams can share analysis (and narratives) with both technical team members and business stakeholders. Notebooks, articles, and applications can all be organized into a customizable folder structure that's tailored to how different teams and companies operate.&lt;/p&gt;

&lt;p&gt;These workspaces act as searchable databases that scale as teams and their projects grow, complete with granular permissions that dictate who can access which projects and what actions they're allowed to take, from viewing to commenting to editing. A project's entire history can be tracked and reviewed, and older versions of notebooks can be previewed and instantly restored.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2F2HxdAqrXRoaUW1Ov87WJ" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2F2HxdAqrXRoaUW1Ov87WJ" title="version-history.png" alt="version-history.png" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This allows data teams to ensure sensitive information stays secure while democratizing data access for both technical and non-technical teams. That's why one Deepnote customer --- a hedge fund with a large data science team --- sought out a Jupyter notebook alternative that would&lt;a href="https://deepnote.com/customers/hedge-fund" rel="noopener noreferrer"&gt; act as a living knowledge base&lt;/a&gt;, not a siloed tool.&lt;/p&gt;

&lt;p&gt;"Notebooks are often used as a quick prototyping tool, but we don't want to create one-off work," the customer said. "We want to invest in ideas that compound over time. Deepnote gives our team one place to create, store ideas, and build on top of the work of others. Visibility goes up over time."&lt;/p&gt;

&lt;p&gt;These data smells aren't deal-breakers for everyone. Plenty of Jupyter notebook users aren't operating in a traditional business environment. Perhaps your work is strictly solo, which makes collaboration challenges a moot point. Or maybe you see the technical limitations as a fair tradeoff for a free tool.&lt;/p&gt;

&lt;p&gt;But every busted workflow and lost second counts for business users. If you work on a team where time is of the essence and collaboration is the norm, consider these smells your cue to find a Jupyter notebook alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explore Deepnote as a Jupyter notebook alternative
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepnote.com/sign-up" rel="noopener noreferrer"&gt;Get started for free&lt;/a&gt; to see how Deepnote compares to a Jupyter notebook.&lt;/p&gt;

</description>
      <category>announcement</category>
      <category>devto</category>
      <category>web3</category>
      <category>blockchain</category>
    </item>
    <item>
      <title>How to query pandas DataFrames with SQL</title>
      <dc:creator>Kevin White</dc:creator>
      <pubDate>Wed, 01 Feb 2023 20:53:51 +0000</pubDate>
      <link>https://dev.to/deepnote/how-to-query-pandas-dataframes-with-sql-2m3o</link>
      <guid>https://dev.to/deepnote/how-to-query-pandas-dataframes-with-sql-2m3o</guid>
      <description>&lt;p&gt;&lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt; is a go-to tool for tabular data management, processing, and analysis in Python, but sometimes you may want to go from pandas to SQL.&lt;/p&gt;

&lt;p&gt;Why? Perhaps you find pandas' syntax intimidating and less intuitive than SQL, which is more beginner-friendly. Or maybe you miss some of the functionalities SQL offers.&lt;/p&gt;

&lt;p&gt;The good news is you can work in Python and still use SQL on a tabular pandas DataFrame.&lt;/p&gt;

&lt;p&gt;Let's look at how to query a pandas DataFrame with SQL using either a Jupyter notebook or Deepnote.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases for using SQL with pandas
&lt;/h2&gt;

&lt;p&gt;Pandas comes with many complex tabular data operations. And, since it exists in a Python environment, it can be coupled with lots of other powerful libraries, such as &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;Requests&lt;/a&gt; (for connecting to other APIs), &lt;a href="https://matplotlib.org/" rel="noopener noreferrer"&gt;Matplotlib&lt;/a&gt; (for plotting data), &lt;a href="https://keras.io/" rel="noopener noreferrer"&gt;Keras&lt;/a&gt; (for training machine learning models), and many more.&lt;/p&gt;

&lt;p&gt;But when it comes to talking to databases, it's not always as powerful as SQL.&lt;/p&gt;

&lt;p&gt;For example, an enterprise company may have a massive database where many tables need to be joined together based on different conditions. Performing conditional joins isn't possible with pandas.&lt;/p&gt;

&lt;p&gt;Additionally, SQL allows you to enforce different data constraints --- such as unique, not null, primary key, etc. --- to make sure your data is consistent and accurate. You can't do that with pandas.&lt;/p&gt;

&lt;p&gt;And if you need to perform advanced aggregations of data (e.g., calculating running totals, moving averages, and percentiles), it can be challenging to write solutions for these in pandas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using SQLAlchemy to query pandas DataFrames in a Jupyter notebook
&lt;/h2&gt;

&lt;p&gt;There are multiple ways to run SQL queries in a Jupyter notebook, but this tutorial will focus on using &lt;a href="https://www.sqlalchemy.org/" rel="noopener noreferrer"&gt;SQLAlchemy&lt;/a&gt; --- a Python library that provides an API for connecting to and interacting with different relational databases, including SQLite, MySQL, and PostgreSQL.&lt;/p&gt;

&lt;p&gt;Using SQLAlchemy, you can retrieve, manipulate, and analyze large sets of data using SQL syntax directly from a Jupyter notebook.&lt;/p&gt;

&lt;p&gt;Other options include the &lt;a href="https://pypi.org/project/pandasql/" rel="noopener noreferrer"&gt;PandaSQL&lt;/a&gt; library, but it's lacking in recent updates and &lt;a href="https://towardsdatascience.com/the-downsides-of-pandasql-that-no-one-talks-about-9b63c664bef4" rel="noopener noreferrer"&gt;has some significant run-time issues&lt;/a&gt; that make it less than ideal. There's also the &lt;a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html" rel="noopener noreferrer"&gt;.query()&lt;/a&gt; method, but this is mostly limited to filtering DataFrames and doesn't provide the exact SQL syntax you may want.&lt;/p&gt;

&lt;p&gt;Compared to these options, SQLAlchemy is more straightforward and intuitive. It's mostly used in an IPython environment, so all you need to get started is a Jupyter notebook.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing SQL Alchemy
&lt;/h3&gt;

&lt;p&gt;To install SQLAlchemy, run the following command:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install sqlalchemy&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Importing SQL Alchemy
&lt;/h3&gt;

&lt;p&gt;To connect to an in-memory database (SQLite, in this case), follow these steps by running commands inside a Jupyter cell:&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: Import SQLAlchemy
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;import sqlalchemy&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 2: Load the SQL extension
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;%load_ext sql&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 3: Create a SQLite connection
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;engine = sqlalchemy.create_engine('sqlite:///mydatabase.db')&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 4: Connect to the SQLite database
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;%sql sqlite:///mydatabase.db&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Running queries with SQLAlchemy
&lt;/h3&gt;

&lt;p&gt;After connecting to an in-memory database, you should store data as tables. To do this, first create a dummy DataFrame:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd

df = pd.DataFrame([["A",1,2],
                   ["B",3,4],
                   ["A",5,6],
                   ["C",7,8],
                   ["A",9,10]], columns = ["colA", "colB", "colC"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, store the DataFrame in the SQLite database:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;df.to_sql('data', con=engine, if_exists='replace')&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
In the above statement, we created table data in the SQLite engine. The &lt;code&gt;if_exists='replace'&lt;/code&gt; argument deletes the current table if it already exists.&lt;/p&gt;

&lt;p&gt;Lastly, we can query the table data using SQL as follows:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;results = %sql SELECT * FROM data WHERE colA = "A";&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
If we view the results, we get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;print(results)
+-------+------+------+------+
| index | colA | colB | colC |
+-------+------+------+------+
|   0   |  A   |  1   |  2   |
|   2   |  A   |  5   |  6   |
|   4   |  A   |  9   |  10  |
+-------+------+------+------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can also convert the results to a pandas DataFrame as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;results.DataFrame()

  index colA colB colC
0   0     A    1    2
1   2     A    5    6
2   4     A    9    10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Deepnote to query pandas DataFrames with SQL
&lt;/h2&gt;

&lt;p&gt;Deepnote comes complete with SQL support for pandas DataFrames --- no downloading, installing, or importing required.&lt;/p&gt;

&lt;p&gt;To start querying a pandas DataFrame using SQL, create a DataFrame as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5ec9oqx5435kw4e4eix.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5ec9oqx5435kw4e4eix.png" alt="Image description" width="800" height="224"&gt;&lt;/a&gt;&lt;br&gt;
Then create a SQL block:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2FWlmwf7sBRcC02NSoGgCL" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2FWlmwf7sBRcC02NSoGgCL" title="sql block create.gif" alt="sql block create.gif" width="1020" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can write any SQL query:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02mec2c3ciy47ro0wir6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02mec2c3ciy47ro0wir6.png" alt="Image description" width="800" height="391"&gt;&lt;/a&gt;&lt;br&gt;
Similar to storing the results in a variable in a Jupyter Notebook, you can store the results in Deepnote as shown:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2F1hP7hegvT2qoUpBPdOpo" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.graphassets.com%2F1hP7hegvT2qoUpBPdOpo" title="df variable.gif" alt="df variable.gif" width="944" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since Deepnote uses &lt;a href="https://github.com/sripathikrishnan/jinjasql" rel="noopener noreferrer"&gt;jinjasql&lt;/a&gt; templating, you can pass Python variables, functions, and control structures (e.g., "if" statements and "for" loops) into your SQL queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best practices for using SQL with pandas
&lt;/h2&gt;

&lt;p&gt;Using SQL with pandas can make data analysis easier, but there are a few caveats.&lt;/p&gt;

&lt;p&gt;Switching to SQL may mean missing out on different optimization techniques that could have been applied using Python. And if you're using SQLAlchemy, remember that it creates an in-memory data set. Therefore, you should only use it when you have sufficient memory.&lt;/p&gt;

&lt;p&gt;Moreover, unlike pandas, which infers the data types by itself, SQL requires explicit specification when creating new tables. To make sure your data is stored and retrieved correctly, it's important to use the appropriate data types for the columns in your DataFrame.&lt;/p&gt;

&lt;p&gt;Lastly, keep in mind that SQL is a powerhouse for databases, not tabular DataFrames sitting in a Python environment. If you can complete your tasks using pandas, you may want to avoid leveraging SQL unnecessarily (unless it offers some major run-time benefits).&lt;/p&gt;

&lt;p&gt;So there you have it --- you're ready to query pandas DataFrames with SQL.&lt;/p&gt;

&lt;p&gt;As this tutorial demonstrated, using SQL with a Jupyter notebook requires multiple steps, which can be quite time-consuming. If you want to integrate SQL with an interactive Python environment minus the extra dependencies and effort, give Deepnote a try.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combine pandas &amp;amp; SQL with Deepnote
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepnote.com/sign-up" rel="noopener noreferrer"&gt;Get started for free&lt;/a&gt; to explore, collaborate on, and share your data.&lt;/p&gt;

</description>
      <category>blockchain</category>
      <category>web3</category>
      <category>crypto</category>
      <category>offers</category>
    </item>
  </channel>
</rss>
