<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lucas Miranda</title>
    <description>The latest articles on DEV Community by Lucas Miranda (@lucaslm).</description>
    <link>https://dev.to/lucaslm</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1106776%2Fd5d7f5c2-e092-4140-b83c-8469da9bdb87.jpeg</url>
      <title>DEV Community: Lucas Miranda</title>
      <link>https://dev.to/lucaslm</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lucaslm"/>
    <language>en</language>
    <item>
      <title>Spark AI - Bringing Chat GPT to Data Engineering</title>
      <dc:creator>Lucas Miranda</dc:creator>
      <pubDate>Sat, 08 Jul 2023 23:15:49 +0000</pubDate>
      <link>https://dev.to/lucaslm/spark-ai-bringing-chat-gpt-to-data-engineering-2jd4</link>
      <guid>https://dev.to/lucaslm/spark-ai-bringing-chat-gpt-to-data-engineering-2jd4</guid>
      <description>&lt;p&gt;Chat GPT has brought a sea of possibilities with his huge capacity to understand human language. Since OpenAI opened GPT model through Rest API for developers, a lot of those possibilites started to become reality, like &lt;a href="https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/" rel="noopener noreferrer"&gt;Bing integrating GPT&lt;/a&gt; - an extension of Microsoft's search tool, or &lt;a href="https://github.com/Significant-Gravitas/Auto-GPT" rel="noopener noreferrer"&gt;Auto-GPT&lt;/a&gt; - "An experimental open-source attempt to make GPT-4 fully autonomous". &lt;br&gt;
And now - more precisely at June 29 2023, a new &lt;a href="https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark" rel="noopener noreferrer"&gt;post on Databricks' Blog&lt;/a&gt; has introduced pyspark-ai, "The English SDK for Apache Spark". It brings a nice API over our known &lt;a href="https://www.simplilearn.com/tutorials/pyspark-tutorial/pyspark-dataframe" rel="noopener noreferrer"&gt;PySpark DataFrames&lt;/a&gt; allowing us to load data from web (like a &lt;a href="https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/" rel="noopener noreferrer"&gt;web scraping&lt;/a&gt;) into a dataframe, perform transformations, run assertions about the data, describe and plot different views of the dataset. Everything with natural language. Let's see some examples (from the original article):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ingest data
&lt;/span&gt;&lt;span class="n"&gt;auto_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark_ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_df&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.carpro.com/blog/full-year-2022-national-auto-sales-by-brand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;auto_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;rank&lt;/th&gt;
&lt;th&gt;brand&lt;/th&gt;
&lt;th&gt;us_sales_2022&lt;/th&gt;
&lt;th&gt;sales_change_vs_2021&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;1849751&lt;/td&gt;
&lt;td&gt;-9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;1767439&lt;/td&gt;
&lt;td&gt;-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Chevrolet&lt;/td&gt;
&lt;td&gt;1502389&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;881201&lt;/td&gt;
&lt;td&gt;-33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Hyundai&lt;/td&gt;
&lt;td&gt;724265&lt;/td&gt;
&lt;td&gt;-2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# plot
&lt;/span&gt;&lt;span class="n"&gt;auto_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# with instructions
&lt;/span&gt;&lt;span class="n"&gt;auto_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pie chart for US sales market shares, show the top 5 brands and the sum of others&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf9x5yceqwn9mxj16fh4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf9x5yceqwn9mxj16fh4.png" alt="Pie Chart plot showing cars' US sales market shares, generated by AI"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# transformations
&lt;/span&gt;&lt;span class="n"&gt;auto_top_growth_df&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;auto_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brand with the highest growth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;auto_top_growth_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;brand&lt;/th&gt;
&lt;th&gt;us_sales_2022&lt;/th&gt;
&lt;th&gt;sales_change_vs_2021&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cadillac&lt;/td&gt;
&lt;td&gt;134726&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# validation
&lt;/span&gt;&lt;span class="n"&gt;auto_top_growth_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expect sales change percentage to be between -100 to 100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# outputs True
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SparkAI also provides a cool API to UDFs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@spark_ai.udf&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;previous_years_sales&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_year_sale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sales_change_percentage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate previous years sales from sales change percentage&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks amazing, right? If you want to give it a shot, I have built a CLI on top of pyspark-ai and you can run it interactively. Check it out: &lt;a href="https://github.com/lucas-lm/spark-ai-cli" rel="noopener noreferrer"&gt;https://github.com/lucas-lm/spark-ai-cli&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  PySparkAI CLI
&lt;/h2&gt;

&lt;p&gt;Let's suppose we want to check what are the top 3 repositories more stared in the google topic on github (&lt;a href="https://github.com/topics/google" rel="noopener noreferrer"&gt;https://github.com/topics/google&lt;/a&gt;). Using PySpark AI CLI we could run the command in shell to get this view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pyspark-ai https://github.com/topics/google &lt;span class="nt"&gt;--transform&lt;/span&gt; &lt;span class="s2"&gt;"top 3 python repos with more stars"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below results were produced by the command above using gpt-3-turbo from OpenAI as our LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6cjeihkw0fe4tg23a07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6cjeihkw0fe4tg23a07.png" alt="CLI Output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see, it is achieving satisfactory results, but there is some mistakes like the wrong table name and the lower case in the filter when the values in the dataframe are Title Case.&lt;br&gt;
As of today, pyspark-ai is still in early stage development and this kind of gap is expected. &lt;/p&gt;

&lt;p&gt;Nevertheless, it has a great potential to become a tool for study and exploration of some datasets.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note about pyspark-ai-cli:&lt;br&gt;
The plot feature is not supported because &lt;code&gt;pyspark-ai&lt;/code&gt; enforces plotly as its visualization library (in &lt;code&gt;spark_ai.plot&lt;/code&gt; function), which does not display any figure when running from a terminal (&lt;a href="https://github.com/plotly/plotly_express/issues/47" rel="noopener noreferrer"&gt;https://github.com/plotly/plotly_express/issues/47&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you want to get started with PySparkAI CLI, check the instructions in the &lt;a href="https://github.com/lucas-lm/spark-ai-cli" rel="noopener noreferrer"&gt;public repository&lt;/a&gt;. If you are more interested in the pyspark-ai features, check it out on &lt;a href="https://github.com/databrickslabs/pyspark-ai" rel="noopener noreferrer"&gt;github repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  PySpark-AI under the hood
&lt;/h2&gt;

&lt;p&gt;If you take a quick look on &lt;a href="https://github.com/databrickslabs/pyspark-ai" rel="noopener noreferrer"&gt;spark-ai source code&lt;/a&gt;, you will notice that it follows a pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your input from methods (transform, plot, verify etc.) is caught&lt;/li&gt;
&lt;li&gt;Your input is used to compound a prompt template&lt;/li&gt;
&lt;li&gt;This prompt is processed by some LLM (commonly with GPT Rest API)&lt;/li&gt;
&lt;li&gt;The output of this prompt is parsed to extract the code blocks&lt;/li&gt;
&lt;li&gt;Code blocks are executed in the &lt;code&gt;exec&lt;/code&gt; python's function&lt;/li&gt;
&lt;li&gt;The results that matters from the execution are returned&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This high-level overview recalls a little bit the illustration given in the Databricks blog post (&lt;a href="https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark):" rel="noopener noreferrer"&gt;https://www.databricks.com/blog/introducing-english-new-programming-language-apache-spark):&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo6er4u3iu73wbo875at.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmo6er4u3iu73wbo875at.png" alt="PySpark AI Diagram: english language is processed by a LLM and results in pyspark code. Source: Databricks Blog"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h2&gt;
  
  
  The Downside
&lt;/h2&gt;

&lt;p&gt;Before talking about the cons of the "English SDK", we have to point out that the library is under the &lt;a href="https://github.com/databrickslabs" rel="noopener noreferrer"&gt;databrickslabs&lt;/a&gt; organization on github, which is a huge indicative that it is something experimental and it is not meant to be handled as a reliable product, and of course, is not ready for production environments.&lt;/p&gt;

&lt;p&gt;What scares me the most about the approach embraced in PySpark-AI is that we do not have control over the code that is running. Even though we can see some logs to understand the code generated, we do not have the chance to assess that code before running it.&lt;br&gt;
Even before we had advanced generative AIs as we have nowadays, &lt;code&gt;exec&lt;/code&gt; and &lt;code&gt;eval&lt;/code&gt; are functions highly avoided due to the &lt;a href="https://realpython.com/python-exec/#uncovering-and-minimizing-the-security-risks-behind-exec" rel="noopener noreferrer"&gt;inherent security risks&lt;/a&gt; they carry.&lt;/p&gt;

&lt;p&gt;Another problem that comes as a consequence of this dynamic execution is the side effects. We can not trust that the code generated will be always the same given the same input. Relying on a third party service to give us the output can also be problematic, because we may face instabilities, increases in the latency among other undesirable situations. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwciiumhwm2rrid8bf2b2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwciiumhwm2rrid8bf2b2.png" alt="Execution failure"&gt;&lt;/a&gt;&lt;br&gt;
Code generated by GPT (model gpt-3.5-turbo) running error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;PySpark-AI - or "English SDK" as it is being introduced, brought an innovative design with a nice API to work with pyspark covering a good variety of operations. &lt;/p&gt;

&lt;p&gt;It is easy to get started, can be useful for beginners and non-technical users would feel more comfortable to try it as well. &lt;/p&gt;

&lt;p&gt;It is not so reliable though. Even if we have future enhancements being applied, I myself can not see this kind of solution becoming safe and stable enough to be applied at scale and/or in a real-world production environment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>bigdata</category>
      <category>python</category>
    </item>
    <item>
      <title>Python Virtual Environments</title>
      <dc:creator>Lucas Miranda</dc:creator>
      <pubDate>Tue, 27 Jun 2023 00:20:47 +0000</pubDate>
      <link>https://dev.to/lucaslm/python-virtual-environments-4o5g</link>
      <guid>https://dev.to/lucaslm/python-virtual-environments-4o5g</guid>
      <description>&lt;p&gt;Packaging and modularity are great features present in every relevant programming language, as it allows us, developers, to easily reuse recurrent code snippets (the modules). &lt;/p&gt;

&lt;p&gt;In Python we can add packages as dependencies of our project by installing them with some package manager like pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.30.0
pip &lt;span class="nb"&gt;install &lt;/span&gt;pytest pytest-cov
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the example we are installing the library &lt;code&gt;requests&lt;/code&gt; version 2.30.0 in the first line and in the second line we are installing the libraries &lt;code&gt;pytest&lt;/code&gt; and &lt;code&gt;pytest-cov&lt;/code&gt;. As we are not specifying any version for the packages pytest and pytest-cov, we will have the latest versions installed.&lt;/p&gt;

&lt;p&gt;That is pretty cool, but a problem arises when working on multiple projects simultaneously or sharing projects with others, or even if we use the machine to study. The management of project-specific packages becomes challenging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Downside of global installations (non virtual environments)
&lt;/h2&gt;

&lt;p&gt;Before we get started with virtual environments, let's picture one situation.&lt;br&gt;
Let's say you have started in a new project, and you have to install &lt;code&gt;pandas&lt;/code&gt;, &lt;code&gt;requests&lt;/code&gt; and &lt;code&gt;jinja&lt;/code&gt; for it. You will simply run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;Jinja2 requests pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, you are going to share this project with someone else, and of course, you will use git to version it and send it to your remote repository like github. Now, other people need to contribute with your project, but when they try to run it, all they see is an error message saying "no module named 'pandas'", which makes sense, since you are sharing only your source code in the repository and not your entire setup (python, packages installed, environment etc.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9nli2d7v93fc9am3dt1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9nli2d7v93fc9am3dt1.png" alt="Screenshot showing error no module named pandas"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To solve this problem you could simply give instructions to people working in your project to run &lt;code&gt;pip install Jinja2 requests pandas&lt;/code&gt;. But that does not completely resolve the problem. Some people might already have some different version of requests installed in their machines, which would not raise the same error as before, but it could raise different errors related to incompatibilities between the different versions used - and now you say "it works on my machine!"&lt;/p&gt;

&lt;p&gt;So you will need a way to share the exactly same versions you are using in the current project with people to make it work properly. Fortunately, python comes with a solution for that: the subcommand &lt;code&gt;freeze&lt;/code&gt; from &lt;code&gt;pip&lt;/code&gt;. With this command you can output all your dependencies (including the depencies of your original dependencies - dependency graph) with their specified versions and then you can use this output to create a text file, which can be used to install the correct dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip freeze &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above command is a common pattern in python to create a &lt;code&gt;requirements.txt&lt;/code&gt; file with the dependencies you have installed so far (it "freezes" them with current installed versions). Then, when you want to setup the project in a new machine, you can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will install everything you have in your &lt;code&gt;requirements.txt&lt;/code&gt; file. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Make sure to keep this file on root of your source code for convenience.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now we have a short command to install every single dependency and we just have to ensure that &lt;code&gt;requirements.txt&lt;/code&gt; is up-to-date by running &lt;code&gt;pip freeze &amp;gt; requirements.txt&lt;/code&gt; every time we include a new dependency to our project, right? Not exactly. We still have a problem.&lt;/p&gt;

&lt;p&gt;Remember when we first installed &lt;code&gt;requests 2.30&lt;/code&gt;, &lt;code&gt;pytest&lt;/code&gt; and &lt;code&gt;pytest-cov&lt;/code&gt; at zthe beggining of this article? Those libraries are not part of the dependencies of the project we have started later, but you will notice that your &lt;code&gt;requirements.txt&lt;/code&gt; still includes them (and all their dependency tree). Take a look:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgoe4g9vmj55gfqd7uqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzgoe4g9vmj55gfqd7uqk.png" alt="Content of the requirements file highlighting requests and pytest dependencies"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can be a problem, because it will install non-required packages for the project and it might even affect the way our dependencies will be resolved by &lt;code&gt;pip install&lt;/code&gt;. Furthermore, imagine the mess we will have when we need to work with different versions of the same package for distinct projects in the same machine...&lt;/p&gt;

&lt;p&gt;To overcome this problem, we can use python's virtual environments. &lt;/p&gt;

&lt;h2&gt;
  
  
  Python's venv
&lt;/h2&gt;

&lt;p&gt;Virtual environment in python - venv for short - is a resource to isolate your project context (python interpreter, libraries installation etc.) from your global python configuration. In practice, the packages installed under a virtual environment does not conflict with packages installed globally in your machine. It is a best practice to have different venvs for each project you are working on. Next we will see a step-by-step guide to use and understand venv.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create venv
&lt;/h3&gt;

&lt;p&gt;Python comes with a builtin module to create virtual environment. It is called &lt;code&gt;venv&lt;/code&gt;, and you can use it to create a new virtual environment just like that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="c"&gt;# note: in your system, python command may be under other name&lt;/span&gt;
&lt;span class="c"&gt;# like python3 or just py instead of python...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the above command, &lt;code&gt;python -m venv&lt;/code&gt; is the command to create a new virtual environment and the last &lt;code&gt;venv&lt;/code&gt; is just the name of your virtual environment. It will be used to create the local folder containing the python interpreter, libraries installed etc. It could be any other name, &lt;code&gt;venv&lt;/code&gt; is the most commonly used name though.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv47w1sxqhdb4y9sj9fby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv47w1sxqhdb4y9sj9fby.png" alt="Terminal with virtual environment creation command"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;br&gt;
Do not worry about the folder &lt;code&gt;venv&lt;/code&gt; generated by this command.&lt;br&gt;
It will be managed by python and you will never have to touch it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your venv is now created, but it is not active yet, so you are still in your regular environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Activate venv
&lt;/h3&gt;

&lt;p&gt;Now that we have a venv for the project, we can activate it to take advantage of its benefits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Linux and MacOS:&lt;/span&gt;
&lt;span class="nb"&gt;source &lt;/span&gt;myvenv/bin/activate

&lt;span class="c"&gt;# Windows&lt;/span&gt;
.&lt;span class="se"&gt;\v&lt;/span&gt;&lt;span class="nb"&gt;env&lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will notice &lt;code&gt;venv&lt;/code&gt; - or whatever name you gave to your venv - appears in green on your command line, indicating that your virtual environment was succesfully activated and now you are running on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9m9h5ynp8bi9h7lj0v0u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9m9h5ynp8bi9h7lj0v0u.png" alt="Terminal after run the activate script"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Do not worry about your shell configuration, it remains mostly unmodified. Your python interpreter changes to the one contained in your venv though.&lt;/p&gt;

&lt;h3&gt;
  
  
  Working on venv
&lt;/h3&gt;

&lt;p&gt;First thing to notice is that you do not have access to the packages and modules that you have installed globally. Trying to import pandas you raise &lt;code&gt;ModuleNotFoundError&lt;/code&gt; for example. It makes sense, we are in a brand new environment right now.&lt;/p&gt;

&lt;p&gt;Let's try to re-install the dependencies we need and freeze them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;Jinja2 requests pandas
pip freeze &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that you can play around with &lt;code&gt;python&lt;/code&gt; and &lt;code&gt;pip&lt;/code&gt; commands just like we do outside of the virtual environment. The difference is that &lt;code&gt;pip&lt;/code&gt; and &lt;code&gt;python&lt;/code&gt; source here is the venv instead of your global installation.&lt;/p&gt;

&lt;p&gt;Look at the difference between the previous &lt;code&gt;requirements.txt&lt;/code&gt; (on the left) against the new one generated in the virtual environment (on the right):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6hhjswavj6ujes8zruu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6hhjswavj6ujes8zruu.png" alt="Comparison between the requirements file before and after virtual environment"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From 22 we went to 13 dependencies. This is a reduction of 9 dependencies. Basically 40% less dependencies than we had before!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;br&gt;
When you close your terminal or your IDE, your virtual environment will be deactivated.&lt;br&gt;
So you have to activate it every time you open it again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Deactivate venv
&lt;/h3&gt;

&lt;p&gt;Whenever you need to deactivate your venv to get back to your "global environment", you can simply run the deactivate script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;deactivate 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Further considerations
&lt;/h2&gt;

&lt;p&gt;Perfect, now we can work with virtual environments to have an isolated context for each project we are working on, but there is still some last observations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Git
&lt;/h3&gt;

&lt;p&gt;The virtual environment directory can contain a lot of files and it tends to be large in size. Also, the package manager might not work the same way in different machines, so you should include your &lt;code&gt;venv&lt;/code&gt; folder in &lt;code&gt;.gitignore&lt;/code&gt; file when working with git. This way you keep only the source code in the remote repository and the &lt;code&gt;venv&lt;/code&gt; can be reproduced by anyone and the package manager will do the best to install the dependencies in the right way for them. &lt;/p&gt;

&lt;h3&gt;
  
  
  CI/CD pipelines
&lt;/h3&gt;

&lt;p&gt;In CI/CD pipelines usually we already have a fully isolated environment, so there is no need to add commands in your scripts to create and activate the virtual environment before installing the dependencies. You can go straight forward to the installation and execution of your python script in these cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  VSCode Users
&lt;/h3&gt;

&lt;p&gt;There is some cool extensions for python in VSCode, but the essential one, in my opinion, is the Python from Microsoft.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1izay6c4jukcko78ezc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1izay6c4jukcko78ezc.png" alt="Python extension on VSCode extensions store"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This extension brings features like intellisense, debugging and so on.&lt;br&gt;
In order to take advantage of the features of this extension, you have to make sure that the python interpreter currently selected is the one of your venv (assuming that you are working in a venv).&lt;/p&gt;

&lt;p&gt;Let's suppose I have added a new dependency to my project (only on venv)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;matplotlib
pip freeze &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is what will happen if we are NOT selecting the correct python interpreter:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zo2ych7blhq7j8hec4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zo2ych7blhq7j8hec4s.png" alt="VSCode alerting matplotlib was not found"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that we have matplotlib installed in our venv, but we still get a warning saying that matplotlib could not be resolved. Look what we get if we try to run it from the VSCode "run button":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feoak9r5t2ra4ht2fubpc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feoak9r5t2ra4ht2fubpc.png" alt="VSCode finishing python script execution with error on run with play button"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we try to run from the terminal with &lt;code&gt;venv&lt;/code&gt; active:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1a265510z4q7f7qrheg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl1a265510z4q7f7qrheg.png" alt="Execution with success on run in command line with virtual environment active"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That means that VSCode is using the wrong interpreter. To fix that is very easy. You just have to hit in the right bottom corner the Python interpreter (marked in red in the illustration) and in the upper panel that will open, select the correct interpreter (you can identify the right one by the path - also marked in red).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nj174s6jm58fm06ka3h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4nj174s6jm58fm06ka3h.png" alt="Selecting the correct python interpreter in VSCode config"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, the warnings will disappear, your intellisense will work fine and the run button will not result in error anymore:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwn0b1e4iuc9dt7gvxy4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwn0b1e4iuc9dt7gvxy4z.png" alt="VSCode working fine after selecting the correct interpreter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is also interesting to notice that in the bottom right corner will be showing your &lt;code&gt;venv&lt;/code&gt; interpreter.&lt;/p&gt;




&lt;p&gt;In conclusion, virtual environments are a powerful tool for managing dependencies and creating isolated project contexts in Python. They help ensure consistent installations and prevent conflicts between different projects.&lt;/p&gt;

&lt;p&gt;If you want to go deeper in virtual environments, check also &lt;a href="https://realpython.com/python-virtual-environments-a-primer/" rel="noopener noreferrer"&gt;https://realpython.com/python-virtual-environments-a-primer/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>setup</category>
      <category>environment</category>
      <category>development</category>
    </item>
  </channel>
</rss>
