<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: orlando ramirez</title>
    <description>The latest articles on DEV Community by orlando ramirez (@oramirezperera).</description>
    <link>https://dev.to/oramirezperera</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F160655%2Feef6ca66-f468-47ef-a42b-2ef0f745235a.jpeg</url>
      <title>DEV Community: orlando ramirez</title>
      <link>https://dev.to/oramirezperera</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oramirezperera"/>
    <language>en</language>
    <item>
      <title>Building an ELT Pipeline with Python and SQL Server: A Netflix Dataset Walkthrough</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Mon, 07 Apr 2025 16:03:56 +0000</pubDate>
      <link>https://dev.to/oramirezperera/building-an-elt-pipeline-with-python-and-sql-server-a-netflix-dataset-walkthrough-3hje</link>
      <guid>https://dev.to/oramirezperera/building-an-elt-pipeline-with-python-and-sql-server-a-netflix-dataset-walkthrough-3hje</guid>
      <description>&lt;p&gt;&lt;strong&gt;Hi! In today’s article, we’ll walk through a small ELT project, revisiting each step of the process and diving deep into the details.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, hop on and let’s get started!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.wikia.nocookie.net%2Fevangelion%2Fimages%2Fd%2Fdc%2FNGE01_30.png%2Frevision%2Flatest%2Fscale-to-width-down%2F1000%3Fcb%3D20190901081329" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.wikia.nocookie.net%2Fevangelion%2Fimages%2Fd%2Fdc%2FNGE01_30.png%2Frevision%2Flatest%2Fscale-to-width-down%2F1000%3Fcb%3D20190901081329" alt="Image of Misato Katsuragi picking up Shinji in a blue car" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Extract
&lt;/h2&gt;

&lt;p&gt;As always, our first step is to &lt;strong&gt;extract&lt;/strong&gt; or find the data we want to work with. For this project, we’ll be using the &lt;strong&gt;Netflix Movies and TV Shows dataset&lt;/strong&gt;, which you can find on Kaggle.&lt;br&gt;
 Here’s the link: &lt;a href="https://www.kaggle.com/datasets/shivamb/netflix-shows" rel="noopener noreferrer"&gt;Netflix Movies and TV Shows&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Load
&lt;/h2&gt;

&lt;p&gt;Now, we move on to the &lt;strong&gt;load&lt;/strong&gt; step, where we import the dataset into a database. For this example, I’m using &lt;strong&gt;Microsoft SQL Server&lt;/strong&gt; running on Windows 11 via &lt;strong&gt;SQL Server Management Studio 20&lt;/strong&gt;. All scripts and code are being executed from &lt;strong&gt;Ubuntu 24.04&lt;/strong&gt; using &lt;strong&gt;WSL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Although everything is running on the same computer, they’re different environments. To connect them, I set up communication through IP which meant opening the necessary ports in SQL Server to allow access from the Ubuntu WSL partition.&lt;/p&gt;

&lt;p&gt;Then, I created a &lt;strong&gt;Python script&lt;/strong&gt; to load the CSV file (downloaded from Kaggle) into SQL Server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./data/netflix_titles.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;netflix_raw&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The connection logic is separated into another Python file where I manage the database credentials and connection string.&lt;br&gt;
Once the script is executed, we can check the table created by Python. Here's the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;].[&lt;/span&gt;&lt;span class="n"&gt;netflix_raw&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;date_added&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;release_year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;listed_in&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we can see, the initial schema is decent, but we’re allocating more space than necessary in each column by using max lengths. To optimize, we can use a Jupyter Notebook to inspect the actual maximum lengths in each column and adjust accordingly.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;max &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After checking all columns, we get a more appropriate schema. For instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;release_year&lt;/code&gt; changes from &lt;code&gt;BIGINT&lt;/code&gt; to &lt;code&gt;INT&lt;/code&gt; since we’re only dealing with year values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;title&lt;/code&gt; changes from &lt;code&gt;VARCHAR&lt;/code&gt; to &lt;code&gt;NVARCHAR&lt;/code&gt; this allows us to store foreign-language titles properly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, we can also check for duplicates in the &lt;code&gt;show_id&lt;/code&gt; column, since we plan to use it as our &lt;code&gt;PRIMARY KEY&lt;/code&gt;. We confirm there are no duplicates, so we can safely assign it as such.&lt;/p&gt;

&lt;p&gt;Here's the new Schema:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuo61ywn7omq4bkyr99b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuo61ywn7omq4bkyr99b.png" alt="Image showing netflix_raw table new schema" width="308" height="563"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;nvarchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;date_added&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;release_year&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;listed_in&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, we:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Drop the table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recreate it with the updated schema.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update our Python script to change the &lt;code&gt;if_exists&lt;/code&gt; parameter to &lt;code&gt;'append'&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./data/netflix_titles.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;netflix_raw&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;if_exists&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run the script again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftc1fmwwftefk7a8dj2h8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftc1fmwwftefk7a8dj2h8.gif" alt="Image of an old windows copying files from C: to D:" width="381" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we're ready for the next step in our journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transform
&lt;/h2&gt;

&lt;p&gt;Next comes the &lt;strong&gt;transformation&lt;/strong&gt; step in our ELT pipeline.&lt;/p&gt;

&lt;p&gt;We start by checking for &lt;strong&gt;duplicate records&lt;/strong&gt;, using the &lt;code&gt;title&lt;/code&gt; column. In doing so, we identify a few &lt;code&gt;show_id&lt;/code&gt; 's that share the same title. &lt;/p&gt;

&lt;p&gt;These are the &lt;code&gt;show_id&lt;/code&gt; 's:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s2639
s8775
s7102
s4915
s5319
s5752
s6706
s304
s5034
s5096
s160
s7346
s8023
s1271
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On closer inspection, we notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Some are TV shows and movies with the same name.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Others are remakes or reboots with different casts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A couple are identical entries, with the only difference being the &lt;code&gt;date_added&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the final &lt;code&gt;show_id&lt;/code&gt; with issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s6706
s304
And
s160
s7346
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For these cases, we keep the latest version  assuming the older entry was removed and later re-added to the platform.&lt;/p&gt;

&lt;p&gt;We exclude the older versions and re-query the dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;netflix_raw&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'s160'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s304'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we analyze columns like &lt;code&gt;listed_in&lt;/code&gt;, &lt;code&gt;director&lt;/code&gt;, &lt;code&gt;country&lt;/code&gt;, and &lt;code&gt;cast&lt;/code&gt;. These contain &lt;strong&gt;comma-separated lists&lt;/strong&gt;, which are fine for production storage but not great for analysis.&lt;/p&gt;

&lt;p&gt;To make them more useful for analytics, we split them into &lt;strong&gt;separate tables&lt;/strong&gt;, creating &lt;strong&gt;one row per entry&lt;/strong&gt; (e.g., one director per row, tied to the corresponding &lt;code&gt;show_id&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This is done using SQL queries like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genres&lt;/span&gt;
&lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;netflix_genres&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;netflix_raw&lt;/span&gt;
&lt;span class="k"&gt;cross&lt;/span&gt; &lt;span class="n"&gt;apply&lt;/span&gt; &lt;span class="n"&gt;string_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listed_in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'s160'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s304'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once done, we’ll have clean relational tables for &lt;code&gt;directors&lt;/code&gt;, &lt;code&gt;genres&lt;/code&gt;, &lt;code&gt;countries&lt;/code&gt;, and &lt;code&gt;casts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here's an image of how our Schema is with these changes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52yr84v10tuwqzjbk82i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52yr84v10tuwqzjbk82i.png" alt="Image of the new Database Schema with the new tables we created" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Missing Values
&lt;/h2&gt;

&lt;p&gt;We then check for &lt;strong&gt;null values&lt;/strong&gt;. For example, the &lt;code&gt;country&lt;/code&gt; column has several missing entries. There are multiple ways to handle this, depending on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The time you have.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The source and reliability of the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Whether you understand &lt;strong&gt;why&lt;/strong&gt; the data is missing (this is crucial in production).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this example, let’s try an approximation: if a &lt;strong&gt;director&lt;/strong&gt; has another movie with a known country, we’ll use that to fill in the missing country values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;netflix_countries&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;netflix_raw&lt;/span&gt; &lt;span class="n"&gt;nr&lt;/span&gt;
&lt;span class="k"&gt;inner&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;netflix_countries&lt;/span&gt; &lt;span class="n"&gt;nc&lt;/span&gt;
        &lt;span class="k"&gt;inner&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;netflix_directors&lt;/span&gt; &lt;span class="n"&gt;nd&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;nc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show_id&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;nr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;nr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also found 3 nulls in the &lt;code&gt;duration&lt;/code&gt; column. Interestingly, for &lt;code&gt;show_id&lt;/code&gt; 's s5542, s5795, and s5814, the &lt;strong&gt;duration value is incorrectly placed in the &lt;code&gt;rating&lt;/code&gt; column&lt;/strong&gt;  likely a data formatting issue.&lt;/p&gt;

&lt;p&gt;We also notice that the &lt;code&gt;date_added&lt;/code&gt; column is stored as a &lt;strong&gt;VARCHAR&lt;/strong&gt;, but we need it as a proper &lt;strong&gt;DATE&lt;/strong&gt; type.&lt;/p&gt;

&lt;p&gt;To fix all of this, we:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create a new &lt;strong&gt;intermediate table&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Eliminate columns we already separated into other tables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transform &lt;code&gt;date_added&lt;/code&gt; to a date type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fix the duration issue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove duplicated entries.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this is done with a SQL query like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date_added&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;date_added&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;release_year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; 
            &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;
            &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;description&lt;/span&gt;
            &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;netflix_intermediate&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;netflix_raw&lt;/span&gt; 
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;show_id&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'s160'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s304'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this, we now have a &lt;strong&gt;clean, optimized intermediate table&lt;/strong&gt;, ready for insights and exploratory analysis.&lt;/p&gt;

&lt;p&gt;This is the final Schema for now:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos0c4lrfp9hajkqzbiyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos0c4lrfp9hajkqzbiyh.png" alt="Image of the final Schema with the changes on the columns of netflix_raw and the creation of all the other tables" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can go even further, like filling remaining nulls, building dashboards, and running deeper analytics. I’ll cover those in &lt;strong&gt;future articles&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Want to try this yourself? Download the dataset and follow along on your own!&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This was a comprehensive walk-through of an ELT process, from raw data to a clean database schema ready for analytics.&lt;/p&gt;

&lt;p&gt;We extracted the data, loaded it into a database, transformed it for analytics, and prepped it for insight generation.&lt;/p&gt;

&lt;p&gt;In upcoming posts, I’ll focus on &lt;strong&gt;data visualization&lt;/strong&gt;, &lt;strong&gt;insight generation&lt;/strong&gt;, and maybe even using &lt;strong&gt;BI tools&lt;/strong&gt; to create dashboards from this dataset.&lt;/p&gt;

&lt;p&gt;Thanks for reading — stay tuned!&lt;/p&gt;

&lt;p&gt;Have questions about this process or want to share your own approach? Drop a comment or reach out. I love talking about data!&lt;/p&gt;

&lt;p&gt;In the next part, I’ll show you how to create stunning visualizations and dashboards using this cleaned dataset.&lt;/p&gt;

&lt;p&gt;Subscribe or follow me so you don’t miss it!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/datasets/shivamb/netflix-shows" rel="noopener noreferrer"&gt;Netflix Movies and TV Shows Dataset – Kaggle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.microsoft.com/en-us/sql-server/sql-server-downloads" rel="noopener noreferrer"&gt;Microsoft SQL Server Download&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/docs/" rel="noopener noreferrer"&gt;Pandas Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.sqlalchemy.org/en/20/" rel="noopener noreferrer"&gt;SQLAlchemy Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/oramirezperera/netflix_data_transformation" rel="noopener noreferrer"&gt;This project GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=ZnQwO6V7pec" rel="noopener noreferrer"&gt;Netflix Data Cleaning and Analysis Project (Ankit Bansal)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>sql</category>
      <category>elt</category>
    </item>
    <item>
      <title>OLAP vs OLTP: The war that is not meant to be</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Tue, 25 Mar 2025 20:06:12 +0000</pubDate>
      <link>https://dev.to/oramirezperera/olap-vs-oltp-the-war-that-is-not-meant-to-be-516b</link>
      <guid>https://dev.to/oramirezperera/olap-vs-oltp-the-war-that-is-not-meant-to-be-516b</guid>
      <description>&lt;p&gt;In previous articles, we covered what &lt;a href="https://dev.to/oramirezperera/oltp-explained-speed-integrity-and-high-availability-in-databases-3n67"&gt;OLTP&lt;/a&gt; and &lt;a href="https://dev.to/oramirezperera/olap-unlocking-the-power-of-analytical-data-processing-5edd"&gt;OLAP&lt;/a&gt; are, their benefits, key features, and use cases.&lt;br&gt;
This might make it seem like there's a 'fight' or 'battle' between these technologies, but that’s not the case.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia2.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExbzk2bjlzYWpia2hjd3UxbjN1aDVlYWc4bGN0ZTF0MHVmN2J6YXNiciZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2Fu4th4weIsJfuE%2Fgiphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia2.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExbzk2bjlzYWpia2hjd3UxbjN1aDVlYWc4bGN0ZTF0MHVmN2J6YXNiciZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2Fu4th4weIsJfuE%2Fgiphy.gif" alt="street fighter gif" width="500" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OLAP and OLTP are different technologies for different use cases and when you need to do different things.&lt;br&gt;
For example, if you need to process fast transactions or anticipate high-frequency insert, update, or delete operations, OLTP is the right choice. On the other hand, if you need to analyze historical data, group information by customers, or extract insights, OLAP is the better option.&lt;/p&gt;

&lt;p&gt;So, when do you need which one? That’s a great question. If you're a small business, you might think you can save money by skipping an OLAP system and querying the database directly. In some cases, that might work initially without issues. But as soon you start to grow you will see that you will face problems around this. However, as you grow, you might face challenges. Imagine slow load times because your app is querying large amounts of data in real time—or worse, causing database crashes due to complex queries that overload the system.&lt;/p&gt;

&lt;p&gt;That’s why it’s important to have a good data architecture and a plan, to follow as you and your company/app/service grow. OLAP and OLTP aren’t rivals—they were designed to fulfill different needs. Using them correctly will save you time, money, and headaches compared to relying on inefficient workarounds.&lt;/p&gt;

&lt;p&gt;A solid data architecture and a clear understanding of the data engineering lifecycle are essential at any stage of your business, whether you’re just starting or scaling up. Having the right tools to do the right job will make easier your live, the live of your customers and everyone involved. Scaling your data infrastructure alongside your code will improve performance, stability, and long-term success. &lt;/p&gt;

&lt;p&gt;How does your company handle transactional and analytical data? Let’s discuss your experiences in the comments!&lt;/p&gt;

&lt;p&gt;Do you think businesses should invest in OLAP early on, or wait until they scale? Share your thoughts!&lt;/p&gt;

&lt;p&gt;Have you ever faced performance issues because of querying transactional data for analytics? Let’s talk about best practices!&lt;/p&gt;

&lt;p&gt;Want to learn more about data architecture and best practices? Follow me for future insights!&lt;/p&gt;

</description>
      <category>olap</category>
      <category>oltp</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>OLTP Explained: Speed, Integrity, and High Availability in Databases</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Thu, 13 Feb 2025 19:23:08 +0000</pubDate>
      <link>https://dev.to/oramirezperera/oltp-explained-speed-integrity-and-high-availability-in-databases-3n67</link>
      <guid>https://dev.to/oramirezperera/oltp-explained-speed-integrity-and-high-availability-in-databases-3n67</guid>
      <description>&lt;h2&gt;
  
  
  OLTP
&lt;/h2&gt;

&lt;p&gt;First thing first, what does OLTP means? OLTP stands for Online Transaction Processing and is a database type system very common in the tech industry. It’s mainly used because allows real-time and accurate data processing for a large number of users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7cna1lib2box8xkeghj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7cna1lib2box8xkeghj.gif" alt="dog question" width="480" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main difference between this an &lt;a href="https://dev.to/oramirezperera/olap-unlocking-the-power-of-analytical-data-processing-5edd"&gt;OLAP&lt;/a&gt; Is that the latter is more focused on the analytical side of the data.&lt;/p&gt;

&lt;p&gt;OLTP is built to handle lots of requests, information and users doing insertions, updates and deletions of data in the database. One of the things about OLTP is that it’s really important to maintain data integrity and a way to maintain it is through concurrency (No two transactions can happen to the same data at the same time), in that way you make sure that two people can query the same information and get the same result.&lt;/p&gt;

&lt;p&gt;Additionally, OLTP uses Indexed datasets to speed up searches and queries. Works with backups so the data is available all the time. &lt;/p&gt;

&lt;p&gt;It focuses on transaction speed, with everything designed to ensure the fastest response time between the user and the database&lt;/p&gt;

&lt;p&gt;Some people say that OLAP is the evolution of OLTP and in some way it is, but it’s important to understand that they are both different and has different goals, one it’s focused on the transactions and speed and the  other one is focused in analytics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transactions
&lt;/h2&gt;

&lt;p&gt;Don’t let the word transaction trick you, OLTP is heavily used in banks, e-commerce, ATMs and airlines because of that you may think that it’s meaning is related to the exchange of economics but it’s more related to the computational transactions (it’s the atomic change of state in the database) &lt;/p&gt;

&lt;p&gt;Another important aspect is that transactions either succed or fail as a whole; they can’t remain in an intermediate or pending state. This means that for the database, the transaction worked or not, inside the data, economical transaction or business rules, you can have pending states, but in the data flow can’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  In other words OLTP
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Process larges quantity of transactions.&lt;/li&gt;
&lt;li&gt;Multiple users can access the same data with data integrity.&lt;/li&gt;
&lt;li&gt;Rapid processes, usually measured in milliseconds.&lt;/li&gt;
&lt;li&gt;Uses indexed data sets.&lt;/li&gt;
&lt;li&gt;Has to be available all the time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In another article I will discuss OLTP vs OLAP as technologies for data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have you worked with OLTP systems before? How do they impact your daily work? Share your thoughts in the comments!&lt;/li&gt;
&lt;li&gt;Want to dive deeper into OLAP vs. OLTP? Stay tuned for my next article where I compare both in detail!&lt;/li&gt;
&lt;li&gt;If you're interested in learning more about databases and data processing, follow me for more insights!&lt;/li&gt;
&lt;li&gt;Need help optimizing your OLTP database? Let’s connect and exchange ideas!&lt;/li&gt;
&lt;li&gt;Do you think OLTP is evolving with modern data needs? Let’s discuss it below!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dataengineering</category>
      <category>sql</category>
      <category>oltp</category>
    </item>
    <item>
      <title>OLAP: Unlocking the Power of Analytical Data Processing</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Tue, 04 Feb 2025 13:48:35 +0000</pubDate>
      <link>https://dev.to/oramirezperera/olap-unlocking-the-power-of-analytical-data-processing-5edd</link>
      <guid>https://dev.to/oramirezperera/olap-unlocking-the-power-of-analytical-data-processing-5edd</guid>
      <description>&lt;p&gt;Hearing the term OLAP out of the blue can be confusing, but once you understand its meaning and purpose, it becomes much easier to grasp. It is easier to get used to it brings a lot of options and brings new opportunities to get the most out of your data. That’s why I’m writing today about OLAP.&lt;/p&gt;

&lt;p&gt;At first, the acronym OLAP means Online Analytical Processing, and here the keyword is Analytical that’s the main difference with OLTP, we will talk about OLTP later in another article. As the name says OLAP is focused on the Analytical part of your data, and that’s not all, the main idea of this is to optimize querying large quantities of Data. Reducing processing time and computational resource consumption. Resulting in saving time and money, and more data availability.&lt;/p&gt;

&lt;h2&gt;
  
  
  OLAP Cube
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3n77gx529ry8iwm93n7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx3n77gx529ry8iwm93n7.png" alt="Rubik's cube" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the main things about using OLAP technologies is using the OLAP cube, which consists on a way or technique to analyze data to look for insights. The cube is a multi-dimensional dataset where some of the data is summarized in a way that brings useful information for businesses.&lt;br&gt;
A main thing in OLAP technologies is having a Fact table that has a star or snowflake schema that has connection with some other tables called dimensions.&lt;/p&gt;

&lt;p&gt;OLAP cubes are optimized for complex analytical queries, allowing users to retrieve insights quickly compared to traditional relational databases. Since data is pre-aggregated, queries that would take minutes or hours in a standard database can be executed in seconds.&lt;br&gt;
Analytical Operation&lt;/p&gt;

&lt;p&gt;Another important thing in OLAP is the analytical operation you can do. These are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consolidation&lt;/li&gt;
&lt;li&gt;Drill-Down&lt;/li&gt;
&lt;li&gt;Slicing and Dicing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Consolidation
&lt;/h2&gt;

&lt;p&gt;Also called roll-up, this operation aggregates data, making it easier to analyze trends and consume later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Drill-Down
&lt;/h2&gt;

&lt;p&gt;Lets the user see the details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Slicing and Dicing
&lt;/h2&gt;

&lt;p&gt;Lets you to specific parts of the data in the OLAP cube and see the data from different viewpoints.&lt;/p&gt;

&lt;p&gt;A simple example of this technology that’s even in the Wikipedia Article is grouping all the sales of a store in a table that will be your fact table and having a reference in an id column to another table with the date and time of that sale that would be our dimension table.&lt;br&gt;
By grouping the data in that way, we could see all the sales in general and analyze them, but we can drill down and search and group by periods, by days of the week, etc, which allows us to look for patterns, group them by a way that is useful for us or just present it in a better way in a data visualization tool.&lt;/p&gt;

&lt;p&gt;The way that the data is stored in OLAP helps to do this kind of query and bring that sales information, a lot faster and less processing consuming than in an ordinary relational database. Another important thing to have in mind while you are working with OLAP it’s that all the relations that you need between your fact table and your dimensions tables.&lt;/p&gt;

&lt;p&gt;Another good thing about OLAP technologies is that it has a better Integration with BI Tools&lt;/p&gt;

&lt;p&gt;OLAP cubes are widely supported by Business Intelligence (BI) tools like Tableau, Power BI, and Pentaho. Their structured format enhances data visualization, making it easier to generate meaningful reports and dashboards. &lt;/p&gt;

&lt;p&gt;That being said, I really like the OLAP technology and think that could be really really useful for businesses, and it’s a great evolution of the OLTP. Bringing you benefits for your data visualizations, and your final consumers of the data. OLAP is ideal for tracking historical data trends over time. Businesses can compare sales, performance, or customer behavior across different periods, facilitating better forecasting and strategic planning.&lt;/p&gt;

&lt;p&gt;Have you worked with OLAP technologies before? Do you think investing in data preparation for OLAP is worthwhile?&lt;/p&gt;

&lt;p&gt;Bibliography:&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=iw-5kFzIdgY" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=iw-5kFzIdgY&lt;/a&gt;&lt;br&gt;
&lt;a href="https://en.wikipedia.org/wiki/Online_analytical_processing" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Online_analytical_processing&lt;/a&gt;&lt;br&gt;
&lt;a href="https://es.wikipedia.org/wiki/Cubo_OLAP" rel="noopener noreferrer"&gt;https://es.wikipedia.org/wiki/Cubo_OLAP&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>sql</category>
      <category>olap</category>
    </item>
    <item>
      <title>Know your tools</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Fri, 24 Jan 2025 15:59:47 +0000</pubDate>
      <link>https://dev.to/oramirezperera/know-your-tools-44oj</link>
      <guid>https://dev.to/oramirezperera/know-your-tools-44oj</guid>
      <description>&lt;p&gt;Yesterday I was doing some challenges in codewars and I got a challenge where I had to “Create a function with two arguments that will return an array of the first n multiples of x.” At first I did what I usually do in these cases, break down the problem into simple steps to solve it easier.&lt;br&gt;
The thing is even broken down I had to do a lot of steps including using a loop that I knew it will be using a lot of resources. If it is the only way and you can't avoid it then do it that way. But then I thought each database administrator or database system has its functions and different ways to get the job done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji5kt56h6llhx8aq31cc.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fji5kt56h6llhx8aq31cc.gif" alt="Thinking lady meme" width="504" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s when I got the idea. Codewars uses PostgreSQL as their database system for these challenges, and then I started searching for functions that could help me with this in particular, finally after some research, I found the generate series function that lets you create a series from a number or timestamp, the function accepts an start number a finish number and a step, that’s when a really difficult problem became a really easy one.&lt;/p&gt;

&lt;p&gt;That’s why it’s important to know your tool and, think about which is the best tool for the problem that you have.&lt;br&gt;
Maybe if you are using the wrong tool for a problem you can make it even bigger or time / money consuming, or if you don't research on the tool that you are already using, maybe there’s a more easy way to solve the thing that you are facing.&lt;br&gt;
Practicing, learning and continuously improving will make yours and your teammates life easier, so my advice to you is to constantly read and keet up with the new features and advances that are happening in your tools.&lt;/p&gt;

&lt;p&gt;So, tell me do you do some katas in codewars to practice?&lt;/p&gt;

&lt;p&gt;Did you face a problem like I did?&lt;/p&gt;

&lt;p&gt;Tell me if you have any other stories like mine with another language.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>quickthoughts</category>
      <category>codewars</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Tips for naming variables in Python</title>
      <dc:creator>orlando ramirez</dc:creator>
      <pubDate>Wed, 14 Jul 2021 23:26:35 +0000</pubDate>
      <link>https://dev.to/oramirezperera/tips-for-naming-variables-in-python-1je8</link>
      <guid>https://dev.to/oramirezperera/tips-for-naming-variables-in-python-1je8</guid>
      <description>&lt;h2&gt;
  
  
  Python
&lt;/h2&gt;

&lt;p&gt;Python is a &lt;a href="https://en.wikipedia.org/wiki/Dynamic_programming_language"&gt;dynamic programming language&lt;/a&gt;, and if you have some time programming, or you are just starting, maybe you'v heard that naming variables is very important, but it’s not only giving long names to your variables, they have to be meaningful, they have to tell you what’s the program doing, or where are you going to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python conventions PEP8
&lt;/h2&gt;

&lt;p&gt;Python uses language conventions for many things and one of them is to naming variables, these conventions are grouped in the &lt;a href="https://www.python.org/dev/peps/pep-0008/"&gt;PEP 8&lt;/a&gt; Style Guide for Python Code as they say: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This document gives coding conventions for the Python code comprising the standard library in the main Python distribution.” &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Reserved words or keywords
&lt;/h2&gt;

&lt;p&gt;There are words that you can’t use they are called reserved keywords these keywords can be listed inside python by importing the module keyword and printing the function keyword.kwlist.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tp-V4dlV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/thaw4jv8sic6nsw1b5s3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tp-V4dlV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/thaw4jv8sic6nsw1b5s3.png" alt="example code of keyword.kwlist function"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--L0GlF9UN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nkpvoc3vtwby44k0kh3z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--L0GlF9UN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nkpvoc3vtwby44k0kh3z.png" alt="The return of the function kwlist a list of the python keywords"&gt;&lt;/a&gt;&lt;br&gt;
These are some of the keywords in Python:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_4E8qdHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/u2qswboad3n13wbp1y3t.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_4E8qdHb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/u2qswboad3n13wbp1y3t.jpg" alt="some of the Python keywords"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All of these words can’t be used as variable names, you can check out if a word is a keyword by using the function keyword.iskeyword() and passing the word inside the parentheses, for example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5pdaJ3v5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o7utlbuyfj5utnrrfqfh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5pdaJ3v5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o7utlbuyfj5utnrrfqfh.png" alt="example code of the iskeyword function"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;like in keyword.iskeyword(‘else’) this will return a Boolean, in this case will be True.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips for naming variables in Python
&lt;/h2&gt;

&lt;p&gt;Some tips to naming variables in Python are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using numbers and making them more descriptive as you can.&lt;/li&gt;
&lt;li&gt;Don’t use special characters like @ or $.&lt;/li&gt;
&lt;li&gt;Variable names can’t start with numbers.&lt;/li&gt;
&lt;li&gt;To join words in variable names, you can use camel casing, but the convention is to use underscores like variable_name.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In python you don’t need to declare explicitly that something is a variable as in other languages, but when you write the name of a variable you can use some conventions to communicate what type of variable is.&lt;/p&gt;

&lt;p&gt;The variable types are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Public variables: these are the “normal” variables, you declare them by writing the variable name then an equal sign and what we want to store. e.g. greeting = ‘hello’&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Private variables: these are variables that aren’t part of the public part of the program, is commonly used with APIs, in Python you can access a private variable any time, but what a private variable tells you is that if you change the content you can break the program or create malfunctions in it. e.g _age = 20.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Constants: By convention are all upper case, are values that are not going to change over time. e.g. PI = 3.14159.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Super private” variables: totally private or please don’t touch me variables these starts with a double underscore. If you change anything inside these variables, you probably are going to break &lt;br&gt;
all your program. e.g __really_important_number = 15.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was a quick summary of naming variables and variables types, Python is a great programming language and can be used in many fields, tell me, have you ever saw any of these variables type? do you write Python code? Which is your favorite programming language?&lt;/p&gt;

&lt;p&gt;A big thanks to ThisIsEngineering from Pexels for the cover photo&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
