<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: GharamElhendy</title>
    <description>The latest articles on DEV Community by GharamElhendy (@gharamelhendy).</description>
    <link>https://dev.to/gharamelhendy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F619679%2Fec81048f-e72e-4b7d-bfd9-b66f722e64ad.jpg</url>
      <title>DEV Community: GharamElhendy</title>
      <link>https://dev.to/gharamelhendy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gharamelhendy"/>
    <language>en</language>
    <item>
      <title>SQL Cheat Sheet #2</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sat, 07 Oct 2023 18:27:40 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/sql-cheat-sheet-2-1p2e</link>
      <guid>https://dev.to/gharamelhendy/sql-cheat-sheet-2-1p2e</guid>
      <description>&lt;h1&gt;
  
  
  Here are some SQL commands to jog your memory!
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Return the number of records in a table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(*) FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Show the entire table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Show a column within the table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT column_name FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select only the unique values within a column
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT DISTINCT column_name FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Return the number of unique values from a column within a table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(DISTINCT column_name)
FROM table_name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records where a certain value exists
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name 
WHERE column_name = 'value'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records excluding a certain value
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name WHERE NOT column_name = 'value'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records where a certain value exists in a column and another certain value exists in another column
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name 
WHERE column1 = 'string1' AND column_2 = 'string2'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records that start with a particular letter from the alphabet (in this case, the letter = a)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name WHERE column_name LIKE 'a%'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records that end with a particular letter from the alphabet (in this case, the letter = a)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name WHERE column_name LIKE '%a'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select all records with values from a column alphabetically ranging from a certain value to another certain value
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_name BETWEEN 'string1' AND 'string2'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Return records in a descending order
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM table_bame ORDER BY column_name DESC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select records from a column with multiple values
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM Customers
WHERE column_name IN ('value1','value2');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Insert records in your table
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO table_name VALUES ('first','second,'third','nth')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;It's worth noting that the number of values you insert after the "INSERT INTO VALUES" has to correspond to the number of columns in the table you're inserting information into.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Change a record in a column to another value
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UPDATE table_name SET column_name = 'new value' WHERE column_name = 'old value'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Remove records where a certain value exists
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE FROM table_name WHERE column_name = 'certain value'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>sql</category>
      <category>data</category>
      <category>dataanalysis</category>
      <category>database</category>
    </item>
    <item>
      <title>SQL Cheat Sheet #1</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sat, 07 Oct 2023 18:21:31 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/sql-cheat-sheet-1-510k</link>
      <guid>https://dev.to/gharamelhendy/sql-cheat-sheet-1-510k</guid>
      <description>&lt;p&gt;SELECT&lt;br&gt;
UPDATE&lt;br&gt;
DELETE&lt;br&gt;
INSERT ... INTO .... VALUE&lt;br&gt;
WHERE... =&amp;gt;&amp;lt;&lt;/p&gt;

&lt;p&gt;CREATE DATABASE&lt;br&gt;
ALTER DATABASE&lt;br&gt;
CREATE TABLE&lt;br&gt;
ALTER TABLE&lt;br&gt;
DROP TABLE&lt;br&gt;
CREATE INDEX (To create a search key)&lt;br&gt;
DROP INDEX&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>data</category>
      <category>sql</category>
    </item>
    <item>
      <title>Using GroupBy to Investigate Data from a Certain Scope According to One or More Specific Attributes</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Mon, 23 Aug 2021 15:08:31 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/using-groupby-to-investigate-data-relevant-to-one-or-more-specific-attributes-3709</link>
      <guid>https://dev.to/gharamelhendy/using-groupby-to-investigate-data-relevant-to-one-or-more-specific-attributes-3709</guid>
      <description>&lt;p&gt;In this scenario, we have a dataframe that's made up of multiple attributes and we want to find the means of some of those attributes but from the scope of one or two main attributes.&lt;/p&gt;

&lt;p&gt;For example, if we want to find the mean height in a population that consists of males and females with different age groups:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Bear in mind that my dataframe is called population and there are attributes like (for example) weight, height, BMI, and the age and gender, which we will use to split the data during analysis.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Importing and Parsing
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;population_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'investigation_data.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To view means relative to the age of the person:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;population_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'age'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will result in showing us the mean age of all samples with a certain age, which will be specified in the first column of my dataset.&lt;/p&gt;

&lt;h2&gt;
  
  
  To view means relative to the age and then relative to the gender:
&lt;/h2&gt;

&lt;p&gt;So, to use multiple columns with groupby, we can do the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;population_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'age'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'gender'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which will show us the mean of the attributes according to age, and then gender.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Renaming Columns for Mutable Operations</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sun, 22 Aug 2021 11:16:24 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/renaming-columns-for-mutable-operations-4c8e</link>
      <guid>https://dev.to/gharamelhendy/renaming-columns-for-mutable-operations-4c8e</guid>
      <description>&lt;p&gt;The simple syntax used to change a column name is like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;This is with the following info in mind: &lt;br&gt;
dataframe name is file and we want to modify the 9th column's name&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;file_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'new_column_name'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, index doesn't support mutable operations, and that's why we'll have to use a different syntax for this modification, which is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;file_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'old-column-name'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'new_column_name'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  There's another method that's a little longer, and it goes as follows:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;new_labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;new_labels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'new_column_name'&lt;/span&gt;
&lt;span class="n"&gt;file_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_labels&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: Using this method, we're reassigning the entire thing to a new list&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Appending Two DataFrames in Pandas</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sun, 22 Aug 2021 11:04:15 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/appending-two-dataframes-in-pandas-49p8</link>
      <guid>https://dev.to/gharamelhendy/appending-two-dataframes-in-pandas-49p8</guid>
      <description>&lt;p&gt;Let's say we have two dataframes with the same attributes and they share a physical attribute that may vary. We can combine both dataframes and differentiate using this physical attribute to make handling csv files easier.&lt;/p&gt;

&lt;p&gt;For example, if we have descriptions of colors of skin, hair, and eyes for two dataframes, one for males and one for females, we can add a column at the end that signifies whether the attributes in a certain row belong to a male or female, all in the same dataframe (population_df) instead of 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Importing and reading the files
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;males_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'males.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;females_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'females.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding a last column with the attribute used for distinction
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;gender_male&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'male'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;males_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;gender_female&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'female'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;females_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Appending the two dataframes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;population_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;males_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;females_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Saving the combined dataset with a False index in order not to save the file with the unnamed column
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;population_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Filename.csv'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: You can make sure that you've successfully appended your two dataframes by using .shape&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;population_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;If the number is the sum of the two dataframe counts, proceed with your work&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>How Relative Frequency Is Important for Predictions (Pandas)</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Fri, 14 May 2021 14:19:09 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/how-relative-frequency-is-important-for-predictions-pandas-57me</link>
      <guid>https://dev.to/gharamelhendy/how-relative-frequency-is-important-for-predictions-pandas-57me</guid>
      <description>&lt;h1&gt;
  
  
  What is the relative frequency? How to find it? And what importance does it serve?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;When analyzing data, sometimes knowing the value count of some of the entries in a certain column isn't enough to form an insight into what the data means. So, we obtain the relative frequency, which is the ratio of the frequency of a particular occurrence divided by the total number of occurrences.&lt;/p&gt;

&lt;p&gt;For example, if you can form meaningful insights from data 9 times out of 12, this means that this happens 75% of the time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to find it in Pandas?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Instead of using this:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You should add normalize=True&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The default of normalization is False, which doesn't take into account the frequency of the occurrence.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The importance of Relative Frequency
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;When you obtain the relative frequency of something, you're better able to get an insight into the probability of its occurrence, which means that you can make better predictions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Note: You can represent the probability distribution using histograms. And when representing the relative frequency using histograms, the heights would indicate the probability.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Adding Syntax Highlighting</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Fri, 14 May 2021 13:23:22 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/adding-syntax-highlighting-45ho</link>
      <guid>https://dev.to/gharamelhendy/adding-syntax-highlighting-45ho</guid>
      <description>&lt;h2&gt;
  
  
  To add a code block with the corresponding syntax highlighting
&lt;/h2&gt;

&lt;p&gt;'''(ProgrammingLanguage)&lt;br&gt;
Your code here&lt;br&gt;
'''&lt;/p&gt;

&lt;h2&gt;
  
  
  So, it would look like this when you're creating the post
&lt;/h2&gt;

&lt;pre&gt;


```python
print("Hello world")
```


&lt;/pre&gt;

&lt;h2&gt;
  
  
  And the final result would be this:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello world"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>programming</category>
    </item>
    <item>
      <title>Data Wrangling Techniques</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Fri, 14 May 2021 13:06:16 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/data-wrangling-techniques-random-documentation-3li5</link>
      <guid>https://dev.to/gharamelhendy/data-wrangling-techniques-random-documentation-3li5</guid>
      <description>&lt;h1&gt;
  
  
  To drop certain irrelevant columns when analyzing a dataset
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'ColumnX'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'ColumnY'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'ColumnZ'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  To count the number of values in a column
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column_name'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  To change the value to numeric
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column_name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column_name'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  To replace certain values in a column to other values of choice
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column_name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'Column_name'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'Initial_value'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;'New_value'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Transposing an Entire Table in Pandas</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Thu, 13 May 2021 09:40:10 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/transposing-an-entire-table-in-pandas-5f9o</link>
      <guid>https://dev.to/gharamelhendy/transposing-an-entire-table-in-pandas-5f9o</guid>
      <description>&lt;h1&gt;
  
  
  Importing and parsing
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Import pandas as pd&lt;br&gt;
df = pd.read_csv('table.csv')&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  To transpose
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df = pd.read_csv('table.csv', skiprows=1, header=None).T&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  You can check that it was transposed with all the columns and rows correctly by using the shape attribute:
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.shape&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;If it used to result, for example, in (10,5) and it became (5,10) after the transpose, then you're all done!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Exploring Data in Pandas Using GroupBy</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Tue, 11 May 2021 04:05:38 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/exploring-data-in-pandas-using-groupby-3n6a</link>
      <guid>https://dev.to/gharamelhendy/exploring-data-in-pandas-using-groupby-3n6a</guid>
      <description>&lt;h1&gt;
  
  
  Importing and Parsing
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Import pandas as pd&lt;br&gt;
df = pd.read_csv('file_name.csv')&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Checking the mean of all attributes (columns) in a data set
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.mean()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Finding the mean of attributes when holding one attribute as an index
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;I.e: Here, I want to find the mean of all the other attributes with each occurence of the petal length attribute.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.groupby('petal_length').mean()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  We can even add multiple entries to hold as an index
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.groupby(['petal_length', 'color']).mean()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  If we don't want the attributes we choose to be made as an index, we can use as_index=false:
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.groupby(['petal_length', 'color'], as_index=False).mean()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  And finally, if we are interested in only one attribute (column) we can index it as follows:
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.groupby(['petal_length', 'color'], as_index=False)['petal_width'].mean()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Appending Data in Pandas (+Numpy) When Working with Two Data Sets That Have the Same Attributes (Columns)</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sun, 09 May 2021 06:34:55 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/appending-data-in-pandas-numpy-when-working-with-two-data-sets-that-have-the-same-attributes-columns-2hic</link>
      <guid>https://dev.to/gharamelhendy/appending-data-in-pandas-numpy-when-working-with-two-data-sets-that-have-the-same-attributes-columns-2hic</guid>
      <description>&lt;h1&gt;
  
  
  Explanation
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Let's assume we have 2 sets of data in 2 different csv files. For example, data that includes petal length, width, thickness, etc. for a certain kind of flower that can be found in 2 colors. &lt;/p&gt;

&lt;p&gt;Let's assume that these 2 colors are red and white.&lt;/p&gt;

&lt;p&gt;While they're of different colors, any given flower would have the same attributes, except for the color. So, we can make that distinction in the data set that combines both.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Importing and Parsing
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;import numpy as np&lt;br&gt;
import pandas as pd&lt;/p&gt;

&lt;p&gt;red_df = pd.read_csv('red_flowers.csv')&lt;br&gt;
white_df = pd.read_csv('white_flowers.csv')&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  We'll create a dataframe for both color arrays
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;color_red = np.repeat('red', red_df.shape[0])&lt;br&gt;
color_white = np.repeat('white', white_df.shape[0])&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Then, we'll add arrays to boh dataframes by setting a new column named "color" to the corresponding array.
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;red_df['color'] = color_red&lt;br&gt;
white_df['color'] = color_white&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Then, we'll append the dataframes and view the file to make sure we've done it successfully.
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;flowers_df = red_df.append(white_df)&lt;br&gt;
flowers_df.head()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Then, we'll save the new combined dataframe to a .csv file.
&lt;/h2&gt;

&lt;p&gt;flowers_df.to_csv('flowers_edited.csv', index=False)&lt;/p&gt;

&lt;h2&gt;
  
  
  Finally, we can check the file to make sure the two datasets were combined successfully by making sure that the rows are the sum of the two sets and that the columns are the initial number of columns in each data set + 1.
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;flowers_df.shape&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;p&gt;_I.e: If each data set had 12 columns, and the red flowers set had 55, while the white flowers set had 45 rows, then the flowers_df.shape attribute should result in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;(100, 13)&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Renaming a Specific Column in Pandas</title>
      <dc:creator>GharamElhendy</dc:creator>
      <pubDate>Sun, 09 May 2021 05:23:33 +0000</pubDate>
      <link>https://dev.to/gharamelhendy/renaming-a-specific-column-in-pandas-o99</link>
      <guid>https://dev.to/gharamelhendy/renaming-a-specific-column-in-pandas-o99</guid>
      <description>&lt;h2&gt;
  
  
  Importing and Parsing
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;import pandas as pd&lt;br&gt;
df = pd.read_csv('file_name.csv')&lt;br&gt;
df.head()&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Check the column's name you want to change, and then:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;df.rename(columns={'old_name':'new_name'}, inplace=True)&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;
&lt;br&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
