DEV Community: GharamElhendy

SQL Cheat Sheet #2

GharamElhendy — Sat, 07 Oct 2023 18:27:40 +0000

Here are some SQL commands to jog your memory!

Return the number of records in a table

SELECT COUNT(*) FROM table_name

Show the entire table

SELECT * FROM table_name

Show a column within the table

SELECT column_name FROM table_name

Select only the unique values within a column

SELECT DISTINCT column_name FROM table_name

Return the number of unique values from a column within a table

SELECT COUNT(DISTINCT column_name)
FROM table_name

Select all records where a certain value exists

SELECT * FROM table_name 
WHERE column_name = 'value'

Select all records excluding a certain value

SELECT * FROM table_name WHERE NOT column_name = 'value'

Select all records where a certain value exists in a column and another certain value exists in another column

SELECT * FROM table_name 
WHERE column1 = 'string1' AND column_2 = 'string2'

Select all records that start with a particular letter from the alphabet (in this case, the letter = a)

SELECT * FROM table_name WHERE column_name LIKE 'a%'

Select all records that end with a particular letter from the alphabet (in this case, the letter = a)

SELECT * FROM table_name WHERE column_name LIKE '%a'

Select all records with values from a column alphabetically ranging from a certain value to another certain value

SELECT * FROM table_name BETWEEN 'string1' AND 'string2'

Return records in a descending order

SELECT * FROM table_bame ORDER BY column_name DESC

Select records from a column with multiple values

SELECT * FROM Customers
WHERE column_name IN ('value1','value2');

Insert records in your table

INSERT INTO table_name VALUES ('first','second,'third','nth')

It's worth noting that the number of values you insert after the "INSERT INTO VALUES" has to correspond to the number of columns in the table you're inserting information into.

Change a record in a column to another value

UPDATE table_name SET column_name = 'new value' WHERE column_name = 'old value'

Remove records where a certain value exists

DELETE FROM table_name WHERE column_name = 'certain value'

SQL Cheat Sheet #1

GharamElhendy — Sat, 07 Oct 2023 18:21:31 +0000

SELECT
UPDATE
DELETE
INSERT ... INTO .... VALUE
WHERE... =><

CREATE DATABASE
ALTER DATABASE
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE INDEX (To create a search key)
DROP INDEX

Using GroupBy to Investigate Data from a Certain Scope According to One or More Specific Attributes

GharamElhendy — Mon, 23 Aug 2021 15:08:31 +0000

In this scenario, we have a dataframe that's made up of multiple attributes and we want to find the means of some of those attributes but from the scope of one or two main attributes.

For example, if we want to find the mean height in a population that consists of males and females with different age groups:

Bear in mind that my dataframe is called population and there are attributes like (for example) weight, height, BMI, and the age and gender, which we will use to split the data during analysis.

Importing and Parsing

Import pandas as pd
population_df = pd.read_csv('investigation_data.csv')

To view means relative to the age of the person:

population_df.groupby('age').mean()

This will result in showing us the mean age of all samples with a certain age, which will be specified in the first column of my dataset.

To view means relative to the age and then relative to the gender:

So, to use multiple columns with groupby, we can do the following:

population_df.groupby(['age', 'gender']).mean()

Which will show us the mean of the attributes according to age, and then gender.

Renaming Columns for Mutable Operations

GharamElhendy — Sun, 22 Aug 2021 11:16:24 +0000

The simple syntax used to change a column name is like this:

This is with the following info in mind:
dataframe name is file and we want to modify the 9th column's name

file_df.columns[8] = 'new_column_name'

However, index doesn't support mutable operations, and that's why we'll have to use a different syntax for this modification, which is as follows:

file_df.rename(columns={'old-column-name':'new_column_name'}, inplace=True)

There's another method that's a little longer, and it goes as follows:

new_labels = list(file_df.columns)
new_labels[8] = 'new_column_name'
file_df.columns = new_labels

Note: Using this method, we're reassigning the entire thing to a new list

Appending Two DataFrames in Pandas

GharamElhendy — Sun, 22 Aug 2021 11:04:15 +0000

Let's say we have two dataframes with the same attributes and they share a physical attribute that may vary. We can combine both dataframes and differentiate using this physical attribute to make handling csv files easier.

For example, if we have descriptions of colors of skin, hair, and eyes for two dataframes, one for males and one for females, we can add a column at the end that signifies whether the attributes in a certain row belong to a male or female, all in the same dataframe (population_df) instead of 2.

Importing and reading the files

Import pandas as pd
males_df = pd.read_csv('males.csv')
females_df = pd.read_csv('females.csv')

Adding a last column with the attribute used for distinction

gender_male = np.repeat('male', males_df.shape[0])
gender_female = np.repeat('female', females_df.shape[0])

Appending the two dataframes

population_df = males_df.append(females_df)

Saving the combined dataset with a False index in order not to save the file with the unnamed column

population_df.to_csv('Filename.csv'), index=False)

Note: You can make sure that you've successfully appended your two dataframes by using .shape

population_df.shape

If the number is the sum of the two dataframe counts, proceed with your work

How Relative Frequency Is Important for Predictions (Pandas)

GharamElhendy — Fri, 14 May 2021 14:19:09 +0000

What is the relative frequency? How to find it? And what importance does it serve?

When analyzing data, sometimes knowing the value count of some of the entries in a certain column isn't enough to form an insight into what the data means. So, we obtain the relative frequency, which is the ratio of the frequency of a particular occurrence divided by the total number of occurrences.

For example, if you can form meaningful insights from data 9 times out of 12, this means that this happens 75% of the time.

How to find it in Pandas?

Instead of using this:

df['Column'].value_counts()

You should add normalize=True

df['Column'].value_counts(normalize=True)

The default of normalization is False, which doesn't take into account the frequency of the occurrence.

The importance of Relative Frequency

When you obtain the relative frequency of something, you're better able to get an insight into the probability of its occurrence, which means that you can make better predictions.

Note: You can represent the probability distribution using histograms. And when representing the relative frequency using histograms, the heights would indicate the probability.

Adding Syntax Highlighting

GharamElhendy — Fri, 14 May 2021 13:23:22 +0000

To add a code block with the corresponding syntax highlighting

'''(ProgrammingLanguage)
Your code here
'''

So, it would look like this when you're creating the post



```python
print("Hello world")
```

And the final result would be this:

print("Hello world")

Data Wrangling Techniques

GharamElhendy — Fri, 14 May 2021 13:06:16 +0000

To drop certain irrelevant columns when analyzing a dataset

df.drop(['ColumnX', 'ColumnY', 'ColumnZ'], axis=1, inplace=True)

To count the number of values in a column

df['Column_name'].value_counts()

To change the value to numeric

df['Column_name'] = pd.to_numeric(df['Column_name'])

To replace certain values in a column to other values of choice

df['Column_name'] = df['Column_name'].replace(['Initial_value'], 'New_value')

Transposing an Entire Table in Pandas

GharamElhendy — Thu, 13 May 2021 09:40:10 +0000

Importing and parsing

Import pandas as pd
df = pd.read_csv('table.csv')

To transpose

df = pd.read_csv('table.csv', skiprows=1, header=None).T

You can check that it was transposed with all the columns and rows correctly by using the shape attribute:

df.shape

If it used to result, for example, in (10,5) and it became (5,10) after the transpose, then you're all done!

Exploring Data in Pandas Using GroupBy

GharamElhendy — Tue, 11 May 2021 04:05:38 +0000

Importing and Parsing

Import pandas as pd
df = pd.read_csv('file_name.csv')

Checking the mean of all attributes (columns) in a data set

df.mean()

Finding the mean of attributes when holding one attribute as an index

I.e: Here, I want to find the mean of all the other attributes with each occurence of the petal length attribute.

df.groupby('petal_length').mean()

We can even add multiple entries to hold as an index

df.groupby(['petal_length', 'color']).mean()

If we don't want the attributes we choose to be made as an index, we can use as_index=false:

df.groupby(['petal_length', 'color'], as_index=False).mean()

And finally, if we are interested in only one attribute (column) we can index it as follows:

df.groupby(['petal_length', 'color'], as_index=False)['petal_width'].mean()

Appending Data in Pandas (+Numpy) When Working with Two Data Sets That Have the Same Attributes (Columns)

GharamElhendy — Sun, 09 May 2021 06:34:55 +0000

Explanation

Let's assume we have 2 sets of data in 2 different csv files. For example, data that includes petal length, width, thickness, etc. for a certain kind of flower that can be found in 2 colors.

Let's assume that these 2 colors are red and white.

While they're of different colors, any given flower would have the same attributes, except for the color. So, we can make that distinction in the data set that combines both.

Importing and Parsing

import numpy as np
import pandas as pd

red_df = pd.read_csv('red_flowers.csv')
white_df = pd.read_csv('white_flowers.csv')

We'll create a dataframe for both color arrays

color_red = np.repeat('red', red_df.shape[0])
color_white = np.repeat('white', white_df.shape[0])

Then, we'll add arrays to boh dataframes by setting a new column named "color" to the corresponding array.

red_df['color'] = color_red
white_df['color'] = color_white

Then, we'll append the dataframes and view the file to make sure we've done it successfully.

flowers_df = red_df.append(white_df)
flowers_df.head()

Then, we'll save the new combined dataframe to a .csv file.

flowers_df.to_csv('flowers_edited.csv', index=False)

Finally, we can check the file to make sure the two datasets were combined successfully by making sure that the rows are the sum of the two sets and that the columns are the initial number of columns in each data set + 1.

flowers_df.shape

_I.e: If each data set had 12 columns, and the red flowers set had 55, while the white flowers set had 45 rows, then the flowers_df.shape attribute should result in:

(100, 13)

Renaming a Specific Column in Pandas

GharamElhendy — Sun, 09 May 2021 05:23:33 +0000

Importing and Parsing

import pandas as pd
df = pd.read_csv('file_name.csv')
df.head()

Check the column's name you want to change, and then:

df.rename(columns={'old_name':'new_name'}, inplace=True)