Ata Seren

Posted on Mar 15, 2024 • Originally published at Medium

LeetCode Study Plan: Introduction to Pandas

#leetcode #pandas #python #learning

Hello everyone. I’m Ata, a computer science graduate and currently interested in cybersecurity. I haven’t used LeetCode a lot since college. But recently, I wanted to hone my coding skills and learn new concepts. To achieve this, I wanted to use “study plans” of LeetCode.

What is Study Plan?

LeetCode Study Plans are plans that consist of LeetCode problems scheduled and categorized. These study plans and problems in them can be specific to some areas such as JavaScript, SQL, etc. or problems that are chosen for training for code interviews. These plans are also split into time schedules to ease the solving process.

Introduction to Pandas Study Plan

This study plan on LeetCode aims to teach the basics of Pandas through 15 simple questions. I used Pandas few times, mostly for machine learning projects but usually, I just used the DataFrame object. I haven’t used its functions a lot. Therefore, I wanted to start with an easy plan about Pandas and learn its basics.

During my solving process, I took some notes to reinforce my understanding of Pandas functions and their capabilities. This story is written according to these notes. In this story, I’ll name the questions, provide my answers, and explain the functions used in the solutions.

Note: I won’t delve into the details of the problems, as they are available in a better format and with examples on LeetCode.

Create a DataFrame from List

Q: Write a solution to create a DataFrame from a 2D list called student_data. This 2D list contains the IDs and ages of some students. The DataFrame should have two columns, student_id and age, and be in the same order as the original 2D list.

df = pd.DataFrame(student_data)
df.columns = ["student_id", "age"]
return df

Creates a DataFrame with specific column names. df.columns is used to name the columns of a DataFrame.

Get the Size of a DataFrame

Q: Write a solution to calculate and display the number of rows and columns of players. Return the result as an array:
[number of rows, number of columns]

return [players.shape[0], players.shape[1]]

df.shape returns a tuple of rows and columns of DataFrame: [row_count, column_count]

Display the First Three Rows

Q: Write a solution to display the first 3 rows of this DataFrame.

return employees.head(3)

df.head(n) returns the first n rows of df.

Select Data

Q: Write a solution to select the name and age of the student with student_id = 101.

return students.loc[students['student_id'] == 101, ['name', 'age']]

In df.loc, the first parameter is the condition used for the search, and the second parameter is a list with desired columns.

Here is an example with multiple conditions:

students.loc[(students['student_id'] == 101) & (students['name'] == "Ulysses"), ['name', 'age']]

Create a New Column

Q: A company plans to provide its employees with a bonus. Write a solution to create a new column name bonus that contains the doubled values of the salary column.

bonus = []
for s in employees["salary"]:
    bonus.append(s*2)

result = employees.assign(bonus=bonus)
return result

In df.assign, there is a column name and a list of values to be used in the column:

df.assign(column_name=[element1, element2, element3])

Note: In the problem “Modify Columns”, I gave some examples about modifying the values in a column of a DataFrame, similar to one I did it in this problem but not at the outside of the DataFrame.

Drop Duplicate Rows

Q: There are some duplicate rows in the DataFrame based on the email column. Write a solution to remove these duplicate rows and keep only the first occurrence.

df = customers.drop_duplicates(subset=['email'])
return df

df.drop_duplicates simply drops duplicates according to the values given in a column or columns. You can give multiple values as below:

dedup_df = df.drop_duplicates(subset=['A', 'B'])

Drop Missing Data

Q: There are some rows having missing values in the name column. Write a solution to remove the rows with missing values.

return students.dropna(subset=['name'])

df.dropna drops the rows with missing values. In this question, a column name is given to drop the rows with missing values if they are only in the given column. df.dropna can take various parameters to handle missing values in different ways.

Modify Columns

Q: A company intends to give its employees a pay rise. Write a solution to modify the salary column by multiplying each salary by 2.

employees.salary = employees.salary*2
return employees

In this question, it is asked to double the values of a column, and I directly accessed the column with df.row_name and doubled its values.

Here is an additional example:

import numpy as np

# Step 1: Select the column
age_column = df['age']

# Step 2: Apply a function to each value
def sqrt(x):
    return np.sqrt(x)
new_age_column = age_column.apply(sqrt)

# Step 3: Assign the new values back to the column
df['age'] = new_age_column

df.apply is used to apply a function to each value in the column.

Rename Columns

Q: Write a solution to rename the columns as follows:

id to student_id
first to first_name
last to last_name
age to age_in_years

return students.rename(columns = {'id':'student_id', 'first':'first_name', 'last':'last_name', 'age':'age_in_years'})

df.rename can be used to change the names of the index or columns like in this case. With inplace=True parameter, df can be modified instead of creating a new one.

Change Data Type

Q: Write a solution to correct the errors: The grade column is stored as floats, convert it to integers.

return students.astype({'grade': int})

df.astype is used to change the data type of an object in the DataFrame. It can be used for specific or all columns. To solve this question, different approaches can be used, such as df.applyto all elements in a column or df.to_numeric to convert non-numeric objects into numeric ones if possible.

Fill Missing Data

Q: Write a solution to fill in the missing value as 0 in the quantity column.

products['quantity'] = products['quantity'].fillna(0)
return products

In this case, it is asked to fill missing data in a single column. That’s why I operated on the quantitycolumn. df.fillna(x) can be used to replace all missing values with the given parameter x.

To achieve the same result, df.replace could be used too:

df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)

Reshape Data: Concatenate

Q: Write a solution to concatenate these two DataFrames vertically into one DataFrame.

return pd.concat([df1, df2], axis=0)

pd.concat can be used to concatenate 2 DataFrames horizontally (same rows, new columns) or vertically (same columns, new rows). axis = 0 is for vertical, and axis = 1 is for horizontal concatenation. Other than this, pd.merge, df.append, and df.join can be used for concatenation.

# Concatenation with pd.merge
result = pd.merge(df, df1, on='Courses', how='outer', suffixes=('_df1', '_df2')).fillna(0)

result['Fee'] = result['Fee_df1'] + result['Fee_df2']
result = result[['Courses', 'Fee']]

# Concatenation with df.join
result = df.join(df1)

# Concatenation with df.append (only vertical concatenation)
result = df.append(df1, ignore_index=True)

Reshape Data: Pivot

Q: Write a solution to pivot the data so that each row represents temperatures for a specific month, and each city is a separate column.

return weather.pivot(index='month', columns='city', values='temperature')

df.pivot is used to pivot a DataFrame with 3 columns. This function is used to reshape to a simpler, smaller DataFrame that the same meaning can be deduced from it. With this function, the index and columns of a DataFrame can be set, and the new DataFrame can be filled with desired values.

Reshape Data: Melt

Q: Write a solution to reshape the data so that each row represents sales data for a product in a specific quarter.

return pd.melt(report, id_vars=['product'], value_vars=['quarter_1', 'quarter_2', 'quarter_3', 'quarter_4'], var_name='quarter', value_name='sales')

pd.melt reshapes a DataFrame to be more computer-friendly. In this problem, pd.melt is used to merge values of multiple columns into a single column. The names of the columns are also used as variable names for the values.

Method Chaining

Q:Write a solution to list the names of animals that weigh strictly more than 100 kilograms. Return the animals sorted by weight in descending order.

return animals[animals['weight'] > 100].sort_values(['weight'], ascending=False)[['name']]

Method chaining is a newer approach to data manipulation, allowing for the execution of multiple operations in a single line of code. With method chaining, each operation is chained together using the dot notation.

Well, that was the last problem in the plan. I hope you enjoyed reading and it was useful for you. I will share more stories when I solve other study plans in LeetCode. Thanks for reading!

DEV Community