DEV Community

loading...
Cover image for Reading and Manipulating Your Dataset With Pandas (2)

Reading and Manipulating Your Dataset With Pandas (2)

orthymarjan profile image Marjan Ferdousi ・4 min read

Manipulation

Let's say you need to see only one column of your dataframe. To see the 'fixed acidity' column of our dataset, you need to write:

df['fixed acidity']
Enter fullscreen mode Exit fullscreen mode

Alt Text

If you add a condition to this column, for example, if you want to see the rows that has a fixed acidity higher than 9:

df[df['fixed acidity']>9]
Enter fullscreen mode Exit fullscreen mode

Alt Text

Sometimes you might need rows with multiple conditions added to columns:

df[(df['fixed acidity']>9) & (df['citric acid']>0.5)]
Enter fullscreen mode Exit fullscreen mode

Alt Text

If you need to find specific columns:

df.loc[:,['volatile acidity', 'chlorides']]
Enter fullscreen mode Exit fullscreen mode

Alt Text

You may want to add conditions with them too, for example, you may want to see the 'volatile acidity' and 'chlorides' content of those rows that have a 'fixed acidity' of 9.2:

df.loc[df['fixed acidity'] == 9.2, ['fixed acidity','volatile acidity', 'chlorides']]
Enter fullscreen mode Exit fullscreen mode

Alt Text

You can view the rows for specific indices (as discussed in the previous chapter) too, like this:

df.loc[0:3, ['volatile acidity', 'chlorides']]
Enter fullscreen mode Exit fullscreen mode

Alt Text

Now, if you want to locate a specific value, for example, the alchohol content of the wine of 0th row:

df['alcohol'].loc[0]
Enter fullscreen mode Exit fullscreen mode

and you will get a value of 9.4

You can find locate a row using its index too:

df.iloc[100]
Enter fullscreen mode Exit fullscreen mode

Alt Text

Now if you want to pinpoint a value within this, for example, the 1st attribute (volatile acidity in this case) of the 100th row, try:

df.iloc[100][1]
Enter fullscreen mode Exit fullscreen mode

and you will get 0.61 as expected.

You can locate specific consecutive rows and columns using this iloc command, for example, first three columns of 3rd to 7th row:

df.iloc[3:8, 0:3]
Enter fullscreen mode Exit fullscreen mode

Alt Text

and non consecutive rows and columns too:

df.iloc[[71, 122, 400], [0, 2]]
Enter fullscreen mode Exit fullscreen mode

Alt Text

What if you want to add a new column to your dataframe? Let's add a 'new column' containing the word 'hi' for all rows:

df['new column'] = 'hi'
df.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

Let's try changing the value of 'new column' of 0th index of the dataframe using iloc from 'hi' to 'bye':

df.iloc[0, df.columns.get_loc('new column')]= 'bye'
df.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

Now let's try to find the word starts with 'by' (that we just have added) and replace it with 'hello':

df['new column'].loc[df['new column'].str.startswith('by')] = 'hello'
df.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

You can also replace null values of your data using pandas. We do not have any null values here, so let's introduce a null value first. Let's replace the string 'hello' with null. To do so, we would need the numpy library.

import numpy as np
df['new column'].loc[df['new column'].str.startswith('hel')] = np.nan
df.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

To check the number of null values, you can use the isna() method like this:

df.isna().sum()
Enter fullscreen mode Exit fullscreen mode

Alt Text

This isna() method can also be used to locate the null value like this:

pd.isna(df.head())
Enter fullscreen mode Exit fullscreen mode

Alt Text

Let's replace the null value with 'hey'.

df.fillna(value='hey', inplace=True)
df.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

If you want to drop null values, use the dropna() method.

Now we will try to create a new dataframe using a loop, where one column of the new dataframe would look the same as the 'new column' of our dataframe df.

rows = []
for i in range(df.shape[0]):
     rows.append(['hi', 'bye'])
df_new = pd.DataFrame(rows, columns=["new column 2", "new column 3"])
df_new.iloc[0, df_new.columns.get_loc('new column 2')]= 'hey'
df_new.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

You can merge these two dataframes using their common attributes:

df_merged = df.merge(df_new, left_on='new column', right_on='new column 2')
df_merged.head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

You can make necessary variations in your merging operations by dropping mismatched attributes, or by using a column with common name and so on.

You can also group your dataframes:

df.groupby(['volatile acidity', 'chlorides']).count().head()
Enter fullscreen mode Exit fullscreen mode

Alt Text

You can also group the dataframes using other attributes like sum.

When you are done with manipulation of your dataframes, you are ready to visualize your data.

Discussion (0)

pic
Editor guide