DEV Community

Suraj Upadhyay
Suraj Upadhyay

Posted on

Pandas : Series and DataFrame

Pandas uses two basic data structures, DataFrame and Series, for storage and manipulation of data. Each can be thought of as a mixture or a compound data structure of dictionaries and lists. That is to say the elements in pandas' data structures can be accessed via key values as well as numerical indexes.

For the rest of the article, let's import the pandas library.

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

Series

Technically a Series is supposed to hold a single dimensional array of values annotated with several data attributes such as index names, data type and the array name.

To create a series you will use the Series constructor pd.Series() and typically use two arguments to create a series. The first argument is used to pass an array of data values and the second optional argument is used to pass an array of index names. There is a third argument too which is used to give a name to the Series.

Here are some ways you can create a Series :

tweet = pd.Series(["My first tweet", 20, 4],
                  index=["message", "likes", "retweets"],
                  name="Tweet Data")

letters = pd.Series(['a', 'b', 'c', 'd'],
                    index=range(4))

ranking = pd.Series(["A. Einstein", "I. Newton",
                     "N. Tesla", "Heisenberg"],
                    index=[1, 2, 3, 4], name="Scientists")

countries = pd.Series(["India", "Russia", "Japan", "China"])
Enter fullscreen mode Exit fullscreen mode

The output of print(tweet) will be :

Alt Text

Notice how you can specify the index names and a name for the whole Series (which in this case is "Tweet Data").

Using these index names you can now access the individual elements in the Series. There are actually three ways to access an individual element, i.e. specifying the index name inside the square-bracket operator ('[]', as in a dictionary), using index name with the dot operator (only if the name doesn't contain any spaces) and using python list indexing:

print(tweet["message"]) # Output : "My first tweet"
print(tweet.likes) # Output : 20
print(tweet[2]) # Output : 4
Enter fullscreen mode Exit fullscreen mode

DataFrame

A DataFrame is technically a collection of Series or in other words many Series glued together. For the sake of drawing parallels, you can think of it as a 2-dimensional array or matrix. However, it is much more than an array or a matrix, a DataFrame can hold much more than just 2-dimensions of data.

To construct a DataFrame you are going to use, not very surprisingly, a DataFrame constructor pd.DataFrame(). The DataFrame constructor takes one positional and several keyword arguments, but there are three arguments you will normally use. The first argument takes up a dictionary in which each key-value pair contains a one-dimensional array representing a column. The second argument takes the names of the column. And the third argument takes a list containing names of the indexes.

E.g.

sales = pd.DataFrame({"Laptops" : [100, 110, 20],
                      "Mobiles" : [30, 35, 4],
                      "Earphones" : [150, 120, 40]},
                     columns={"Laptops", "Mobiles", "Earphones"},
                     index=["2018", "2019", "2020"])
sales
Enter fullscreen mode Exit fullscreen mode

Output :

Alt Text

You can play with the DataFrame constructor the same way we did with the Series constructor. You can access the individual columns of a DataFrame by indexing with column names either with the square-bracket operator or the dot operator.

print(sales["Mobiles"], '\n')
print(sales.Earphones)
Enter fullscreen mode Exit fullscreen mode

Output :

Alt Text

Now, to access an individual data element inside a DataFrame, you will use two methods, namely, iloc and loc.

iloc : This reduces the indexing of a DataFrame to a standard python matrix's zero-based indexing. It works with the square brackets.
That is,

sales.iloc[0][2] # Output : 150. Laptop Sales in 2018
sales.iloc[0][0] # Output : 100.  Mobile Sales in 2018
sales.iloc[2][2] # Output : 40   Earphone Sales in 2020
Enter fullscreen mode Exit fullscreen mode

Note : The iloc method only works with numerical indexes.

loc : This provides many flexible ways to access an element in a DataFrame. This method of accessing elements is very similar to numpy array indexes, if you are familiar with numpy arrays. This allows you to access the elements using labels for both indexes and columns as well as using numerical indexes too. That is :

sales.loc['2018', 'Mobiles']            # Output : 30
sales.loc['2020', 'Earphones']          # Output : 40
sales.loc[['2018', '2020'], 'Laptops'] 
# Output :
# 2018    100
# 2020     20
# Name: Laptops, dtype: int64
Enter fullscreen mode Exit fullscreen mode

The loc method also takes arrays of boolean values as arguments, but let's save that for another article.

Thanks for reading.

Regards,

Suraj Upadhyay.

Top comments (0)