DEV Community

Cover image for Pivot: With Pandas and SQLite
Timothy Cummins
Timothy Cummins

Posted on

Pivot: With Pandas and SQLite

SQL has become one of the most popular querying languages, if not the most popular. So with my experiences in the language I thought I would write a mixed blog to create a pivot table with the use of SQLite and Pandas. Just for fun at the end I will show you how to use the Pivot operator in SQL as well.

Pivot

What Pivot allows you to do is to is convert rows to columns to create a sort of "long view". Right away you might wonder why that would be useful but it can come in handy when you want to view or present your results from your query in an easy to read fashion. With Pandas it as simple as using pivot_table and with SQL there is a slick Pivot operator as I mentioned above.

Grabbing Some Data

I will be taking some soccer data for todays post from Kaggle if you would like to follow along. Once that is downloaded we will want to import our packages and take our data out of our wonderful Downloads folder.

import sqlite3
import pandas as pd
import numpy as np
conn = sqlite3.connect('../../Downloads/database.sqlite')
cur = conn.cursor()
Enter fullscreen mode Exit fullscreen mode

If you are not familiar with sqlite, above we are just setting our connection to our data and our cur so we can create queries.

Querying Our Data

Now that we have our data and SQLite setup we can begin our queries. For our example I thought it would be fun to find out which teams had the most goals in home games so we could see who appreciates those hometown fans! To do this we are going to have to join our Match table and Team table together.

SELECT team_long_name, home_team_goal goals
FROM Match 
JOIN Team
ON Team.team_api_id = Match.home_team_api_id
Enter fullscreen mode Exit fullscreen mode

Since above we have both of those tables combined on the team_api_id we can pull the full team names and how many goals they scored in each match when they were the home team. Now to display our results we need to tell our cur to execute the function and then we can fetch the information as a Pandas DataFrame.

cur.execute("""
SELECT team_long_name,home_team_goal
FROM Match 
JOIN Team
ON Team.team_api_id = Match.home_team_api_id;
""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
Enter fullscreen mode Exit fullscreen mode

Alt Text

You may have noticed that I did not sum the values for the total amount of goals yet, but just hold on Pandas pivot_tables has an option for that.

Changing to Pivot

So now that we have our columns we can go over what this whole thing is about, pivots. With Pandas creating a pivot table is very simple, all you need is the pivot_table function and then the following data:

Data: The DataFrame you want to use
Values: The information you want for your rows
Columns: What you want your new columns to be
Aggfunc: Any aggregate functions you want to pass on your values

It is a simple as that so let's plug it in and see what happens.

pd.pivot_table(df, columns = ['team'], values = ['goals','games'], aggfunc = np.sum)
Enter fullscreen mode Exit fullscreen mode

Alt Text

Our pivot table is complete but we have so many teams. To get the top 5 and reorder them we could use a combination of reindexing and iloc but, why do that when we have SQL syntax and can easily reorder with an order by and limit.

SELECT team_long_name team,SUM(home_team_goal) goals
FROM Match 
JOIN Team
ON Team.team_api_id = Match.home_team_api_id
GROUP BY team_long_name
ORDER BY goals DESC
LIMIT 5
Enter fullscreen mode Exit fullscreen mode

Alt Text

SQL Pivot

SQL also offers a Pivot Operator as I mentioned earlier, though sadly you cannot use it with SQLite. To use this operator we use the exact same query but with three differences.

  1. We need to make our query a subquery so that we can call upon it
  2. We need to call the subquery with our Pivot operator
  3. We need to select which values we want as our columns

So your final query should end up as:

SELECT * FROM  
(SELECT team_long_name team,SUM(home_team_goal) goals
FROM Match 
JOIN Team
ON Team.team_api_id = Match.home_team_api_id
GROUP BY team_long_name
ORDER BY goals DESC
LIMIT 5
)
AS Home_Team_Goals
PIVOT(
    goals
    FOR team IN ([Real Madrid CF],[FC Barcelona],[Celtic],[FC Bayern Munich],[PSV])
) AS TopHomePivotTable
Enter fullscreen mode Exit fullscreen mode

Conclusion

It is simple as that to create a pivot table out of your data. I apologize for not having an image for the Pivot using SQL syntax, but I am having trouble with my root password for MySQL and am having trouble resetting it through the terminal. If anyone has any tips or tricks on the reset besides from the MySQL webpage I would love to to hear.

Top comments (0)