DEV Community

Cover image for Flatten Pandas dataframe MultiIndex
Rolando Gómez Tabar
Rolando Gómez Tabar

Posted on • Edited on

4 1

Flatten Pandas dataframe MultiIndex

Context

A few days ago, I had to extract a data from Oracle Database using SQL, and then PIVOT a long set of values in a column, to multiple columns. The sintax to do this, requires to write every single value you want to be as a column, not one but twice to avoid the default quotes.

See reference
https://www.oracle.com/technical-resources/articles/database/sql-11g-pivot.html

There are a few options like pivoting using PIVOT XML, or even more recent to build a JSON column using JSON_ARRAYAGG and JSON_OBJECT to get dynamic columns or any value as an attribute, but still not so straightforward.

First try:

Lets try with a python notebook (I use VSCode notebook).
First import pandas:



import pandas as pd


Enter fullscreen mode Exit fullscreen mode

We will use a random free dataset:



# source file from https://www.kaggle.com/gregorut/videogamesales
df = pd.read_csv('./vgsales.csv')
df


Enter fullscreen mode Exit fullscreen mode

vgsales.csv

Using the function pivot_table we will transponse the values of the column Genre to be a column for every value in the dataset.



pivot_df = df.pivot_table(index=['Year', 'Publisher'], columns=['Genre'], values=['Global_Sales'], aggfunc=np.sum, fill_value=0)
pivot_df


Enter fullscreen mode Exit fullscreen mode

simple df.pivot_table

Now let's reset_index() to "try" flattening the indexes.



mi_pivot_df = pivot_df.reset_index()
mi_pivot_df


Enter fullscreen mode Exit fullscreen mode

pivot_df.reset_index
Ook, not sure, but, lets try to export to an Excel, with index=False to avoid showing the index column to the left.



pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)


Enter fullscreen mode Exit fullscreen mode

pivot_df.to_excel 1
So, it's not implemented. Although, it does work when you remove the index=False part. However it will show the index column to the left and the two column indexes, which I don't want to be there.

Second try

I searched the solution to flat the most, but no answer was full, complete enough, and free of errors. I collected all, and by try and error, got here, a working solution. I also works after using pivot function:



flat_index_pivot_df = pivot_df['Global_Sales'].reset_index()
flat_index_pivot_df


Enter fullscreen mode Exit fullscreen mode

df.reset_index 2

Now you can get a clean Excel Sheet, free of MultiIndex.



flat_index_pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)


Enter fullscreen mode Exit fullscreen mode

Do you know another solution?

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (1)

Collapse
 
chris1610 profile image
Chris Moffitt

Nice article. I wrote a small utility for pandas called sidetable which flattens multi-indexes and gives some additional options. It might be useful to you and your readers.

You can see it here - github.com/chris1610/sidetable#fla...

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay