Context
A few days ago, I had to extract a data from Oracle Database using SQL, and then PIVOT
a long set of values in a column, to multiple columns. The sintax to do this, requires to write every single value you want to be as a column, not one but twice to avoid the default quotes.
See reference
https://www.oracle.com/technical-resources/articles/database/sql-11g-pivot.html
There are a few options like pivoting using PIVOT XML
, or even more recent to build a JSON column using JSON_ARRAYAGG
and JSON_OBJECT
to get dynamic columns or any value as an attribute, but still not so straightforward.
First try:
Lets try with a python notebook (I use VSCode notebook).
First import pandas:
import pandas as pd
We will use a random free dataset:
# source file from https://www.kaggle.com/gregorut/videogamesales
df = pd.read_csv('./vgsales.csv')
df
Using the function pivot_table
we will transponse the values of the column Genre to be a column for every value in the dataset.
pivot_df = df.pivot_table(index=['Year', 'Publisher'], columns=['Genre'], values=['Global_Sales'], aggfunc=np.sum, fill_value=0)
pivot_df
Now let's reset_index()
to "try" flattening the indexes.
mi_pivot_df = pivot_df.reset_index()
mi_pivot_df
Ook, not sure, but, lets try to export to an Excel, with index=False
to avoid showing the index column to the left.
pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)
So, it's not implemented. Although, it does work when you remove the index=False
part. However it will show the index column to the left and the two column indexes, which I don't want to be there.
Second try
I searched the solution to flat the most, but no answer was full, complete enough, and free of errors. I collected all, and by try and error, got here, a working solution. I also works after using pivot
function:
flat_index_pivot_df = pivot_df['Global_Sales'].reset_index()
flat_index_pivot_df
Now you can get a clean Excel Sheet, free of MultiIndex.
flat_index_pivot_df.to_excel('./global_sales_by_publishers_genres.xlsx', index=False)
Do you know another solution?
Top comments (1)
Nice article. I wrote a small utility for pandas called sidetable which flattens multi-indexes and gives some additional options. It might be useful to you and your readers.
You can see it here - github.com/chris1610/sidetable#fla...