DEV Community

wrighter
wrighter

Posted on • Originally published at wrighters.io on

Basic Pandas: moving columns

Sometimes we want to manipulate a DataFrame’s columns by changing the column ordering. There are a few ways to do this, depending on what state your DataFrame is in.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(5,5), columns=['a', 'b', 'c', 'd', 'e'])
>>> df['max'] = df.max(axis=1)
>>>
>>> df
         a        b        c        d        e      max
0 0.067423 0.058920 0.999309 0.440547 0.572163 0.999309
1 0.384196 0.732857 0.138881 0.764242 0.096347 0.764242
2 0.900311 0.662776 0.223959 0.903363 0.349328 0.903363
3 0.988267 0.852733 0.913800 0.106388 0.864908 0.988267
4 0.830644 0.647775 0.596375 0.631442 0.907743 0.907743

First, let’s just review the basics. Without moving or dropping columns, we can view any column we want in any order by just selecting them.

>>> df['max']
0 0.999309
1 0.764242
2 0.903363
3 0.988267
4 0.907743
Name: max, dtype: float64

Or any set of columns, including viewing the column more than once, and in any order.

>>> df[['d', 'a', 'max', 'b', 'd']]
         d.       a      max        b        d
0 0.440547 0.067423 0.999309 0.058920 0.440547
1 0.764242 0.384196 0.764242 0.732857 0.764242
2 0.903363 0.900311 0.903363 0.662776 0.903363
3 0.106388 0.988267 0.988267 0.852733 0.106388
4 0.631442 0.830644 0.907743 0.647775 0.631442

So assigning back to our variable will make this reordering permanent.

df = df[['d', 'a', 'b', 'max', 'e']]

Since the columns are just an Index, they can be converted to a list and manipulated, then you can also use the reindex method to change the columns ordering. Note that you don’t want to just assign the sorted names to columns, this won’t move them, but will rename them!

>>> df.reindex(columns=sorted(df.columns))
         a        b        d        e      max
0 0.067423 0.058920 0.440547 0.572163 0.999309
1 0.384196 0.732857 0.764242 0.096347 0.764242
2 0.900311 0.662776 0.903363 0.349328 0.903363
3 0.988267 0.852733 0.106388 0.864908 0.988267
4 0.830644 0.647775 0.631442 0.907743 0.907743

Also, when you are first creating a column, you can just insert it in the order that you want it to appear. By default, adding a column using the [] operator will put it at the end.

>>> df.insert(3, "min", df.min(axis=1))
>>> df
         d        a        b      min      max        e
0 0.440547 0.067423 0.058920 0.058920 0.999309 0.572163
1 0.764242 0.384196 0.732857 0.096347 0.764242 0.096347
2 0.903363 0.900311 0.662776 0.349328 0.903363 0.349328
3 0.106388 0.988267 0.852733 0.106388 0.988267 0.864908
4 0.631442 0.830644 0.647775 0.631442 0.907743 0.907743

Finally, you can pop the column, then re-insert it. Popping a column removes it and returns it, as you’d expect.

>>> col_e = df.pop("e")
>>> df.insert(3, "e", col_e)
>>> df
         d        a        b        e      min      max
0 0.440547 0.067423 0.058920 0.572163 0.058920 0.999309
1 0.764242 0.384196 0.732857 0.096347 0.096347 0.764242
2 0.903363 0.900311 0.662776 0.349328 0.349328 0.903363
3 0.106388 0.988267 0.852733 0.864908 0.106388 0.988267
4 0.631442 0.830644 0.647775 0.907743 0.631442 0.907743

So as you can see, there are a number of ways to manipulate your column ordering in your DataFrame.

Discussion (0)