Espoir Murhabazi

Posted on Mar 19, 2018

How pandas read_clipboard method works

#python #pandas #codenewbie #blogmore

Last year I was tracking the fanatic golden badge from StackOverflow, that badge recognizes important contributions from members of the community. It rewards users who visit the site during 100 consecutive days and it is rarely awarded.

Despite the fact it is the easiest golden badge a user can get on the site, I was longing to receive it but unfortunately, my progression was reset after 89 consecutive days and I couldn’t earn the badge.

It’s was very painful for me but I didn’t give up and I started a new year with the resolution to not forget to check my so profile every single day.

I do this not only for copying answers but for learning new concepts and improving my programming skills by reading others people answers and answering some questions about python especially flask questions and pandas questions.

I check unanswered questions on the site every single day after work, when my work becomes boring or when I feel unmotivated in order to increase my self-confidence!
And after 100 consecutive days, I finally earned it.

When I was answering pandas questions on the site, discovered a useful function from pandas library, it helps me to be more productive, so I decided to write a blog post about it.

  pandas.read_clipboard()

This is very useful when you are trying to answer questions related to pandas, especially when a user has posted his dataframe in text format and you want to reproduce the same dataframe in your working environment.

To get the most out of this post, it’s best if you are familiar with python and pandas library. You might also want to check this tutorial or this from pandas to get started with it. You need also a python working environment to run the example from this tutorial, one of my favorites is a jupyter notebook or ipython console.

History

The read_clipboard method was first introduced in version 0.13 of pandas in order to solve the problem of creating dataframe from data copied to the clipboard.
When it was introduced it was a handy way to takes the contents of the clipboard buffer and pass them to the read_table method. It became popular when people started asking questions about pandas on stack overflow by adding the content of their dataframe as text to the questions. And now the best developers use it every time they want to give help to a beginner when they post their problems on the site.

How It Works :

As said in the method documentation :

read_clipboard method reads text from the clipboard and pass to read_table method and returns a parsed: DataFrame

This method is very similar to the read_csv method of pandas or read_table but where the data comes from clipboard buffer instead of a CSV file.
First, you need to have text from a dataframe. It’s important to have a text structured in a dataframe way with data order in row and columns.

After that, you need to copy the data using ctrl+c or cmd+c on mac, then you call pd.read_clipboard () method in your environment And finally, you get a dataframe which you can use to do all manipulations that can be done in a dataframe.

An Example

Let’s take a look at how this works in a real live example :
Suppose you have the following data and you want to create a dataframe from it :

    bar   foo
0    4     1
1    5     2
2    6     3

You just need to copy the data in your clipboard and write this method in your environment or ipython console (Don’t copy-paste it but write it, otherwise it will overwrite the data in your clipboard):

import pandas as pd 
pd.read_clipboard()

if everything works fine you will have the same data as what I posted in the output of your console.
You can also put the output in a variable by adjusting the code like this :

df = pd.read_clipboard()

Another example is when you want to answer a question on StackOverflow :
Look for a question tagged pandas here and check if the user has posted the dataframe.

Or you can start with this question and copy the data shared, and write the following in your python environment :

df = pd.read_clipboard()

if everything was perfectly done you will get the following in your df :

You can see all the action in the animation below.

Pro tip:
As said in the method documentation, this one pass data to the read_table method, you can use read_table parameters to make magic with this method.

Here is an example:
Let’s suppose you have a question were the data posted comes from a CSV file :

1,Allen, Miss Elisabeth Walton,1st,29,female,1,1
2,Allison, Miss Helen Loraine,1st,2,female,0,1
3,Allison, Mr Hudson Joshua Creighton,1st,30,male,0,0
4,Allison, Mrs Hudson JC (Bessie Waldo Daniels),1st,25,female,0,1
5,Allison, Master Hudson Trevor,1st,0.92,male,1,0
6,Anderson, Mr Harry,1st,47,male,1,0
7,Andrews, Miss Kornelia Theodosia,1st,63,female,1,1
8,Andrews, Mr Thomas,1st,39,male,0,0
9,Appleton, Mrs Edward Dale (Charlotte Lamson),1st,58,female,1,1
10,Artagaveytia, Mr Ramon,1st,71,male,0,0
11,Astor, Colonel John Jacob,1st,47,male,0,0
12,Astor, Mrs John Jacob (Madeleine Talmadge Force),1st,19,female,1,1
13,Aubert, Mrs Leontine Pauline,1st,NA,female,1,1
14,Barkworth, Mr Algernon H,1st,NA,male,1,0
15,Baumann, Mr John D,1st,NA,male,0,0

Calling read_clipboard with default parameters and this data will throw and an error.
Let’s play with some argument in order to have a working dataframe :

the first one is the sep:

sep: The data separator, by default it’s space but for this case, we will need to use a comma as our data is in a CSV format

the second one is :

names: which is an array-like in most cases a list, here is the description from the docs :

List of column names to use. If the file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause a UserWarning to be issued.

the third one is :

index_col: specify which columns to use as index columns or a list of values to use as an index.
Here is the description from pandas doc:

Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)

Let’s pass those parameters to our read_clipboard method and get a working dataframe :

df = pd.read_clipboard(sep=',', 
                       index_col=0, 
                       names=['firstName', 
                              'Name',
                              'PClass',
                              'Age',
                              'Sex',
                              'Survived',
                              'SexCode'])

and here is the final dataframe as result :

You can play with as many parameters you want according to situations you are facing and the data you are dealing with. you can find more here in read_table documentation.

When we don’t use the read_table method, we can still create our own dataframe manually from a structured text file but we’re limited by time and many mistakes. That makes things harder because it’s difficult to reproduce the exact dataframe. But with the help of read_clipboard method, now we can create a dataframe from structured text data which lets us answer easy questions on StackOverflow.

CONCLUSION
Now that you’ve learned about the read_clipboard method, take a look at others IO(Input-output ) tools from pandas in the official documentation.

For more reading, check out this blog post from datacamp, 10 minutes from pandas, and this feedwith questions tagged by pandas on StackOverflow.

Special thanks to the CodeNewbie team for their useful advice and missions they gave us during the blog more challenge missions.