Pandas #2: Opening files and using them as dataframes.

(Previous post about Pandas: Pandas #1: Working with data in Python and its main structures)

Previously we saw the basic Pandas objects and an introduction about how creating them with lists and dictionaries and access singles values. But, if you want to open files (like spreadsheets) to be manipulated in dataframes. Are we able to do this?!

Of course!!! Let's see:

Opening files

Pandas has many methods that can save us from spending a lot of time coding alternatives to open files and transform them in dataframes to be manipulated. There are so many of them and you can see in Pandas' webpage a list of names of all methods to open files. Here I'll write about the two most used of them:

Open a csv file

import pandas as pd

x = pd.read_csv('my_file.csv')

This is the simplest way to do this, but if needed we have some parameters that we can pass to it to make our life easier. Let's suppose you have a tsv (tab-separated values) instead of an csv (comma-separated values). To solve this, you can pass sep to the method and declare the separator used in the file (comma is the default value):

import pandas as pd

x = pd.read_csv('my_file.csv', sep="\t") #\t means tab

If you import your file just like the cells above, the default behavior of read_csv is infer that the first line of the file refers to the name os the columns. You can pass header = None if you don't have columns names. If you want to put your own columns names, you can use the parameter names and pass the list of your names.

import pandas as pd

x = pd.read_csv('my_file.csv', header = None, names=['col1','col2,'col3'])

You can also pass a column that will be used as index in your dataframe. It's passed using index_col:

import pandas as pd

x = pd.read_csv('my_file.csv', header = None, names=['col1','col2,'col3'], 
index_col='col2')

There are many other parameters that you can use to open a csv file, but these ones are enough to start your analysis.

Open an excel file

Microsoft Windows is the most popular OS and Microsoft Office a well-known package (dragging excel with it). So, it's important to know how to open these files. Let's see:

import pandas as pd

x = pd.read_excel('my_file.xls', sheet_name='sheet2')

The most important parameters are the same as in csv version, but this time we'll include one more, the sheet_name. Here you can specify which sheet to open in your file if you have many of them. If they're not named, you can pass a integer to call it (remember that it starts in 0).

Well, that's it for now! My plan is use this DEV blog as a notebook, saving things from the most simplest and elementary to the most elaborate. My intention is to keep information easy to access.

See ya!