DEV Community

Leonardo Damasceno Nascimento
Leonardo Damasceno Nascimento

Posted on

Getting a CSV from internet with Pandas

Hi everyone.
Yesterday, I came across a challenge while I applying my studies od Pandas library.

I usually use CSV files I download to do my analysis, but this time my data was in an online CSV file that was constantly updated by other people. So, at least for me, is pretty boring to have to download every time the file and use it on the code.

So, here I am to show you how I solved this problem and managed to collect the data directly from the internet:

Case 1: CSV directly from a link

I have gotten this link to download a CSV file from G Drive:
https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download

If you click will download a file of the price of the dollar to Real.

Code:
import pandas as pd
url from the file you wanna download
#url ='https://drive.google.com/uc?authuser=0&id=1UzlPy6CZQeAzDXhfc_2sHEyK_Jb50vJs&export=download'
#Creating a dataframe to get the file
price_df = pd.read_csv(url)
price_df 
Enter fullscreen mode Exit fullscreen mode

Pretty easy ;)

But, sometimes you don't be so easy. You will have to format the file.


Case 2: CSV in a request that needs to be handled

This link is a history of coffee price in Brazil I found on Google:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp

At the end of the page you will see a link to download the file:

If you copy the URL, you will get that link:
http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp?d-3496238-e=2&6578706f7274=1

Let´s apply the same code:

coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
coffe_df = pd.read_csv(coffe_url)
coffe_df
Enter fullscreen mode Exit fullscreen mode

And... Ops... Doesn't work.
Don't work because that link is not with the format like the other file of GDrive.

To solve that problem, we have to import two libraries:
Import requests
Import io

import requests
import io

coffe_url = 'http://portalweb.cooxupe.com.br:8080/portal/precohistoricocafe_2.jsp;jsessionid=FDD4F9458CC173838FCDE86C02BF6349?d-3496238-e=2&6578706f7274=1'
# Doing a request from website 
content_url = requests.get(coffe_url).content
file = io.StringIO(content_url.decode('latin1'))
coffe_df = pd.read_csv(file, sep='\t')
coffe_df
Enter fullscreen mode Exit fullscreen mode

And now we did it!!!

Notes: I use decode 'latin1', because the file has some special characters.
Note2: the separators of the file aren't a common separator. Is \t for some reason.
Note3: The file download like a text, the library io is to transform to file.
Text file before convert to file
You also can use the strings methods, but more simple use IO library

Top comments (0)