DEV Community

Getting data from a website santabarbarahvaccontractor without an API in 3 lines of Python code

We talk about how you can save time and nerves while automating the process of obtaining data from websites without an appropriate API.

Suppose that in search of the data required for your project, you stumble upon such a web page https://www.santabarbarahvaccontractor.com:

Here they are - all the necessary data for your project.

But what if the data you need are on a site that does not provide APIs for receiving them? Of course, you can spend several hours and write a handler that will receive this data and convert it to the format you need for your application.

But there is a simpler solution - the Pandas library and its built-in function read_html (), which is designed to retrieve data from html pages.

import pandas as pd
tables = pd.read_html("https://www.santabarbarahvaccontractor.com/") *
*print(tables[0])

Yes, everything is so simple. Pandas finds the html-tables on the page and returns them as a new DataFrame object.

Now let's try to tell Pandas that the first (to be exact zero) row of the table contains the column headers, and also ask it to generate a datetime-object from the row in the column with the date and time.

import pandas as pd
calls_df, = pd.read_html("URL", header=0, parse_dates=["Call Date"])
print(calls_df)

Now you know how using Python and Pandas you can quickly get data from almost any site, without much effort

Top comments (0)