Microsoft's Power BI is a powerful tool for data analysis and data visualization. For all of it's shortcomings, one of its most appealing features is its accessibility; from the individual to corporate enterprise integration.
In the recent weeks, I've encountered a similar question accross forums and that is: How do I work with an API in Power BI? Hence, the purpose of this post!
There are a couple of ways to work with APIs in Power BI, but the method that I reach for most is Python's Request module. Below, we'll step through the process of installing the necessary Python packages, enabling Python in Power BI, and getting data from the US Federal Election Commission's API.
Installing the Necessary Python Packages
For the purposes of this post, I'm assuming you already have Python installed on your computer. If not, you can download the latest version from Python's website.
The two main Python packages you will need is Requests and Pandas. Both Requests and Pandas can be installed using pip install. You can install both libraries with the same pip install command by seperating the two with a space.
Next, you'll want to make sure your Power BI is pointed to the correct file path to run Python scripts. Open a new instance of Power BI and navigate to File and Options and settings.
Constructing the API Request
The next step is to construct the API request. I normally start writing my API request in a code editor before running it as a script in Power BI. Again, we'll be using the US Federal Election Commission's API for this GET request. It is an open API but you will need to create an account to obtain and API key, which you can do at the FEC website.
import requests
import pandas as pd
We start by importing the necessary libraries to our Python file. If you're interested in learning more about the respective libraries, you can find the full documentation at Requests and Pandas. It's convention to import pandas with the alias of pd to prevent having to write pandas everytime we want to use it.
Next, we'll assign our API key to a variable, key
. In most instances you can create an API request with just a couple lines of code.
key = 'some_key'
url = f'https://api.open.fec.gov/v1/candidates/?api_key={key}&page=1&sort_hide_null=false&sort_null_only=false&sort_nulls_last=false&per_page=100&sort=name'
r = requests.get(url).json()
That's it. That is all you need to make a GET request to an API. Most APIs are stored in the form of JSON files, that is why we need the .json()
method. Make sure to read the documentation for the API you're using. It will describe how the data will returned to you.
For this API call, we're going to write it a bit different. The FEC API results are paginated; meaning the results are returned as a per page. We're going to utilize a for
loop to retreive the first 500 results (100 per page, which you can see is a parameter in the url string).
The code above creates a function called api_call that accepts a parameter num
and performs the basic GET request from earlier, substituting the page number in the url string with the parameter that's passed into the api_call function. Below is what is returned when we make the GET request.
{
"api_version": "1.0",
"pagination": {
"per_page": 100,
"count": 44523,
"pages": 2227,
"page": 1
},
"results": [
{
"state": "US",
"candidate_status": "N",
"last_file_date": "2021-05-05",
"federal_funds_flag": false,
"load_date": "2021-07-27T06:29:31+00:00",
"party": "PAF",
"cycles": [
2022
],
We want the results
key, so we will store that in a DataFrame object. This is where Pandas comes into play. Power BI works best with Pandas' DataFrames. Within the api_call function, we store each request in a variable df
and return that variable.
Next, we'll create a for loop to loop through the first 5 pages and append each result to an empty list called frames. After the for loop has ran, we'll create a new variable to store the final DataFrame and use the .concat()
method and pass in the frames list. Using the .concat()
method without an axis
kwarg will append the DataFrames to one another.
We'll run df_final.info()
to see the results:
And...it worked! We can see we have 500 entries contained in this DataFrame.
Our last step is to take that code and paste it into Power BI.
In Power BI, click Get Data, and type Python as the data source. The Python script editor will open and you can paste your code into that editor and click okay.
After the script has run, we'll get the Power Query Navigator to select which data we want to load.
We'll Transform Data to open in the Power Query Editor and make any ETL adjustments needed.
And that is it. If you look at the source step in the Power Query editor, you'll see your Python script is stored. Which means each time you refresh the data, the script will run the API request again and produce the most up to date information.
Practice
For a fun project to go your skills, use the pagination
key in the oringial result and extract the count of total pages in the request. Use that as the top end of your for loop to retreive ALL the data.
Final Thoughts
I hope you've found this at least somewhat helpful. I love working with APIs, it's really want got me into coding in the first place.
Also, I don't write a lot, but I woould love to start doing this more. I'd love feedback on content and writing style, tips and tricks, whatever.
Happy coding!!
Top comments (1)
Thanks a lot for this post. I was hoping this might be adaptable to the Power BI service, but it looks like python in the service doesn't support the requests package? Kind of a bummer.