Bit Project

Public Data API as a Healthcare Tool

chocohazel926 profile image Vibha Shastry ・4 min read

APIs in Healthcare

In today’s world, the exchange of data is a key element that supports industries which rely on the Internet and thrive on acquiring data. However, in the healthcare industry, the exchange of health data is underinvested, affecting the quality of patient care and other health care services. For example, a patient may not be able to easily access their health records or hospitals may not be able to efficiently transfer health records. Recently, application program interface (API) has been touted as a tool that can streamline this exchange of data, further improving patient care.

APIs are essentially code that allow computer systems to interact with one another. They are supported by the concept of interoperability — the exchange of information between computer systems. In healthcare, we see interoperability in the form of electronic health records (EHR). While such records can be effective in storing patient data, hospitals frequently use multiple health records per patient. This makes retrieving and sharing patient data complicated. With APIs, multiple EHRs can be reduced to one profile or system, which is a more convenient method to transfer patient data.

Considering that APIs are frequently used in tech industries, APIs have many potential applications to healthcare. Clinical studies can address underrepresented groups with a greater variety of data. Greater accessibility to health data can affect understanding of diseases and treatments in various populations. In addition, expanding the use of APIs in the healthcare system can also make data more transportable through the internet. To illustrate how exactly APIs can be used, we will go through an example that involves accessing data using an API and R, a statistical and data analysis programming language. For this activity, we will be using RStudio.

API Mini Tutorial

Suppose that you are conducting a clinical research study on COVID-19 cases in the United States. While you have data on the number of confirmed cases from your local hospitals, you would like to compare those to the nationwide number of cases, preferably by state. Here, we will use an open API, which is publicly available on the Internet. This open API has been retrieved from a COVID-19 API website: https://api.covid19api.com/live/country/usa/status/confirmed.

In RStudio, we will first need to load a package using the library() function. We will download a package called jsonlite, which converts JSON (Javascript Object Notation) data and makes them more readable and organized. JSON is a syntax used in R that helps structure data.

> library(json) 
> covid_data <- fromJSON("https://api.covid19api.com/live/country/usa/status/confirmed")
> print(covid_data)

We then want to be able to view the data on R. The fromJSON() function parses JSON data in R. To avoid having to repeat this command over and over again, we will create a variable and assign it to the fromJSON() command using an assignment operator (symbolized by “<-”). This also makes it easier to refer back to the data when we work with it later. Here, we created the variable “covid_data”:

If you type print(covid_data), the following statistics will show up on R. Here, we can see the number of confirmed cases by state (note that this is not the entire dataset- just a view of how the data would look like):

                   Country CountryCode                 Province City CityCode   Lat     Lon  Confirmed 
1  United States of America          US                 Michigan               43.33  -84.54     24244   
2  United States of America          US                 Colorado               39.06 -105.31      7307   
3  United States of America          US               Washington                47.4 -121.49     10530    
4  United States of America          US              Mississippi               32.74  -89.68      2781    
5  United States of America          US                Tennessee               35.75  -86.69      5508    
6  United States of America          US             South Dakota                44.3  -99.44       730      
7  United States of America          US             Pennsylvania               40.59  -77.21     22997   
8  United States of America          US              Puerto Rico               18.22  -66.59       897     
9  United States of America          US           Grand Princess               37.65 -122.67       103      

What if we only wanted to look at a smaller dataset — that is, what if we wanted to randomly sample only 60% of the data we have? We first start by using the set.seed() function, which helps in generating a random sequence of numbers (and therefore will help in generating a random 60% subset of our data). We define the “population size” (the original data) with nrow(covid_data), which returns the number of columns or rows. Then we define the “sample size” (our 60% subset) using round(). To identify the sample drawn from our data, we set sample_id equal to sample(pop_size, sample_size).

The command sample(pop_size, sample_size) tells us that from the "population" data, we will take a sample that is 60% of the dataset (notice what we had set sample_size equal to). Then the resulting sample can be viewed through print(covid_data_sample):

> set.seed(1342)
> pop_size = nrow(covid_data)
> sample_size = round(pop_size*0.6)
> sample_id = sample(pop_size,sample_size)
> covid_data_sample = covid_data[sample_id, ]
> print(covid_data_sample)

Oftentimes, datasets can be huge. Therefore, it is helpful to take smaller random samples and perform statistical analysis. Not only are APIs helpful in interpreting such information, but they can be used to efficiently transfer and interpret data. APIs can help create better information systems in healthcare, a sector where data is extremely valuable.

The Future of Digitizing Health Information

Besides multiple applications to patient care, APIs can have many potential roles in health research. Clinical studies, which involve testing human participants to learn more about disease and other health-related topics, can also get access to a variety of data through healthcare APIs. Rather than directly downloading data from a database (which often results in bad formatting), APIs allow for the data to be more readable and organized. In addition, expanding the use of APIs in the healthcare system can also make data more transportable through the internet. Through our mini tutorial, we got to see how data can be easily looked it through an API. Such an approach is not only useful in areas of clinical research — it will also be an important tool that could help make health data more accessible in the future.


Editor guide