In today’s world, the exchange of data is a key element that supports industries which rely on the Internet and thrive on acquiring data. However, in the healthcare industry, the exchange of health data is underinvested, affecting the quality of patient care and other health care services. For example, a patient may not be able to easily access their health records or hospitals may not be able to efficiently transfer health records. Recently, application program interface (API) has been touted as a tool that can streamline this exchange of data, further improving patient care.
APIs are essentially code that allow computer systems to interact with one another. They are supported by the concept of interoperability — the exchange of information between computer systems. In healthcare, we see interoperability in the form of electronic health records (EHR). While such records can be effective in storing patient data, hospitals frequently use multiple health records per patient. This makes retrieving and sharing patient data complicated. With APIs, multiple EHRs can be reduced to one profile or system, which is a more convenient method to transfer patient data.
Considering that APIs are frequently used in tech industries, APIs have many potential applications to healthcare. Clinical studies can address underrepresented groups with a greater variety of data. Greater accessibility to health data can affect understanding of diseases and treatments in various populations. In addition, expanding the use of APIs in the healthcare system can also make data more transportable through the internet. To illustrate how exactly APIs can be used, we will go through an example that involves accessing data using an API and R, a statistical and data analysis programming language. For this activity, we will be using RStudio.
Suppose that you are conducting a clinical research study on COVID-19 cases in the United States. While you have data on the number of confirmed cases from your local hospitals, you would like to compare those to the nationwide number of cases, preferably by state. Here, we will use an open API, which is publicly available on the Internet. This open API has been retrieved from a COVID-19 API website: https://api.covid19api.com/live/country/usa/status/confirmed.
> library(json) > covid_data <- fromJSON("https://api.covid19api.com/live/country/usa/status/confirmed") > print(covid_data)
We then want to be able to view the data on R. The fromJSON() function parses JSON data in R. To avoid having to repeat this command over and over again, we will create a variable and assign it to the fromJSON() command using an assignment operator (symbolized by “<-”). This also makes it easier to refer back to the data when we work with it later. Here, we created the variable “covid_data”:
If you type print(covid_data), the following statistics will show up on R. Here, we can see the number of confirmed cases by state (note that this is not the entire dataset- just a view of how the data would look like):
Country CountryCode Province City CityCode Lat Lon Confirmed 1 United States of America US Michigan 43.33 -84.54 24244 2 United States of America US Colorado 39.06 -105.31 7307 3 United States of America US Washington 47.4 -121.49 10530 4 United States of America US Mississippi 32.74 -89.68 2781 5 United States of America US Tennessee 35.75 -86.69 5508 6 United States of America US South Dakota 44.3 -99.44 730 7 United States of America US Pennsylvania 40.59 -77.21 22997 8 United States of America US Puerto Rico 18.22 -66.59 897 9 United States of America US Grand Princess 37.65 -122.67 103
What if we only wanted to look at a smaller dataset — that is, what if we wanted to randomly sample only 60% of the data we have? We first start by using the set.seed() function, which helps in generating a random sequence of numbers (and therefore will help in generating a random 60% subset of our data). We define the “population size” (the original data) with nrow(covid_data), which returns the number of columns or rows. Then we define the “sample size” (our 60% subset) using round(). To identify the sample drawn from our data, we set sample_id equal to sample(pop_size, sample_size).
The command sample(pop_size, sample_size) tells us that from the "population" data, we will take a sample that is 60% of the dataset (notice what we had set sample_size equal to). Then the resulting sample can be viewed through print(covid_data_sample):
> set.seed(1342) > pop_size = nrow(covid_data) > sample_size = round(pop_size*0.6) > sample_id = sample(pop_size,sample_size) > covid_data_sample = covid_data[sample_id, ] > print(covid_data_sample)
Oftentimes, datasets can be huge. Therefore, it is helpful to take smaller random samples and perform statistical analysis. Not only are APIs helpful in interpreting such information, but they can be used to efficiently transfer and interpret data. APIs can help create better information systems in healthcare, a sector where data is extremely valuable.
Besides multiple applications to patient care, APIs can have many potential roles in health research. Clinical studies, which involve testing human participants to learn more about disease and other health-related topics, can also get access to a variety of data through healthcare APIs. Rather than directly downloading data from a database (which often results in bad formatting), APIs allow for the data to be more readable and organized. In addition, expanding the use of APIs in the healthcare system can also make data more transportable through the internet. Through our mini tutorial, we got to see how data can be easily looked it through an API. Such an approach is not only useful in areas of clinical research — it will also be an important tool that could help make health data more accessible in the future.