Arun

Posted on Sep 6, 2021 • Edited on Sep 11, 2021

NIH Grants - Obtaining information on NIH Projects

#news #programming #api #nih

Covered in this:

Background
About This Post
Immediate Trigger For This Post
Obtaining Information On NIH Grants
Obtaining Results Using API
Limitations Using API
Providing Input To The API, Retrieving Output From The API
Using PowerShell To Send Request (input) And Receive and Process Response (output)
Interpreting Results
Coming Up Next...

Background

On March 11th, 2020 the World Health Organization officially declared SARS-CoV-2 as a pandemic. Many administrations have geared their energies towards restrictions and bringing in various mandates. This is not a political thread - we have various avenues to take them up there. Let us keep this post focused on technologies - API, PowerShell, appropriate software related topics.

About This Post

There have been a lot of talk about Gain of Function research and how NIH (a US entity funded by Tax Payers) might have provided grants for various projects - some of which might have gone towards such projects. As the conversations are political in nature, they might fit well on other platforms. I doubt dev.to is a place for that.

The goal of this post is to explain how to obtain information on grants provided by NIH towards various projects. Specifically this post provides details on how to obtain information on specific projects which involve keywords, for example Furin.

Although this post provides insights into how to obtain information from NIH using various means, this post is largely geared towards using NIH APIs. I will cover a specific case that will obtain a list of projects involving keyword Furin using the NIH RePORTER API. Using the idea, you will be able to reuse it to obtain list of projects that feature any other keyword. You will be able to do this by simply replacing the text furin in this to a keyword you wish to query with to obtain projects featuring that keyword.

A general information on how to use NIH APIs for other purposes, will be covered in a separate post. For example, you may want to obtain a list of projects awarded to a specific "science expert" (in NIH language, termed as "Principal Investigator" or PI) using NIH RePORTER API. I will create another post to explain different search conditions that may be useful to get a better picture on how NIH grants were awarded and to who. So please continue to tune in here.

Alternatively, you may check my twitter handle @arunschirps

This twitter thread holds an elaborate list of NIH grants towards key players and organizations whose projects might be of interest to many:

In these changing times, where information is the first victim that finds challenges in its reach, sharing selfless knowledge is a collective effort. Many people contribute towards this effort sincerely - they know they are pitted against media juggernauts and risk social isolation. Information on how NIH grants (could?) have been misused does not appear to be anything that the main stream media is interested in researching on or reporting about. Social media can be used in a constructive way as well - not all that happen in that sphere is misinformation or disinformation. The twitter thread highlighted above is a very small effort from my side. I obtained information regarding NIH grants as and when a questionable research called for further analysis - sort of like "follow the money".

If I can find a way to obtain information on grants on various projects funded by the NIH, anyone can. I have been contemplating on writing a post detailing how I did it. It just happened today.

Immediate Trigger For This Post

The immediate trigger for this post on sharing what I know on how to probe details on NIH funding came this tweet from @Bobby_Network

The quoted tweet from @Bobby_Network expanded the visibility for a tweet from Twitter User @TheSeeker268 who highlighted "references of furin protease cleavage sites in Chinese Coronavirus research papers"

This provoked my thought - I became curious to check information on projects executed (and currently being executed) using NIH grants which involve keyword Furin.

That check (into NIH's own funding) yielded many projects - about 464 projects at the time of this writing. Not sure how many of those 464 could have expressed interests towards Gain of Function research. The tweet below contains a screen capture on one of the projects while mentioning about the count returned by the NIH RePORTER API:

Exploring which of these projects may involve questionable (yes, a very light word indeed) scientific researches requires skills different from software/programming skills. It is best for the experts trained in the appropriate field to weigh into it. However, obtaining information from NIH using the (a) RePORTER Website, (b) the ExPORTER database, and (c) most importantly the RePORTER API (all three covered here) and understanding merits and demerits using some of these ways is probably not a skill that experts in those fields might posses. Their hands are already full in making best use of their expertise elsewhere where they are trained to do.

Hence this post - this provides information obtained from NIH on projects which, for example, feature the keyword Furin. This is an effort to bridge the gap in skillset between experts involved in the two fields. A software developer may not understand details mentioned (very terse on many projects) in these projects documented in NIH - RePORTER Website, ExPORTER database or obtained using RePORTER API. An expert who is knowledgeable in understanding these may not have the know-how on obtaining the information without too much work navigating through huge amount of links and painful searches.

Obtaining Information On NIH Grants

1. Using NIH RePORTER Website:

NIH provides a web based SaaS tool available at RePORTER Website

Users can search for anything - names of Principal Investigators, Organizations, Funding Organizations, free-form text search. While it provides a quick way to check on NIH grants, there are challenges using this. I will cover them in a separate post later.

2. Using NIH RePORTER ExPORTER Database:

NIH also provides a web based downloadable database at RePORTER ExPORTER

I haven't used this myself. I guess it provides the whole grants database downloadable in CSV or XML format. I assume the volume of data will be huge - not sure if it is a feasible option to do a quick round of checking using this method. Also, the data obtained this way, I imagine, would be stale, as this is a snapshot of information that is obtained at the moment of the download action.

3. Using NIH RePORTER API:

NIH also facilitates obtaining data using API - the RePORTER API

The RePORTER API provides a useful method to do a quick query with a set of conditions (criteria). This is pretty ReSTful, the response from it is a string in a special format (JSON format, more about it in this section below).

This Service, supplemented with knowledge of platforms/frameworks like PowerShell helped me in creating the content for the twitter thread mentioned before.

One need not be restricted in choices on frameworks/tools/utilities in his pursuit to get data using the RePORTER API. Users can use any tool that can call a ReSTful API endpoint - PowerShell, curl utility, Postman, et al. More about this below.

For an extensive use - for a web or native app, I recommend using full-fledged development environment that rely on your favorite framework provided by Python, .Net - anything that you are comfortable with.

The idea provided here, hopefully, will help anyone in pursuit of this. I will attempt to write separate post(s) on this topic when I get some time. Also, honestly, depending on the reach (cough)... so, follow me (wink).

Obtaining Results Using API

Now, let us spring into action. I will explain how to call NIH API and obtain results through one of these methods:

Using NIH API web page's Try it out option
curl utility
Invoke-RestMethod cmdlet in PowerShell

This section will mostly detail the commands and steps to obtain the result. Use any that you of the three methods explained below.

The explanation of results will be taken up in the section Interpreting Results below.

1. Using NIH API Web Page's Try it out option:

ALERT
The images are captured at the time of this writing (so, "as is"), but let me know if you see changes in future

navigate to NIH's API documentation here
select a version (V2.0 is the latest)
click anywhere on the section for "Projects" - a blue bar, beginning with POST button towards the left
keep scrolling down, you should see "Try it out" section
click on button titled "Try it out"
you will notice it has changed to "Cancel" - that is ok
you will see, now, you can edit the payload area that exists below the text "Example Value | Schema"
the payload area is where you you will provide the appropriate input (like, get me projects in fiscal year 2020, or projects under Fauci and so on) to the API
also, you will notice, now, there is a new blue "Execute" button
click on the payload area
remove all the text inside that textbox
enter this input into the payload area (below the text "Example Value | Schema")

{
  "criteria": {
     "advanced_text_search": {
        "operator": "and",
        "search_text": "furin",
        "search_field": "abstracttext"
     }
  },
  "offset": 0,
  "limit": 500,
  "sortField": "award_amount",
  "sortOrder": "desc"
}

Copy paste is recommended, if you are not familiar with the format here

click on the button "Execute"
after a few seconds, you will notice that the API has returned results below the "Server Response" area
if you prefer, you can click on "Download" button after a successful execution of the API (if the API execution is successful, you should see a 200 below the "Code" column under Server Response area)
at the time of this writing, the downloaded file is saved with a default extension of ".json" - it is possible that some browsers may end up blocking them

2. curl Utility:

curl \
   --header "accept: application/json" \
   --header "Content-Type: application/json" \
   --request POST \
   --data "{\"criteria\":{\"advanced_text_search\":{\"operator\":\"and\",\"search_text\":\"furin\",\"search_field\":\"abstracttext\"}},\"offset\":0,\"limit\":500,\"sortField\":\"award_amount\",\"sortOrder\":\"desc\"}" \
   --url "https://api.reporter.nih.gov/v2/projects/Search"

Copy paste is recommended, if you are not familiar with the format here.

The output can be saved into a text file by adding this option which saves the results into file nih_projects.json:

 --output nih_projects.json

3. Invoke-RestMethod PowerShell cmdlet:

If you have PowerShell on your machine, skip right ahead to the code below. If you don't, running the code below comes with a pre-requisite for PowerShell. It is now available on all major platforms - latest version, at the time of this writing, is 7.1. The following links from Microsoft might be of help:

${nih_projects_furin} = Invoke-RestMethod `
-Headers ( @{
    "Content-Type" = "application/json"
    ; "Accept" = "application/json"
} )`
-Method Post `
-Body ( @{
    "criteria" = @{ `
      "advanced_text_search" = @{ `
              "operator" = "and"
            ; "search_field" = "abstracttext"
            ; "search_text" = "furin"
        }
    }
;   "offset" = 0
;   "limit" = 500
;   "sort_field" = "award_amount"
;   "sort_order" = "desc"
} | ConvertTo-Json -Depth 3) `
-Uri "https://api.reporter.nih.gov/v2/projects/Search

Copy paste is recommended, if you are not familiar with the format here.

Limitations Using API

Ultimate authority of the NIH RePORTER API is vested with them. Their directive here has this recommendation:

... recommended that users post no more than one URL request per second and limit large jobs to either weekends or weekdays between 9:00 PM and 5:00 AM EST.
Failure to comply with this policy may require administrators to block your IP address from accessing the API service

the count of projects returned by the RePORTER API is restricted to 500 at a time, so if a search condition that you specified should fetch 700 projects, you will need to specify which 500 of those 700 projects you want to see using these inputs along with "criteria": (a) "offset" - indicating the starting point in the resultset (note this is a zero-based #, so to get first one onwards, you got to set it to 0) and (b) "limit" - to specify how many projects you want to retrieve from the API
continuing on this topic, if you want all 700, you will need to fire the API twice - first time with the offset set to 0, so that the results fetched start from the first project and limit set to 500 to get you the first 500 AND THEN a second time with an offset of 500 and limit of 200 (or more - no harm specifying more)
the "offset" cannot be more than 9,999; in such cases, you will need to be a bit creative (can you tell how? wink)

Last three are from my experience, I have NOT found this documented in NIH's API documentation, not sure if I missed it. If indeed it is not documented, this may change in future.

Providing Input To The API, Retrieving Output From The API

The NIH API expects inputs (like what do you want to search for - fiscal year, project with ID, investigators and their projects, etc.) to be fed to its Search endpoint, in a special string format called the JSON format. In return, its response is also a string in JSON format. I'll briefly explain about this format with an e-Mail Service example below. Curious readers can check this section below.

The request format (input to the API) and response format (output that the API returns to you) are documented in the NIH's API documentation here.

As you navigate to that page, you may want to click on "Select a version" dropdown (V2.0 is the latest). Click on the POST button under that. It should open up with various kinds of search parameters (criterion) that you may want to pass. This is called payload - that is, input that you want to feed into the API. As you see there, the payload is in a string in JSON format.

TIP
Images in this section above might be useful in getting a better handle of the explanation here

Likewise, you should be able to go to a "Try it out" as explained in the previous section. The results will be ready a few seconds after you click on the "Execute" button. This - that is the result/response from the API - you will notice, is also a string in JSON format.

ALERT:
The information below, attempting to explain JSON format, uses an over-simplified example for e-Mail Service. Please skip it, if you are knowledgeable about JSON format.

There is nothing fancy about a string in JSON format. Imagine how you enter into inbox at your favorite email service. You are presented with a login page where you enter your inbox id into the user-id field and your password. Behind the scene, the program at the email service server receives the two pieces of information (user-id, password) in a single package bundled into a key-value-pair format:

{
   "user-id": "arun",
   "password: "DontEvenTryIt!"
}

The email service receives, in this case, a package containing user-id and password. The email service knows the user-id should be accessible next to the special text - the key "user-id", and password in the subsequent field - the key "password". It plucks the values for both the keys and uses them in its algorithm to check against its database to validate if the person who provided them is indeed you. So, in this case, the data regarding credentials (values for the user-id and the password) is bundled into one package from your e-Mail client and sent over to the Server.

Communication between a requestor for data (the Client) and processor of the data (the Server) from the requestor is contractualized before hand - otherwise there is no way that the Server will know what the Client is asking for or verify the identity of the Client, if it needs to. One of the important aspects in arriving at a contract is the FORMAT of how the data is passed between them (to and from). This is where passing information as a string in JSON format comes handy. There are other formats available as well. But, it is a choice that the Service provider (server) and the Client make to pre-define which format the communication can happen or is happening (if the two are versatile in accepting more than one format).

In the example for the simple e-Mail Service, we saw the information on "credential" package bundled with "user-id" and "password".

There are other examples for data with different pieces of information bundled into a package.

Person's Name:

{
   "first-name": "John",
   "middle-name": null,
   "last-name": "Doe"
}

US Address:

{
   "street-number": "1600",
   "address-line-1": "Pennsylvania Avenue NW",
   "address-line-2": null,
   "city": "Washington",
   "state": "DC",
   "zip": "20500"
}

Now imagine the complete mailing address.

Mailing Address:

{
   "user-name": {
      "first-name": "John",
      "middle-name": null,
      "last-name": "Doe"
   },
   "mailing-address": {
      "street-number": "1600",
      "address-line-1": "Pennsylvania Avenue NW",
      "address-line-2": null,
      "city": "Washington",
      "state": "DC",
      "zip": "20500"
   }
}

The mailing address example shows how the bundles of information (two bundles in the example above) can hold other bundles ("name" bundle" and the "address" bundle) of information. You can visualize how this can quickly add up. But, as long as the requestor (the Client) and the provider (the Server) agree upon a pre-defined format, parsing any complicated dataset is not a complicated thing.

This completes a primer for JSON format that will tide you through your quest to fire NIH APIs and understand the data you send to it and the data you receive from it.

Using PowerShell To Send Request (input) And Receive And Process Response (output)

ALERT:
The next section uses the term "HashTable" interchangeably with "PSCustomObject". This is to simplify the explanation below.

The explanation regarding JSON format may have been put in a simple way, but in order to use a string in JSON format, we need programming utilities or write one (program). Like I said before, this is where PowerShell comes handy. The PowerShell platform makes it convenient to navigate this resultset. We now know that a string which follows JSON format is just a key-value pair. The PowerShell platform's (.Net platform, essentially) has a data structure/datatype which pretty much stores data in key-value pair format - the datatype is HashTable (note the ALERT above). As a string formatted in JSON format is a key-value pair, you can understand such strings are easily convertible into a variable of type HashTable.

On PowerShell platform, it is seamless to convert such strings in JSON format into objects of type HashTable and vice-versa. The NIH API expects input (also termed "request" to the API) as a string in JSON format - this is achieved using the ConvertTo-Json cmdlet that Powershell provides. In return, the API provides output (also termed "response" from the API) as a string in JSON format - this is converted automatically by the Invoke-RestMethod cmdlet into HashTable type object.

You will be able to achieve the same using other programming platforms. But if you have a Windows OS, PowerShell is inbuilt.

Like I said before, you can use any tool that can call a ReSTful API endpoint. My choice for PowerShell, in this case, is convenience - PowerShell provides useful cmdlets like Measure-Object, Group-Object, Select-Object, Where-Object and so on that makes obtaining and formatting data pretty easy.

For example, you might want to sum all the dollar amounts granted for the projects that are contained in the resultset. PowerShell has a Measure-Object cmdlet which helps you to find:

count
sum
max
min
average

from the list of projects returned by the API

You might want to retrieve projects funded in the Fiscal Year 2020. PowerShell has a Where-Object cmdlet which can help you filter the results from the resultset returned by the API.

You might want to group projects by Organizations and then find sum of dollar amounts granted for each of Organization. PowerShell has a Group-Object cmdlet which can help you group results into various groups based on a specific property like Organization Name.

Interpreting Results

The API's response, as discussed before is a string in JSON format. It will be something like this:

{
   "meta": {
     "search_id": null,
     "total": 464,
     "offset": 0,
     "limit": 500,
     "sort_field": "award_amount",
     "sort_order": "desc",
     "sorted_by_relevance": true,
     "properties": {}
   },
   "results": [
   ]
}

Notice that the response has two nodes at its root - "meta" and "results". The "meta" node does not provide actual data on various projects that were sought for in the input criterion that we specified to the API - instead it contains meta information like total number of projects that match this criterion, filter limit, order of listing projects and so on. The "results" node contains the actual project details - as a LIST. The entire list is bound within square brackets. For simplicity, the above sample holds an empty result - hence you see nothing inside the square bracket enclosure. In real, for a search resulting in a few projects, individual projects appear as comma-separated objects (string in JSON format). Format of this is given below.

Format/sample of a project item under results node is:

{
  "appl_id": 7592287,
  "subproject_id": null,
  "fiscal_year": 2007,
  "project_num": "1Z01AI000929-05",
  "project_serial_num": "AI000929",
  "organization": {
    "org_name": "NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES",
    "city": null,
    "country": null,
    "org_city": null,
    "org_country": "UNITED STATES",
    "org_state": null,
    "org_state_name": null,
    "dept_type": null,
    "fips_country_code": null,
    "org_duns": null,
    "org_fips": null,
    "org_ipf_code": null,
    "org_zipcode": null,
    "external_org_id": 0
  },
  "award_type": "1",
  "activity_code": "Z01",
  "award_amount": 5625818,
  "is_active": false,
  "project_num_split": {
    "appl_type_code": "1",
    "activity_code": "Z01",
    "ic_code": "AI",
    "serial_num": "000929",
    "support_year": "05",
    "full_support_year": "05",
    "suffix_code": ""
  },
  "principal_investigators": [
    {
      "profile_id": 7841119,
      "first_name": "Stephen",
      "middle_name": "H",
      "last_name": "Leppla",
      "is_contact_pi": true,
      "full_name": "Stephen H Leppla",
      "title": null,
      "email": null
    }
  ],
  "contact_pi_name": "LEPPLA, STEPHEN H",
  "program_officers": [],
  "agency_ic_admin": {
    "code": "AI",
    "abbreviation": "NIAID",
    "name": "National Institute of Allergy and Infectious Diseases"
  },
  "agency_ic_fundings": [
    {
      "fy": 2007,
      "code": "AI",
      "name": "National Institute of Allergy and Infectious Diseases",
      "abbreviation": "NIAID",
      "total_cost": 5625818.0
    }
  ],
  "cong_dist": null,
  "spending_categories": null,
  "project_start_date": null,
  "project_end_date": null,
  "organization_type": {
    "name": null,
    "code": null,
    "is_other": false
  },
  "full_foa": null,
  "full_study_section": null,
  "award_notice_date": null,
  "is_new": false,
  "mechanism_code_dc": "IM",
  "core_project_num": "Z01AI000929",
  "terms": "Furin gene; Toxin ; Vaccines ; Virulence ; ...filtered list...;",
  "pref_terms": "Animal Model;Animals;Anthrax Vaccines;Anthrax disease;Antigens;Bacterial Toxins;Bacteriophages;Binding;Biochemistry;Cell Surface Receptors;vaccine evaluation;...filtered list...;",
  "abstract_text": "Anthrax toxin protective antigen protein (PA, 83 kDa) binds to receptors ...filtered text...",
  "project_title": "Vaccines and Therapeutics for Anthrax",
  "phr_text": null,
  "spending_categories_desc": null,
  "agency_code": "NIH",
  "covid_response": null,
  "arra_funded": "N",
  "budget_start": null,
  "budget_end": null,
  "cfda_code": null,
  "funding_mechanism": "Intramural Research",
  "direct_cost_amt": null,
  "indirect_cost_amt": null,
  "project_detail_url": "https://reporter.nih.gov/project-details/7592287"
}

Notice that, earlier, when we fired off the API using the Invoke-RestMethod cmdlet in PowerShell, we actually saved the response into a variable named "nih_projects_furin". To access this variable, PowerShell expects you to enclose it in ${} - this is similar to how a variable is accessed in bash shell (many things in PowerShell ditto what you do in bash)

As we discussed before, the response from API has two nodes - a "meta" node and a "results" node.

In PowerShell, we can explore the "meta" node returned by the API using the "nih_projects_furin" variable's "meta" property. This can be done using the code below:

${nih_projects_furin}.meta

You will observe an output similar to this:

If you have run the API using the other two methods described above (using Try it out option at NIH API web page OR using curl utility), you will notice that the result corresponding to the "meta" node in the response is this:

  "meta": {
    "search_id": null,
    "total": 464,
    "offset": 0,
    "limit": 500,
    "sort_field": null,
    "sort_order": "ASC",
    "sorted_by_relevance": true,
    "properties": {}
  }

Likewise, you can see the "results" node as well - it will be huge here, hence avoiding posting that here. As you know now, the "results" node in the response holds a list of projects. To access. let us say the first project in it, using PowerShell, use this self-explanatory code:

${nih_projects_furin}.results[0]

Notice that the items in the list are indexed starting with 0.

Also, the output on this will closely match the response string describing the project node, referenced here.

The "results" node (which is a list, as explained many times) also holds information of the total # of elements in the list. You can obtain that using:

${nih_projects_furin}.results.Count

In this case, since the total project count (464) is less than 500 (and we asked for up to 500 projects starting from the first one), this number should match what you see in under the "meta" node (which was pasted before)

Coming up Next

Processing results from API using PowerShell
Use of Measure-Object, Group-Object, Where-Object PowerShell cmdlets
Displaying information in a tabular format

DEV Community