Scraping Subsidy and Government Grant Databases
Government subsidies and grants represent billions in funding, but finding relevant opportunities means navigating dozens of fragmented databases. Let's build a scraper that aggregates grant data into a searchable pipeline.
Key Data Sources
- grants.gov — US federal grants (API available)
- USAspending.gov — Federal spending data (API)
- EU Open Data Portal — European funding
- State-level portals — Vary by state
Setting Up
pip install requests beautifulsoup4 pandas schedule
Grants.gov API
import requests
GRANTS_URL = "https://apply07.grants.gov/grantsws/rest/opportunities/search/"
def search_grants(keyword, page=1):
payload = {
"keyword": keyword,
"oppStatuses": "forecasted|posted",
"rows": 25,
"startRecord": (page - 1) * 25
}
resp = requests.post(GRANTS_URL, json=payload,
headers={"Content-Type": "application/json"})
data = resp.json()
opportunities = []
for opp in data.get("oppHits", []):
opportunities.append({
"title": opp.get("title", ""),
"agency": opp.get("agencyCode", ""),
"opp_number": opp.get("number", ""),
"close_date": opp.get("closeDate", ""),
"award_ceiling": opp.get("awardCeiling", 0),
"award_floor": opp.get("awardFloor", 0),
})
return opportunities, data.get("hitCount", 0)
grants, total = search_grants("artificial intelligence")
print(f"Found {total} AI-related grants")
for g in grants[:5]:
print(f" {g['title'][:60]} - ${g['award_ceiling']:,.0f}")
USAspending.gov API
USA_SPENDING = "https://api.usaspending.gov/api/v2"
def search_spending(keyword, limit=50):
payload = {
"filters": {
"keywords": [keyword],
"time_period": [{"start_date": "2025-01-01", "end_date": "2026-12-31"}]
},
"limit": limit, "page": 1
}
resp = requests.post(f"{USA_SPENDING}/search/spending_by_award/", json=payload)
return resp.json().get("results", [])
awards = search_spending("machine learning")
for a in awards[:5]:
print(f" {a.get('Recipient Name', 'N/A')}: ${a.get('Award Amount', 0):,.0f}")
Scraping State-Level Portals
Many state grant portals lack APIs. Use ScraperAPI for JavaScript-heavy sites:
from bs4 import BeautifulSoup
def scrape_state_grants(state_url):
params = {
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": state_url,
"render": "true",
"wait_for_selector": ".grant-listing"
}
resp = requests.get("https://api.scraperapi.com", params=params)
soup = BeautifulSoup(resp.text, "html.parser")
grants = []
for item in soup.select(".grant-listing"):
title = item.select_one(".title")
amount = item.select_one(".amount")
deadline = item.select_one(".deadline")
if title:
grants.append({
"title": title.get_text(strip=True),
"amount": amount.get_text(strip=True) if amount else "N/A",
"deadline": deadline.get_text(strip=True) if deadline else "N/A"
})
return grants
Building an Alert System
import pandas as pd
import schedule
def check_new_grants():
keywords = ["AI", "machine learning", "data science", "cybersecurity"]
all_grants = []
for kw in keywords:
grants, _ = search_grants(kw)
all_grants.extend(grants)
df = pd.DataFrame(all_grants).drop_duplicates(subset="opp_number")
df.to_csv("grants_latest.csv", index=False)
print(f"Updated: {len(df)} unique grants found")
schedule.every().day.at("08:00").do(check_new_grants)
Scale with ThorData proxies and monitor with ScrapeOps.
Key Takeaways
- Federal APIs (grants.gov, USAspending) provide structured grant data
- State portals often need JavaScript rendering for scraping
- Automated monitoring catches new opportunities early
- Combining federal and state data creates comprehensive views
Government data is public by law. Respect rate limits and use data responsibly.
Top comments (0)