Government procurement is a $13 trillion global market. Companies that find relevant tenders first gain a massive competitive advantage. Here's how to build a procurement data scraper.
Why Scrape Procurement Data?
Government tender portals are fragmented -- each country, state, and municipality runs its own platform. No single API covers everything. Scraping consolidates opportunities into one pipeline.
Building the Scraper
pip install requests beautifulsoup4 pandas schedule
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
class TenderScraper:
def __init__(self, proxy_api_key):
self.api_key = proxy_api_key
self.tenders = []
def scrape_portal(self, url, selectors):
proxy_url = f"http://api.scraperapi.com?api_key={self.api_key}&url={url}"
resp = requests.get(proxy_url, timeout=30)
soup = BeautifulSoup(resp.text, "html.parser")
results = []
for item in soup.select(selectors["container"]):
title_el = item.select_one(selectors["title"])
deadline_el = item.select_one(selectors["deadline"])
value_el = item.select_one(selectors.get("value", ""))
category_el = item.select_one(selectors.get("category", ""))
if title_el:
results.append({
"title": title_el.text.strip(),
"deadline": deadline_el.text.strip() if deadline_el else "N/A",
"value": value_el.text.strip() if value_el else "N/A",
"category": category_el.text.strip() if category_el else "N/A",
"source_url": url,
"scraped_at": str(datetime.now())
})
return results
def scrape_sam_gov(self):
url = "https://sam.gov/search/?index=opp&sort=-modifiedDate&page=1"
selectors = {
"container": ".opportunity-result",
"title": ".opportunity-title",
"deadline": ".response-date",
"value": ".award-amount"
}
return self.scrape_portal(url, selectors)
def filter_by_keywords(self, tenders, keywords):
return [t for t in tenders if any(
kw.lower() in t["title"].lower() for kw in keywords
)]
# Usage
scraper = TenderScraper("YOUR_SCRAPERAPI_KEY")
tenders = scraper.scrape_sam_gov()
it_tenders = scraper.filter_by_keywords(tenders, ["software", "IT", "cloud", "data"])
df = pd.DataFrame(it_tenders)
print(f"Found {len(it_tenders)} relevant IT tenders")
Automating Daily Checks
import schedule
import json
def daily_tender_scan():
scraper = TenderScraper("YOUR_KEY")
keywords = ["software development", "cloud services", "data analytics"]
tenders = scraper.scrape_sam_gov()
relevant = scraper.filter_by_keywords(tenders, keywords)
with open(f"tenders_{datetime.now().strftime('%Y%m%d')}.json", "w") as f:
json.dump(relevant, f, indent=2)
if relevant:
notify_team(relevant)
schedule.every().day.at("07:00").do(daily_tender_scan)
Scaling Infrastructure
Government portals often use heavy JavaScript. ScraperAPI handles JS rendering automatically. For portals behind geographic restrictions, ThorData provides country-specific residential proxies. Monitor pipeline health with ScrapeOps.
Legal Notes
Government procurement data is public by design -- transparency in public spending is a legal requirement in most jurisdictions. Always respect rate limits and robots.txt, but the data itself is meant to be accessible.
Conclusion
A consolidated tender scraping pipeline turns a fragmented market into a structured competitive advantage. Start with the portals most relevant to your industry, automate daily checks, and expand coverage over time.
Top comments (0)