Scraping Tender and Government Procurement Data

#python #tutorial #webdev #programming

Government procurement is a $13 trillion global market. Companies that find relevant tenders first gain a massive competitive advantage. Here's how to build a procurement data scraper.

Why Scrape Procurement Data?

Government tender portals are fragmented -- each country, state, and municipality runs its own platform. No single API covers everything. Scraping consolidates opportunities into one pipeline.

Building the Scraper

pip install requests beautifulsoup4 pandas schedule

import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime

class TenderScraper:
    def __init__(self, proxy_api_key):
        self.api_key = proxy_api_key
        self.tenders = []

    def scrape_portal(self, url, selectors):
        proxy_url = f"http://api.scraperapi.com?api_key={self.api_key}&url={url}"
        resp = requests.get(proxy_url, timeout=30)
        soup = BeautifulSoup(resp.text, "html.parser")

        results = []
        for item in soup.select(selectors["container"]):
            title_el = item.select_one(selectors["title"])
            deadline_el = item.select_one(selectors["deadline"])
            value_el = item.select_one(selectors.get("value", ""))
            category_el = item.select_one(selectors.get("category", ""))

            if title_el:
                results.append({
                    "title": title_el.text.strip(),
                    "deadline": deadline_el.text.strip() if deadline_el else "N/A",
                    "value": value_el.text.strip() if value_el else "N/A",
                    "category": category_el.text.strip() if category_el else "N/A",
                    "source_url": url,
                    "scraped_at": str(datetime.now())
                })
        return results

    def scrape_sam_gov(self):
        url = "https://sam.gov/search/?index=opp&sort=-modifiedDate&page=1"
        selectors = {
            "container": ".opportunity-result",
            "title": ".opportunity-title",
            "deadline": ".response-date",
            "value": ".award-amount"
        }
        return self.scrape_portal(url, selectors)

    def filter_by_keywords(self, tenders, keywords):
        return [t for t in tenders if any(
            kw.lower() in t["title"].lower() for kw in keywords
        )]

# Usage
scraper = TenderScraper("YOUR_SCRAPERAPI_KEY")
tenders = scraper.scrape_sam_gov()
it_tenders = scraper.filter_by_keywords(tenders, ["software", "IT", "cloud", "data"])
df = pd.DataFrame(it_tenders)
print(f"Found {len(it_tenders)} relevant IT tenders")

Automating Daily Checks

import schedule
import json

def daily_tender_scan():
    scraper = TenderScraper("YOUR_KEY")
    keywords = ["software development", "cloud services", "data analytics"]
    tenders = scraper.scrape_sam_gov()
    relevant = scraper.filter_by_keywords(tenders, keywords)

    with open(f"tenders_{datetime.now().strftime('%Y%m%d')}.json", "w") as f:
        json.dump(relevant, f, indent=2)

    if relevant:
        notify_team(relevant)

schedule.every().day.at("07:00").do(daily_tender_scan)

Scaling Infrastructure

Government portals often use heavy JavaScript. ScraperAPI handles JS rendering automatically. For portals behind geographic restrictions, ThorData provides country-specific residential proxies. Monitor pipeline health with ScrapeOps.

Legal Notes

Government procurement data is public by design -- transparency in public spending is a legal requirement in most jurisdictions. Always respect rate limits and robots.txt, but the data itself is meant to be accessible.

Conclusion

A consolidated tender scraping pipeline turns a fragmented market into a structured competitive advantage. Start with the portals most relevant to your industry, automate daily checks, and expand coverage over time.

DEV Community