Sam Gale

Posted on May 22

Building a Daily Google News API Monitor in Python

#automation #googlenewsapi #newsapi #python

I wanted a small, local tool that would search the news for brand mentions. I didnt want to pay over $100 a month so I decided to build my own.

What I created was a tool that would search the news with a Google News API every morning for a list of keywords, run each result through a LLM for sentiment and a one-sentence summary, save everything to SQLite, and ping me the results on Slack/a web app.

The whole project came out to about 1,000 lines of Python across ten files. It is a Flask app with a SQLite database and a single HTML dashboard.

Here's how it's wired together, with the code that matters from each layer.

Repo: google-news-monitor on GitHub. Install instructions at the bottom.

The pipeline

The whole tool is one pipeline:

keyword → Google News API → OpenAI enrichment → SQLite → (dashboard | REST | CLI | Slack)

Every interface (the dashboard form, the REST API, the CLI, the daily cron) ends up calling the same process_keyword() function. Here is the entire core loop, from monitor/pipeline.py:

def process_keyword(keyword, num=30, when="1d", gl=None):
    keyword = keyword.strip()
    fetched = search.fetch_google_news(keyword, num=num, when=when, gl=gl)

    new_count = 0
    for art in fetched:
        if not art.get("url"):
            continue
        ai_result = ai.enrich_article(keyword, art.get("title") or "",
                                      art.get("snippet") or "")
        row = {
            "keyword": keyword,
            "title": art["title"],
            "url": art["url"],
            "source": art.get("source"),
            "snippet": art.get("snippet"),
            "published_at": art.get("published_at"),
            "sentiment": ai_result["sentiment"],
            "ai_summary": ai_result["summary"],
        }
        article_id = db.save_article(row)
        if article_id is not None:
            new_count += 1
            alerts.check_article(keyword, row, article_id)

    return {"keyword": keyword, "fetched": len(fetched), "new": new_count}

Fetch, enrich, save, alert. That is the whole tool, minus the interfaces wrapped around it.

Fetching from the Google News API

I'm using SearchApi.io as the entry point to the Google News API.
One issues i ran into was, google news matching is loose. search "niche company 1" and half the results are for the similar niche company 2, nothing to do with you. So there's a per-article flag for whether your keyword actually appears in the article body, with a toggle to hide everything else.

From monitor/search.py:

SEARCHAPI_URL = "https://www.searchapi.io/api/v1/search"

def fetch_google_news(keyword, num=30, when=None, gl=None):
    params = {
        "engine": "google_news",
        "q": keyword.strip(),
        "nfpr": 1,                 # turn off "did you mean..."
        "num": num,
        "api_key": os.environ["SEARCHAPI_KEY"],
    }
    if when:
        params["when"] = when      # 1h, 1d, 7d, 1m, 1y
    if gl:
        params["gl"] = gl.lower()  # 2-letter country code

    resp = requests.get(SEARCHAPI_URL, params=params, timeout=30)
    resp.raise_for_status()
    data = resp.json()

    articles = []
    for item in data.get("organic_results", []) or []:
        articles.append({
            "title": item.get("title"),
            "url": item.get("link"),
            "source": item.get("source", {}).get("name"),
            "snippet": item.get("snippet"),
            "published_at": parse_date(item.get("date")),
        })
    return articles

One thing the Google News API will trip you up on: the date field arrives as free-form strings like "1 week ago", "May 30, 2023", or "Yesterday". Not ISO timestamps. If you store those verbatim, your SQL filters will silently break and your charts will sort "May 30" alphabetically next to "2026-05-14". I wrote a small parser (monitor/dates.py) that normalizes everything to YYYY-MM-DD on the way in.

OpenAI enrichment with a JSON guardrail

For every article I want two things: a sentiment label (positive, negative, neutral) and a one-sentence summary. The trick is to force OpenAI to return parseable JSON so the database ingestion never sees free-form text.

From monitor/ai.py:

ARTICLE_SYSTEM_PROMPT = (
    "You are a media-monitoring analyst. For each article you receive, "
    "classify the sentiment toward the tracked brand/keyword and write a "
    "one-sentence summary. You MUST return a single JSON object - no prose, "
    "no markdown, no code fences. "
    'Schema: {"sentiment": "positive"|"negative"|"neutral", '
    '"summary": "<one sentence>"}. Never include any other keys."
)

def enrich_article(keyword, title, snippet):
    resp = openai_client().chat.completions.create(
        model=os.environ.get("OPENAI_MODEL", "gpt-4o-mini"),
        response_format={"type": "json_object"},   # enforce JSON
        messages=[
            {"role": "system", "content": ARTICLE_SYSTEM_PROMPT},
            {"role": "user", "content":
                f"Tracked keyword: {keyword}\n"
                f"Article title: {title}\n"
                f"Article snippet: {snippet or '(no snippet)'}"},
        ],
        temperature=0.2,
    )
    data = json.loads(resp.choices[0].message.content or "{}")
    sentiment = (data.get("sentiment") or "neutral").lower()
    if sentiment not in {"positive", "negative", "neutral"}:
        sentiment = "neutral"
    return {"sentiment": sentiment, "summary": (data.get("summary") or "").strip()}

response_format={"type": "json_object"} forces the model to emit valid JSON. The system prompt also redundantly says "no prose, no markdown, no code fences" because models sometimes ignore the format flag anyway. Belt and braces.

There's also a summarize_period(keyword, articles) function that takes the full article list for a time window and writes a paragraph-long narrative summary. Same JSON-only pattern. That's what powers the "AI period summary" block at the top of the dashboard report.

SQLite with self-healing schema

I wanted zero setup steps. No flask db upgrade, no SQL files to apply by hand. So the database creates itself on first connection:

SCHEMA = """
CREATE TABLE IF NOT EXISTS articles (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    keyword TEXT NOT NULL,
    title TEXT NOT NULL,
    url TEXT NOT NULL,
    source TEXT,
    snippet TEXT,
    published_at TEXT,
    sentiment TEXT,
    ai_summary TEXT,
    fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(keyword, url)
);
-- ...more tables for keywords, alerts, period summaries
"""

@contextmanager
def connect(path=DB_PATH):
    first_run = not os.path.exists(path)
    conn = sqlite3.connect(path)
    conn.row_factory = sqlite3.Row
    if first_run:
        conn.executescript(SCHEMA)
        conn.commit()
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()

UNIQUE(keyword, url) does the deduplication. If you re-search the same keyword, articles you already have don't get re-saved and don't get re-billed for OpenAI calls.

Three interfaces, one core

The dashboard, the REST API, and the CLI all sit on top of the same pipeline.process_keyword(). Each one is small.

The REST blueprint lives in monitor/api.py:

@bp.post("/api/search")
def run_search():
    data = request.get_json(silent=True) or {}
    keyword = (data.get("keyword") or "").strip()
    if not keyword:
        return jsonify({"status": "error", "message": "keyword required"}), 400
    result = pipeline.process_keyword(keyword,
                                       num=int(data.get("num") or 30),
                                       when=data.get("when") or None,
                                       gl=data.get("gl") or None)
    return jsonify({"status": "ok", "result": result})

The CLI is in cli.py:

@cli.command()
@click.argument("keyword")
@click.option("--when", default="1d")
@click.option("--num", default=50)
@click.option("--gl", default=None)
def search(keyword, when, num, gl):
    result = pipeline.process_keyword(keyword, num=num, when=when, gl=gl)
    click.echo(json.dumps(result, indent=2))

The dashboard is one HTML file (templates/dashboard.html) using Tailwind via CDN and Chart.js for the volume chart. There is no build step. Forms POST to the same endpoints.

The GET endpoints auto-fetch when the DB is empty for that keyword. A single URL is enough to spin up a fresh monitor for a new term, which makes it easy to plug into your own scripts or hand to an AI agent that needs to know what the press is saying about a brand.

REST API

All endpoints return JSON. The server runs on 127.0.0.1:5000 by default.

Method	Path	Purpose
GET	`/healthz`	Liveness check
GET	`/api/keywords`	List tracked keywords
POST	`/api/keywords`	Add a keyword
DELETE	`/api/keywords/<keyword>`	Stop tracking
POST	`/api/search`	Run the pipeline once (fetch + enrich + save)
POST	`/api/cron/run`	Run the daily job immediately
GET	`/api/report/<keyword>`	Full report, every article in the period
GET	`/api/matches/<keyword>`	Same payload, filtered to keyword matches only
GET	`/api/analytics/<keyword>`	Sentiment totals plus bucketed volume for the chart
GET	`/api/alerts`	Recent breaking-news alerts
GET	`/api/settings`	Current settings (keys are masked)
POST	`/api/settings`	Update keys, model, Slack webhook
POST	`/api/settings/test-slack`	Send a test Slack message

The report and matches endpoints auto-fetch on first use. If there is nothing in the database for that keyword yet, the pipeline runs first and the report comes back populated. Subsequent calls are instant. Pass ?fetch=true to force a refresh.

Query parameters for /api/report and /api/matches:

period: daily, weekly, monthly, or all (default: all for matches, weekly for report)
fetch: true to force a fresh fetch even when data already exists
num: max results from Google News when auto-fetching (default 50)
when: 1h, 1d, 7d, 1m, 1y (default: any time)
gl: 2-letter country code, e.g. us, gb, de

Example. A single URL is enough to spin up a fresh monitor for a brand new keyword:

GET http://127.0.0.1:5000/api/matches/n8n?period=all

Daily cron, in-process

APScheduler runs the daily job inside the same Flask process, so there is no system cron to configure and no separate worker to deploy.

def start_scheduler():
    hour = int(os.environ.get("DAILY_CRON_HOUR", "8"))
    minute = int(os.environ.get("DAILY_CRON_MINUTE", "0"))
    sched = BackgroundScheduler(daemon=True)
    sched.add_job(pipeline.run_all_monitored, trigger="cron",
                  hour=hour, minute=minute, id="daily_monitor",
                  replace_existing=True)
    sched.start()

run_all_monitored() walks the list of keywords flagged with monitored=1 in the database and runs the full pipeline for each.

Alerts

The alerts module checks every freshly-saved article against a list of risk phrases (lawsuit, breach, outage, scandal, …), checks if OpenAI returned negative sentiment, and posts a formatted message to Slack if either trips. It also tracks a 14-day rolling baseline and fires a separate alert if today's article volume is 3x that baseline.

RISK_PHRASES = ["lawsuit", "sued", "investigation", "breach", "hack",
                "outage", "scandal", "fired", "resigns", "bankruptcy", "recall"]

def check_article(keyword, article, article_id):
    haystack = (article.get("title", "") + " " + article.get("snippet", "")).lower()
    matched = [p for p in RISK_PHRASES if p in haystack]
    reasons = []
    if matched:
        reasons.append(f"risk phrase: {', '.join(matched)}")
    if article.get("sentiment", "").lower() == "negative":
        reasons.append("negative sentiment")
    if reasons:
        db.save_alert(keyword, article_id, "; ".join(reasons))
        send_slack(format_slack(keyword, reasons, article))

Install it

git clone https://github.com/SamJale/Google-News-Monitor-API.git
cd google-news-monitor
pip install -r requirements.txt
python app.py

A browser tab opens at http://127.0.0.1:5000/. Add your SearchApi.io and OpenAI keys through the Settings button in the UI (or edit .env directly).

If you want to drive it from the terminal:

python cli.py add "anthropic"                       # start tracking
python cli.py search "anthropic" --when 7d --num 50 # one-shot
python cli.py report "anthropic" --period weekly    # see the saved data
python cli.py cron                                  # run the daily job now

Things I would change if I were building it again

The OpenAI enrichment runs serially. For a keyword that returns 50 articles, that's 50 sequential API calls. Easy win: parallelize with asyncio or a thread pool.
The data.db file lives in the project root. Probably should default to ~/.google-news-monitor/data.db for cleaner installs.
No retry logic on transient API errors. SearchApi.io and OpenAI both occasionally 500. Add exponential backoff.

If you build any of these, send a PR.

If you want to see what the running app looks like, screenshots are on the GitHub README. It is MIT licensed, runs entirely on your machine, and has no telemetry or sign-up.

Disclosure: I work at SearchApi.io, which is the Google News data source this tool uses. Worth saying upfront before anyone digs.

DEV Community