<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sundeep Anand</title>
    <description>The latest articles on DEV Community by Sundeep Anand (@sdp5).</description>
    <link>https://dev.to/sdp5</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3625158%2Fc512d2c4-57b3-4036-9c1c-ea6121aca667.jpg</url>
      <title>DEV Community: Sundeep Anand</title>
      <link>https://dev.to/sdp5</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sdp5"/>
    <language>en</language>
    <item>
      <title>Building GreenGovRAG</title>
      <dc:creator>Sundeep Anand</dc:creator>
      <pubDate>Sun, 23 Nov 2025 05:02:16 +0000</pubDate>
      <link>https://dev.to/sdp5/building-greengovrag-31ba</link>
      <guid>https://dev.to/sdp5/building-greengovrag-31ba</guid>
      <description>&lt;h2&gt;
  
  
  An Open Source AI Assistant for Australian Environmental Compliance
&lt;/h2&gt;

&lt;p&gt;If you've ever tried to navigate environmental regulations in Australia, you'll know the pain: federal EPBC Act, state EPAs, local council planning schemes, emissions reporting frameworks - all scattered across PDFs, government portals, and legislation websites.&lt;/p&gt;

&lt;p&gt;We're building &lt;strong&gt;&lt;a href="https://greengovrag.sundeep.id.au/" rel="noopener noreferrer"&gt;GreenGovRAG&lt;/a&gt;&lt;/strong&gt; to change that. (&lt;a href="https://github.com/sdp5/green-gov-rag" rel="noopener noreferrer"&gt;source code&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Fragmented Regulatory Knowledge
&lt;/h2&gt;

&lt;p&gt;Environmental consultants, planners, and ESG analysts spend &lt;strong&gt;days or weeks&lt;/strong&gt; searching for answers to questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Does this wind farm project need federal EPBC Act approval?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What's the biodiversity offset policy for councils in South Australia?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Can I clear native vegetation near Murray Bridge, SA?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;How do I report Scope 3 emissions in Victoria?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The information exists - it's public, it's authoritative - but it's &lt;strong&gt;impossibly fragmented&lt;/strong&gt;. Government portals are siloed by jurisdiction. LexisNexis is expensive ($10k-100k/year). ChatGPT hallucinates and lacks precise citations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There had to be a better way.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: RAG + Geospatial Intelligence
&lt;/h2&gt;

&lt;p&gt;GreenGovRAG is a &lt;strong&gt;Retrieval-Augmented Generation (RAG) system&lt;/strong&gt; purpose-built for Australian environmental and planning regulations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Natural Language Queries&lt;/strong&gt;&lt;br&gt;
Ask questions like a human, get answers with verifiable citations to official sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Location-Aware Filtering&lt;/strong&gt;&lt;br&gt;
Filter by state, LGA (Local Government Area), or region using geospatial intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Search&lt;/strong&gt;&lt;br&gt;
Combines BM25 keyword matching with vector similarity for precise retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-Jurisdictional Coverage&lt;/strong&gt;&lt;br&gt;
Federal (EPBC Act), State (SA/NSW/VIC legislation), Local (council planning schemes), and Emissions (CER, NGER) in one unified system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-LLM Support&lt;/strong&gt;&lt;br&gt;
Works with OpenAI (GPT-4o, GPT-4o-mini), Anthropic (Claude), AWS Bedrock, and Azure OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud-Agnostic Deployment&lt;/strong&gt;&lt;br&gt;
Deploy on AWS, Azure, or run locally with Docker.&lt;/p&gt;
&lt;h2&gt;
  
  
  Technical Architecture: A Production RAG System
&lt;/h2&gt;

&lt;p&gt;This isn't a tutorial project. It's a &lt;strong&gt;production-ready system&lt;/strong&gt; with real ETL pipelines, monitoring, and cloud deployment. Here's how it works:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Document Ingestion &amp;amp; ETL
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Federal: EPBC Act (environment.gov.au)&lt;/li&gt;
&lt;li&gt;State: SA/NSW/VIC legislation, EPA guidelines&lt;/li&gt;
&lt;li&gt;Local: Council planning schemes&lt;/li&gt;
&lt;li&gt;Emissions: CER emissions data, NGER reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Development:&lt;/strong&gt; Apache Airflow (local UI for testing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production:&lt;/strong&gt; GitHub Actions (scheduled daily runs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Processing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF parsing (PyMuPDF, layout-aware chunking)&lt;/li&gt;
&lt;li&gt;HTML scraping (BeautifulSoup)&lt;/li&gt;
&lt;li&gt;Metadata tagging with LLM (auto-extracts jurisdiction, topics, regulatory hierarchy)&lt;/li&gt;
&lt;li&gt;Storage: PostgreSQL with pgvector, AWS S3/Azure Blob&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Text Chunking &amp;amp; Embeddings
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Semantic chunking with regulatory context preservation
&lt;/span&gt;&lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;span class="n"&gt;CHUNK_OVERLAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;span class="n"&gt;EMBEDDING_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each chunk preserves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jurisdiction (Federal/State/Local)&lt;/li&gt;
&lt;li&gt;Document type (Legislation, Guideline, Planning Scheme)&lt;/li&gt;
&lt;li&gt;LGA (Local Government Area)&lt;/li&gt;
&lt;li&gt;Regulatory hierarchy (Act → Regulation → Policy)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Vector Store &amp;amp; Hybrid Search
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vector Stores:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FAISS&lt;/strong&gt; (local development)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt; (production - faster for large datasets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Search Strategy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Hybrid search: BM25 + vector similarity
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lga_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. BM25 keyword search (handle exact terminology)
&lt;/span&gt;    &lt;span class="n"&gt;keyword_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bm25_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Vector similarity search (semantic matching)
&lt;/span&gt;    &lt;span class="n"&gt;vector_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lga&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lga_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lga_name&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Combine with reciprocal rank fusion
&lt;/span&gt;    &lt;span class="n"&gt;merged_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reciprocal_rank_fusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;keyword_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;vector_results&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;merged_results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Geospatial Intelligence
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Location NER (Named Entity Recognition):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract Australian locations from queries
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can I clear vegetation near Murray Bridge, SA?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;locations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_ner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; [{"text": "Murray Bridge", "state": "SA", "lga": "Rural City of Murray Bridge"}]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;LGA Filtering:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GeoJSON boundaries for all 537 Australian LGAs&lt;/li&gt;
&lt;li&gt;Buffer zone queries (planned feature)&lt;/li&gt;
&lt;li&gt;Spatial intersection for multi-LGA queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Response Generation with Trust Scoring
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Response enhancement with citation verification
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. LLM generates initial response
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Trust score calculation
&lt;/span&gt;    &lt;span class="n"&gt;trust_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calculate_trust_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;factors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;citation_precision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Citations match sources?
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;regulatory_hierarchy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Cites appropriate authority level?
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;# Documents up-to-date?
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jurisdiction_alignment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;   &lt;span class="c1"&gt;# Correct jurisdiction?
&lt;/span&gt;        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Citation verification
&lt;/span&gt;    &lt;span class="n"&gt;verified_citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify_citations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trust_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trust_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;verified_citations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Backend:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.12&lt;/li&gt;
&lt;li&gt;FastAPI (async API with auto-docs)&lt;/li&gt;
&lt;li&gt;SQLModel (type-safe ORM)&lt;/li&gt;
&lt;li&gt;LangChain (RAG orchestration)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Database:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL with pgvector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vector Stores:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FAISS (development)&lt;/li&gt;
&lt;li&gt;Qdrant (production)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Embeddings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sentence-transformers/all-MiniLM-L6-v2 (default, configurable)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LLM Providers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI (GPT-4o, GPT-4o-mini)&lt;/li&gt;
&lt;li&gt;Anthropic (Claude 3)&lt;/li&gt;
&lt;li&gt;AWS Bedrock&lt;/li&gt;
&lt;li&gt;Azure OpenAI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Frontend:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React + TypeScript (work in progress)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Deployment:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker + Docker Compose&lt;/li&gt;
&lt;li&gt;AWS: ECS Fargate, RDS PostgreSQL, EC2 Spot (Qdrant), CloudFront, S3&lt;/li&gt;
&lt;li&gt;Azure: Container Apps, PostgreSQL Flexible Server, Blob Storage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment Architecture (AWS)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐
│   CloudFront    │  (CDN + frontend)
└────────┬────────┘
         │
┌────────▼────────┐
│  API Gateway    │  (HTTP API)
└────────┬────────┘
         │
┌────────▼────────┐
│   ECS Fargate   │  (Backend API)
│   (2 tasks)     │
└────┬─────┬──────┘
     │     │
     │     └──────────┐
     │                │
┌────▼─────┐   ┌─────▼────────┐
│   RDS    │   │  EC2 Spot    │
│PostgreSQL│   │   (Qdrant)   │
└──────────┘   └──────────────┘
     │
┌────▼─────┐
│    S3    │  (Document storage)
└──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Plugin Architecture: Easy Contributions
&lt;/h2&gt;

&lt;p&gt;One of the best parts of this project is the &lt;strong&gt;plugin system&lt;/strong&gt; for document sources. Adding a new regulation is straightforward:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Adding QLD Vegetation Management Guidelines&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/green_gov_rag/etl/sources/qld_vegetation.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;green_gov_rag.etl.sources.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseDocumentSource&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QLDVegetationScraper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseDocumentSource&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="c1"&gt;# Scrape QLD government portal
&lt;/span&gt;        &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;link&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document-link&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QLD Vegetation Management&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jurisdiction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;State&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QLD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing config: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Register in config:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backend/configs/documents_config.yml&lt;/span&gt;
&lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qld_vegetation&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://www.qld.gov.au/environment/vegetation-management&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That's it!&lt;/strong&gt; The ETL pipeline auto-discovers the plugin and starts ingesting documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Environmental Impact Assessment Pre-screening
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; Environmental consultant&lt;br&gt;
&lt;strong&gt;Query:&lt;/strong&gt; &lt;em&gt;"Do I need an environmental impact assessment to build a solar farm in regional NSW?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GreenGovRAG Output:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarizes relevant sections from NSW planning portal and EPBC Act&lt;/li&gt;
&lt;li&gt;Explains exemption criteria and thresholds&lt;/li&gt;
&lt;li&gt;Provides citations to official sources&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Native Vegetation Clearing Rules
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; Landowner in rural SA&lt;br&gt;
&lt;strong&gt;Query:&lt;/strong&gt; &lt;em&gt;"Can I clear native vegetation near Murray Bridge, SA?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GreenGovRAG Output:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieves SA Government vegetation clearance policies&lt;/li&gt;
&lt;li&gt;Filters to Rural City of Murray Bridge LGA&lt;/li&gt;
&lt;li&gt;Returns allowed/disallowed activities and buffer zones&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. Emissions Reporting Compliance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;User:&lt;/strong&gt; Sustainability advisor&lt;br&gt;
&lt;strong&gt;Query:&lt;/strong&gt; &lt;em&gt;"Which emissions standards apply to industrial zones in Greater Sydney?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GreenGovRAG Output:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Points to NSW EPA and federal requirements&lt;/li&gt;
&lt;li&gt;Suggests offsets or sustainable alternatives&lt;/li&gt;
&lt;li&gt;Links to energy incentive schemes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Challenges &amp;amp; Lessons Learned
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. PDF Parsing is Hard
&lt;/h3&gt;

&lt;p&gt;Government PDFs come in every flavor: scanned images, multi-column layouts, tables, footnotes. I tried multiple libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyMuPDF:&lt;/strong&gt; Fast but struggles with complex layouts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pdfplumber:&lt;/strong&gt; Better table extraction but slower&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;llmsherpa (LayoutPDFReader):&lt;/strong&gt; Best for hierarchical documents but requires external service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Hybrid approach - detect layout type, route to appropriate parser.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Metadata is Everything
&lt;/h3&gt;

&lt;p&gt;Early versions chunked documents naively (500 tokens, no context). This led to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chunks without jurisdiction information&lt;/li&gt;
&lt;li&gt;Federal rules mixed with local bylaws&lt;/li&gt;
&lt;li&gt;No regulatory hierarchy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; LLM-based metadata tagging during ingestion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jurisdiction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Federal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;State&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NSW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VIC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QLD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lga&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Act&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Regulation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Guideline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Trust Scores Matter
&lt;/h3&gt;

&lt;p&gt;Users (especially consultants) need to &lt;strong&gt;trust&lt;/strong&gt; the answers for compliance reporting. I implemented trust scoring based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Citation precision (does the answer match the source?)&lt;/li&gt;
&lt;li&gt;Regulatory hierarchy (citing Acts over guidelines)&lt;/li&gt;
&lt;li&gt;Recency (prioritize recent amendments)&lt;/li&gt;
&lt;li&gt;Jurisdiction alignment (federal &amp;gt; state &amp;gt; local)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. GitHub Actions &amp;gt; Airflow for Production ETL
&lt;/h3&gt;

&lt;p&gt;Initially used Airflow for all ETL. It's great for local development (nice UI), but &lt;strong&gt;overkill for production&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resource-heavy (needs separate EC2 instance)&lt;/li&gt;
&lt;li&gt;Complex deployment&lt;/li&gt;
&lt;li&gt;Over-engineered for daily batch jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Switched to GitHub Actions&lt;/strong&gt; for production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled workflow (runs daily at 2 AM UTC)&lt;/li&gt;
&lt;li&gt;No infrastructure cost&lt;/li&gt;
&lt;li&gt;Easier to maintain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kept Airflow for local development only (Docker Compose with &lt;code&gt;--profile dev&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Immediate priorities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete React frontend (query interface, LGA map selection)&lt;/li&gt;
&lt;li&gt;Expand coverage to all Australian states/territories&lt;/li&gt;
&lt;li&gt;Real-time regulatory change monitoring (web scraping + notifications)&lt;/li&gt;
&lt;li&gt;Interactive compliance checklist generator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Future enhancements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User authentication (OAuth2)&lt;/li&gt;
&lt;li&gt;Parcel-level geospatial queries (buffer zones, overlays)&lt;/li&gt;
&lt;li&gt;Multi-lingual support (Mandarin, Arabic for multicultural communities)&lt;/li&gt;
&lt;li&gt;Integration with government APIs (planning portals, LGA systems)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How You Can Help
&lt;/h2&gt;

&lt;p&gt;GreenGovRAG is &lt;strong&gt;open source and community-driven&lt;/strong&gt;. Ways to contribute:&lt;/p&gt;

&lt;h3&gt;
  
  
  For Developers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add new document source plugins (VIC, QLD, WA, TAS, NT regulations)&lt;/li&gt;
&lt;li&gt;Improve frontend UI/UX (React, TypeScript)&lt;/li&gt;
&lt;li&gt;Write integration tests&lt;/li&gt;
&lt;li&gt;Optimize vector search performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Domain Experts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Validate query results and provide feedback&lt;/li&gt;
&lt;li&gt;Add your state/council's regulations to the ETL pipeline&lt;/li&gt;
&lt;li&gt;Suggest new use cases and features&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Advocates
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Share with planners, ESG analysts, and researchers&lt;/li&gt;
&lt;li&gt;Present at meetups or conferences&lt;/li&gt;
&lt;li&gt;Provide feedback via GitHub issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/sdp5/green-gov-rag" rel="noopener noreferrer"&gt;github.com/sdp5/green-gov-rag&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using Docker (Recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/sdp5/green-gov-rag.git
&lt;span class="nb"&gt;cd &lt;/span&gt;green-gov-rag/deploy/docker
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Edit .env with your API keys (OpenAI, Anthropic, or AWS Bedrock)&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Access:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend API: &lt;a href="http://localhost:8000/docs" rel="noopener noreferrer"&gt;http://localhost:8000/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Frontend: &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt; (WIP)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Query Example (via API)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8000/api/query &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "query": "Do I need an EIA for a solar farm in regional NSW?",
    "lga_name": "Dubbo Regional"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building GreenGovRAG has been a journey in production RAG engineering: from ETL pipelines that don't break, to trust scoring systems that consultants can rely on, to geospatial filtering that makes sense for Australian regulations.&lt;/p&gt;

&lt;p&gt;The goal isn't to replace human expertise - it's to &lt;strong&gt;make that expertise more efficient&lt;/strong&gt;. Environmental consultants still need to verify answers, but now they can do in &lt;strong&gt;seconds&lt;/strong&gt; what used to take &lt;strong&gt;days&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're working on RAG systems, navigating regulatory compliance, or just interested in civic tech - I'd love to hear from you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let's make environmental compliance faster, smarter, and more accessible.&lt;/strong&gt; 🌏&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>llm</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
