DEV Community

Cover image for LucideCrawl: AI-Powered Web Ingestion and Phishing Detection API Built on Xano
Temitope
Temitope

Posted on

LucideCrawl: AI-Powered Web Ingestion and Phishing Detection API Built on Xano

Xano AI-Powered Backend Challenge: Public API Submission

This is a submission for the Xano AI-Powered Backend Challenge: Production-Ready Public API

What I Built

I built LucideCrawl — a production-ready public API that allows developers to safely ingest and analyze web content at scale while protecting users from phishing, scams, and malicious sites.

LucideCrawl provides four core capabilities, all implemented entirely in Xano with robust authentication, per-user rate limiting, usage tracking, and audit logging:

  • Phishing & Safety Detection – AI-powered, real-time evaluation of URLs to detect scams, impersonation, urgent threats, and security risks.
  • Ask Questions About Web Pages – Extract clean content and answer natural language questions grounded in the page content. Perfect for RAG, summarization, or compliance checks.
  • Sitemap-Based Bulk Ingestion – Crawl all pages from a sitemap.xml with include/exclude path filtering.
  • Full Website Crawl – Depth-controlled crawling of entire websites with domain/path rules, delivering structured, clean data.

LucideCrawl is ideal for:

  • Browser extensions and email tools needing instant phishing detection
  • AI agents requiring safe, grounded web data
  • Knowledge platforms building search indexes or SEO audits
  • Security teams monitoring brand impersonation

All core logic, authentication, API key management, and rate limiting are built natively in Xano.


API Documentation

Base URL: https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx

Authentication:

  • All endpoints require an x-api-key header.
  • API keys are generated upon signup and displayed only once. Users can manage them in their account.

Rate Limits (monthly, per user):

Plan trust_scan ask_the_page load_sitemap site_crawl
Free 10 50 5 2
Pro 100 5,000 50 20
Enterprise 1000 50,000 500 200

Core Endpoints:

  1. POST /trust_scan – AI-powered URL safety scan
    Input: { "url": "https://example.com" }
    Returns: safety_score, safety_label, confidence_level, phishing_category, impersonated_brand, detected_threats, risk_factors, details, and user_action_recommendation.

  2. POST /ask_the_page – Answer questions about a web page
    Input: { "url": "...", "question": "..." }
    Returns: Grounded AI answer with metadata.

  3. POST /load_sitemap – Bulk page ingestion from sitemap.xml
    Input: { "sitemap_url": "...", "include_paths": [...], "exclude_paths": [...] }
    Returns: Array of structured page data.

  4. POST /site_crawl – Depth-first crawl of a website
    Input:

   {
     "base_url": "https://example.com",
     "page_limit": 100,
     "crawl_depth": 3,
     "include_subdomains": false,
     "follow_external_links": false,
     "include_paths": ["/blog/", "/docs/"],
     "exclude_paths": ["/login", "/checkout"]
   }
Enter fullscreen mode Exit fullscreen mode

Returns: Array of crawled pages in clean, structured format.

Each response includes a usage object with monthly consumption and remaining quota.


Demo

Example: Phishing Detection

curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/trust_scan \
  -H "x-api-key: sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://paypal-security-update-2025.com/login"}'
Enter fullscreen mode Exit fullscreen mode

Response (simplified):

{
  "success": true,
  "data": {
    "safety_score": 0.08,
    "safety_label": "Danger",
    "confidence_level": "high",
    "phishing_category": "financial",
    "impersonated_brand": "PayPal",
    "detected_threats": ["URGENT_ACTION_REQUIRED", "FAKE_LOGIN_FORM"],
    "details": "This page mimics PayPal's login interface and uses urgency tactics to steal credentials.",
    "user_action_recommendation": "Do not enter any information. Close immediately."
  },
  "usage": {
    "month": "2025-12",
    "used": 12,
    "limit": 100,
    "remaining": 88
  }
}
Enter fullscreen mode Exit fullscreen mode

The AI Prompt I Used

  • “Create a secure Xano API endpoint with API key authentication, per-user rate limiting, usage logging, and history tracking.”
  • “Build a trust_scan endpoint with structured AI output: safety_score, safety_label, confidence, threats, and recommendations.”
  • “Generate endpoints for page Q&A, sitemap ingestion, and full site crawling with consistent security and usage patterns.”

The AI produced strong foundations that I then refined for production readiness.


How I Refined the AI-Generated Code

Key improvements:

  • Robust header handling: Case-insensitive x-api-key detection
  • Per-endpoint rate limiting: Separate usage tracking for each endpoint
  • Atomic usage counting: Prevents accidental overcharging on failed requests
  • Comprehensive history tables: For auditing and dashboard support
  • Plan-based dynamic limits: Free, Pro, Enterprise
  • Long operation timeouts: Up to 600s for deep crawls
  • Consistent response format: Always { success, data, usage }

These refinements ensure fairness, transparency, and a developer-friendly API.


My Experience with Xano

Xano made it possible to build a fully featured, secure, public-facing API in days.

Highlights:

  • Visual function stack for complex workflows (auth → rate limit → process → log → respond)
  • Instant testing and real-time logs
  • Powerful api.request integration
  • Native authentication and database support

LucideCrawl is now live, helping developers build safer, smarter web-powered applications.

Top comments (0)