DEV Community

Cover image for Zero Public Ports: How I Secured my B2B API Against 10k Scraper Requests
Ioan G. Istrate
Ioan G. Istrate

Posted on • Originally published at blog.tripvento.com

Zero Public Ports: How I Secured my B2B API Against 10k Scraper Requests

I built security for "no one's looking." The open web is "everyone's looking." This is the post mortem of closing that gap.

The thesis: identity + perimeter shift + observability beats reactive blocking.

TL;DR: Profile traffic first, then move your perimeter to the edge. I added request logging with traffic source tagging, migrated to DB backed API keys with tiered responses, enforced monthly quotas and burst throttles, blocked cloud ASN scraper traffic at Cloudflare, whitelisted valid API paths, then closed the last big hole with a Cloudflare Tunnel that leaves zero public HTTP ports. A scheduled anomaly detector watches for abuse patterns that slip through prevention.


Two weeks after deploying Tripvento's API to a DigitalOcean droplet, I opened Django Admin and found 10,000 requests from a cluster of AWS EC2 IPs hammering my rankings endpoint with a public-facing key. A vulnerability scanner from a different IP had been probing for .env files, secrets.json, debug endpoints, and a bunch of fintech routes that don't exist on my server. Somebody had found me.

The good news? The scraper only got the top 10 results per query. My default pagination served as an unintentional safety margin, because they'd have needed to paginate to get everything, and apparently they didn't bother. Conservative defaults are your first line of defense when a system is under documented.

The bad news? My API was wide open on ports 80 and 443, serving traffic directly through Nginx with no WAF, no tunnel, and authentication that amounted to a single API key checked against environment variables.

Here's every layer of defense I built over the next two weeks, in the order I built them, including the two times I accidentally blocked my own infrastructure. Think of it as four phases: observe, identify, constrain, then remove the attack surface entirely.


Layer 1: See Everything First — The Request Logger

You can't defend what you can't see. Before blocking anything, I needed to know who was hitting what, how fast, and from where.

I added a model that captures every API request, the key that made it, the endpoint, the client IP, status code, response time, and a classification field for traffic source:

class APIRequestLog(models.Model):
    key = models.ForeignKey('APICredential', on_delete=models.SET_NULL, null=True, blank=True)
    path = models.CharField(max_length=255)
    client_ip = models.GenericIPAddressField(null=True)
    method = models.CharField(max_length=10, default='GET')
    status = models.IntegerField(null=True)
    latency_ms = models.IntegerField(null=True)
    traffic_source = models.CharField(max_length=30, blank=True, default='')
    timestamp = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            models.Index(fields=['key', 'timestamp']),
            models.Index(fields=['client_ip', 'timestamp']),
            models.Index(fields=['timestamp']),
        ]
Enter fullscreen mode Exit fullscreen mode

The indexes matter because without them, admin queries on tens of thousands of rows crawl. The traffic_source field was a later addition that turned out to be critical: it classifies internal traffic by function so the abuse detector doesn't flag your own systems.

The middleware that populates it extracts the real client IP through the proxy chain. If you're behind Cloudflare and Nginx, the IP your Django app sees is the proxy IP, not the actual client. You need to read the right headers in the right order:

# Extract real IP through the proxy chain
ip = (
    request.META.get('HTTP_<YOUR_CDN_REAL_IP_HEADER>') or
    request.META.get('HTTP_X_FORWARDED_FOR', '').split(',')[0].strip() or
    request.META.get('HTTP_X_REAL_IP') or
    request.META.get('REMOTE_ADDR')
)
Enter fullscreen mode Exit fullscreen mode

The specific header name depends on your CDN — Cloudflare, AWS CloudFront, and Fastly all use different ones. Check your provider's docs. The important thing is: read the CDN's real-IP header first, fall back to X-Forwarded-For (take only the first hop), then X-Real-IP, then REMOTE_ADDR as last resort.

Critical safety note: only trust these headers if your origin isn't publicly reachable or you restrict inbound traffic to known proxy IPs. If your server accepts direct connections, an attacker can send a spoofed X-Forwarded-For: 1.2.3.4 and bypass your IP-based throttles and logging entirely. I fix this later with the tunnel and firewall (Layers 5-6), but if your origin is still public, use Nginx's real_ip module with set_real_ip_from restricted to your CDN's IP ranges.

This looks simple but caused a real problem early on. Before I configured Nginx to forward the right headers, every request showed 127.0.0.1 as the source IP. I was completely blind to who was actually hitting the API.

The Nginx fix — forward the CDN's real client IP header through to your app:

location / {
    proxy_pass http://127.0.0.1:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $http_<cdn_real_ip_header>;
    proxy_set_header X-Forwarded-For $http_<cdn_real_ip_header>;
    proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
}
Enter fullscreen mode Exit fullscreen mode

The key insight: use your CDN's header variable instead of $remote_addr. When you're behind a CDN proxy, $remote_addr is the CDN's edge IP, not the client's. Without this, your throttling treats all CDN traffic as one user and your logs are useless. (Note: Nginx converts HTTP header hyphens to underscores in variable names, so a header like CF-Connecting-IP becomes $http_cf_connecting_ip.) In a mixed or publicly reachable origin setup, you should append to X-Forwarded-For using $proxy_add_x_forwarded_for instead of overwriting it. In my case, the tunnel and firewall guarantee a single trusted hop, so overwriting is safe.

For internal traffic, the middleware classifies by custom headers or request properties to separate your own services from customer traffic:

traffic_source = ''
if is_internal_key:
    if is_mcp_request(request):
        traffic_source = 'ai_agent'
    elif is_warmup_request(request):
        traffic_source = 'cache_warmer'
    elif is_seo_path(request.path):
        traffic_source = 'seo_pipeline'
    else:
        traffic_source = 'internal'
Enter fullscreen mode Exit fullscreen mode

How you detect each source is up to you; custom User-Agent strings, specific headers, URL path patterns. The point is to tag them. If you log everything into one bucket then external scrapers, your own AI agent, your cache warmers, your SEO pipeline will make your abuse detection useless because 90% of your traffic is yourself.


Layer 2: API Key Authentication — Identity as Infrastructure

The original auth was three environment variables compared in an if/elif:

if api_key == internal_key:
    tier = 'internal'
elif api_key == paying_key:
    tier = 'paid'
elif api_key == public_key:
    tier = 'public'
Enter fullscreen mode Exit fullscreen mode

This doesn't scale. You can't rotate keys without redeploying. You can't track per customer usage. You can't revoke a compromised key without taking down every customer on that tier.

I moved to database backed keys with a tier system. Each key has a tier, monthly usage tracking, an active flag for instant revocation, and per key CORS origins:

class APICredential(models.Model):
    TIER_CHOICES = [
        ('free', 'Free'),
        ('pro', 'Pro'),
        ('business', 'Business'),
        ('internal', 'Internal'),
    ]

    secret = models.CharField(max_length=64, unique=True, db_index=True)
    tier = models.CharField(max_length=20, choices=TIER_CHOICES)
    label = models.CharField(max_length=100)
    is_active = models.BooleanField(default=True)
    request_count = models.IntegerField(default=0)
    period_start = models.DateField(default=date.today)
    allowed_origins = models.JSONField(default=list, blank=True)
Enter fullscreen mode Exit fullscreen mode

Hash your keys like passwords. I store a SHA-256 hash of each key in the database, not the raw key. When a request comes in, the auth class hashes the provided key and looks up the hash. If the database leaks via SQL injection, exposed backup, whatever then the attacker gets hashes, not live credentials. The tradeoff is that you can only show the raw key once at creation time. Customers who lose their key need a rotation, not a lookup. Same pattern as Stripe, AWS, and every serious API key system.

If you use Stripe webhooks for key provisioning, hashing creates a race condition. Stripe fires the webhook the instant payment completes this often before the browser even redirects to your thank you page. If the webhook creates and hashes the key, the thank you page finds an existing key but can only show the hash, not the raw key. The fix is to not let the webhook create keys. Let the thank you page be the sole provisioner, it's the only code path that can display the raw key to the customer before hashing. The webhook becomes a logging safety net if the thank you page never fires (customer closed the tab), you see it in the logs and manually provision.

Tiers define both monthly request caps and per minute burst limits. Free tier gets a low cap with a tight burst. Paid tiers scale up. Internal keys get unlimited monthly but still have burst limits because even your own infrastructure shouldn't be able to accidentally DDoS your API.

The authentication class does a single DB lookup and passes the key object downstream so the throttle and middleware don't need redundant queries:

class KeyAuthentication(authentication.BaseAuthentication):
    def authenticate(self, request):
        raw_key = (
            request.META.get('HTTP_X_API_KEY') or
            request.query_params.get('api_key')
        )

        if not raw_key:
            return None

        try:
            key_hash = hash_credential(raw_key)
            credential = APICredential.objects.get(secret=key_hash, is_active=True)
        except APICredential.DoesNotExist:
            raise exceptions.AuthenticationFailed('Invalid API Key')

        return (
            KeyUser(tier=credential.tier),
            {'tier': credential.tier, 'credential': credential}
        )
Enter fullscreen mode Exit fullscreen mode

Returning None from authenticate() in DRF means "no authentication attempted," not "anonymous but allowed." Endpoints that require a key enforce it at the permission layer, a separate APIKeyPermission class checks whether authentication succeeded and returns 403 if not.

Key generation uses a random hex token with a short prefix unique to your service. The prefix is cosmetic but useful because when you see one in a log or environment variable, you know immediately it's yours versus some other credential.

Separate what each tier can see. This is a data exfiltration control, not just an API design choice. My public facing keys and paying clients hit serializers that return clean, documented fields. Internal keys hit richer serializers with raw signal data and metadata I need for building programmatic pages. Same endpoints, different response shapes selected by tier. If someone reverse engineers the public API, they see the documented shape. The internal fields that power my infrastructure never leave the server on a client request. Internal keys are never distributed to clients (obv) they're used exclusively server to server between my own infrastructure, and are additionally restricted by origin.


Layer 3: Two Layer Throttling — Monthly Quotas + Burst Protection

Rate limiting needs two dimensions: monthly quotas (business logic) and per minute bursts (abuse protection). They serve different purposes and fail differently.

Monthly throttling is DB backed. It checks the key's usage count against its tier limit and increments atomically. I use select_for_update() to prevent race conditions on concurrent requests:

class MonthlyQuotaThrottle(BaseThrottle):
    def allow_request(self, request, view):
        credential = request.auth.get('credential')
        if not credential:
            return True

        allowed, usage_info = credential.check_and_increment()

        if usage_info:
            # Attach for X-RateLimit-* response headers
            request._rate_limit_info = usage_info

        return allowed
Enter fullscreen mode Exit fullscreen mode

The usage counter resets lazily, rather than a cron that resets all counters at a fixed time, each key's counter resets on its own schedule. This avoids a thundering herd of resets hitting your database simultaneously.

Scaling note: select_for_update() locks a DB row on every request. That's fine at my current traffic, but at millions of requests it becomes a bottleneck and if you naïvely add row locks in multiple code paths, you can deadlock. Keep the locking in one place, always lock the same table in the same order, and wrap it in a tight transaction. The next step is moving the counter to Redis with INCR (atomic, no locking, sub millisecond) and syncing back to Postgres periodically for billing accuracy. For now, the DB lock is simpler and correct.

Burst throttling is cache backed (Redis) for speed:

class BurstThrottle(BaseThrottle):
    def get_cache_key(self, request):
        # Key off the DB primary key, not the raw API key string
        credential = request.auth.get('credential') if request.auth else None
        if credential:
            return f"burst:{credential.pk}"
        # Fall back to client IP for unauthenticated requests
        return f"burst:ip:{get_client_ip(request)}"

    def allow_request(self, request, view):
        limit = self.get_tier_limit(request)
        key = self.get_cache_key(request)
        current = cache.get(key, 0)

        if current >= limit:
            return False

        try:
            cache.incr(key)
        except ValueError:
            cache.set(key, 1, 60)

        return True
Enter fullscreen mode Exit fullscreen mode

A subtle bug I hit early, the burst cache key originally used the first N characters of the raw API key string. If your keys share a prefix, you get cache key collisions and separate keys share burst counters. Switching to the database primary key made each key completely independent.

One more thing, cache backend behavior varies. The incr/ValueError pattern above works with Django's Redis and Memcached backends, but other backends may behave differently on missing keys. If you're using Redis directly, the cleaner pattern is INCR (which auto-creates the key) followed by EXPIRE on the first increment. Test with your actual backend. This approach is good enough for abuse protection, not billing accuracy, slight over allowance under a race condition is acceptable.


Layer 4: Cloudflare WAF — Block the Clouds

The 10K scraper requests came from AWS EC2 IPs. Most legitimate API consumers don't typically call your API from disposable cloud instances without coordination they call from their own servers, which have their own ASNs. Bots and scrapers rent cheap cloud VMs.

How I found which ASNs to block: the request logger showed me. I could filter by status code, sort by IP frequency, and see exactly who was hammering the API. A quick ASN lookup (bgp.tools or ipinfo.io) on the top offending IPs told me they were all cloud infrastructure. The bulk scraper was AWS. The vulnerability scanner probing for .env files and fintech endpoints? A French hosting provider OVH (AS16276). Once you see the pattern, you block the ASN instead of playing whack a mole with individual IPs.

Two WAF rules:

Rule 1: Block cloud infrastructure ASNs

In Cloudflare's WAF custom rules, you can match on ip.src.asnum. The expression is straightforward you just chain the ASNs with or:

(ip.src.asnum eq <AWS_ASN>) or (ip.src.asnum eq <GCP_ASN>) or (ip.src.asnum eq <AZURE_ASN>)
Enter fullscreen mode Exit fullscreen mode

Major providers to consider blocking: AWS has multiple ASNs for different regions and legacy services. Google Cloud has a primary and secondary ASN. Azure has its own. And don't forget the budget hosting providers: OVH, Hetzner, DigitalOcean (yes, you might need to block your own cloud provider's ASN if scrapers are renting boxes there).

You can find any IP's ASN at bgp.tools, just paste the IP and it shows the network. Build your block list from what your request logger tells you, not from a generic list.

Warning: Some cloud provider ASNs also cover legitimate services. Google's cloud ASN covers Googlebot. I haven't had indexing issues, but monitor Search Console if you add broad ASN blocks. More importantly, some of your legitimate customers might call your API from AWS or GCP instances, their integration servers, their Lambda functions, their Cloud Run services. ASN blocking is brutally effective against commodity scrapers, but it's a product decision, not a pure security win. If you start onboarding B2B customers, you may need to whitelist specific IPs or move to a more targeted approach.

Funny mistake #1: This rule blocked my own Vercel cache warmers. Vercel runs on AWS. I deployed the rule, saw my cache warming jobs start failing, and had to scramble to figure out why. The fix was adjusting rule priority, but I ended up solving this differently with the tunnel (Layer 5).

Rule 2: API endpoint whitelist

Instead of blocking bad paths, I only allow known good ones. The WAF rule blocks any request to my API hostname where the path doesn't match my whitelist of valid endpoints.

I'm not going to share the exact expression because that's literally my API surface area, but the approach is: list every valid path prefix your API serves, and block everything else. In Cloudflare's expression language, this looks like a compound rule matching http.host and using starts_with() on http.request.uri.path with not logic, if the path doesn't start with any of your known prefixes, block it.

This is what killed the vulnerability scanner. All those probes to .env, secrets.json, debug endpoints, fintech routes, they now get blocked at the edge before they ever reach my server.

Funny mistake #2: I turned on Cloudflare's Bot Fight Mode thinking it would help. It killed my MCP server communication. Cloudflare classified my MCP server's HTTP requests as bot traffic and started serving CAPTCHAs to programmatic API calls. Turned that off immediately.


Layer 5: Cloudflare Tunnel — Zero Public Ports

This is the force multiplier, the single change that delivered a 10x security improvement for 1x effort. It essentially deprecated the need for a complex firewall because I moved the perimeter from my server to Cloudflare's edge. Instead of exposing HTTP ports to the internet, I set up a Cloudflare Tunnel (cloudflared) that creates an outbound only connection from my server to Cloudflare's edge.

The concept is simple: instead of Cloudflare connecting to your server (which requires open ports), your server connects out to Cloudflare and holds that connection open. Cloudflare routes incoming requests back through the established tunnel. Your server never accepts inbound connections.

The config points your hostname at your local app server, and a catch all returns 404 for anything else:

# /etc/cloudflared/config.yml
tunnel: <your-tunnel-id>
credentials-file: /path/to/credentials.json

ingress:
  - hostname: api.yourdomain.com
    service: http://127.0.0.1:8000
  - service: http_status:404
Enter fullscreen mode Exit fullscreen mode

The traffic flow becomes:

Internet → Cloudflare Edge → Cloudflare Tunnel (outbound) → Your App (localhost:8000)
Enter fullscreen mode Exit fullscreen mode

No inbound connections. No public ports. If someone port scans your server's IP, they find nothing.

The tunnel is powerful because you pair it with closing inbound ports. A tunnel alone doesn't help if your origin is still publicly reachable because attackers will just bypass Cloudflare and hit your IP directly. And even with zero public ports, you still need outbound controls, OS patching, and least privilege access. The tunnel eliminates the biggest attack surface, but it's not a substitute for everything else.

Setup is four commands: create the tunnel, route your DNS to it, install as a systemd service, enable and start. Before enabling the tunnel, delete your existing A record in DNS, the tunnel creates a CNAME that points to Cloudflare's tunnel infrastructure instead.


Layer 6: Firewall — Lock It Down

With the tunnel handling all HTTP traffic, your web facing ports are unnecessary. Drop them:

sudo ufw delete allow 80
sudo ufw delete allow 443
sudo ufw reload
Enter fullscreen mode Exit fullscreen mode

After the tunnel is confirmed working, your firewall should only allow what's strictly necessary for server administration. Everything else is closed. Your app server listens on localhost only which is accessible through the tunnel but invisible from outside.

I also burned the old IP address. Since the server's IP was in every scanner's target list from the weeks it was publicly exposed, I requested a new IP from DigitalOcean. The old address is dead, the new one has zero public facing services.

While rotating the IP, I also rotated every credential that had touched the old server: database passwords, API keys, SSH keys, Django secret key. If any of those had been exfiltrated during the weeks the server was exposed (unlikely, but possible), the rotated credentials make them worthless. Treat an IP rotation as a full credential rotation, if the address is compromised enough to burn, assume everything on that box might be too.


Layer 6.5: Don't Publish Your Attack Surface

This one is easy to miss. Django REST Framework with drf-spectacular auto generates Swagger/Redoc documentation from your viewsets. By default, it documents everything — including endpoints you don't want public.

My Swagger docs were exposing internal endpoint structures, webhook URLs, and stats endpoints. Anyone with the docs URL could see the complete API surface area.

The fix: use drf-spectacular's @extend_schema(exclude=True) on any viewset or endpoint you don't want in public docs. Internal infrastructure, webhooks, and admin facing endpoints get excluded entirely. The public docs show only what a paying customer needs to integrate.

Alternatively, serve docs behind authentication so only logged-in users can see the full API schema. But exclusion is simpler, if a customer doesn't need to call it, it shouldn't be in their docs.

You can also configure drf-spectacular with different SPECTACULAR_SETTINGS per environment; disable the Swagger/Redoc UI entirely in production while still generating the OpenAPI schema for internal CI/CD tools and testing.


Layer 7: The Anomaly Detector

All the layers above are preventive. The anomaly detector is reactive, it runs on a schedule and analyzes request log patterns looking for three things:

1. IP level abuse on public keys. Any single IP making an unusually high number of requests on a free tier key in a short window gets flagged. This catches scrapers who found a public key and are harvesting data.

2. Fast burn on paid keys. If a key burns through a large percentage of its monthly quota in a single day, something is wrong it's either a bug in the customer's integration, a leaked key, or intentional abuse. Flag it before the customer hits their limit and calls support.

3. High error rates. Any key with a disproportionate number of 4xx/5xx responses in a short window. A legitimate integration has low error rates. A scanner probing random endpoints has very high error rates. This catches the vulnerability scanners that somehow got a valid key.

The implementation is a Django management command that queries the request log table with time windowed aggregations. Here's the general shape:

# Concept — not the actual implementation
from django.db.models import Count

# Flag IPs with abnormal request volume on public keys
suspicious_ips = (
    RequestLog.objects
    .filter(tier='free', timestamp__gte=one_hour_ago)
    .values('client_ip')
    .annotate(total=Count('id'))
    .filter(total__gte=THRESHOLD)
)

# Flag keys burning quota too fast
for key in active_keys:
    daily_usage = RequestLog.objects.filter(
        key=key, timestamp__gte=one_day_ago
    ).count()
    if daily_usage >= key.monthly_limit * BURN_RATE_THRESHOLD:
        flag(key, 'fast_burn')
Enter fullscreen mode Exit fullscreen mode

It runs on a cron schedule. One critical gotcha: if you're activating a Python virtualenv in your cron command, make sure you set SHELL=/bin/bash at the top of your crontab. Without it, cron uses /bin/sh which doesn't support source, and your jobs silently fail. I spent an embarrassing amount of time debugging that one.

What this doesn't do yet: auto revoke keys or send alerts. Right now it writes to a log file. In the near future the plan is to wire up email or Slack notifications. For now, this is honest about where the system ends.


Bonus: The Admin Tarpit

Django's /admin/ is a well known attack vector. Every scanner probes it. Instead of just blocking it at the WAF (which I do), I had an idea for a little revenge:

import time
from django.http import StreamingHttpResponse

def admin_tarpit(request):
    def slow_bleed():
        while True:
            yield b" "
            time.sleep(10)

    return StreamingHttpResponse(slow_bleed(), content_type="text/plain")
Enter fullscreen mode Exit fullscreen mode

Move the real admin to a secret URL. Put this tarpit at /admin/. Any scanner that hits it gets a connection that never closes, it receives one byte every 10 seconds, tying up their resources instead of yours. Most scanners will hang for minutes before timing out.

Important caveat: don't run this in Django. If you're on Gunicorn or uWSGI with a fixed worker pool, a few dozen concurrent scanners hitting the tarpit will exhaust your workers and take down the actual API. Move tarpit logic to the edge; an Nginx limit_req with a trickle response, a Cloudflare Worker, or a dedicated lightweight process. Let the edge absorb the slow connections so your Django workers stay focused on serving real requests. Consider this defensive friction, not a security control, it wastes attacker resources, but it doesn't protect anything on its own.

You could also go the honeypot route, serve a fake login page at /admin/ and log every credential pair that gets submitted. Now your scanner is giving you intelligence instead of the other way around.


The Full Stack

Here's every layer, bottom to top:

Layer What Why
Firewall Minimal open ports Nothing to connect to
Cloudflare Tunnel Outbound-only connection Server is invisible to port scans
Cloudflare WAF ASN blocking + endpoint whitelist Cloud scrapers and vuln scanners die at the edge
Nginx CDN header forwarding, dot-file blocking Real IPs reach Django, .env probes get 444
Django Auth DB-backed API keys with tiers Every request has an identity
Serializer Separation Different response shapes per tier Internal fields never leak to client requests
Monthly Throttle DB-backed per-key quotas Business logic enforcement
Burst Throttle Cache-backed per-minute limits Abuse protection
Request Logger Every API call logged with source classification Visibility into everything
Anomaly Detector Scheduled job analyzing log patterns Catches what the preventive layers miss
Docs Lockdown Exclude internal endpoints from Swagger Don't hand attackers your API map
Credential Rotation Rotate IPs, passwords, keys together Burn the old, start clean
Security Headers HSTS, X-Content-Type-Options, X-Frame-Options Free points on vendor security audits

What I Learned

Start with logging, not blocking. My first instinct was to block the scrapers. But without logs, I would have blocked them and then had no idea if the blocking worked, or what else was hitting me. The request logger was the single highest value piece and everything else was informed by what it showed me.

Your own infrastructure is your first adversary. I blocked my Vercel cache warmers with the ASN rule. I killed my MCP server with Bot Fight Mode. Both times I was scrambling to figure out why things broke. Test your security rules against your own traffic first.

Conservative defaults are a safety margin. The scraper got 10 results per request because that's my default page size. If I'd been returning unbounded results, they'd have gotten everything in one call. Set PAGE_SIZE conservatively, it's not just UX, it limits data exfiltration per request. When your system is under-documented and under-defended, your defaults are doing the defending.

Separate your traffic sources. The source classification field on request logs was a late addition but turned out to be the most useful one. Without it, my anomaly detector would flag my own cache warmer (which makes thousands of requests per hour) as abuse every single run.

You're securing against the internet, not against targeted attacks. The vulnerability scanner wasn't targeting Tripvento, it was probing for fintech endpoints on every IP in its range. The scraper was harvesting any open API it could find. This is background radiation. The defenses don't need to be exotic; they just need to exist.

Don't forget security headers. For a headless API it's less critical than a frontend, but Strict-Transport-Security, X-Content-Type-Options: nosniff, and X-Frame-Options: DENY are low hanging fruit. They cost nothing to add, they pass automated security audits, and some customers will check for them during vendor evaluation. Add them in middleware or at the Nginx layer and forget about them.


What's Next

This stack is solid for an initial hardening of the API. Here's what I'll build when the threat model changes:

  • Scoped keys — per endpoint or per resource permissions beyond just tier level access.

  • Signed internal requests — HMAC with timestamps for server-to-server traffic, replacing raw API keys for internal communication.

  • Anomaly alerting — Slack or email notifications instead of a log file nobody checks.

  • Audit log integrity — periodic export to append only object storage so logs can't be tampered with post breach.


This is part 4 of the Building Tripvento series. Part 1 covered deleting 55M rows to scale the database. Part 2 covered the multi-LLM self healing data pipeline. Part 3 covered the Django performance audit. Next up: how I built a content factory that generates destination guides at scale. Bonus: Part 0 why I am building Tripvento.

I'm Ioan Istrate, founder of Tripvento — a hotel ranking API that scores properties against 14 traveler personas using geospatial intelligence and semantic AI. Previously worked on ranking systems at U.S. News & World Report. If you want to talk about Django performance, security, or API design, let's connect on LinkedIn.

Top comments (0)