Last Tuesday my multi-region cron job finished its 4 AM pull and the logs lit up with 47 timeouts against streaming-platform metadata endpoints. Curl from my workstation worked fine. Curl from the same VLAN as the cron host hung. The difference was a Pi-hole instance my partner had spun up two days earlier — it was nuking telemetry CNAMEs that one of the platform SDKs uses for metadata fan-out. So the API replies looked successful but half the catalog data was missing.
That was the day I ripped out Pi-hole and put NextDNS on the router instead. Six months later, my discovery pipeline is faster, my home network has fewer ads, and my cron jobs are no longer collateral damage. This post is the boring infrastructure story of how I got there — what worked, what blew up, and the PHP and Python glue that kept the catalogue honest across eight geo regions.
Why router-level DNS beats per-device blocking
The browser-extension approach (uBlock, etc.) works on the page you are looking at. Pi-hole and AdGuard Home work on the LAN, but they require you to operate a box. NextDNS sits in the middle: it is a hosted resolver that speaks DoH, DoT, and plain UDP, and the configuration lives in a web dashboard. The router points all clients at it.
The advantages for a discovery backend like the one I run:
- One config follows me across networks. The same profile applies whether I am on my home Wi-Fi or tethering off my phone.
- The dashboard exposes a CSV/JSON log API. I can pipe that into SQLite and join it with my own crawler logs to see which ad domains overlap with the legitimate CDN domains my crawler hits.
- Allowlists are first-class. When my cron host needs to resolve a flagged telemetry CNAME for legitimate reasons, I add the domain to the profile rather than maintaining a fork of a hosts file.
- DNS-over-HTTPS at the router means the upstream ISP cannot see the queries even from the dumbest IoT device on my network.
The disadvantage: NextDNS is a SaaS, and DNS is on the critical path. I run it with a local fallback (Unbound on the router) so when NextDNS has an incident — and they have had two in the time I have used them — clients fail open to a normal resolver instead of dropping all traffic.
What I run the router on
The router is a six-year-old NanoPi R2S flashed with OpenWrt 23.05. It has 1 GB of RAM, two gigabit ports, and idles at 2 W. Total cost: about $50, including the metal case.
For NextDNS specifically, OpenWrt has a first-class package — nextdns — that runs a tiny local proxy and forwards everything over DoH. The relevant bits of /etc/config/nextdns look like this:
config nextdns
option enabled '1'
option profile '4f8c2a'
option report_client_info '1'
option auto_activate '1'
option hardened_privacy '0'
option cache_size '10MB'
option max_ttl '3600'
option max_inactivity '15'
option use_https '1'
option setup_router '1'
The setup_router flag is the magic one: it rewrites dnsmasq's upstream resolvers and patches the DHCP advertisements so every client gets the local proxy as its DNS server. No per-device config needed.
To prove this is actually being used, I run a quick check on every new device:
dig trendvidstream.com +short
curl -s https://test.nextdns.io | jq .status
If status is ok and the profile ID matches, the device is using NextDNS. If it is not, the device is bypassing the router. Android's Private DNS is a common culprit; Chromecast hardcodes 8.8.8.8 and ignores DHCP entirely — more on that later.
Building a custom denylist from real video discovery traffic
This is where it stops being a generic 'I installed NextDNS' post. My discovery backend hits roughly 14,000 distinct hostnames a day across the eight regions I crawl. About 4% of those are pure ad/tracking domains that get served up alongside real metadata in JSON-LD blobs. NextDNS's default blocklist (NextDNS Ads, OISD Big, hagezi-pro) catches most of them, but it also catches a long tail of CDNs that some platforms use for thumbnail fan-out.
So I built a feedback loop. Every night, a PHP 8.4 cron joins the NextDNS query log (pulled via their REST API) against my own crawler's SQLite FTS5 index of fetched URLs. Anything blocked by NextDNS that also appears in a successful crawl payload gets surfaced for review.
Here is the puller. It paginates through the log API and dumps everything into a local SQLite table:
<?php
declare(strict_types=1);
const NEXTDNS_PROFILE = '4f8c2a';
const DB_PATH = __DIR__ . '/data/nextdns.db';
$apiKey = getenv('NEXTDNS_API_KEY') ?: exit("missing NEXTDNS_API_KEY\n");
$db = new PDO('sqlite:' . DB_PATH);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$db->exec(<<<SQL
CREATE TABLE IF NOT EXISTS dns_log (
ts INTEGER NOT NULL,
domain TEXT NOT NULL,
root_domain TEXT NOT NULL,
status TEXT NOT NULL,
reasons TEXT,
client_ip TEXT,
PRIMARY KEY (ts, domain, client_ip)
);
CREATE INDEX IF NOT EXISTS idx_log_root ON dns_log(root_domain);
CREATE INDEX IF NOT EXISTS idx_log_status ON dns_log(status);
SQL);
$cursor = null;
do {
$url = 'https://api.nextdns.io/profiles/' . NEXTDNS_PROFILE . '/logs?limit=1000';
if ($cursor !== null) {
$url .= '&cursor=' . urlencode($cursor);
}
$ch = curl_init($url);
curl_setopt_array($ch, [
CURLOPT_HTTPHEADER => ['X-Api-Key: ' . $apiKey],
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 30,
]);
$body = curl_exec($ch);
if (curl_errno($ch)) {
fwrite(STDERR, 'curl: ' . curl_error($ch) . "\n");
exit(1);
}
curl_close($ch);
$payload = json_decode($body, true, flags: JSON_THROW_ON_ERROR);
$rows = $payload['data'] ?? [];
$cursor = $payload['meta']['pagination']['cursor'] ?? null;
$stmt = $db->prepare(
'INSERT OR IGNORE INTO dns_log
(ts, domain, root_domain, status, reasons, client_ip)
VALUES (:ts, :d, :rd, :s, :r, :ip)'
);
$db->beginTransaction();
foreach ($rows as $row) {
$stmt->execute([
':ts' => (int) ($row['timestamp'] / 1000),
':d' => $row['domain'],
':rd' => $row['root'] ?? $row['domain'],
':s' => $row['status'],
':r' => json_encode($row['reasons'] ?? []),
':ip' => $row['clientIp'] ?? null,
]);
}
$db->commit();
echo 'Imported ' . count($rows) . ' rows, cursor=' . ($cursor ?? '<end>') . "\n";
} while ($cursor !== null);
That dumps about 1.2 GB into SQLite a month. With WAL mode and the two indexes above, queries against 30 days of data return in single-digit milliseconds.
The second half of the loop is the reconciliation. My crawler stores every fetched URL in an FTS5 table called crawler_urls. I want to find blocked DNS names whose root domain shows up in legitimate crawler payloads — those are false positives I need to allowlist.
<?php
declare(strict_types=1);
$db = new PDO('sqlite:' . __DIR__ . '/data/nextdns.db');
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$db->exec("ATTACH DATABASE '" . __DIR__ . "/data/crawler.db' AS c");
$sql = <<<SQL
WITH blocked AS (
SELECT root_domain, COUNT(*) AS hits
FROM dns_log
WHERE status = 'blocked'
AND ts > strftime('%s', 'now', '-7 days')
GROUP BY root_domain
HAVING hits > 5
)
SELECT b.root_domain, b.hits, COUNT(c.url) AS crawler_hits
FROM blocked b
JOIN c.crawler_urls c
ON c.crawler_urls MATCH 'host:' || b.root_domain
WHERE c.fetched_ok = 1
GROUP BY b.root_domain
ORDER BY crawler_hits DESC
LIMIT 50;
SQL;
foreach ($db->query($sql) as $row) {
printf(
"%-40s blocked=%-6d crawled=%-6d\n",
$row['root_domain'],
$row['hits'],
$row['crawler_hits']
);
}
When I ran this for the first time after three weeks of data, I found seven domains in the top 50 that needed allowlisting — all CDNs that some platforms use for image fan-out and which OISD had categorized as trackers because they also serve telemetry. Two of them were causing the timeout cascade I described at the top of this post.
Pushing the allowlist back to NextDNS
NextDNS exposes a write API for the allowlist. Once I have reviewed the output of the reconciliation query, I have a small Python script that pushes approved domains:
#!/usr/bin/env python3
"""Push a reviewed allowlist to a NextDNS profile.
Usage: ./push_allow.py < reviewed.txt
"""
import json
import os
import sys
import time
from urllib import request, error
PROFILE = "4f8c2a"
API = f"https://api.nextdns.io/profiles/{PROFILE}/allowlist"
KEY = os.environ["NEXTDNS_API_KEY"]
def push(domain: str) -> tuple[int, str]:
payload = json.dumps({"id": domain, "active": True}).encode()
req = request.Request(
API,
method="POST",
data=payload,
headers={
"X-Api-Key": KEY,
"Content-Type": "application/json",
},
)
try:
with request.urlopen(req, timeout=10) as r:
return r.status, r.read().decode()
except error.HTTPError as e:
return e.code, e.read().decode()
def main() -> int:
failures = 0
for raw in sys.stdin:
domain = raw.strip()
if not domain or domain.startswith("#"):
continue
code, body = push(domain)
if code in (200, 201, 409):
print(f"ok {domain}")
else:
print(f"FAIL {domain} {code} {body}", file=sys.stderr)
failures += 1
time.sleep(0.2) # do not hammer the API
return 1 if failures else 0
if __name__ == "__main__":
sys.exit(main())
I keep the reviewed list under version control in the same repo as my crawler. Diffs against last week's snapshot get a one-line commit message and the script runs out of a Makefile target. No web UI clicking — it is all reproducible. The same repo carries my FTP deploy automation, so a single make ship pushes the crawler code to all four origins and reconciles the allowlist against the upstream profile.
Measuring impact across eight regions
My crawler fetches a fixed set of discovery probe URLs from eight regional proxies on a 4-hour cadence. Each probe records two things: time to first byte for the metadata endpoint, and number of redirects before the canonical URL. After cutting over from Pi-hole to router-NextDNS, the median TTFB on the worst region (KR) dropped from 410 ms to 280 ms — not because DNS got faster, but because the platform's lookup chain no longer hit blocked CNAMEs that triggered retry loops.
A small Go program reads my probe table and emits a Prometheus-style summary so I can graph the change over time:
package main
import (
"database/sql"
"fmt"
"log"
"os"
_ "modernc.org/sqlite"
)
type regionStat struct {
Region string
Samples int
P50 float64
P95 float64
Failures int
}
func main() {
db, err := sql.Open("sqlite", os.Getenv("PROBE_DB"))
if err != nil {
log.Fatal(err)
}
defer db.Close()
rows, err := db.Query(`
SELECT region,
COUNT(*),
(SELECT ttfb_ms FROM probes p2
WHERE p2.region = p1.region
ORDER BY ttfb_ms
LIMIT 1 OFFSET (COUNT(*) / 2)) AS p50,
(SELECT ttfb_ms FROM probes p2
WHERE p2.region = p1.region
ORDER BY ttfb_ms
LIMIT 1 OFFSET (COUNT(*) * 95 / 100)) AS p95,
SUM(CASE WHEN status >= 500 OR status = 0 THEN 1 ELSE 0 END)
FROM probes p1
WHERE ts > strftime('%s','now','-24 hours')
GROUP BY region
ORDER BY region
`)
if err != nil {
log.Fatal(err)
}
defer rows.Close()
for rows.Next() {
var s regionStat
if err := rows.Scan(&s.Region, &s.Samples, &s.P50, &s.P95, &s.Failures); err != nil {
log.Fatal(err)
}
fmt.Printf("# region=%s n=%d failures=%d\n", s.Region, s.Samples, s.Failures)
fmt.Printf("probe_ttfb_ms{region=%q,quantile=\"0.5\"} %.0f\n", s.Region, s.P50)
fmt.Printf("probe_ttfb_ms{region=%q,quantile=\"0.95\"} %.0f\n", s.Region, s.P95)
}
}
The numbers I care about most:
- P50 TTFB across regions: down 18% on average, with KR and BR seeing the biggest drops because their platform endpoints chain through the most ad CNAMEs.
- Cron job runtime: the nightly catalogue refresh dropped from a 22-minute median to 14 minutes. Most of that win is fewer per-request retries.
- FTP deploy chain: my deploys ship to four origins over FTP, and the deploy host's resolver is now also pointing at the router's NextDNS proxy. Deploys have not gotten faster (FTP is what it is), but they no longer occasionally stall when the deploy host tries to resolve an analytics blob a platform plugin pings during build.
Gotchas I hit so you do not have to
Chromecast and some smart TVs hardcode 8.8.8.8. They will completely ignore the router's DHCP-advertised DNS. The fix is a hairpin NAT rule that redirects outbound port 53 to the router itself. On OpenWrt:
iptables -t nat -A PREROUTING -i br-lan -p udp --dport 53 \
! -d 192.168.1.1 -j DNAT --to-destination 192.168.1.1
iptables -t nat -A PREROUTING -i br-lan -p tcp --dport 53 \
! -d 192.168.1.1 -j DNAT --to-destination 192.168.1.1
This catches plain UDP/TCP DNS. It does not catch DoH/DoT, which most smart TVs do not use anyway — but if you have a device that does, you will need to block port 853 and the IPs of major DoH endpoints, or just accept that one device is bypassing your filtering.
NextDNS cache TTLs interact badly with very short upstream TTLs. I had a probe target with a 30-second TTL that got cached for the full 60 seconds NextDNS's default minimum. Setting min_ttl to 0 in the profile fixed it. Worth checking if you do anything DNS-time-sensitive.
Allowlist entries are exact-match by default. If you allowlist cdn.example.com, it does not cover img.cdn.example.com. The web UI lets you toggle 'include subdomains' per entry, but the API requires the wildcard field, which is undocumented at the time of writing. Setting wildcard: true in the POST body works.
Do not allowlist root domains casually. I made the mistake of adding googleapis.com to my allowlist to unblock a specific Maps API call from one client, and discovered a week later that I had silently un-blocked a pile of telemetry endpoints across every device on the network. Allowlist the specific subdomain.
Set up a failover. I run Unbound on the router as a secondary upstream. If NextDNS's proxy returns SERVFAIL or times out for more than 5 seconds, dnsmasq falls through to Unbound. The cost is that a small handful of queries during a NextDNS outage will resolve unfiltered, but that beats the alternative of a fully dead network.
What I would do differently
If I were starting from zero today, I would skip the router-bundled nextdns package and run it as a sidecar via the nextdns Linux binary in --router mode. The OpenWrt package is fine, but it lags upstream by 3-6 months on bug fixes. Running the binary directly under procd lets me upgrade independently.
I would also build the reconciliation pipeline earlier. The first two months I just used the default blocklists and waited for users (i.e. my partner) to complain. Building the SQLite join from day one would have caught the seven CDN false positives in a week instead of in three weeks of head-scratching.
Conclusion
Putting NextDNS on the router is the highest-leverage piece of home-network infrastructure I have added in the last two years. The DNS layer is the right place to block ads because it catches every protocol, every device, every browser — including the ones with no extension support. For a discovery backend like TrendVidStream, it is also the right place to surface false positives, because the same logs that show what got blocked can be joined against the crawler's own URL table to find the CDNs that need an explicit allow.
The whole setup is about 200 lines of PHP and Python, a 50-line Go binary for metrics, and a NextDNS subscription that costs less than a coffee a month. The discovery pipeline runs faster, the home network is quieter, and when something does break, the logs make the fix obvious. That is a good week's work.
Top comments (0)