<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SweepBase</title>
    <description>The latest articles on DEV Community by SweepBase (@sweepbase).</description>
    <link>https://dev.to/sweepbase</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886467%2F93a893fd-d4b9-474b-a682-cf8587a48db0.png</url>
      <title>DEV Community: SweepBase</title>
      <link>https://dev.to/sweepbase</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sweepbase"/>
    <language>en</language>
    <item>
      <title>What I learned scraping 141 crypto cardholder agreements</title>
      <dc:creator>SweepBase</dc:creator>
      <pubDate>Wed, 20 May 2026 19:25:17 +0000</pubDate>
      <link>https://dev.to/sweepbase/what-i-learned-scraping-141-crypto-cardholder-agreements-11lg</link>
      <guid>https://dev.to/sweepbase/what-i-learned-scraping-141-crypto-cardholder-agreements-11lg</guid>
      <description>&lt;p&gt;On 3 February 2026, three unrelated crypto cards — CEX.IO Card, Trustee Plus, and IN1 — stopped processing payments on the same day. They had no parent in common. They were not hacked. None of the front-end brands had failed. The only thing they shared was a Polish payment-institution whose license had been revoked twelve days earlier by KNF.&lt;/p&gt;

&lt;p&gt;That was the prompt to start a dataset. The question was simple: how many other crypto cards share an underlying issuer that almost no user has ever heard of? Answering it required reading roughly 141 cardholder agreements.&lt;/p&gt;

&lt;p&gt;This post is about what that data collection actually looked like — the scraper choices, the failure modes, and what surprised me about the structure of "publicly available" legal documents on the web.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture, in two paragraphs
&lt;/h2&gt;

&lt;p&gt;Most crypto companies don't directly issue payment cards. They rent the right to issue cards from a principal member of Visa or Mastercard. That principal — usually a small or mid-size bank or e-money institution — is the BIN sponsor. The six digits at the start of the card number identify them. The brand on the front of the card is a separate company, the program manager, which contracts with both the sponsor and the user.&lt;/p&gt;

&lt;p&gt;Three layers, one of them visible. The sponsor is the layer the regulator can actually shut down. When the regulator does, every program manager on that sponsor's BIN goes dark on the same day. From the user's perspective there is no warning, because the user never signed up with the sponsor.&lt;/p&gt;

&lt;p&gt;If that sounds familiar — Stripe acquired Bridge in 2025, Coinbase Card runs on Pathward (not Marqeta, which is just the processor), Gnosis Pay runs on Monavate — it is the same pattern at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scrape
&lt;/h2&gt;

&lt;p&gt;The first plan was naive: a Playwright job that visited each card's &lt;code&gt;/legal&lt;/code&gt; or &lt;code&gt;/terms&lt;/code&gt; URL, extracted text, and ran a regex for the phrase "issued by [BANK NAME]". This worked for about a third of the dataset.&lt;/p&gt;

&lt;p&gt;The other two-thirds failed in interesting ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cardholder agreement is a PDF generated only after KYC&lt;/strong&gt;. About a dozen cards. The static T&amp;amp;C is a marketing summary; the legally binding agreement is generated at application time with a Lambda. You can't fetch it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sponsor name is in an appendix, not the first paragraph&lt;/strong&gt;. A regex that scans the first 500 words misses it. Some cardholder agreements bury "issued by ___" inside a chargeback procedures section, sometimes thousands of words in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sponsor disclosure was deleted&lt;/strong&gt;. A handful of cards used to name their sponsor and quietly removed it after the Union54 BIN suspension in 2022. The Wayback Machine still has the old version. The current page doesn't.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The page is rendered client-side via a wallet SDK that won't run in headless Chrome&lt;/strong&gt;. Two cards. Solved by switching to a real Chrome instance with the wallet extension pre-installed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The card's website doesn't include a cardholder agreement at all&lt;/strong&gt;. Around 25 cards. The agreement exists somewhere — there must be a paper trail because Visa or Mastercard requires one — but the public-facing site doesn't link to it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For (5), the only reliable signal is the BIN itself. If you can find a forum post or a press release with someone's card number prefix, you can look up which member that prefix is registered to and infer the sponsor. The signal is noisy, but it's better than nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ended up in the dataset
&lt;/h2&gt;

&lt;p&gt;After two passes (one scraped, one manual cross-check), each card got one of four confidence labels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HIGH (~79 cards)&lt;/strong&gt;: sponsor name verbatim from a publicly fetched T&amp;amp;C, on a date recorded with the row.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MEDIUM (~34 cards)&lt;/strong&gt;: sponsor named in an older snapshot, press release, or regulator filing — but the current public page doesn't repeat it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CIRCUMSTANTIAL (~25 cards)&lt;/strong&gt;: inferred from program-manager naming or industry partnerships. Treated as upper-bound estimate, not fact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UNKNOWN (~3 cards)&lt;/strong&gt;: best guess, flagged for follow-up.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've built data products before, this part will be familiar. The interesting wrinkle is that the legal disclosure regime varies wildly by jurisdiction. US and EU cards almost always name the sponsor verbatim. APAC programs frequently do not. African and LatAm cards have actively &lt;em&gt;removed&lt;/em&gt; the disclosure since 2022, because Union54's BIN suspension that year created a contagion risk — if the regulator suspends your sponsor for someone else's fraud, you want to keep your customer association with the sponsor quiet.&lt;/p&gt;

&lt;p&gt;That asymmetry — disclosure norms diverging across regions — is itself a structural fact about the market. It is not a dataset cleanliness problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the data shows once you have it
&lt;/h2&gt;

&lt;p&gt;Globally, the Herfindahl-Hirschman Index across all 141 cards is around 400 to 500. Below the US DOJ threshold for "unconcentrated." That number is misleading. Once you split by region and product type — which is the actual choice a user faces when picking a card — the picture inverts.&lt;/p&gt;

&lt;p&gt;US self-custody stablecoin cards: HHI around 5,000 to 6,300 depending on how you count circumstantial attribution. Two banks (Third National in Tennessee, Lead Bank in Missouri) anchor roughly two-thirds of issuance. EU/UK self-custody: even worse — a single sponsor (Monavate, owned via Baanx since 1 May 2026 by Exodus) anchors most of the segment.&lt;/p&gt;

&lt;p&gt;If you want to look at the per-card data, the methodology, or the per-row source URLs, the dataset is at &lt;a href="https://sweepbase.net/dataset" rel="noopener noreferrer"&gt;sweepbase.net/dataset&lt;/a&gt; and the full write-up of the concentration findings is at &lt;a href="https://sweepbase.net/research/bin-sponsor-concentration-2026" rel="noopener noreferrer"&gt;sweepbase.net/research/bin-sponsor-concentration-2026&lt;/a&gt;. Both are CC-BY for academic and journalistic use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd build differently next time
&lt;/h2&gt;

&lt;p&gt;Three concrete things, for anyone trying to do this kind of dataset:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't trust a single fetch.&lt;/strong&gt; Spot-audit on the day of publication. Of 32 cards I re-checked on 16 May 2026, only 14 were verbatim re-verifiable. The rest had been edited since the original scrape. The dataset now schedules quarterly re-fetches.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Track which jurisdiction's regulator can shut each sponsor down.&lt;/strong&gt; Most public BIN datasets are jurisdiction-blind. For risk analysis, that's a critical missing column. KNF can shut down a Polish sponsor in twelve days. The FCA cannot. Knowing which is which changes the risk-weighting of each card.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distinguish sponsor from processor from program manager.&lt;/strong&gt; The single most common error in casual coverage of crypto cards — repeated in trade press for years — is calling Marqeta the "issuer" of Coinbase Card. Marqeta is the processor. The actual sponsor (Pathward) doesn't appear in 95% of articles about the card. Different roles, different regulators, different failure modes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Most "crypto card competition" coverage treats the front-of-card brands as substitutable when, behind the scenes, two different brands are often two skins on the same regulated entity. That doesn't matter — until the regulator pulls the sponsor's license, and three programs go dark on a Tuesday.&lt;/p&gt;

&lt;p&gt;The dataset is open. Corrections welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://sweepbase.net/research/bin-sponsor-concentration-2026" rel="noopener noreferrer"&gt;Sweepbase Research&lt;/a&gt;. I run &lt;a href="https://sweepbase.net" rel="noopener noreferrer"&gt;Sweepbase&lt;/a&gt;, an independent crypto-card comparison and research project tracking 141 active cards across regions, networks, and BIN sponsors.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>fintech</category>
      <category>crypto</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>How I Maintained an Awesome-List of 136 Crypto Cards as a CI-Linted Dataset</title>
      <dc:creator>SweepBase</dc:creator>
      <pubDate>Thu, 07 May 2026 08:29:34 +0000</pubDate>
      <link>https://dev.to/sweepbase/how-i-maintained-an-awesome-list-of-136-crypto-cards-as-a-ci-linted-dataset-1pp6</link>
      <guid>https://dev.to/sweepbase/how-i-maintained-an-awesome-list-of-136-crypto-cards-as-a-ci-linted-dataset-1pp6</guid>
      <description>&lt;p&gt;Last month I open-sourced &lt;a href="https://github.com/mbtrilla/awesome-crypto-cards" rel="noopener noreferrer"&gt;awesome-crypto-cards&lt;/a&gt; — a curated list of 136 crypto debit and credit cards. This post is about the boring infrastructure: why I run awesome-lint in CI, how I keep the list synced with the dataset behind sweepbase.net, and where I underestimated effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a flat README, not a database
&lt;/h2&gt;

&lt;p&gt;The list lives as a single README.md. No JSON, no YAML, no static site. People who land on a GitHub awesome-list expect to scan markdown, not click into an interactive viewer.&lt;/p&gt;

&lt;p&gt;Trade-offs I accepted: no programmatic queries, no filtering UI, no auto-generated content.&lt;/p&gt;

&lt;p&gt;Trade-offs I avoided: an extra build step, broken links from generator bugs, and the friction of "wait, where do I edit this?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The awesome-lint CI
&lt;/h2&gt;

&lt;p&gt;Every push runs awesome-lint via GitHub Actions. It catches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate URLs (you'd be surprised)&lt;/li&gt;
&lt;li&gt;Links missing https://&lt;/li&gt;
&lt;li&gt;Markdown formatting that breaks GitHub's renderer&lt;/li&gt;
&lt;li&gt;Broken anchor references in the contents section&lt;/li&gt;
&lt;li&gt;Categories that don't sort alphabetically
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/main.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Awesome Lint&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;lint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;22'&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx awesome-lint&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lint config is the strictest version (no-emoji). I keep it that way because the goal is acceptance into other awesome-list registries down the line, and they reject any list that fails their own awesome-lint pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping it synced with the source
&lt;/h2&gt;

&lt;p&gt;The dataset behind sweepbase.net is a CSV of 141 rows. Five of those are pre-launch products (waitlist, "in development" custody, "TBA" network) — the README rule is "shipping only," so the README count is 136.&lt;/p&gt;

&lt;p&gt;The diff between CSV and README runs as a small Node script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;csvNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cards&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Card Service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()));&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;readmeNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;re&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/- &lt;/span&gt;&lt;span class="se"&gt;\[([^\]]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)\]\(&lt;/span&gt;&lt;span class="sr"&gt;https:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;sweepbase&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="sr"&gt;net&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;cards&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;readme&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;readmeNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inCsvNotReadme&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;csvNames&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;readmeNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each time I add a card to the dataset, this tells me what's missing in the README, and I add it manually. Manual is fine because it's once a week at most.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I underestimated
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Alphabetical filter sections.&lt;/strong&gt; Each region/custody/use-case section repeats card names. Adding one new card means editing 4-5 lists. I have a script in mind but haven't built it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Related Lists" section.&lt;/strong&gt; The other awesome-lists in the crypto/defi space are mostly stale (2-3 years since update). Including them feels honest but reduces the list's perceived freshness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Star farming.&lt;/strong&gt; Two-week organic plan, 23 days later, 1 star. Reality check: the list needs distribution, not just existence.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building an awesome-list, the lint+CI part is fast. The interesting work is keeping it honest as the underlying space changes.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/mbtrilla/awesome-crypto-cards" rel="noopener noreferrer"&gt;https://github.com/mbtrilla/awesome-crypto-cards&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Three months of running a Next.js aggregator on a CSV: what broke and what did not,</title>
      <dc:creator>SweepBase</dc:creator>
      <pubDate>Wed, 06 May 2026 12:22:11 +0000</pubDate>
      <link>https://dev.to/sweepbase/three-months-of-running-a-nextjs-aggregator-on-a-csv-what-broke-and-what-did-not-e59</link>
      <guid>https://dev.to/sweepbase/three-months-of-running-a-nextjs-aggregator-on-a-csv-what-broke-and-what-did-not-e59</guid>
      <description>&lt;p&gt;I shipped a 141-row crypto card comparison site on a public CSV instead of a database back in February, and I want to write down what I have learned three months in. The earlier posts covered why I picked CSV (&lt;a href="https://dev.to/sweepbasecards/my-nextjs-15-aggregator-runs-on-a-csv-file-instead-of-a-database-16h5"&gt;why a CSV beats a database for this&lt;/a&gt;) and what I would do differently on the architecture side (&lt;a href="https://dev.to/sweepbasecards/what-i-learned-shipping-a-nextjs-15-csv-side-project-37po"&gt;six lessons-learned from shipping a Next.js 15 + CSV side project&lt;/a&gt;). This is the operational version.&lt;/p&gt;

&lt;h2&gt;
  
  
  What broke
&lt;/h2&gt;

&lt;p&gt;ISR cache went stale faster than I expected. Setting &lt;code&gt;revalidate = 86400&lt;/code&gt; on card detail pages felt safe in dev. In production, when I edited the CSV and pushed, the new content took up to 24 hours to surface on cold pages because Vercel only revalidates on traffic. I added a &lt;code&gt;/api/revalidate&lt;/code&gt; webhook that I hit from a small script after every CSV change. That fixed the lag, but it adds a step I forget half the time.&lt;/p&gt;

&lt;p&gt;PapaParse parsing in a Server Component blew up once when a column contained a comma inside quoted text and the quoting was wrong. Zod validation caught the malformed row, but I had 20 minutes of "is the entire site broken" panic before I read my own logs. Lesson: always log the failing row before throwing.&lt;/p&gt;

&lt;p&gt;Image proxy started rate-limiting. I serve card images via &lt;code&gt;/api/image-proxy&lt;/code&gt; with a 7-day cache. About six weeks in, I noticed Google Drive started throttling requests from Vercel egress IPs. Cache hit rate dropped, latency went up. I now host all new card images locally as &lt;code&gt;.webp&lt;/code&gt; and only fall back to Drive for legacy entries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What did not break
&lt;/h2&gt;

&lt;p&gt;The catalog itself. 141 rows in a CSV is below any threshold where you actually need a database. Greps are instant in CI, the file diffs cleanly in PRs, and contributors can read it without a SQL client. I have not regretted this once.&lt;/p&gt;

&lt;p&gt;Filter functions as predicates. Every category on the site is a single function &lt;code&gt;(card: Card) =&amp;gt; boolean&lt;/code&gt; in one file. When I needed to add a new category (Brazil, USDC, self-custody), it was a one-line export. Reading &lt;a href="https://sweepbasenotes.blogspot.com/2026/05/why-i-keep-building-these-comparison.html" rel="noopener noreferrer"&gt;a meta post on the editorial layer of a comparison site&lt;/a&gt; made me realize this was the architectural choice that made the most editorial work feel cheap.&lt;/p&gt;

&lt;p&gt;Zod schemas as the source of truth. Card type, validation, defaults all in one place. I have refactored the card model three times now and the migration was always trivial because the schema was the contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would copy on a new project
&lt;/h2&gt;

&lt;p&gt;Start with a CSV. Move to a database only when you have evidence the CSV is the bottleneck. For three months of traffic and 141 rows, mine never was.&lt;/p&gt;

&lt;p&gt;If you want the live result, &lt;a href="https://sweepbase.net" rel="noopener noreferrer"&gt;the database is at sweepbase.net&lt;/a&gt; and the &lt;a href="https://telegra.ph/How-I-picked-the-metric-to-compare-139-crypto-cards-on-05-03" rel="noopener noreferrer"&gt;comparison methodology piece&lt;/a&gt; is on Telegraph. There is also a &lt;a href="https://cryptocardnotes.wordpress.com/2026/05/03/what-i-would-ask-a-crypto-card-founder-if-they-pitched-me-a-launch/" rel="noopener noreferrer"&gt;follow-up note on the founder-pitch lens&lt;/a&gt; that complements this operational view.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>nextjs</category>
      <category>sideprojects</category>
    </item>
    <item>
      <title>What I learned shipping a Next.js 15 + CSV side project</title>
      <dc:creator>SweepBase</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:48:50 +0000</pubDate>
      <link>https://dev.to/sweepbase/what-i-learned-shipping-a-nextjs-15-csv-side-project-37po</link>
      <guid>https://dev.to/sweepbase/what-i-learned-shipping-a-nextjs-15-csv-side-project-37po</guid>
      <description>&lt;p&gt;I shipped a small side project this year: &lt;a href="https://sweepbase.net" rel="noopener noreferrer"&gt;sweepbase.net&lt;/a&gt;, a comparison site for crypto debit and credit cards. 139 cards, no DB, the whole dataset is one CSV file in the repo.&lt;/p&gt;

&lt;p&gt;Here are the things I'd actually tell another dev about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  CSV beats a DB more often than people admit
&lt;/h2&gt;

&lt;p&gt;The whole catalog is &lt;code&gt;data.csv&lt;/code&gt;, parsed at boot, validated with Zod. Reads outnumber writes by something like 10,000 to 1, and most "writes" are me fixing a number once a month.&lt;/p&gt;

&lt;p&gt;For that load profile, a database is theatre. CSV in a public repo gives me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One source of truth, version controlled&lt;/li&gt;
&lt;li&gt;Diff-able commits when I change a number&lt;/li&gt;
&lt;li&gt;No admin UI to build&lt;/li&gt;
&lt;li&gt;An auditable timeline anybody can inspect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When somebody asks "why did you change Crypto.com APY", I link the commit. That answer is more reassuring than any dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zod earns its rent
&lt;/h2&gt;

&lt;p&gt;Zod's schema does double duty: it validates at boot, and it generates the TypeScript type via &lt;code&gt;z.infer&lt;/code&gt;. One source for shape, no drift between runtime and compile time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CardSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;fxMargin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;atmFee&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Card&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;infer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;CardSchema&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a row in the CSV is malformed, the build fails. I never ship broken data without knowing.&lt;/p&gt;

&lt;h2&gt;
  
  
  ISR is the right default for content sites
&lt;/h2&gt;

&lt;p&gt;Next.js 15.1 App Router with &lt;code&gt;revalidate: 3600&lt;/code&gt; on every page. The data changes a few times a week. There is no reason to re-render on every request. Lighthouse stays at 100 across the catalog because the rendered HTML is essentially static, and the framework refreshes it every hour.&lt;/p&gt;

&lt;p&gt;I had to fight the urge to reach for SSR or client-side fetching. Neither belongs here.&lt;/p&gt;

&lt;h2&gt;
  
  
  React.cache() is underrated
&lt;/h2&gt;

&lt;p&gt;Multiple components in a single page render call the same &lt;code&gt;getCards()&lt;/code&gt; function. Without &lt;code&gt;React.cache()&lt;/code&gt;, the CSV gets parsed once per call site. Wrapped in &lt;code&gt;React.cache()&lt;/code&gt;, it parses once per request. Easy 10x latency win that I almost missed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Filters as predicates beats SQL for small data
&lt;/h2&gt;

&lt;p&gt;37 category pages (USA, no-KYC, self-custody, travel, and so on), all rendered from the same Server Component. The category-specific logic lives in &lt;code&gt;lib/filters.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isSelfCustody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;card&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Card&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;card&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;custody&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;self&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isUSACompatible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;card&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Card&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;card&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;regions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;USA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adding a new category page is a 6-line PR: filter, slug, name. No migration, no index to remember.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do differently
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Started the public CSV from day one. I used Notion for the first month, lost a week porting it.&lt;/li&gt;
&lt;li&gt;Set up Sentry before shipping, not after the first ghost bug report.&lt;/li&gt;
&lt;li&gt;Wrote the report-error button in week 1. Real user reports caught more bad data than my own auditing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to look
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Live: &lt;a href="https://sweepbase.net" rel="noopener noreferrer"&gt;sweepbase.net&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Dataset: &lt;a href="https://sweepbase.net/datasets/data.csv" rel="noopener noreferrer"&gt;/datasets/data.csv&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Calculator: &lt;a href="https://sweepbase.net/calculator" rel="noopener noreferrer"&gt;/calculator&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see the schema or argue with one of my ratings, both are public. The CSV is the source of truth.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>typescript</category>
      <category>webdev</category>
      <category>sideprojects</category>
    </item>
  </channel>
</rss>
