Alessandro Kreslin for MobilityData

Posted on Mar 23

Making Open Transit Data Visible: How a Next.js Migration Changed What Google and LLMs See

#nextjs #ai #performance #react

MobilityData maintains the MobilityDatabase, an open repository of transit data used by developers and agencies worldwide. Our pages were public. Our data was open. And none of it was properly visible to search engines.

This is the story of how we migrated from a React SPA to Next.js, what we gained beyond performance numbers, and what surprised us along the way.

The Problem: Invisible Public Data

MobilityData is a global non-profit dedicated to making transportation systems interoperable through open data standards. We maintain the MobilityDatabase, an open catalog of over 6,000 GTFS, GTFS Realtime, and GBFS feeds across 99+ countries, used by developers, transit agencies, and researchers to build tools and services that help people get around without driving alone.

Our application was a standard React single-page application. It worked well for users who landed on it directly, but it had a fundamental problem: when Google crawled our pages, it would do its best to parse the JavaScript, but had a low success rate. We ended up with many incorrectly parsed pages that Google classified as "low quality" and excluded from search results. For a platform whose entire purpose is making transit data discoverable and accessible, this was working directly against our mission.

Source: Google Search Console. Page being indexed, but not showing correct content

The Results Of The Migration

Before diving into the how, here's what the migration actually delivered:

Improved Performance: Core Web Vitals

Our methodology was simple. The old and new applications were both tested under the same conditions: a 3-run average score using Lighthouse in the same environment.

Lighthouse scores of the landing page, on desktop (left), and mobile (right). The landing page content loads over 1 second faster on desktop and over 5 seconds faster on mobile.

Lighthouse scores of a sample feed detail page, on desktop (left), and mobile (right). The feed detail page content loads over 1 second faster on desktop and over 3 seconds faster on mobile.

Search Visibility

Currently, we're tracking any indexing changes in Google Search Console and plan to share detailed metrics in a follow-up post once we have enough data. But early results are already clear.

More telling than raw index counts is what Google actually displays. Searching site:mobilitydatabase.org returned roughly 15 pages before the migration. One week after the launch of Next.Js, that number jumped to ~100 and is still climbing.

Open Graph Previews

When someone shared a link to our platform on Slack, LinkedIn, or Twitter, the preview was either blank or generic. Now, shared URLs render with proper titles and descriptions.

First URL is old SPA, the beta URL is the new Next.js state

LLM Discoverability: An Under-appreciated Benefit

This result surprised us the most, and it’s also the one we think matters increasingly going forward.

An increasing share of how developers discover tools, APIs, and data sources is through AI assistants — ChatGPT, Claude, Perplexity, and others. These systems rely on being able to read and understand web content. LLM crawlers are less forgiving than Google's crawler and will refuse to visit pages that require JavaScript to render.

Source: Claude Sonnet 4.6

After the migration, we tested how LLMs interact with our pages:

Source: Claude Sonnet 4.6

If the LLM cannot read the url, it can’t parse its data or recommend the source of the data. Meaning, we've lost a discovery channel. One that's only going to grow. Server-side rendering isn't just about SEO anymore, it's about making your content accessible to the next generation of information retrieval systems.

How We Approached the Migration

We evaluated the migration as both a rendering architecture change and an opportunity to address accumulated tech debt. The two goals reinforced each other, moving to Next.js forced us to reconsider patterns that had calcified in the SPA.

Our approach was to drop the entire existing application into a Next.js catch-all route. This let us keep running the old stack: Redux, React Router, Firebase, while gradually adding new SSR pages through the App Router, which would override the catch-all as we went.

But this process introduced its own problems. Running two routers in parallel caused navigation sync issues. URLs would update but page content wouldn't, and navigating from new pages back to legacy ones introduced performance delays.

The lesson: gradual migration is viable, but it does introduce tech debt that can easily be forgotten about once the immediate pages are working.

Reducing Redux's Role

The core issue was <PersistGate/> , Redux's standard hydration pattern. It blocks rendering until the store initializes, meaning every page paid a performance cost, even those that never touched the store.

For components that relied heavily on the store, wrapping them was straightforward. The harder call was the Header component, which lived on every SSR page and contained a logout action. We chose to render these pages without waiting for store initialization, accepting that logout wouldn't work during the brief window before hydration. Faster LCP for every user was worth the edge case.

For isolated features like Feeds Search, we replaced Redux with SWR, significantly reducing boilerplate by colocating data fetching with the components that needed it.

Technical Details

Firebase Authentication for SSR

The SPA used Firebase's client-side library to generate authentication tokens for our API. Moving to SSR meant server-side API calls needed user identity, but passing Firebase's long-lived refresh token in a cookie was too risky, and even the 1-hour Firebase ID token contained more data than our server actually needed.

Instead, we created a custom short-lived JWT (HS256, 1-hour TTL) containing only the user's uid and email, stored in an httpOnly cookie. This minimizes the data exposed if a cookie is ever compromised. Server-side, this JWT is decoded and passed to Firebase Admin SDK to generate the token needed for API calls. Client-side calls still use Firebase's standard auth flow.

Two parallel auth paths add complexity, but it keeps server-side authentication secure without exposing long-lived credentials or unnecessary user data.

Caching Strategies — Gains and Limits

Server-side rendering is inherently more expensive than hosting static files for a default React SPA. An effective caching strategy keeps costs down while delivering real performance gains.

Static Pages

Landing, About, Contact, these were straightforward wins using SSG (Static Site Generation). Because the new architecture eliminated the startup delays of the old app (Redux store initialization, Firebase auth checks), these pages now load nearly instantly due to the CDN.

However, there is one complication: every page includes a Header component that changes based on authentication status. Partial Pre-rendering would have been the ideal solution. Instantly display the main content, then stream in the header. We discovered PPR too late to rework the architecture (more on that later in this article). Instead, the Header is a client component that picks up auth state during hydration. This is what causes the brief "Login" → "Account" flicker on first visit. Since it only occurs on initial page load, it's a low-priority fix but something we plan to address.

Feed Detail Pages

These had the most to gain from caching, but also the most complexity. The feed detail page renders different content depending on authentication: non-authenticated, authenticated, and authenticated-admin. A single caching strategy couldn't cover all three.

Example of feed detail page

Our solution was to create two internal routes pointing to the same public URL, with middleware reading the auth cookie to route users to the correct version. We use public routes with header-level protection rather than Next.js route groups, because our catch-all route (which supports the legacy application) created conflicts with route groups.

Non-Authenticated

All non-authenticated users see identical content, making the entire page safe to cache with ISR (Incremental Static Regeneration). We chose ISR over build-time SSG because there are 6,000+ feeds, so building them all at deploy isn't reasonably feasible.

Authenticated

Because content is tied to a specific user ID, page-level caching isn't practical. Our current approach uses unstable_cache to briefly cache API responses on the server, giving returning users a faster experience without bloating cache with per-user entries.

This is a deliberate starting point, not the final answer. Our caching maturity path looks like this:

unstable_cache (current) — fast to ship, limited hit rate
Client-side caching with SWR (next) — better UX for returning users with no additional server cost
Role-based ISR (ideal) — with defined roles like authenticated and authenticated-admin, we could ISR cache these pages the same way we do for non-authenticated users, dramatically improving hit rates

We held off on step 3 because our role system is still early-stage. Each step unlocks better performance without requiring infrastructure we haven't built yet.

Revalidation

For our MVP release, a daily cron job revalidates the cache for all feeds, giving our ISR pages a ~34% cache hit rate. The next step is an on-demand revalidation endpoint that triggers whenever a feed is updated. This would extend cache lifetimes to a maximum of two weeks, significantly improving hit rates.

Overall, the application sits at a ~91% cache hit rate. At this level, the majority of traffic is served from the edge without hitting the origin. This keeps our compute costs closer to what you'd expect from a static SPA than a fully server-rendered application. There's room to grow, but for a first release we're satisfied with it.

Measuring Performance Improvements

Early in our migration, performance data was misleadingly optimistic. Next.js prefetching made navigations feel instant, but it masked underlying page weight. On our search page, pre-loading 20 detail pages (each triggering 2–3 API calls) created a "request storm" that strained our backend. We eventually disabled prefetching on the search page (prefetch={false}) to balance perceived speed with server load.

To find a reliable baseline, we moved away from generic scores and focused on Largest Contentful Paint (LCP) as our primary metric. We refined our toolkit to separate synthetic noise from real-world behavior:

Tool	What it Caught	Why we used it
JS-Off Check	SSR failures / PersistGate issues	If a page is blank with JS disabled, it's invisible to search engines. This was the catalyst for moving data fetching to the server.
Browser Performance Tab	Unused CSS & "Request Storms"	Our most reliable source of truth. It revealed actual load behavior and identified unneeded CSS chunks that Lighthouse missed.
Screaming Frog	Crawler visibility	We used this to crawl the site specifically to verify what search engines and screen readers see without executing client-side logic.
"Generated At" Timestamp	ISR/Cache verification	A value in the HTML that proved ISR was actually serving a cached version rather than re-rendering on every hit.
Lighthouse	General health (Directional)	We found scores inconsistent between runs for precise benchmarking, treating them as a guide rather than a final grade.

Our AI-Assisted Workflow

We started the migration in January 2026 using Copilot with Claude Sonnet 4.5 and Opus 4.5 to plan the architecture and validate a transition plan. What we didn't account for was a timing gap:

Next.js 16 with Partial Pre-rendering released: October 2025
Sonnet 4.5 training cutoff: July 2025
Opus 4.5 training cutoff: May 2025

Neither model had seen Next.js 16 documentation. Given our limited engineering resources and no prior Next.js experience on the team, choosing Next.js 15 would have been the safer call. A version the AI tools actually knew well, letting us move faster with more reliable assistance. Once I realized the gap, I started feeding relevant documentation links directly into prompts, which helped but didn't fully close the knowledge deficit.

For the day-to-day workflow, we split tasks by complexity. Large architectural changes: route restructuring, layout migrations, state management refactoring, went to Opus, which handled the full scope of changes and their cascading effects better. Contained tasks: component conversion, boilerplate generation, data fetching logic, went to Sonnet, which was faster and more cost-effective for focused work.

AI was excellent at accelerating the repetitive parts of migration. Where it required close supervision was refactoring cleanup: removing Redux or restructuring files would often leave dead code, unused imports, or orphaned utilities that needed manual review.

The PPR Miss

Because we weren't familiar with Next.js before this migration, we relied on AI research to establish the baseline architecture. The tools guided me toward a solid SSR + ISR setup, but never surfaced Partial Pre-rendering, the one feature that would have cleanly solved several problems we worked around manually. The Header flicker on static pages, the caching constraints on feed detail pages, PPR's ability to serve a static shell while streaming in dynamic components addresses both.

By the time we discovered PPR through reading the documentation, the architecture was already set. It's on the roadmap to be revisited for a future iteration.

The takeaway: AI tools are most useful when you already have enough domain knowledge to evaluate what they're not telling you.

Why We Chose Vercel Over Cloud Run

We weighed Vercel against Google Cloud Run for our Next.js hosting. While both are robust, they optimize for different priorities: control versus velocity.

Cloud Run offered lower direct costs and more infrastructure control, but it would have required significant engineering hours to configure the CI/CD pipeline, Dockerize the environment, and manage the CDN layer. Vercel, built by the creators of Next.js, offered a zero-config experience that aligned with our lean team's needs.

The deciding factors for Vercel were:

Native Preview Environments: Automatic, unique URLs for every PR. A workflow our team already relied on.
Integrated CDN & Caching: Next.js features like Incremental Static Regeneration (ISR) work out of the box, allowing us to hit a 91% cache rate without manual configuration.
Zero Infrastructure Overhead: No Dockerfiles or container orchestration; we simply push code and it deploys.
Cold Starts: Vercel optimizes for JavaScript/Node.js using a warming layer, which keeps cold starts in the 300ms–800ms range. For Cloud Run, spinning up a Next.js container typically takes 2–5 seconds.
Edge Middleware: Vercel runs middleware on the edge CDN. In our case, where middleware routes users to cached or dynamic feed detail pages, this was a significant advantage.

The Trade-off: We estimated Vercel would cost $75–$100/month (based on dev seats and traffic), compared to roughly $10–$35/month on Cloud Run. However, for a small team where engineering time is the most expensive resource, paying a premium for a platform that "just works" was a clear win for our roadmap.

What We'd Do Differently

Read the docs before asking the AI. Spending a day with the Next.js documentation end-to-end before writing any code would have surfaced PPR early enough to build around it. When you're new to a framework, it's tempting to let AI tools guide the learning, but they can only teach you what they know.

Assess AI tools' knowledge boundaries upfront. Asking the model about a recent feature and checking the response would have revealed the Next.js 16 gap immediately instead of mid-migration.

Break the first PR into smaller pieces. The initial migration PR was massive. Smaller, incremental PRs would have been easier to review, easier to roll back, and would have shown clearer progression to the team. Link to the Initial PR

What This Unlocked

Our pages are indexable, which directly affects whether transit agencies and developers find the MobilityDatabase through organic search, whether AI assistants surface us when someone asks about open transit data, and whether a shared link actually communicates what it points to.

The mobile experience got meaningfully faster, which matters more than analytics suggest. First impressions at conferences and demos happen on phones, not on the monitors we develop on.

The codebase is leaner. Removing unnecessary Redux, modernizing the auth flow, and adopting Next.js conventions gave us a foundation that's easier to build on.

Next up: on-demand cache revalidation to push hit rates higher, SEO keyword tuning to capitalize on our new visibility, adopting Partial Pre-rendering to further close the gap between static and dynamic performance, and continued work on our backlog of transit data tools.

About MobilityData

MobilityData is a global non-profit maintaining the development of open data standards that power transit and shared mobility apps worldwide. We support specs like GTFS and GBFS, working with agencies, companies, and developers to make mobility data more usable and consistent.

DEV Community