DEV Community

Marcc Atayde
Marcc Atayde

Posted on

Technical SEO Audit Checklist for Modern Web Applications: What Crawlers Actually See

You shipped a beautiful web application. Clean code, smooth UX, fast on your machine. Then you check Google Search Console and realize your pages are barely indexed, your structured data is throwing errors, and half your canonical tags are pointing to the wrong URLs. Sound familiar?

Technical SEO is the unsexy foundation that either unlocks or blocks all the content work you do on top of it. This audit checklist is built for developers — not marketers — so we'll go deep on the implementation details, not just the theory.

1. Crawlability and Indexation

Before anything else, you need to verify that Googlebot can actually find and read your pages.

robots.txt

Your robots.txt lives at the root of your domain. A common mistake in Laravel apps is accidentally blocking crawlers in production because someone copied a staging config.

User-agent: *
Disallow: /admin/
Disallow: /api/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml
Enter fullscreen mode Exit fullscreen mode

Verify it at https://yourdomain.com/robots.txt and test specific URLs using Google Search Console's URL Inspection tool.

XML Sitemap

Your sitemap should include all canonical, indexable URLs — nothing behind auth walls, nothing with noindex. In Laravel, the spatie/laravel-sitemap package makes this straightforward:

use Spatie\Sitemap\Sitemap;
use Spatie\Sitemap\Tags\Url;

Sitemap::create()
    ->add(
        Url::create('/blog')
            ->setLastModificationDate(now())
            ->setChangeFrequency(Url::CHANGE_FREQUENCY_DAILY)
            ->setPriority(0.8)
    )
    ->writeToFile(public_path('sitemap.xml'));
Enter fullscreen mode Exit fullscreen mode

Don't just generate it once — hook it into your deployment pipeline or schedule it via php artisan schedule:run.

2. Canonical Tags and Duplicate Content

Duplicate content is one of the most common technical SEO issues, especially in e-commerce and CMS-driven apps. URL variations like ?ref=newsletter, ?sort=price, or trailing slash inconsistencies all create duplicate signals.

Every page needs a self-referencing canonical

<link rel="canonical" href="https://yourdomain.com/products/running-shoes" />
Enter fullscreen mode Exit fullscreen mode

In Laravel Blade, centralise this:

<link rel="canonical" href="{{ $canonical ?? url()->current() }}" />
Enter fullscreen mode Exit fullscreen mode

Then in your controllers or Livewire components, explicitly set the canonical when needed — especially for paginated pages, filtered product listings, or tag archives.

HTTP vs HTTPS, WWW vs non-WWW

Pick one and redirect everything else to it with a 301. Check your .htaccess or Nginx config. This should be handled at the server level, not just in Laravel's middleware.

3. Structured Data (Schema Markup)

Structured data doesn't guarantee rich results, but it does help Google understand your content. For a web app, the relevant schemas are usually Article, Product, FAQPage, BreadcrumbList, and LocalBusiness.

@push('head')
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "{{ $post->title }}",
  "datePublished": "{{ $post->published_at->toIso8601String() }}",
  "author": {
    "@type": "Person",
    "name": "{{ $post->author->name }}"
  },
  "image": "{{ $post->og_image_url }}"
}
</script>
@endpush
Enter fullscreen mode Exit fullscreen mode

Validate everything using Google's Rich Results Test and the Schema Markup Validator.

4. Core Web Vitals and Page Experience Signals

Google's Page Experience signals include LCP, INP (replacing FID), and CLS. These are measurable, fixable, and directly tied to ranking.

  • LCP (Largest Contentful Paint): Should be under 2.5s. Preload your hero images with <link rel="preload" as="image">. Lazy-load everything below the fold.
  • INP (Interaction to Next Paint): Heavy JavaScript blocking the main thread is the usual culprit. Audit your JS bundle — Alpine.js stays lean, but watch for third-party scripts.
  • CLS (Cumulative Layout Shift): Always set explicit width and height on images and iframes. Reserve space for async-loaded UI elements.

Run npx lighthouse https://yourdomain.com --view locally for a quick diagnostic.

5. Metadata Completeness

Every page needs a unique, descriptive <title> and <meta name="description">. These won't directly boost rankings but they affect click-through rates, which does matter.

<title>{{ $page->seo_title ?? $page->title . ' | ' . config('app.name') }}</title>
<meta name="description" content="{{ $page->meta_description ?? $page->excerpt }}" />
Enter fullscreen mode Exit fullscreen mode

Also audit your Open Graph and Twitter Card tags — these control how your pages look when shared:

<meta property="og:title" content="{{ $page->og_title ?? $page->title }}" />
<meta property="og:image" content="{{ $page->og_image ?? asset('images/default-og.jpg') }}" />
<meta property="og:type" content="website" />
Enter fullscreen mode Exit fullscreen mode

Keep titles under 60 characters and descriptions under 155. Use a spreadsheet to audit them at scale — export your URLs and titles via a crawler like Screaming Frog.

6. Mobile and Internationalisation

Mobile-First Indexing

Google now indexes the mobile version of your site first. Test with Chrome DevTools in mobile emulation and verify your responsive breakpoints aren't hiding critical content behind JavaScript toggles.

hreflang for Multi-Language Apps

If you're running a multi-language Laravel app, hreflang tells Google which version to serve for which locale:

<link rel="alternate" hreflang="en" href="https://yourdomain.com/en/about" />
<link rel="alternate" hreflang="ar" href="https://yourdomain.com/ar/about" />
<link rel="alternate" hreflang="x-default" href="https://yourdomain.com/en/about" />
Enter fullscreen mode Exit fullscreen mode

This is particularly relevant for businesses operating in multilingual markets — something the team at HanzWeb.ae encounters regularly when building regional web applications for clients across the UAE and MENA.

7. HTTPS, Security Headers, and URL Structure

  • Ensure all internal links use HTTPS. Mixed content warnings can affect crawl behaviour.
  • Use descriptive, hyphenated slugs: /blog/technical-seo-audit not /blog?id=87
  • Avoid deep URL nesting beyond three levels
  • Return proper HTTP status codes: 404 for missing pages, 410 for intentionally deleted content, 301 for permanent redirects

Check your redirect chains — a 301 that hits another 301 before reaching the destination wastes crawl budget and dilutes link equity.

8. Log File Analysis (Underused but Powerful)

Server logs tell you exactly what Googlebot is crawling and how often. Tools like Screaming Frog Log Analyzer or even a simple grep on your Nginx/Apache logs can reveal:

  • Pages being crawled but not indexed
  • Soft 404s (pages returning 200 but showing empty content)
  • Crawl budget being wasted on paginated parameter URLs
grep 'Googlebot' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -20
Enter fullscreen mode Exit fullscreen mode

This gives you the top 20 URLs Googlebot is spending time on. If it's hitting /api/ endpoints or admin routes, fix your robots.txt immediately.

Putting It All Together

Technical SEO isn't a one-time task — it's an ongoing audit practice. The checklist above covers the highest-impact areas, but the real discipline is building these checks into your development workflow rather than treating them as an afterthought post-launch.

Set up a quarterly crawl with Screaming Frog, monitor Search Console weekly for coverage errors, and make structured data and canonical logic part of your page templates from day one. The applications that rank consistently aren't the ones with the cleverest content strategy — they're the ones with a technically sound foundation that search engines can trust.

Top comments (0)