APIVerve

Posted on Feb 26 • Edited on Mar 16 • Originally published at blog.apiverve.com

What Your User Agent String Reveals

#useragent #browserdetection #analytics #http

Every HTTP request your browser makes includes a header that tells the server who's asking. It looks something like this:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36

That string — the user agent — is a mess of legacy tokens, browser names, and version numbers. It claims to be Mozilla. It claims to be Safari. It's actually Chrome.

Despite its chaotic appearance, the user agent contains valuable information hiding in plain sight. Understanding it helps you build better analytics, debug more effectively, and detect bots trying to scrape your site.

Why It Looks Like That

The user agent string is an artifact of browser wars and backwards compatibility hacks accumulated over 30 years.

In the early web, Netscape Navigator was dominant. Its user agent started with "Mozilla" (Netscape's internal codename). When sites wanted to send advanced features, they checked for "Mozilla" in the user agent.

Then Internet Explorer arrived. It needed sites to treat it like Netscape, so it called itself "Mozilla-compatible." When Opera arrived, it pretended to be IE pretending to be Mozilla. When Safari arrived, it claimed to be "like Gecko" (the Netscape rendering engine) even though it used WebKit. When Chrome arrived, it claimed compatibility with everything.

The result: every modern browser claims to be every other browser. Chrome says it's Mozilla, AppleWebKit, Gecko-like, Chrome, AND Safari. This is intentional — it's the accumulated weight of 30 years of "if you don't pretend to be everyone else, some website won't work right."

You cannot parse this string with simple substring matching. The tokens don't mean what they appear to mean. You need a parser that understands the history and extracts the actual information.

What Information Is Actually There

Despite the chaos, a properly parsed user agent reveals:

Browser and version. Not "Safari" from the string, but the actual browser. Chrome 122, Firefox 123, Safari 17.3, etc.

Operating system and version. Windows 10, macOS 14, iOS 17, Android 14, Linux, etc.

Device type. Desktop, mobile phone, tablet, TV, game console, etc.

Rendering engine. Blink (Chrome/Edge), Gecko (Firefox), WebKit (Safari), etc.

Bot indicators. Known crawlers identify themselves. Googlebot, Bingbot, and others announce who they are.

A parsed user agent might return:

{
  "browser": {
    "name": "Chrome",
    "version": "122.0.0.0"
  },
  "os": {
    "name": "Windows",
    "version": "10"
  },
  "isMobile": false,
  "isBot": false
}

That's actually useful data, extracted from the cryptic original string.

What You Can Do With It

Analytics segmentation. What browsers do your users actually use? What percentage are mobile? Should you still support IE11? (Probably not.) How much traffic comes from iOS Safari versus Android Chrome?

User agent data answers these questions. You can segment your analytics by browser, by OS, by device type. This informs development priorities, testing focus, and feature decisions.

Conditional experiences. Serve WebP images to browsers that support them. Show the app download banner only on mobile web. Use modern CSS features where supported, fallbacks where not.

User agent tells you what capabilities to expect. It's not as reliable as feature detection (more on that later), but it's useful for server-side decisions before the page even loads.

Debugging context. When a user reports "it's broken on my phone," you need more detail. User agent data tells you they're on iOS 15 Safari, which helps you reproduce the issue.

Log user agents with error reports. When you're investigating a bug, knowing "this only happens on Chrome 118 on Android" dramatically narrows your search.

Bot detection. Legitimate crawlers identify themselves. Googlebot says it's Googlebot. Bingbot says it's Bingbot. Your analytics should probably exclude these from "real user" counts.

Suspicious scrapers often have missing, unusual, or outdated user agents. A user agent from Internet Explorer 6 hitting your modern web app is probably not a real user. Neither is an empty user agent string.

Bot Detection and the User Agent

The line between "bot" and "user" is blurrier than you might think.

Clearly bots: Googlebot, Bingbot, Slurp (Yahoo), DuckDuckBot, etc. These announce themselves clearly and are usually welcome. They're how you get search engine traffic.

Probably bots: Requests with user agent strings like Python-urllib/3.9 or Go-http-client/1.1. These are programmatic requests, usually scrapers, APIs, or automated tools. Whether you want them depends on what they're doing.

Suspicious: Missing user agents, very short user agents, or user agents that claim to be browsers from 10 years ago. Real users don't usually have these. Bots that forgot to set a realistic user agent do.

Hard to tell: Sophisticated bots that use real browser user agents. A scraper running in a real Chrome instance has a perfect Chrome user agent. You can't distinguish it from a human on user agent alone.

This is why user agent is one signal among many for bot detection. Combine it with behavior analysis (how fast do they browse? do they execute JavaScript? do they move the mouse?), IP reputation, and other signals.

What It Can't Do

User agent detection has real limitations. Understanding them prevents misuse.

Security decisions: Never. User agents are trivially spoofed. Any client can send any user agent. If you're using user agent for access control, authentication, or security, you're making a mistake.

Feature detection: User agent tells you what browser someone claims to be using. It doesn't tell you what features are actually available in their specific browser version on their specific device.

The right approach for feature detection is to actually detect the feature:

// Don't: Assume based on user agent
if (userAgent.includes('Chrome')) {
  useWebPImages();
}

// Do: Actually check for support
if (document.createElement('canvas').toDataURL('image/webp').indexOf('data:image/webp') === 0) {
  useWebPImages();
}

Feature detection beats user agent detection for deciding what code paths to execute. User agent is for analytics, debugging, and rough optimization hints — not capability detection.

Reliable mobile detection: Most mobile detection based on user agent works, but edge cases abound. Tablets identify differently. Desktop browsers can be set to mobile mode. Mobile browsers can request desktop sites. Don't assume mobile detection is 100% accurate.

The Privacy Dimension

User agents are a fingerprinting vector. Combined with other signals (screen size, installed fonts, timezone, etc.), the user agent contributes to a unique fingerprint that can track users across sites without cookies.

This is why browsers are reducing the information they send. Chrome's User-Agent Reduction initiative progressively removes details from the user agent string. Instead of revealing the exact browser version and OS version, newer Chrome versions send less specific information.

The replacement is User-Agent Client Hints, a more structured and privacy-respecting way to request device information. Instead of broadcasting everything in every request, the server asks for what it needs, and the browser can control what it reveals.

If you're building something that relies on detailed user agent parsing, be aware that the detail available is decreasing over time. Plan for a future where you get less information by default.

Practical Implementation

When you need to parse user agents, use an API rather than rolling your own regex:

async function analyzeUserAgent(uaString) {
  const response = await fetch('https://api.apiverve.com/v1/useragentparser', {
    method: 'POST',
    headers: {
      'x-api-key': API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ ua: uaString })
  });

  const { data } = await response.json();
  return {
    browser: data.browser.name,
    browserVersion: data.browser.version,
    os: data.os.name,
    osVersion: data.os.version,
    isMobile: data.isMobile,
    isBot: data.isBot
  };
}

The API handles the complexity of parsing the chaotic user agent format and returns clean, structured data.

For high-volume applications (logging every request), consider:

Caching parsed results by user agent string (there are millions of user agents, but you'll see the same ones repeatedly)
Parsing asynchronously / in batches rather than inline
Storing raw user agent strings and parsing later for analytics

User Agent in Logging

Including user agent in your logs is valuable for debugging but requires thought:

Log it with errors. When something breaks, knowing the user's browser helps reproduce the issue.

Log it with significant events. Login events, purchases, sign-ups. If you need to investigate suspicious activity, user agent helps identify device patterns.

Don't log it with everything. Every request generates logs. Storing full user agent strings everywhere gets expensive.

Consider storage efficiency. User agent strings are often 200+ characters. For high-volume logging, consider storing a normalized version or hash with lookup table.

The Future

User agent strings are a historical accident that became infrastructure. They'll be with us for a long time, but their role is shifting.

Client Hints will gradually replace user agent for capability detection. Privacy requirements will reduce the specificity of user agent strings. But the basic concept — clients identifying themselves to servers — isn't going away.

If you're building user agent detection today, build it flexibly. Handle missing or reduced information gracefully. Don't depend on specific formatting that might change.

The user agent is a window into who's visiting your site. It's not a perfect window, and it's getting more frosted over time, but it's still valuable for analytics, debugging, and understanding your audience.

Parse user agents accurately with the User Agent Parser API. Detect crawlers and bots with the Bot Detector API. Understand who's visiting your site.

Originally published at APIVerve Blog

DEV Community