DEV Community

Cover image for You're parsing URLs wrong.
Vincent Ge
Vincent Ge

Posted on

You're parsing URLs wrong.

There are somethings that you should never build by yourself. Not because it's difficult, but because it's time consuming and filled with gotchas.

One of these things is URL parsing.

Try implementing your own URL parsing

Raise your hand if you've done this before ✋

const baseUrl = "https://example.com";
const endpoint = "api";
const resourceId = "123";

const fullUrl = `${baseUrl}/${endpoint}/${resourceId}`;
console.log(fullUrl); // Output: https://example.com/api/123
Enter fullscreen mode Exit fullscreen mode

But you're not a barbarian, maybe you abstract this into a function:

function joinUrlParts(...parts) {
    return parts.map(part => part.trim()).join('/');
}

// Example usage:
const baseUrl = "https://example.com";
const endpoint = "api";
const resourceId = "123";

const fullUrl = joinUrlParts(baseUrl, endpoint, resourceId);
console.log(fullUrl); // Output: https://example.com/api/123
Enter fullscreen mode Exit fullscreen mode

Well, that's certainly better, but this example can break with a single stray /:

// Example usage that breaks:
const baseUrl = "https://example.com";
const endpoint = "api/";
const resourceId = "123";

const fullUrl = joinUrlParts(baseUrl, endpoint, resourceId);
console.log(fullUrl); // Output: https://example.com/api//123
Enter fullscreen mode Exit fullscreen mode

So maybe you tried again, you add sanitation to your input.

function joinUrlParts(parts) {
    // Trim leading and trailing slashes from each part
    const sanitizedParts = parts.map(part => part.trim().replace(/^\/+|\/+$/g, ''));
    // Join the sanitized parts with slashes
    return sanitizedParts.join('/');
}
Enter fullscreen mode Exit fullscreen mode

This looks bullet proof? Right? Not that many edge cases!

Well, yes, but actually no.

Well yes but actually no meme

You still have to support some interesting use cases:

  • What about joining https://example.com/cats and ../dogs/corgie.
  • This doesn't understand https://example.com/dogs/corgie#origins.
  • This doesn't escape characters http://www.example.com/d%C3%A9monstration.html
  • This doesn't accept query parameters https://some.site/?id=123
  • This doesn't let you parse URLs.

The list goes on. You CAN implement what

But why does this matter?

If you write modern JavaScript, you know that the URL object exists from JS Web APIs.

But JavaScript didn't always have a good way to construct and parse URLs built in. The URL object was first included in the ECMAScript 2015 specs.

There are still lots of older videos and blogs that parse URLs with all kinds of fancy magic like JavaScript - How to Get URL Parameters - Parse URL Variables
and some fun workarounds like The lazy man's URL parsing.

When I wrote my first few JavaScript projects in 2016, I parsed and build URLs with all kinds giant loops and regex. The solutions were hard to read at best, and extremely buggy at worse.

Then I used Node.js path.join() Method in my later projects. Until this week, I had assumed this was still the way to go. I even tried importing browserify's path implementation for the browser.

The modern JS way to URL handling

If you've been sleeping under a rock like me, here's a quick overview of the URL object from Web APIs.

// Create a new URL object
const url = new URL('https://example.com/path?param1=value1&param2=value2#section');

// Parse the URL
console.log('Host:', url.host); // Host: example.com
console.log('Path:', url.pathname); // Path: /path
console.log('Search Params:', url.searchParams.toString()); // Search Params: param1=value1&param2=value2
console.log('Hash:', url.hash); // Hash: #section

// Update pars of the URL
url.protocol = 'https';
url.host = 'new.example.com';
url.searchParams.set('param3', 'value3');
url.hash = '#updated-section';

// Recreate the URL
const rebuiltUrl = new URL(url.href);

// Update the URL object directly
rebuiltUrl.searchParams.set('param2', 'updatedValue');

// Print the updated URL
console.log('Rebuilt URL:', rebuiltUrl.href); // Rebuilt URL: https://new.example.com/path?param1=value1&param2=updatedValue&param3=value3#updated-section
Enter fullscreen mode Exit fullscreen mode

Browser compatibility

All remotely recent, modern browsers will support the URL library. Some methods may not be fully implemented, but the basic usage remains consistent. Find the full browser compatibility table here.

If you must support ancient browsers, the core-js project provides a polyfill for older browsers.

Bottom line

Please use native Web APIs. Avoid building one off utility classes for common actions. JavaScript is constantly changing, I wrote my first lines of JS in 2016 and I'm constantly finding myself leaning into old, outdated information.

If you have a moment, take a look through the Web API docs from MDN. I guarantee you will find something new that solves a problem you needed to build your own solution for in the past.

A fun aside

I still hate how the JavaScript Date object's interface is so lacking.

Try finding a date 5 days in the past, or comparing if two events happened 3 days apart.

The fact that moment.js or day.js needs to exist in 2024 bothers me a lot.

More fun stuff

Come chat with me in these places:

Top comments (0)