DEV Community

Matt Kenefick
Matt Kenefick

Posted on

Regex: Fix duplicate slashes without affecting protocol

Let’s say you want to fix a URL that looks like:
Enter fullscreen mode Exit fullscreen mode

Using a string replace or a simple regex could incorrectly “fix” the double slashes following the protocol. We can fix that by using a negative lookbehind.

Enter fullscreen mode Exit fullscreen mode

For PHP:

$url = '';
$str = preg_replace('#(?<!:)/+#im', '/', $url);

### For Javascript:

let url = '';
url.replaceAll(/(?<!:)\/+/gm, '/');
// ""

Enter fullscreen mode Exit fullscreen mode

Top comments (2)

joshcheek profile image
Josh Cheek • Edited

replaceAll is really new and, eg, not available on my version of node (14.16), but the normal replace works fine with the /g flag.

It is usually best to use libraries for things that have well specified structures. The libs will actually parse the string according to the spec and make sure everything is valid. I haven't done this in JS before, but looking at docs, it seems like it should be this (note that I don't have a Windows machine to test it on, I assume the path.posix should do it, but haven't verified):

const path = require("path")

function normalizePath(urlString) {
  const url = new URL(urlString)
  url.pathname = path.posix.normalize(url.pathname)
  return url.toString()

Enter fullscreen mode Exit fullscreen mode

Here, I've given it a pretty wonky looking path, but that path is valid. Yeah, you can apparently have colons in the path 🤷 (see pchar here).

All that said, you actually can parse a URI with a regex, but it's a bit of a chore. Eg Ruby's standard library does it:

$ ruby -r uri -e 'p URI::RFC3986_Parser::RFC3986_URI' | wc -c

$ ruby -r uri -e 'p URI::RFC3986_Parser::RFC3986_URI'
Enter fullscreen mode Exit fullscreen mode
grahamthedev profile image

simple but effective, do you not need it to be

url = url.replaceAll to work in JS though?