DEV Community

Discussion on: Daily Challenge #89 - Extract domain name from URL

Collapse
 
willsmart profile image
willsmart • Edited

One in JS

hostName is based on a quick reading of the spec and should cope with usernames and ports. Uniform_Resource_Identifier on wikipedia

salientSubdomain is the human-readable part of a domain name host as reqd. It just clips off www's and TLD's, with a little complication to handle non-matching strings cleanly.

const hostName = url => /^(?:[^:]+:\/\/(?:[^@\/?]+@)?([^:\/?]+))?/.exec(url)[1]
const salientSubdomain = url => /^(?:(?:www\.)?(.+?)(?:\.[a-zA-Z]+)?)?$/.exec(domainName(url)||'')[1]

const testUrls = [
"https://twitter.com/explore",
"https://github.com/thepracticaldev/dev.to",
"https://www.youtube.com",
"https://will:p4ssw0rd@google.com?q=cybersecurity",
"https://a.b.c.d:8080",
"mailto:mr@willsm.art",
"http://192.168.1.60:3000/home"
];

console.log(testUrls.map(hostName))
console.log(testUrls.map(salientSubdomain))

output:

[ "twitter.com", "github.com", "www.youtube.com", "google.com", "a.b.c.d", undefined, "192.168.1.60" ]
[ "twitter", "github", "youtube", "google", "a.b.c", undefined, "192.168.1.60" ]

The hostname extractor regex looks fairly funky but isn't too bad if you break it down into parts...
Regular expression visualization
(vis by Debuggex, which rocks)