loading...

Daily Challenge #89 - Extract domain name from URL

thepracticaldev profile image dev.to staff ・1 min read

Write a function that, when given a URL as a string, returns only the domain name as a string.

domainName(https://twitter.com/explore) == "twitter"
domainName(https://github.com/thepracticaldev/dev.to) == "github"
domainName(https://www.youtube.com) == "youtube"

Good luck!


Want to propose a challenge idea for a future post? Email yo+challenge@dev.to with your suggestions!

Discussion

pic
Editor guide
Collapse
zerquix18 profile image
I'm Luis! \^-^/

Javascript!

function domainName(domain) {
  const a = document.createElement('a');
  a.href = domain;
  const { hostname } = a;
  const hostSplit = hostname.split('.');
  hostSplit.pop();
  if (hostSplit.length > 1) {
    hostSplit.shift();
  }
  return hostSplit.join();
}

domainName('https://twitter.com/explore') == "twitter"
domainName('https://github.com/thepracticaldev/dev.to') == "github"
domainName('https://www.youtube.com') == "youtube"
Collapse
ashawe profile image
Harsh Saglani

Can you please explain what

const { hostname } = a;

does and how?

Collapse
ap13p profile image
Afief S

It has the same effect as:
const hostname = a.hostname

Collapse
midasxiv profile image
Midas/XIV

That's called destructuring assignment. "a" probably is an object which has the hostname property so that assignment extracts hostname.
I'm just writin about this 😅. Hopefully that'll help you.

Collapse
erezwanderman profile image
erezwanderman

Actually the domain name is the whole thing - "twitter.com", "github.com", "youtube.com". What do you want to get for something like "a.b.c.ac.il"?

Collapse
eyemyth profile image
Jay Thompson

Even "com" or whatever other TLD is a domain name. The most specific domain name in "www dot youtube dot com" is "www".

(edited because def dot to removed the www from my youtube URL.)

Collapse
savagepixie profile image
SavagePixie

JavaScript one-liner

const domainName = url => url.replace(/https?:\/\/(?:www\.)?/, "").split(".")[0]
Collapse
aminnairi profile image
Amin

Haskell

Note: Assuming the string will always start with either http:// or https:// or nothing but not supporting any other protocols (by lazyness).

Note2: This won't work for the URIs that have a dot in their name, still working on it.

Note3: Working all good but the end is a mess haha...

import Control.Arrow
import Data.List (isPrefixOf)

removeProtocol :: String -> String
removeProtocol "" = ""
removeProtocol string@(firstCharacter : rest)
    | isPrefixOf https string = removeProtocol (drop (length https) string)
    | isPrefixOf http string = removeProtocol (drop (length http) string)
    | otherwise = firstCharacter : removeProtocol rest
    where
        https = "https://"
        http = "http://"

countDots :: String -> Int
countDots =
    filter (== '.') >>> length

dropUntilWithDot :: String -> String
dropUntilWithDot =
    dropWhile (/= '.') >>> drop 1 

domainName :: String -> String
domainName url 
    | countDots url == 1 = takeWhile (/= '.') urlWithoutProtocol
    | otherwise = takeWhile (/= '.') $ iterate dropUntilWithDot url !! (countDots (takeWhile (/= '/') urlWithoutProtocol) - 1)
    where urlWithoutProtocol = removeProtocol url

main :: IO ()
main = do
    print $ domainName "domain.com"                                 -- domain
    print $ domainName "http://domain.com"                          -- domain
    print $ domainName "https://domain.com"                         -- domain
    print $ domainName "api.domain.com"                             -- domain
    print $ domainName "http://api.domain.com"                      -- domain
    print $ domainName "https://api.domain.com"                     -- domain
    print $ domainName "dev.api.domain.com"                         -- domain
    print $ domainName "http://dev.api.domain.com"                  -- domain
    print $ domainName "https://dev.api.domain.com"                 -- domain
    print $ domainName "https://dev.api.domain.com/something.cool"  -- domain

Try it online

Hosted on Repl.it.

Collapse
larisho profile image
Gab

Have you tried it using a .co.uk TLD? :D

Collapse
aminnairi profile image
Amin

No I didn't and I assume it will fail for this particular case. Do you have any way on improving this one?

Thread Thread
larisho profile image
Gab

Unfortunately, I think the only way to improve on it is to use the list of all TLDs to find how much of the end of the domain is TLD.

Collapse
erezwanderman profile image
erezwanderman

Javascript, using npm package "tldjs"

const { parse } = require("tldjs");

const regexify = str => {
  return str.replace(/[|\\{}()[\]^$+*?.]/g, "\\$&");
};

const getDomain = url => {
  const parseResult = parse(url);
  console.log(parseResult);
  if (parseResult.domain) {
    return parseResult.domain.replace(
      RegExp(regexify("." + parseResult.publicSuffix) + "$"),
      ""
    );
  } else {
    return parseResult.hostname;
  }
};

Try it: codesandbox.io/s/affectionate-dawn...

Collapse
zizzyzizzy profile image
ZizzyZizzy

Jesus, NO! This is what's wrong with development today. This is equivalent to killing a fly with a Sherman tank.

Please, please, for the love of God and companies everywhere that are sick of obfuscated, confusing unmanageable, unmaintainable and insecure code, please look at the one line pure Javascript code above this answer.

There is absolutely no reason on earth to include a library with hundreds of lines of code to perform a simple operation.

Collapse
erezwanderman profile image
erezwanderman

Your comment is what's wrong with development today. Taking the shortest solution that seems to somehow solve the vague requirements and declaring it solved and secure.
Each of the JS solutions will fail for one of these test URLs: 'a.b.c.ac.il/', 'news.com.au/', 'youtube.com'.
Except for mine.

Collapse
ashawe profile image
Harsh Saglani

O(N) approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    string domain = "";
    bool flag = true;
    for(int i = 0; i < url.size(); i++)
    {
        if(flag)
        {
            if(url[i] == '/')
            {
                flag = false;
                i+=2;
                if(url[i] == 'w')
                    i = i+3;
                else
                    i--;
            }
            continue;
        }
        else if(url[i] == '.')
            return domain;
        domain += url[i];
    }
    return domain;
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}

Naive approach:

#include<iostream>
#include <string>
using namespace std;
string domainName(string url)
{
    int x = url.find("www");
    if(x==string::npos)
    {
        x = url.find("//");
        x+=2;
    }
    else
        x+=4;
    return url.substr(x,url.find(".com")-x);
}
int main()
{
    string url;
    cin >> url;
    cout << domainName(url) << endl;
    return 0;
}
Collapse
kvharish profile image
K.V.Harish

My solution in js

const domainName = (url) => {
  const match = url.match(/:\/\/(www[0-9]?\.)?(.[^/:]+)/i);
  return (match && match.length > 2 && typeof match[2] === 'string' && match[2].length) ? match[2].split('.')[0] : null;
}
Collapse
hexangel616 profile image
Emilie Gervais

JavaScript:

const domainName = url => {
    let hostName = new URL(url).hostname;
    let domain = hostName;
    if (hostName != null) {
        let str = hostName.split('.').reverse();
        if (str != null && str.length > 1) {
            domain = str[1] + '.' + str[0];
            if (hostName.indexOf(/[^/]+((?=\/)|$)/g) != -2 && str.length > 2) {
                domain = str[2] + '.' + domain;
            }
        }
    }
    return domain.split('.')[0];
}

considering subdomains & second-level domains (ie. '.bc.ca')

Collapse
mauamy profile image
mauamy

Python one-liner:

def domain_name(url):
    return url.split("/")[2].split(".")[-2]
Collapse
willsmart profile image
willsmart

One in JS

hostName is based on a quick reading of the spec and should cope with usernames and ports. Uniform_Resource_Identifier on wikipedia

salientSubdomain is the human-readable part of a domain name host as reqd. It just clips off www's and TLD's, with a little complication to handle non-matching strings cleanly.

const hostName = url => /^(?:[^:]+:\/\/(?:[^@\/?]+@)?([^:\/?]+))?/.exec(url)[1]
const salientSubdomain = url => /^(?:(?:www\.)?(.+?)(?:\.[a-zA-Z]+)?)?$/.exec(domainName(url)||'')[1]

const testUrls = [
"https://twitter.com/explore",
"https://github.com/thepracticaldev/dev.to",
"https://www.youtube.com",
"https://will:p4ssw0rd@google.com?q=cybersecurity",
"https://a.b.c.d:8080",
"mailto:mr@willsm.art",
"http://192.168.1.60:3000/home"
];

console.log(testUrls.map(hostName))
console.log(testUrls.map(salientSubdomain))

output:

[ "twitter.com", "github.com", "www.youtube.com", "google.com", "a.b.c.d", undefined, "192.168.1.60" ]
[ "twitter", "github", "youtube", "google", "a.b.c", undefined, "192.168.1.60" ]

The hostname extractor regex looks fairly funky but isn't too bad if you break it down into parts...
Regular expression visualization
(vis by Debuggex, which rocks)

Collapse
kjhank profile image
Krzysztof Hankiewicz

Quick & dirty ugly chain if you don't want to research regex:

const domainName = (domain) => domain.split('://')[1].split('/')[0].includes('www.') ? domain.split('://')[1].split('/')[0].split('www.')[1].split('.')[0] : domain.split('://')[1].split('/')[0].split('.')[0]
Collapse
ap13p profile image
Afief S

Oneline with javascript

const domainName = d => new URL(d).hostname.split(".").shift()
Collapse
saikiran profile image
Sai Kiran

No regex needed

 function domainName(url){
     return url.split("/")[2].split(".").slice(-2)[0]
 }
Collapse
akwetey profile image
Jonathan Akwetey

it got to be in my fav JavaScript!!

Alt text of image