By the way, I would use the regexp /(\w\S*)+/g. That allows for apostrophes and dashes, and doesn't break a url into separate words.

Whatever regex you choose, the result is an approximate wordcount.

For instance, it's , do we count this as 1 or 2 words? It is written like a single word, but technically it is 2, it is shorthand for: it is. Apostrophes can also be used to describe possession like: Bob's cat, do we count Bob's as one word? Different contexts can have different meaning.

Hyphens can be used to make compound words such as: vice-president, twenty-one, rock-hard, and load-bearing. Can we say all of these are the same (1 or 2 words)? Some compound words have no hyphen such as headache.

Whichever regex you use, to have a consistent interpretation is probably not possible ! You need natural language processing to make it more accurate and consistent.

