var plainText = content.replace(/<[^>]*>/g, '');
For further actions, you may consider blocking this person and/or reporting abuse
var plainText = content.replace(/<[^>]*>/g, '');
For further actions, you may consider blocking this person and/or reporting abuse
IHesamI -
Njabulo Majozi -
Dipak Ahirav -
Dipak Ahirav -
Top comments (9)
why so complicated, use textContent
To be honest it is all about this line
.replace(/<[^>]*>/g, '');
It's really not a good idea to parse HTML with regex...
Look at the following sample:
HTML entities cause problems...
Don't parse HTML with Regex. Golden Rule of Regex on the web. HTML is a context-free language, not a regular one. As such, a regular automaton isn't going to suffice for all intricacies.
Extended Regex mitigates some of those issues, but in JS in particular,
.textContent
is the superior choice.Doesn't work. For example:
'<a href="abc>xyz" />'.replace(/<[^>]*>/g, '');
->xyz\" />
Don't use regex to parse HTML. It may work in most cases, but you'll be caught flat-footed by corner cases.
The topic is more complex than expected. There are leading spaces, comments, etc.
Text content alone is not enough.
The following example also shows the regex problem.
Nice, never thought about that one :)
The other day I was following a react tutorial and I've found that you could do something like this in react (in case you're interested):
The "dangerouslySetInnerHTML" is from react, and you can use when you're receiving text with HTML quotes, like the following:
(To be fair, web crawling and the like may in fact require parsing HTML like that, but at that point actually parsing it is the preferable solution.)
but when i need to remove it anyway?