DEV Community

Surjeet Bhadauriya
Surjeet Bhadauriya

Posted on

One liner - remove html tags from a string

var plainText = content.replace(/<[^>]*>/g, '');
Enter fullscreen mode Exit fullscreen mode

Refer: https://stackoverflow.com/questions/51195143/is-there-a-way-to-remove-html-tags-from-a-string-in-javascript/51195294#51195294

Top comments (9)

Collapse
 
frankwisniewski profile image
Frank Wisniewski • Edited

why so complicated, use textContent

<!DOCTYPE html>
<html lang=de>
  <meta charset=UTF-8>
  <title>delete tags</title>
  <div id="myContainer">
    <h1>myHeader</h1>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Incidunt, vitae.</p>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Incidunt, vitae.</p>
  </div>
  <script>
  "use strict";
  let myContainerText = myContainer.textContent,
      plainText = myContainer.innerHTML.replace(/<[^>]*>/g, '');
  console.log(plainText===myContainerText) // true
  </script>
Enter fullscreen mode Exit fullscreen mode
Collapse
 
suri66 profile image
Surjeet Bhadauriya

To be honest it is all about this line

.replace(/<[^>]*>/g, '');

Collapse
 
frankwisniewski profile image
Frank Wisniewski • Edited

It's really not a good idea to parse HTML with regex...

Look at the following sample:

<!DOCTYPE html>
<html lang=de>
  <meta charset=UTF-8>
  <title>delete tags</title>
    <div id="c">
      <p>This is a &lt;H1&gt; tag</p>
    </div>
  <script>
  "use strict";
  const extractPlainText = str => 
    new DOMParser()
      .parseFromString(str, "text/html")
      .documentElement.textContent
  console.log(
    extractPlainText(c.innerHTML)
  )
  // This is a <H1> tag
  console.log(
    c.innerHTML.replace(/<[^>]*>/g, '')
  )
  //     This is a &lt;H1&gt; tag
  </script>
Enter fullscreen mode Exit fullscreen mode

HTML entities cause problems...

Thread Thread
 
fjones profile image
FJones

Don't parse HTML with Regex. Golden Rule of Regex on the web. HTML is a context-free language, not a regular one. As such, a regular automaton isn't going to suffice for all intricacies.
Extended Regex mitigates some of those issues, but in JS in particular, .textContent is the superior choice.

Collapse
 
mistval profile image
Randall

Doesn't work. For example:

'<a href="abc>xyz" />'.replace(/<[^>]*>/g, ''); -> xyz\" />

Don't use regex to parse HTML. It may work in most cases, but you'll be caught flat-footed by corner cases.

Collapse
 
frankwisniewski profile image
Frank Wisniewski • Edited

The topic is more complex than expected. There are leading spaces, comments, etc.
Text content alone is not enough.
The following example also shows the regex problem.

<!DOCTYPE html>
<html lang=de>
  <meta charset=UTF-8>
  <title>delete tags</title>
    <div id="c">
      <p>vote for</p>
      <!-- >Trump not for-->
      <p>Biden</p>
    </div>
  <script>
  "use strict";
  const extractPlainText = str => 
    new DOMParser()
      .parseFromString( str, "text/html ")
      .documentElement.textContent
      .split( '\n' )
      .map( el => el.trim() )
      .filter( x => x.length > 0 )
      .join( '\n' )

  console.log(
    extractPlainText(c.innerHTML)
  )
// vote for
// Biden

  console.log(
    c.innerHTML.replace(/<[^>]*>/g, '')
  )
//     vote for 
//     Trump not for -->
//     Biden
  </script>
Enter fullscreen mode Exit fullscreen mode
Collapse
 
nunoa21 profile image
NunoA21 • Edited

Nice, never thought about that one :)
The other day I was following a react tutorial and I've found that you could do something like this in react (in case you're interested):

<p>{dangerouslySetInnerHTML={{ __html: details.instructions }}}</p>
Enter fullscreen mode Exit fullscreen mode

The "dangerouslySetInnerHTML" is from react, and you can use when you're receiving text with HTML quotes, like the following:

{
    "text": "<p> Hello <b>World</b>!</p>"
}
Enter fullscreen mode Exit fullscreen mode
 
fjones profile image
FJones

(To be fair, web crawling and the like may in fact require parsing HTML like that, but at that point actually parsing it is the preferable solution.)

Collapse
 
willaiem profile image
Damian Żygadło

but when i need to remove it anyway?