DEV Community

Daniel Rotter
Daniel Rotter

Posted on • Originally published at danielrotter.at

Finding all HTML tags in a project not being self-closed

I am currently working on upgrading an existing Vue project from version 2 to 3, which involves
quite some breaking changes. I don't want to go into the details,
but at one point it was useful to find all elements of a certain Vue component that were not self-closed. In this
specific, case it was about a base-input component. The following cases were of interest to me:

<base-input value="Some text"></base-input>
<base-input disabled>Some text in a slot</base-input>
Enter fullscreen mode Exit fullscreen mode

However, the following were not:

<base-input value="Some text" />
<base-input disabled />
Enter fullscreen mode Exit fullscreen mode

There were quite some occurrences of this component in the entire project, therefore just searching for base-input was
not going to cut it for me. Instead, I decided to use regular expressions resp. regex with
ripgrep. After installing ripgrep it provides a rg command line tool.

The following solution worked for my use case:

rg --multiline '<base-input[^>]*[^/]>'
Enter fullscreen mode Exit fullscreen mode

Let's break it down:

  1. The --multiline flag will make sure that this pattern is also matched across multiple lines, i.e. the match can contain line breaks.
  2. The <base-input will be searched for literally, i.e. this exact character sequence.
  3. With [^>]* an arbitrary amount (that's what * stands for) of characters not being > will be matched.
  4. After that, there must be at least one character not being a /, which would indicate a self-closing tag.
  5. Finally, the > finishes the tag.

Although this works for the above examples, it is not a universal solution to the problem. It does for instance not
match the following cases:

<base-input></base-input>
<base-input value="Some > text" />
Enter fullscreen mode Exit fullscreen mode

The first line will not be matched, because there must be at least one character not being / after the <base-input
literal. Fortunately, that was not a problem for me, since I knew that using that component without attributes does not
make any sense, so I could ignore that case.

The second line will match although it shouldn't, since it recognized the > within the quotes as the end of the tag.
This will result in a false positive, but that was also fine for me since this did not occur quite often in the code
base.

Unfortunately, it is not even possible to write a full HTML parser using regular expressions, even though so many people
ask about this on Stack Overflow that they've decided to make this part of their regular expressions
FAQ
. But that
should not stop you from using regular expressions to do quick one-off tasks such as finding some occurrences in a big
code base if you know the limitations and how they might affect the results.

Top comments (0)