DEV Community

Waylon Walker
Waylon Walker

Posted on โ€ข Originally published at waylonwalker.com

2 1

Find all Headings with BeautifulSoup

BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn't.

Make an example

sample.html

Lets make a sample.html file with the following contents. It mainly has some headings, <h1> and <h2> tags that I want to be able to find.

<!DOCTYPE html>
<html lang="en">
  <body>
    <h1>hello</h1>
    <p>this is a paragraph</p>
    <h2>second heading</h2>
    <p>this is also a paragraph</p>
    <h2>third heading</h2>
    <p>this is the last paragraph</p>

  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

Get the headings with BeautifulSoup

Lets import our packages, read in our sample.html using pathlib and find all headings using BeautifulSoup.

from bs4 import BeautifulSoup from pathlib import Path

soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$"))
Enter fullscreen mode Exit fullscreen mode

And what we get is a list of bs4.element.Tag's.

>> print(headings)
[<h1>hello</h1>, <h2>second heading</h2>, <h2>third heading</h2>]
Enter fullscreen mode Exit fullscreen mode

I recently added a heading_link plugin to markata, you might notice the
๐Ÿ”—'s next to each heading on this page, that is powered by this exact
technique.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Sentry image

See why 4M developers consider Sentry, โ€œnot bad.โ€

Fixing code doesnโ€™t have to be the worst part of your day. Learn how Sentry can help.

Learn more

๐Ÿ‘‹ Kindness is contagious

Please leave a โค๏ธ or a friendly comment on this post if you found it helpful!

Okay