DEV Community

Waylon Walker
Waylon Walker

Posted on • Originally published at waylonwalker.com

2 1

Find all Headings with BeautifulSoup

BeautifulSoup is a DOM like library for python. It's quite useful to manipulate html. Here is an example to find_all html headings. I stole the regex from stack overflow, but who doesn't.

Make an example

sample.html

Lets make a sample.html file with the following contents. It mainly has some headings, <h1> and <h2> tags that I want to be able to find.

<!DOCTYPE html>
<html lang="en">
  <body>
    <h1>hello</h1>
    <p>this is a paragraph</p>
    <h2>second heading</h2>
    <p>this is also a paragraph</p>
    <h2>third heading</h2>
    <p>this is the last paragraph</p>

  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

Get the headings with BeautifulSoup

Lets import our packages, read in our sample.html using pathlib and find all headings using BeautifulSoup.

from bs4 import BeautifulSoup from pathlib import Path

soup = BeautifulSoup(Path('sample.html').read_text(), features="lxml") headings = soup.find_all(re.compile("^h[1-6]$"))
Enter fullscreen mode Exit fullscreen mode

And what we get is a list of bs4.element.Tag's.

>> print(headings)
[<h1>hello</h1>, <h2>second heading</h2>, <h2>third heading</h2>]
Enter fullscreen mode Exit fullscreen mode

I recently added a heading_link plugin to markata, you might notice the
🔗's next to each heading on this page, that is powered by this exact
technique.

Top comments (0)

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay