DEV Community

Waylon Walker
Waylon Walker

Posted on • Originally published at waylonwalker.com

4 2

Using a Python Markdown ast to Find All Paragraphs

In looking for a way to automatically generate descriptions for pages I stumbled into a markdown ast in python. It allows me to go over the markdown page and get only paragraph text. This will ignore headings, blockquotes, and code fences.

import commonmark parser = commonmark.Parser() ast = parser.parse(p.content)

paragraphs = '' for node in ast.walker():
    if node[0].t == "paragraph":
        paragraphs += " "
        paragraphs += node[0].first_child.literal
Enter fullscreen mode Exit fullscreen mode

It's also super fast, previously I was rendering to html and using beautifulsoup to get only the paragraphs. Using the commonmark ast was about 5x faster on my site.

Top comments (0)

The best way to debug slow web pages cover image

The best way to debug slow web pages

Tools like Page Speed Insights and Google Lighthouse are great for providing advice for front end performance issues. But what these tools can’t do, is evaluate performance across your entire stack of distributed services and applications.

Watch video

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay