Siddhesh Surve

Posted on Jan 22

🕸️ I Just Deleted My Scraper Boilerplate: Meet the "One-Liner" Crawler

#python #opensource #productivity #dataengineering

Stop writing for loops to crawl websites. Seriously. It’s 2026, and we just got the upgrade XPath was missing for 20 years.

If you have ever built a web scraper, you know the Drill of Pain™:

Write a Python script.
Import requests and BeautifulSoup.
Fetch a URL.
Parse the HTML.
Write a loop to find links.
Recursively call the function (and accidentally DDoS the site because you forgot a time.sleep).

It’s imperative, it’s messy, and it’s hard to read 6 months later.

But a new open-source tool just dropped on GitHub, and it is effectively "SQL for the Web."

Meet wxpath.

🤯 The Paradigm Shift: Crawling Inside the Query

Usually, XPath is just a selector language. You download the page first, then you use XPath to pick the title.

wxpath changes the rules. It adds a custom url(...) operator directly into the XPath language. This means your selector doesn't just find data—it travels to new pages to get it.

The Old Way (Python + Requests + Parsel)

# 🤮 The Boilerplate Nightmare
import requests
from parsel import Selector

def crawl(url):
    response = requests.get(url)
    sel = Selector(text=response.text)

    # Extract data
    title = sel.xpath("//h1/text()").get()

    # Find next link and repeat...
    next_page = sel.xpath("//a[@class='next']/@href").get()
    if next_page:
        crawl(next_page) # Hope you handle recursion depth!

The `wxpath` Way

# 😍 The "One-Liner" (Declarative)
import wxpath

# Fetch the URL, find the link, FETCH THAT LINK, and return the title
query = "url('https://site.com')/url(//a[@class='next']/@href)//h1/text()"

results = wxpath.query(query)

Do you see that? The logic of navigation is embedded in the query.

⚡ Feature Spotlight: The "Deep Crawl" Operator (`///`)

The killer feature isn't just fetching one page. It's the /// operator.

In standard XPath, // means "search anywhere in this document."
In wxpath, /// means "crawl anywhere in this network."

If you want to crawl all links on a Wikipedia page, and then for each of those pages, extract the title and the first paragraph, you don't need a loop. You need a map.

Scraping a Knowledge Graph in one expression:

url('https://en.wikipedia.org/wiki/Web_scraping')
///url(//div[@id='bodyContent']//a/@href)  /map {                                     'title': //h1/text(),
    'summary': (//p)[1]/text(),
    'source_url': base-uri(.)
}

This single expression:

Fetches the seed URL.
Finds all links in the body.
Concurrently crawls those links.
Returns a clean JSON-like object (Map) for each result.

🛠️ Why This is "Viral" Tech

We are moving away from Imperative Coding (telling the computer how to do it) to Declarative Coding (telling the computer what you want).

We saw this with React (UI). We saw this with GraphQL (APIs). Now we are seeing it with wxpath (Scraping).

Key Features that make it production-ready:

XPath 3.1 Support: It supports modern Maps and Arrays (/map{...}), so your output is ready for JSON serialization immediately.
Async Under the Hood: It uses aiohttp to fetch pages concurrently, even though the query looks linear.
Politeness Built-in: It respects robots.txt and handles rate limiting automatically, so you don't become "that guy" who crashes a server.

🚀 How to Try It

It’s a Python library, so installation is standard:

pip install wxpath

Then run the CLI to test a query instantly:

wxpath "url('https://news.ycombinator.com')//a[@class='storylink']/text()"

🔮 The Verdict

Is this the end of Scrapy? Probably not for massive, enterprise-scale crawls. But for the 90% of use cases where you just need to "grab data from these linked pages," wxpath is a cheat code.

It turns a 50-line script into a 3-line query. That is the kind of efficiency we live for.

Star the repo while it's hot:
👉 github.com/rodricios/wxpath

DEV Community

🕸️ I Just Deleted My Scraper Boilerplate: Meet the "One-Liner" Crawler

🤯 The Paradigm Shift: Crawling Inside the Query

The Old Way (Python + Requests + Parsel)

The `wxpath` Way

⚡ Feature Spotlight: The "Deep Crawl" Operator (`///`)

🛠️ Why This is "Viral" Tech

Key Features that make it production-ready:

🚀 How to Try It

🔮 The Verdict

Top comments (0)

🤯 The Paradigm Shift: Crawling Inside the Query

The Old Way (Python + Requests + Parsel)

The wxpath Way

⚡ Feature Spotlight: The "Deep Crawl" Operator (///)

🛠️ Why This is "Viral" Tech

Key Features that make it production-ready:

🚀 How to Try It

🔮 The Verdict

The `wxpath` Way

⚡ Feature Spotlight: The "Deep Crawl" Operator (`///`)