DEV Community: Mohammad Raziei

Phasma: I Brought PhantomJS Back from the Dead (and It Runs with Just `pip install`)

Mohammad Raziei — Thu, 25 Jun 2026 23:13:04 +0000

In 2018, the PhantomJS maintainer posted a short message on GitHub:

"I am stepping down as maintainer. Slimer.js is mostly unmaintained. It's time for us to move on."

Headless Chrome had arrived. Everyone moved on. PhantomJS was officially dead.

But here's the thing nobody talked about: Chrome still isn't a Python package.

The Problem: Browser Automation Should Be `pip install`

Python has a philosophy — batteries included. You pip install requests and you have HTTP. You pip install numpy and you have numerical computing. Everything arrives as a package, versioned, isolated, reproducible.

So why does browser automation still require you to leave Python's ecosystem entirely?

playwright install downloads a 300MB Chromium binary outside your virtualenv
Puppeteer requires Node.js and npm
Selenium requires a separately installed browser and a matching ChromeDriver

These tools are powerful — but they're not Pythonic. They break the contract: one environment, one install, everything works.

I wanted something different. I wanted to write:

pip install phasma

And have a fully working headless browser. No apt. No npm. No system packages. No setup steps outside of pip. Just Python.

That's phasma — and PhantomJS, it turns out, was the perfect engine to build it on. A self-contained binary with a full WebKit engine that ships inside the wheel. The ecosystem had abandoned it, but the binary never stopped working.

What Is Phasma?

Phasma is a Python package that wraps PhantomJS with a modern, Playwright-like async API. The PhantomJS binary ships inside the wheel — no system packages, no Node.js, no Chromium, nothing. Just:

pip install phasma

And it works. On Linux, macOS, Windows. In Docker containers with no internet access. On HPC clusters. On that one server where the sysadmin says no.

A note on the name: In Star Wars, Captain Phasma is a chrome-armored stormtrooper who everyone thought was dead after The Force Awakens — and then came back in The Last Jedi. It felt like the right name for a package that resurrects a browser engine everyone declared dead. Also, PhantomJS is literally a phantom. It was right there.

The Real Use Case: Running DOM-dependent JavaScript Anywhere

Here's the concrete problem I kept running into. You have a JavaScript library that needs a real DOM to work. Maybe it's a charting library, a diagram renderer, a scraper, or just some legacy code that uses document.querySelector.

You want to run it from Python. Your options:

A real example from my own work: I needed to run Mermaid.js — the diagram-as-code library — from Python to generate SVG/PNG diagrams. Mermaid.js uses the DOM heavily. The official CLI (mmdc) requires Node.js. So I built mmdc for Python on top of phasma:

import asyncio
from phasma.svg import SvgRenderer

MERMAID_DIAGRAM = """
graph TD
    A[pip install phasma] --> B[Import in Python]
    B --> C[Launch Browser]
    C --> D[Run any JS with DOM]
    D --> E[Get results back]
"""

async def render_mermaid(diagram_code: str) -> bytes:
    # Inject Mermaid.js from CDN, render the diagram, export as PNG
    async with phasma.Browser() as browser:
        page = await browser.new_page()
        html = f"""
        <html>
        <body>
            <div class="mermaid">{diagram_code}</div>
            <script src="https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.min.js"></script>
            <script>mermaid.initialize({{ startOnLoad: true }});</script>
        </body>
        </html>"""
        await page.goto(html)
        await page.wait_for_selector(".mermaid svg", timeout=5000)
        svg = await page.inner_html(".mermaid")

    async with SvgRenderer() as r:
        return await r.to_png(svg)

No Node.js. No npm install mermaid. Just Python.

How It Works Under the Hood

The old approach (every existing PhantomJS wrapper) looked like this:

Phasma works differently. When you call launch(), it starts one persistent PhantomJS process that runs a tiny HTTP server:

The result: ~9ms per operation instead of seconds. The same pattern Selenium WebDriver has used for years — I just applied it to PhantomJS.

The API

Phasma's API is deliberately close to Playwright so there's almost no learning curve:

import asyncio
import phasma

async def main():
    browser = await phasma.launch()
    try:
        page = await browser.new_page()

        # navigate
        await page.goto("https://example.com")

        # extract data
        title = await page.evaluate("document.title")
        heading = await page.text_content("h1")
        links = await page.evaluate("""
            Array.from(document.querySelectorAll('a'))
                 .map(a => ({ text: a.textContent, href: a.href }))
        """)

        # interact
        await page.fill("#search", "phasma")
        await page.click("#submit")
        await page.wait_for_selector(".results")

        # output
        await page.screenshot("shot.png")
        await page.pdf("page.pdf")

    finally:
        await browser.close()

asyncio.run(main())

SVG Rendering — Also Batteries Included

One thing I needed constantly: converting SVG files to PNG/PDF without installing Inkscape or cairosvg. Phasma handles this too, with a dedicated SvgRenderer class:

from phasma.svg import SvgRenderer

async with SvgRenderer() as r:
    # basic conversion
    png = await r.to_png("diagram.svg")
    jpg = await r.to_jpeg("<svg ...>...</svg>")
    pdf = await r.to_pdf("chart.svg")

    # scale up for high-resolution export
    png_2x = await r.to_png("icon.svg", scale=2.0)   # 2× resolution
    png_4x = await r.to_png("icon.svg", scale=4.0)   # 4× resolution

    # PDF that fits the SVG exactly — no A4 borders
    pdf = await r.to_pdf("diagram.svg")              # paper = SVG dimensions

    # or standard paper size
    pdf = await r.to_pdf("diagram.svg", pdf_format="A4")

One SvgRenderer = one PhantomJS process reused for all conversions. Batch-convert 100 SVGs and PhantomJS starts exactly once.

Running Any DOM-dependent JS: The General Pattern

This is the core superpower. Any JavaScript that needs a browser environment:

import asyncio
import phasma

# Your JS library that needs window, document, etc.
YOUR_LIBRARY_JS = """
    // imagine this is Chart.js, D3, Mermaid, or any DOM library
    var result = someLibraryThatNeedsDOM.process(inputData);
    result  // last expression is returned
"""

async def run_js_with_dom(js_code: str, html_context: str = "<html><body></body></html>"):
    browser = await phasma.launch()
    try:
        page = await browser.new_page()

        # set up the HTML context your JS needs
        import tempfile
        from pathlib import Path
        with tempfile.NamedTemporaryFile(mode='w', suffix='.html', delete=False) as f:
            f.write(html_context)
            await page.goto(Path(f.name).as_uri())

        # inject and run your library
        return await page.evaluate(js_code)
    finally:
        await browser.close()

No Node.js. No Chrome. No system dependencies. Just Python and a pip install.

Benchmark

xychart-beta horizontal
    title "Time per operation (ms) — lower is better"
    x-axis ["goto()", "10× ops", "100× ops"]
    y-axis "milliseconds" 0 --> 3000
    bar [2500, 25000, 250000]
    bar [9, 90, 900]

🔵 Old approach (per-process) — 🟣 Phasma (persistent session)

The difference is entirely startup cost. With phasma, PhantomJS starts once and stays alive.

Installation & Quick Start

pip install phasma

That's it. The PhantomJS binary is bundled in the wheel.

import asyncio
import phasma

async def main():
    browser = await phasma.launch()
    page = await browser.new_page()
    await page.goto("https://example.com")
    print(await page.evaluate("document.title"))
    await browser.close()

asyncio.run(main())

Who Is This For?

You're on a server where apt / yum is locked
You're building a minimal Docker image and don't want to pull in Chrome (~300MB)
You need to run DOM-dependent JavaScript from Python without Node.js
You want to convert SVG files without system dependencies
You need a headless browser that arrives via pip and nothing else

Playwright and Puppeteer are better tools for modern web automation — if you can install them. Phasma exists for the cases where you can't.

pygixml 0.10.0 released — A Faster, Smarter XML Parser for Python

Mohammad Raziei — Sat, 11 Apr 2026 16:12:28 +0000

XML parsing in Python has had three choices for over a decade: ElementTree (slow but built-in), lxml (fast but heavy), and minidom (don't). I wanted something that sits at the intersection of speed, simplicity, and a small install footprint.

That's what pygixml is — a Cython wrapper around pugixml, one of the fastest C++ XML parsers in existence.

Version 0.10.0 just dropped, and it's the most significant release so far. Let's walk through what's new.

The Numbers (50 iterations, 5 000 elements)

Library	Avg Time	Speedup vs ElementTree
pygixml	0.0009 s	9.2× faster
lxml	0.0041 s	2.0× faster
ElementTree	0.0083 s	1.0× (baseline)

Memory usage tells a similar story: pygixml uses 0.67 MB at 5 000 elements vs ElementTree's 4.84 MB. And the installed package is just 0.45 MB, vs lxml's 5.48 MB, according to the pip-size report.

If you care about these numbers, the full benchmark suite covers 6 XML sizes (100 to 10 000 elements) and is included in the repo. Run it yourself:

git clone https://github.com/MohammadRaziei/pygixml.git
cd pygixml

cmake -B build
cmake --build build --target run_full_benchmarks
# or
pip install .
python benchmarks/full_benchmark.py

What's New in 0.10.0

1. `children()` — Iterate Direct Children (or All Descendants)

Before 0.10.0, iterating over an element's children required manual sibling walking:

# The old way — walk siblings manually
child = node.first_child()
while child:
    if child.name == "student":
        process(child)
    child = child.next_sibling

Now you get a clean Pythonic iterator:

# Direct children only (default)
for child in node.children():
    process(child)

# All descendants in depth-first order
for descendant in node.children(recursive=True):
    process(descendant)

Text, comment, and processing-instruction nodes are automatically skipped — you only get element nodes.

2. `text()` — Recursive Text Extraction with Configurable Joins

Getting text out of an XML element shouldn't require walking the tree yourself. text() collects all text and CDATA nodes from the subtree and joins them:

doc = pygixml.parse_string("""
<article>
    <p>Hello <b>world</b>! This is <i>rich</i> text.</p>
</article>
""")
p = doc.root.child("p")

p.text()                     # "Hello\nworld!\nThis is\nrich\ntext."
p.text(recursive=False)      # "Hello "  (direct text only)
p.text(join=" ")             # "Hello world! This is rich text."

For simple cases where you just want the first child's text, child_value("tag") is still there and is slightly faster.

3. `element.value = "text"` — Finally, This Works

Element nodes in pugixml don't store text directly — they contain child text nodes. In 0.10.0, setting .value on an element automatically creates or replaces that text child:

doc = pygixml.parse_string("<root><item/></root>")
item = doc.root.child("item")

item.value = "Hello"
print(item.value)   # "Hello"  ✅
print(item.text())  # "Hello"  ✅
# XML: <item>Hello</item>

# Replaces existing text
item.value = "World"
print(item.value)   # "World"

And reading back: element.value now returns the first text child's value (or None if there's no text), so set and get are symmetric.

4. `from_mem_id_unsafe()` — O(1) Node Lookup

This is the most powerful — and most dangerous — feature in 0.10.0.

Every XMLNode exposes a mem_id property: a unique numeric identifier derived from the node's internal address. You can use it to reconstruct a node later:

# Fast: O(1), direct pointer cast
node = pygixml.XMLNode.from_mem_id_unsafe(node_id)

# Safe but O(n): walks the tree
node = root.find_mem_id(node_id)

The difference is O(1) vs O(n). But from_mem_id_unsafe treats the identifier as a raw pointer — if the document was freed or the node deleted, using it will cause a segfault.

When to use it: only in performance-critical paths where you've profiled and confirmed that find_mem_id's tree walk is a bottleneck. For most code, find_mem_id is the right choice.

The mem_id system is also hashable, making it ideal for dictionary-based caching.

Why aren't `XMLNode` objects hashable?

You might wonder why you can't just do cache[node] = data. The reason is intentional: XMLNode objects are mutable — you can rename them, change their content, add children, and so on. In Python, mutable objects shouldn't be hashable, because their identity and equivalence would break the moment you modify them. Using mem_id as the key makes the contract explicit: the integer is stable and hashable, while the node wrapper is transient.

Using nodes in dictionaries (the right way)

# Store node data by mem_id (a stable, hashable integer)
cache = {}
for node in doc:
    cache[node.mem_id] = {
        "xpath": node.xpath,
        "depth": node.xpath.count("/"),
    }

# Later, reconstruct the node (O(1) but unsafe)
for mem_id, metadata in cache.items():
    node = pygixml.XMLNode.from_mem_id_unsafe(mem_id)
    if node:  # Always check if the node is still valid
        process(node, metadata)

For safety, use find_mem_id (O(n) but returns None for deleted nodes):

node = root.find_mem_id(mem_id)
if node:
    process(node)

5. `xpath` Property — Generate Absolute XPath to Any Node

doc = pygixml.parse_string("<root><book><title>Gatsby</title></book></root>")
title = doc.root.child("book").child("title")

print(title.xpath)  # /root[1]/book[1]/title[1]

This is a custom O(depth) algorithm that walks from the node up to the root, counting same-name siblings to produce accurate positional predicates. pugixml doesn't provide this natively — it's pygixml's own addition.

6. `xml` Property — One-Liner XML Serialization

node.xml  # same as node.to_string() with 2-space indent

7. `ParseFlags` Enum

All 18 pugixml parse flags are now available as a proper IntFlag enum:

# Fastest parse — skip escapes, EOL normalization, whitespace
doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL)

# Combine specific flags
flags = pygixml.ParseFlags.COMMENTS | pygixml.ParseFlags.CDATA
doc = pygixml.parse_string(xml, flags)

8. Python 3.6–3.13 Support

pygixml works with every Python from 3.6 through 3.13. .pyi stub generation via stubgen-pyx is only enabled on Python 3.9+ (where the package is available), so older versions still build fine — just without type stubs.

Full Feature Summary

Feature	pygixml	lxml	ElementTree
Parse speed (5K elements)	0.0009 s	0.0041 s	0.0083 s
Memory (5K elements)	0.67 MB	0.67 MB	4.84 MB
Runtime Dependencies	0	libxml2, libxslt	None (stdlib)
Package size	0.45 MB	5.48 MB	built-in
XPath 1.0	✅ full	✅ full	❌ limited
XSLT	❌	✅	❌
Schema validation	❌	✅	❌
`children()` iterator	✅	❌	❌
`text()` recursive	✅	❌	❌
`element.value = "text"`	✅	❌	❌
`xpath` property	✅	❌	❌
`mem_id` caching	✅	❌	❌

Installation

pip install pygixml

Zero Runtime Dependencies

This is a huge advantage that often gets overlooked.

lxml depends on system libraries (libxml2, libxslt). If those have security vulnerabilities or version conflicts, your environment breaks.
pygixml bundles pugixml directly into the Python extension.

It has zero runtime dependencies. No libxml, no external binaries, no transitive dependency chains. Just a single install that works.

Pre-compiled wheels are available for Windows, Linux, and macOS.

Links

If this project helps you, a star on GitHub goes a long way. Thanks for reading.

How to Parse XML Fast in 2026 (Python)

Mohammad Raziei — Wed, 08 Apr 2026 23:53:13 +0000

JSON won the internet. We all know that. But XML never left — it just moved
into the places where reliability matters more than trendiness.

If you work with Maven configs, Android manifests, Office Open XML (.docx/.xlsx),
SVG, RSS feeds, DocBook, SOAP services, or any enterprise integration layer, you're
still parsing XML. And in 2026, there's no excuse for it being slow.

The Problem with XML Parsing in Python

Python's standard library ships with xml.etree.ElementTree. It works. It's
fine for small files. But the moment your XML grows beyond a few hundred
elements, ElementTree becomes a bottleneck — because it builds a full Python
object for every single node, attribute, and text node in the tree.

The usual answer is lxml, which wraps libxml2 in C. It's fast and
feature-rich. But it's also a 5.5 MB install with a heavy dependency chain,
and its Python bindings add overhead on every call.

So what if you want the fastest possible parse, a tiny footprint, and a
clean Python API?

That's the question that led me to build pygixml —
a Cython wrapper around pugixml, one of the fastest
C++ XML parsers in existence.

Let me show you the numbers first, then we'll get into the code.

The Numbers

Here's what happens when you parse a 5,000-element XML document with the
three most common Python XML libraries:

Library	Parse Time	Speedup vs ElementTree
pygixml	0.0009 s	8.6× faster
lxml	0.0041 s	1.9× faster
ElementTree	0.0076 s	1.0× (baseline)

And memory usage during the same parse:

Library	Peak Memory
pygixml	0.67 MB
lxml	0.67 MB
ElementTree	4.84 MB

ElementTree uses 7× more memory because it materializes every node as a
full Python object. pygixml and lxml stay in C/C++ land until you
explicitly access data.

The installed package size tells its own story:

Package	Size
pygixml	0.43 MB
lxml	5.48 MB

That's a 12× difference. If you're building a Docker image, Lambda function,
or anything where size matters, it adds up.

All benchmarks run on the same machine with time.perf_counter() across 5
warmed-up iterations. You can reproduce them yourself — the code is in the
benchmarks/ directory.

How pygixml Works Under the Hood

Here's the architecture:

Three things make this fast:

No Python object per node — the entire parsed tree lives in C++ memory. pygixml only creates a Python wrapper when you explicitly access a node.
Zero-copy Cython bridge — data doesn't get copied between C++ and Python. Strings are encoded in-place.
pugixml's custom allocator — pugixml uses a block-based memory pool instead of per-node malloc, which means fewer syscalls and better cache locality.

Getting Started

pip install pygixml

One dependency-free install, 430 KB.

Parsing XML

import pygixml

xml = """
<library>
    <book id="1" category="fiction">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
    </book>
    <book id="2" category="fiction">
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
    </book>
</library>
"""

doc = pygixml.parse_string(xml)
root = doc.root

# Access children
book = root.child("book")
print(book.name)                      # book
print(book.attribute("id").value)     # 1
print(book.child("title").text())     # The Great Gatsby

The API is deliberately simple. Properties for simple access
(node.name, node.value, node.type), methods for operations that take
arguments (node.child(name), node.text()). No surprises.

XPath Queries

This is where pygixml really shines. pugixml's XPath engine is fast,
standards-compliant (XPath 1.0), and fully exposed:

# All fiction books
fiction = root.select_nodes("book[@category='fiction']")
print(f"Found {len(fiction)} fiction books")

# Single match
match = root.select_node("book[@id='2']")
if match:
    print(match.node.child("title").text())   # 1984

# Pre-compile for repeated use
query = pygixml.XPathQuery("book[year > 1950]")
recent = query.evaluate_node_set(root)

# Scalar evaluations
avg = pygixml.XPathQuery(
    "sum(book/price) div count(book)"
).evaluate_number(root)
print(f"Average price: ${avg:.2f}")

has_orwell = pygixml.XPathQuery(
    "book[author='George Orwell']"
).evaluate_boolean(root)
print(f"Has Orwell: {has_orwell}")

Creating XML

doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
item = root.append_child("product")
item.append_child("name").set_value("Laptop")
item.append_child("price").set_value("999.99")

doc.save_file("catalog.xml")

Modifying XML

doc = pygixml.parse_string("<person><name>John</name></person>")
root = doc.root

root.child("name").set_value("Jane")
root.child("name").name = "full_name"
root.append_child("age").set_value("30")

print(root.xml)
# <person>
#   <full_name>Jane</full_name>
#   <age>30</age>
# </person>

Performance Tuning: Parse Flags

Here's a feature most Python XML libraries don't expose: parse flags.
pygixml gives you a ParseFlags enum with 18 options to control exactly
how pugixml processes your input.

# Fastest possible parse — skip everything optional
doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL)

# Pick exactly what you need
flags = pygixml.ParseFlags.COMMENTS | pygixml.ParseFlags.CDATA
doc = pygixml.parse_string(xml, flags)

ParseFlags.MINIMAL skips escape processing, EOL normalization, and
attribute whitespace conversion. On real-world XML with lots of escaped
content (&, <, etc.), this can give you a noticeable speed boost
over the default.

Which Library Should You Use?

	pygixml	lxml	ElementTree
Parse speed	Fastest	Fast	Slowest
Memory	Low	Low	High (7×)
Package size	0.43 MB	5.48 MB	Built-in
XPath	1.0	1.0 + 2.0 + 3.0	Limited
XSLT	No	Yes	No
Schema validation	No	Yes	No
Dependencies	None	libxml2, libxslt	None

The Full Benchmark

If you want to run the numbers yourself:

git clone https://github.com/MohammadRaziei/pygixml.git
cd pygixml

The project uses CMake for its build system, so benchmarks are built-in targets:

# Full suite: parsing (6 sizes), memory, package size
cmake --build build --target run_full_benchmarks

# Legacy parsing-only benchmark
cmake --build build --target run_benchmarks

# Or directly with Python
python benchmarks/full_benchmark.py

Here's the actual output from a recent run:

=====================================================================
PARSING PERFORMANCE
=====================================================================
    Size | Library      |    Avg (s) |    Min (s) |  Speedup vs ET
----------------------------------------------------------------------
     100 | pygixml      |   0.000008 |   0.000008 |          14.4x
     100 | lxml         |   0.000094 |   0.000088 |           1.2x
     100 | elementtree  |   0.000112 |   0.000108 |           1.0x
----------------------------------------------------------------------
     500 | pygixml      |   0.000097 |   0.000096 |           5.8x
     500 | lxml         |   0.000394 |   0.000385 |           1.4x
     500 | elementtree  |   0.000558 |   0.000542 |           1.0x
----------------------------------------------------------------------
    1000 | pygixml      |   0.000147 |   0.000143 |           7.8x
    1000 | lxml         |   0.001127 |   0.001052 |           1.0x
    1000 | elementtree  |   0.001146 |   0.001114 |           1.0x
----------------------------------------------------------------------
    5000 | pygixml      |   0.000883 |   0.000880 |           8.6x
    5000 | lxml         |   0.004108 |   0.003907 |           1.9x
    5000 | elementtree  |   0.007614 |   0.006634 |           1.0x
----------------------------------------------------------------------
   10000 | pygixml      |   0.001649 |   0.001635 |           9.8x
   10000 | lxml         |   0.009095 |   0.008174 |           1.8x
   10000 | elementtree  |   0.016108 |   0.013917 |           1.0x
----------------------------------------------------------------------

Memory usage (tracemalloc peak):

Size	pygixml	lxml	ElementTree
1 000	0.13 MB	0.13 MB	1.01 MB
5 000	0.67 MB	0.67 MB	4.84 MB
10 000	1.34 MB	1.34 MB	9.68 MB

Package size:

Package	Size
pygixml	0.43 MB
lxml	5.48 MB

Wrap-Up

XML isn't going anywhere. The tools we use to process it matter more than
we think — especially when that XML is on the critical path of a request,
a batch job, or a data pipeline.

pygixml brings one of the fastest C++ XML parsers to Python with minimal
friction. Same API patterns you already know. Same XPath you already use.
Just faster.

If you try it out, I'd love to hear about your use case. And if the project
helps you, a star on GitHub
goes a long way.

Links:

Have a different XML parsing strategy? Drop it in the comments — I'm
always looking for better approaches.

Introducing pip-size: See the Real Cost of Python Packages

Mohammad Raziei — Wed, 08 Apr 2026 20:10:59 +0000

Why Package Size Matters More Than You Think

Every day, thousands of Python packages are uploaded to PyPI. Many of us check the wheel size before installing and think "oh, it's lightweight!" — but that's just the tip of the iceberg.

A package might only be 50 KB on its own, but when you install it, you could be pulling in hundreds of megabytes of transitive dependencies. The package advertises itself as "lightweight," but what your users actually download is something entirely different.

This is exactly the problem pip-size solves.

What is pip-size?

pip-size calculates the real download size of PyPI packages — including all their dependencies — without actually downloading anything. It uses the PyPI JSON API to resolve the entire dependency tree and shows you the full picture before you run pip install.

Quick Example

$ pip-size requests
🔍 Resolving 'requests'...
  ✓ requests==2.32.5  →  requests-2.32.5-py3-none-any.whl
    ✓ urllib3==2.3.0  →  urllib3-2.3.0-py3-none-any.whl
    ✓ charset-normalizer==3.4.1  →  charset_normalizer-3.4.1-py3-none-any.whl
    ✓ certifi==2025.1.31  →  certifi-2025.1.31-py3-none-any.whl
    ✓ idna==3.10  →  idna-3.10-py3-none-any.whl
  requests==2.32.5  63.2 KB  (total: 834.8 KB)
  ├── urllib3==2.3.0  341.8 KB
  ├── charset-normalizer==3.4.1  204.8 KB
  ├── certifi==2025.1.31  164.0 KB
  └── idna==3.10  61.4 KB

See? requests itself is only 63.2 KB, but the total cost is 834.8 KB — over 13x more than the package alone!

Why This Matters

1. Fair Comparison Between Alternatives

Want to compare httpx vs requests vs aiohttp? Don't just look at their individual sizes — compare the full dependency tree:

pip-size httpx
pip-size requests
pip-size aiohttp

Now you can make an informed decision based on what users will actually download.

2. Audit Your Own Packages

If you maintain a package, you might be surprised what your "lightweight" library is actually shipping. Run:

pip-size your-package

3. Spot Heavy Dependencies

Ever wondered why a simple CLI tool pulls in 200 MB? pip-size shows you exactly which dependency is responsible for the bulk of the size.

4. CI Automation

Use --quiet or --bytes to integrate size checks into your CI pipeline:

pip-size mypackage --quiet
# Output: 1234567

Installation

pip install pip-size

Key Features

Zero downloads — uses PyPI JSON API only
Full dependency tree — includes all transitive dependencies
Extras support — see how requests[security] affects size
Proxy support — works with HTTP, SOCKS4, and SOCKS5 proxies
Caching — 24-hour cache to avoid repeated API calls
JSON output — integrate with your own tools

The Bigger Picture

We often obsess over code performance, but install size is an overlooked dimension of developer experience. Every megabyte you force users to download:

Slows down CI/CD pipelines
Increases container image sizes
Wastes bandwidth, especially in regions with limited connectivity
Frustrates users on slow connections

pip-size is my small step toward raising awareness about this issue. I hope it helps you make better decisions when choosing dependencies — and when publishing your own packages.

Give it a try and let me know what you think!

GitHub: mohammadraziei/pip-size

Why I Built pip-size: A Story About Obsession with Performance

Mohammad Raziei — Mon, 06 Apr 2026 21:07:01 +0000

It Started with a Simple Question

"How fast is it?"

That's the question I always ask when I write a Python package. Not "does it work?" — because obviously it works. The real question is: how fast is it compared to what already exists?

I've been building high-performance Python libraries for years. Libraries like:

yyaml, pygixml, serin, ctoon, novasvg, liburlparser

And the results? In many cases, 20x to 100x faster than the mainstream alternatives.

I have the benchmarks to prove it. I've spent countless hours profiling, optimizing, and benchmarking. I know exactly how fast my code runs.

But there was one question I couldn't answer easily:

"How big is it?"

The Problem Nobody Talks About

When you compare Python packages, everyone talks about:

Features
API simplicity
Community support
GitHub stars

But nobody talks about download size. And that's a problem.

Here's why: a package might be "lightweight" in source code, but its dependencies tell a different story.

Let me give you a real example. A few months ago, I was comparing HTTP libraries:

requests==2.33.1  63.4 KB  (total: 620.4 KB)
httpx==0.28.1  71.8 KB  (total: 560.0 KB)
aiohttp==3.13.5  1.7 MB  (total: 2.6 MB)

The package itself is small. But the total size tells a different story.

Now imagine you're choosing between two libraries:

Library A: 50 KB package, but pulls in 500 KB of dependencies
Library B: 200 KB package, but zero dependencies

Which one is really "lighter"?

That's the question I wanted to answer. But there was no tool to do it.

The Search for a Solution

I searched for existing tools. I found:

pip show — shows installed package size, but only for what's already installed
pip download — downloads everything to measure it (wasteful!)
Various size calculators — none of them considered the full dependency tree

The problem? You have to install the package to see its size. That's insane!

I wanted to know the size before installing. I wanted to see the full picture — the package plus every dependency, transitively.

So I did what any developer would do: I built it myself.

Introducing pip-size

pip-size calculates the real download size of PyPI packages and their dependencies. Zero downloads. No pip subprocess. Pure PyPI JSON API.

pip install pip-size

pip-size requests

🔍 Resolving 'requests'...
  ✓ requests==2.33.1  →  requests-2.33.1-py3-none-any.whl
    ✓ idna==3.11  →  idna-3.11-py3-none-any.whl
    ✓ certifi==2026.2.25  →  certifi-2026.2.25-py3-none-any.whl
    ✓ charset_normalizer==3.4.7  →  charset_normalizer-3.4.7-py3-none-any.whl
    ✓ urllib3==2.6.3  →  urllib3-2.6.3-py3-none-any.whl
  requests==2.33.1  63.4 KB  (total: 620.4 KB)
  ├── idna==3.11  69.3 KB
  ├── certifi==2026.2.25  150.1 KB
  ├── charset_normalizer==3.4.7  209.0 KB
  └── urllib3==2.6.3  128.5 KB

Now you can see:

The package size (63.4 KB)
The total size including all dependencies (620.4 KB)
The breakdown of each dependency

Features

Full dependency tree — see every transitive dependency
Extras support — check requests[security] or fastapi[standard]
JSON output — integrate with scripts
Proxy support — for restricted networks
Caching — 24-hour cache to avoid repeated requests

Why This Matters

When I'm developing high-performance libraries, size matters for several reasons:

1. Deployment

If you're shipping to edge devices, every megabyte counts. A library that claims to be "lightweight" but pulls in 500 MB of dependencies is not lightweight — it's a liability.

2. Cold Starts

In serverless environments (AWS Lambda, Google Cloud Functions), cold start time correlates with package size. Smaller packages = faster cold starts.

3. CI/CD

Smaller packages mean faster pip installs in your CI pipeline. Over hundreds of builds, this adds up.

4. User Trust

As a package maintainer, I want to be transparent about what I'm shipping. If my package is 100 KB but pulls in 50 MB of dependencies, users deserve to know.

The Bigger Picture

Building pip-size made me realize something: we've been comparing packages wrong.

When we see "package X is 50 KB" and "package Y is 200 KB," we assume X is lighter. But that's only half the story.

The real cost of a package is:

package size + size of all dependencies + size of their dependencies + ...

That's what pip-size reveals.

What's Next

I'm continuing to improve pip-size. Some ideas:

Compare multiple packages side-by-side
Show size trends over time
Integrate with dependency security tools
Add "size budget" warnings for CI

If you have ideas or want to contribute, the repo is open: github.com/mohammadraziei/pip-size

Final Thoughts

I've spent years optimizing for speed. Now I'm obsessed with size too.

Because at the end of the day, performance isn't just about how fast code runs — it's about how efficiently it reaches your users.

Have you ever been surprised by a package's hidden size? Let me know in the comments!

Links:

GitHub: github.com/mohammadraziei/pip-size
PyPI: pypi.org/project/pip-size

The Real Size of AI Frameworks: A Wake-Up Call

Mohammad Raziei — Mon, 06 Apr 2026 18:56:05 +0000

You Think You Know What You're Installing

When someone says "just install PyTorch," you probably think "how bad can it be?" It's a deep learning library, right? A few hundred megabytes, maybe?

Think again.

I built pip-size to expose the hidden cost of Python packages. And what I found in the AI ecosystem is... shocking.

The Numbers Don't Lie

I ran pip-size on the most popular AI frameworks. Here are the results:

Framework	Package Size	Total (with deps)
torch	506.0 MB	2.5 GB 🤯
tensorflow	545.9 MB	611.9 MB
paddlepaddle	185.8 MB	212.1 MB
jax	3.0 MB	137.1 MB
onnxruntime	16.4 MB	39.5 MB
transformers	9.8 MB	38.4 MB
keras	1.6 MB	29.5 MB

The PyTorch Surprise

Here's what happens when you pip install torch:

torch==2.11.0  506.0 MB  (total: 2.5 GB)
├── nvidia-cudnn-cu13==9.19.0.56  349.1 MB
├── nvidia-cublas==13.3.0.5  384.6 MB  [extra: cublas]
├── nvidia-nccl-cu13==2.28.9  187.4 MB
├── triton==3.6.0  179.5 MB
├── nvidia-cusparse==12.7.9.17  143.9 MB  [extra: cusparse]
├── nvidia-cusparselt-cu13==0.8.0  162.0 MB
├── nvidia-curand==10.4.2.51  57.1 MB  [extra: curand]
├── nvidia-cusolver==12.1.0.51  192.4 MB  [extra: cusolver]
└── ... (more CUDA libs)

2.5 GB. For a "simple" deep learning library.

The package itself is 506 MB, but CUDA dependencies add another ~2 GB. This is why your Docker images are huge. This is why your CI takes forever. This is why you need a 100GB disk just to do machine learning.

TensorFlow: The Heavy Champion

TensorFlow isn't far behind:

tensorflow==2.21.0  545.9 MB  (total: 611.9 MB)
├── keras==3.14.0  1.6 MB  (total: 8.3 MB)
│   └── ml-dtypes==0.5.4  4.8 MB
├── numpy==2.4.4  16.1 MB
├── h5py==3.14.0  4.3 MB
└── grpcio==1.80.0  6.5 MB

612 MB total. Keras helps (it's now bundled), but TensorFlow still brings a lot of baggage.

JAX: The Lightweight Contender?

JAX looks small at first glance — just 3 MB! But look closer:

jax==0.9.2  3.0 MB  (total: 137.1 MB)
├── jaxlib==0.9.2  79.4 MB
├── scipy==1.17.1  33.7 MB
└── numpy==2.4.4  16.1 MB

137 MB when you count everything. Still smaller than PyTorch and TensorFlow, but not "lightweight" by any means.

The Hidden Gems

ONNX Runtime: Only 39.5 MB

If you're deploying models and don't need the full training stack, ONNX Runtime is surprisingly compact:

onnxruntime==1.24.4  16.4 MB  (total: 39.5 MB)
├── numpy==2.4.4  16.1 MB
└── sympy==1.14.0  6.0 MB

That's 65x smaller than PyTorch. For inference, this is a game-changer.

Keras: Just 29.5 MB

Keras (the standalone version, not bundled with TensorFlow) is the lightest option:

keras==3.14.0  1.6 MB  (total: 29.5 MB)
├── numpy==2.4.4  16.1 MB
├── h5py==3.16.0  4.8 MB
└── ml-dtypes==0.5.4  4.8 MB

Perfect for when you want something simple without the enterprise overhead.

What This Means for You

1. Docker Images

If you're shipping PyTorch in a Docker image, plan for at least 3 GB. TensorFlow? 700 MB. ONNX Runtime? 50 MB.

Choose wisely based on your deployment constraints.

2. CI/CD

Every pip install torch in your CI pipeline costs time and bandwidth. Consider:

Caching wheels
Using lighter alternatives for testing
Installing only what's needed

3. Local Development

That "quick experiment" with PyTorch? It's 2.5 GB. Maybe JAX at 137 MB is enough for your use case.

Conclusion

The AI ecosystem is massive — literally. Before you pip install your next ML library, know what you're getting into.

Use pip-size to see the full picture:

pip install pip-size
pip-size torch
pip-size tensorflow
pip-size jax

Your disk space will thank you.

Links:

GitHub: github.com/mohammadraziei/pip-size
PyPI: pypi.org/project/pip-size

What's the biggest package surprise you've encountered? Let me know in the comments!

How I Discovered the Hidden Cost of "Lightweight" Python Packages

Mohammad Raziei — Mon, 06 Apr 2026 18:28:08 +0000

The "It's Just a Small Library" Trap

We've all been there. You find a Python package that promises to solve your problem with minimal overhead. The README says "lightweight," the GitHub stars look good, and the developer swears it's "just a few kilobytes."

So you install it, run your project, and wonder why your Docker image grew by 200MB.

What happened?

The package is small. But its dependencies aren't. And those dependencies have dependencies. And those... you get the idea.

The Moment I Realized Something Was Missing

I was comparing HTTP libraries for a new project. requests is popular, but everyone says it's "heavy." Then I found a library that claimed to be a "lightweight alternative."

But something in my gut said "let me check." So I built pip-size — a tool that calculates the real download size of PyPI packages and their dependencies, using only the PyPI JSON API. No downloads. No pip subprocess. Just data.

Install it:

pip install pip-size

Compare HTTP libraries fairly:

pip-size requests
pip-size httpx
pip-size aiohttp

The results might surprise you:

Package	Package Size	Total (with deps)
requests	63.4 KB	620.4 KB
httpx	71.8 KB	560.0 KB
aiohttp	1.7 MB	2.6 MB

httpx is often marketed as a "modern" alternative to requests, but the total size is almost identical! Meanwhile, aiohttp is over 4x larger — which makes sense since it's a full async framework, not just a client.

The Flask vs FastAPI Myth

Here's where it gets interesting. Flask is often called "lightweight" while FastAPI is labeled as "heavy." Let's verify:

pip-size flask
pip-size fastapi

Results:

Framework	Package Size	Total (with deps)
Flask	101.0 KB	606.2 KB
FastAPI	115.0 KB	2.9 MB

Flask is indeed smaller — about 5x smaller than FastAPI when you count everything.

But here's the nuance: FastAPI's size comes from pydantic (2.4 MB), which brings powerful data validation and automatic API documentation. You're not just getting a web framework — you're getting a complete API solution.

So "lightweight" depends on what you need. If you want simplicity and control, Flask wins. If you want automatic docs, validation, and type hints, FastAPI's "weight" is a feature, not a bug.

Real-World Use Cases

1. Compare Alternatives Fairly

pip-size httpx
pip-size requests
pip-size aiohttp

Now you can compare apples to apples — not just the package size, but the entire dependency tree.

2. Audit Your Own Packages

pip-size mypackage

See what you're actually shipping to your users. Sometimes you'll be surprised.

3. Spot the Heavy Culprit

When your project grows unexpectedly, run pip-size on your dependencies. You'll find which one is dragging in the bulk of the weight.

4. Understand Optional Extras

pip-size "requests[security]"
pip-size "fastapi[standard]"

See exactly how much each extra adds over the base package.

Why This Matters

In a world where:

Docker images need to be small
CI/CD pipelines need to be fast
Bandwidth isn't free (especially in developing countries)
Cold starts in serverless matter

Knowing the real cost of a dependency before you install it isn't a luxury — it's a necessity.

Wrapping Up

pip-size is open source (MIT license) and available on PyPI. It uses the PyPI JSON API, caches responses for 24 hours, and supports proxies if you need them.

Next time you see a package advertised as "lightweight," run pip-size first. Your future self (and your users) will thank you.

Have you ever been surprised by a package's hidden dependencies? Let me know in the comments!

Links:

GitHub: github.com/mohammadraziei/pip-size
PyPI: pypi.org/project/pip-size

Stop Writing Boilerplate Wrappers for C++ Bindings — Meet polybind

Mohammad Raziei — Sun, 05 Apr 2026 21:24:43 +0000

If you've spent time writing Python bindings for a C++ library with template
classes, you know the pattern. You expose Box<int32_t> as Box_int32,
Box<double> as Box_float64, and then you spend an afternoon writing the
same dispatch logic in Python to pretend they're one class. And then you do
it again for Matrix, Tensor, Pair, and every other template in the
library.

This post is about why that happens, and how one command fixes it.

The root of the problem

C++ templates don't exist at runtime — they're resolved at compile time.
When you use nanobind, pybind11, or Cython to expose a Box<T>, the binding
layer has no generic T to offer Python. You register each specialisation
separately:

// nanobind
nb::class_<Box<int32_t>>(m, "_Box__int32")
    .def(nb::init<int32_t>())
    .def("value", &Box<int32_t>::value);

nb::class_<Box<double>>(m, "_Box__float64")
    .def(nb::init<double>())
    .def("value", &Box<double>::value);

So Python gets two completely separate classes. isinstance, type(), and
every type-check in your codebase sees them as unrelated:

b = _mylib._Box__int32(10)
isinstance(b, _mylib._Box__float64)  # False
type(b) is _mylib._Box__int32        # True

The usual fix is a hand-written dispatcher:

_MAP = {int: _mylib._Box__int32, float: _mylib._Box__float64}

class Box:
    def __new__(cls, val):
        return _MAP[type(val)](val)

This breaks type(b) is Box, loses docstrings, kills IDE autocomplete, and
needs to be written again for every template class in the project.

Multi-parametric templates make it worse

When you have Pair<T1, T2>, you now need a two-dimensional dispatch table.
Pair__float64__int32, Pair__int32__int64, maybe more combinations. The
hand-written approach becomes a maintenance problem very quickly.

The polybind approach

polybind solves this by reading the .pyi stub your binding tool already
produces and generating the wrapper for you.

The naming convention is intentional: use double underscores to separate
template parameters in your class names, following numpy scalar type names:

_Box__int32           →  Box<int32_t>
_Box__float64         →  Box<double>
_Pair__float64__int32 →  Pair<double, int32_t>

Then run:

python -m nanobind.stubgen -m _mylib -o _mylib.pyi
polybind _mylib.pyi

That's it. mylib.py is written and ready to import.

What you get

from mylib import Box, Pair

# single-type: auto-detect from argument
b_int   = Box(42)
b_float = Box(3.14)
b_str   = Box("hello")

type(b_int) is Box          # True  ✅
isinstance(b_float, Box)    # True  ✅
b_int.value()               # 42    ✅

# multi-type: auto-detect from both arguments
p = Pair(3.14, 5)
p.first()                   # 3.14
p.second()                  # 5
type(p) is Pair             # True

# explicit dtype when auto-detect isn't enough
Box(1, dtypes=["float64"])
Pair(1, 2, dtypes=["int32", "int64"])

# partial dict — specify what matters, rest is auto
Pair(1.0, 2, dtypes={"first": "float64"})

# subscript to get the raw C++ class
Box["int32"]                # → _mylib._Box__int32
Pair[("float64", "int32")]  # → _mylib._Pair__float64__int32

How the type map works

The generated wrapper stores a map keyed by suffix tuples:

_type_map_box: ClassVar[Dict[tuple, type]] = {
    ('int32',):   _mylib._Box__int32,
    ('float64',): _mylib._Box__float64,
    ('str_',):    _mylib._Box__str_,
}

_type_map_pair: ClassVar[Dict[tuple, type]] = {
    ('float64', 'int32'): _mylib._Pair__float64__int32,
    ('int32',   'int64'): _mylib._Pair__int32__int64,
}

__new__ maps argument types to suffix tuples via _NUMPY_TYPE_MAP and
looks them up. For Pair(3.14, 5):

type(3.14).__name__ == 'float'  →  suffix 'float64'
type(5).__name__    == 'int'    →  suffix 'int32'
key = ('float64', 'int32')      →  _Pair__float64__int32

Binding-method agnostic

polybind never imports your C++ module at generation time. It only reads the
.pyi stub — plain text that every binding tool can produce:

Tool	Stub command
nanobind	`python -m nanobind.stubgen -m _mylib`
pybind11	`pybind11-stubgen _mylib`
Cython	stubgen via mypy

Switch tools tomorrow — the polybind command stays the same.

What else is preserved

Beyond dispatch, polybind also:

Reproduces @staticmethod, @classmethod, @property decorators from the stub — returning wrapper instances, not raw C++ objects
Carries docstrings through and rewrites variant class names (_Box__int32 → Box)
Generates full type annotations using typing.Union for method signatures
Accepts np.dtype objects in the dtypes argument if numpy is installed
Registers all C++ classes as virtual subclasses of the wrapper via ABC.register(), so isinstance(raw_cpp_obj, Box) is also True

A note on when dtypes is required

polybind infers template types from constructor arguments by matching Python
type annotations. If a template parameter isn't represented in the
constructor (a tag-dispatch pattern, for example), auto-detection isn't
possible. The generated wrapper will raise a clear TypeError at runtime
asking for an explicit dtypes list.

This is a deliberate design choice: fail loudly at construction time rather
than silently select the wrong variant.

Getting started

pip install polybind
polybind _mylib.pyi          # generates mylib.py
polybind _mylib.pyi --dry-run  # preview without writing

Source and docs: github.com/mohammadraziei/polybind

Feedback is very welcome, especially from projects using less common binding
tools or unusual template patterns. Open an issue with a sample .pyi and
I'll make sure it's handled correctly.

When Constraints Build Tools

Mohammad Raziei — Sun, 05 Apr 2026 00:52:08 +0000

The office network had rules. Strict ones.

No apt-get. No brew. No npm. No downloading binaries from the internet. If it wasn't on PyPI, it didn't exist. The IT policy was clear, the firewall was clearer, and the list of exceptions was empty.

I had one job: automate the documentation pipeline. Diagrams, architecture charts, flow diagrams — all written in Mermaid, all living as .mmd files in the repo, all needing to be rendered to SVG on every build. Simple enough, in theory.

The first thing I found was mermaid-cli. The official tool. Maintained by the Mermaid team themselves. I opened the installation docs, and the first line was:

npm install -g @mermaid-js/mermaid-cli

Closed the tab.

I kept searching. There was a Python package — mermaid-cli on PyPI. I felt a small rush of hope. I ran pip install. It installed. I ran it.

It printed:

playwright install chromium

Of course. Under the hood, it needed a browser. And installing a browser meant downloading a binary from the internet, outside of PyPI, which the network blocked. Even if it hadn't — I didn't want a browser. A browser meant hundreds of megabytes of dependency for what was, at its core, a text-to-SVG conversion.

The hope disappeared.

I sat with the problem for a while.

What does Mermaid.js actually need? I read the source. It needs a DOM. Not a full browser with tabs and network requests and a GPU process — just a DOM. document.createElement. querySelector. CSS computed styles. The ability to measure text. That's it.

The reason everyone reaches for a browser is that browsers are where DOMs live. But a DOM and a browser aren't the same thing.

I remembered PhantomJS.

Most people think PhantomJS is dead. And for what it was originally built for — web scraping, UI testing, automated screenshots of modern sites — it is. Playwright killed it for those use cases in 2018, and the project hasn't had a release since.

But PhantomJS is, underneath all of that, a self-contained WebKit binary. It has a JavaScript engine. It has a real DOM. And it ships as a single executable file — no installation, no system dependencies, no apt-get required.

More importantly: it was on PyPI. Wrapped, bundled, ready to pip install.

The question was whether I could build something thin and clean on top of it. Not a web scraping tool. Not a browser automation framework. Just: run this JavaScript file, give it a DOM, capture what it prints to stdout.

That was phasma.

The first version was small. Almost embarrassingly small. A Python class that started a PhantomJS subprocess, wrote a JS file to a temp directory, ran it, and captured the output. No async, no fancy API, no browser context abstraction. Just driver.exec.

from phasma.driver import exec as run_js
output = run_js("render_diagram.js", capture_output=True)

I pointed it at Mermaid.js. Wrote a small script that loaded the library, created a DOM element, called mermaid.render(), and printed the SVG to stdout.

It worked.

The whole thing — PhantomJS starting up, loading Mermaid, rendering the diagram, printing SVG — took about 800 milliseconds. For a CI pipeline that ran once per push, that was completely acceptable.

mmdc was maybe two hundred lines of Python on top of that. Read the .mmd file. Pass the content to phasma. Capture the SVG. Write it to disk. Done.

pip install mmdc
mmdc --input architecture.mmd --output architecture.svg

No Node.js. No npm. No browser. No apt-get. Just pip — the one thing the network allowed.

There's a version of this story where I found a better solution. Where someone had already built the right thing and I just hadn't searched hard enough. Where the constraint turned out to be navigable with an existing tool.

That version didn't happen.

What happened instead is that the constraint — only PyPI, nothing else — pushed me into a corner narrow enough that the only way out was to build something. And the thing I built turned out to be useful beyond the original problem.

People use mmdc now in Docker containers where they don't want a browser. In CI pipelines where Node.js isn't available. In air-gapped environments where the internet doesn't exist. The constraint that created the tool turns out to be a constraint a lot of people have.

phasma grew a little after that. A Playwright-inspired async API got added — not because mmdc needed it, but because the lower layer was interesting enough to build on. That part is still rough around the edges, still needs work, still has edge cases that aren't handled cleanly. It's the part of the project that's most alive, and most in need of people who want to dig into Python async internals. The door is open.

But the core — driver.exec, a bundled PhantomJS binary, a DOM you can use from Python with nothing but pip — that part works. It works because it had to.

The firewall never had to open. The diagrams appeared in the documentation. The pipeline ran.

The constraint didn't block the solution — it was the solution.

Links:

phasma on GitHub — if the async API interests you, PRs are open
mmdc on GitHub
phasma on PyPI · mmdc on PyPI

I Needed to Run Mermaid.js in Python. So I Built Two Libraries.

Mohammad Raziei — Sat, 04 Apr 2026 23:09:59 +0000

It started with a single line in a requirements doc:

"Diagrams should be auto-generated as part of the build pipeline."

Simple enough, right? I was building a documentation automation tool in Python. The diagrams were written in Mermaid — clean, text-based, version-controlled. All I needed was to convert .mmd files to SVG during the build.

I looked up the standard way to do it.

npm install -g @mermaid-js/mermaid-cli

I stared at that line for a moment. This was a Python project. A pure Python project. And now I needed Node.js, npm, and — I kept reading — Chromium running headlessly in the background, just to turn a text file into an SVG.

I closed the tab.

The Search for Alternatives

Surely someone had solved this already. I started digging through PyPI.

The Python packages that claimed to render Mermaid either called out to the npm tool under the hood (so you still needed Node.js), hit a third-party API (so you needed internet access and an API key), or just... generated the Mermaid syntax and left the rendering to you.

None of them actually rendered diagrams. Locally. In Python. Without external dependencies.

I went deeper. What does Mermaid.js actually need to render? I read through the source. It needs a real DOM — document.createElement, CSS computed styles, SVG measurement APIs. It's not just parsing text; it's doing real browser-level layout to figure out where nodes go.

That's why everyone reaches for a browser. Mermaid genuinely needs one.

The Realization

At some point I remembered PhantomJS.

Most people think of PhantomJS as a dead project — and for web scraping and UI testing, it is. Playwright killed it for those use cases. But PhantomJS is, at its core, a self-contained WebKit binary with a full DOM implementation. It hasn't had a new release since 2018, but it also hasn't needed one for what I needed it for. It's frozen in time, which for a reproducible build environment is actually a feature.

The question was: could I build a clean Python interface around it that would let me inject Mermaid.js into PhantomJS and capture the SVG output?

I started building phasma.

Building Phasma

The first design goal was zero setup. If you had to install PhantomJS separately, I hadn't actually solved the original problem. So phasma bundles the PhantomJS binary directly — it ships with the package, across Windows, Linux, and macOS.

pip install phasma
# That's it. PhantomJS is included.

The core of phasma is driver.exec — a way to run JavaScript files directly through the bundled PhantomJS binary, with full DOM and WebKit support:

from phasma.driver import exec as phantomjs_exec

# Run any JS file with full PhantomJS/WebKit environment
result = phantomjs_exec("my_script.js", capture_output=True)

This was actually all that mmdc needed. I just needed to run a JS file that loaded Mermaid, rendered a diagram, and printed the SVG. The driver.exec interface handled it cleanly.

But while I was at it, I kept going.

The Playwright-like API

Once the core worked, the API felt obvious: make it look like Playwright. If you've used modern browser automation in Python, Playwright's API is the gold standard. Clean async, familiar method names, intuitive page/browser hierarchy.

import asyncio
from phasma import launch

async def main():
    browser = await launch()
    page = await browser.new_page()

    await page.goto("https://example.com")
    title = await page.text_content("h1")
    await page.screenshot(path="capture.png")
    await page.pdf(path="page.pdf")

    await browser.close()

asyncio.run(main())

The async implementation still has rough edges — this is an area where contributions are genuinely welcome. If you're comfortable with Python async internals and want to help bring the Playwright-like API to full stability, the repo is open and PRs are very much appreciated.

Then Building mmdc

With phasma working, building mmdc took surprisingly little code. The hard problem was already solved.

from mmdc import MermaidConverter
from pathlib import Path

converter = MermaidConverter()
converter.to_svg("graph TD\n  A --> B --> C", output_file=Path("diagram.svg"))
converter.to_png("graph TD\n  A --> B --> C", output_file=Path("diagram.png"))
converter.to_pdf("graph TD\n  A --> B --> C", output_file=Path("diagram.pdf"))

No Node.js. No npm. No Chromium. Just:

pip install mmdc

The CLI mirrors the official mermaid-cli syntax, so if you're already familiar with it, switching is trivial:

mmdc --input diagram.mmd --output diagram.svg
mmdc --input diagram.mmd --output diagram.png --timeout 60

What I Actually Shipped

Two packages, one problem:

phasma — a Python interface for PhantomJS with a bundled binary, driver.exec for direct JS execution, and a Playwright-inspired async API (in active development). For anyone who needs to run JavaScript with real DOM support inside Python.

mmdc — a Mermaid diagram converter built on top of phasma. Converts .mmd files to SVG, PNG, and PDF. Fully offline, no system dependencies beyond pip install mmdc.

Both are on PyPI. Both are MIT licensed. And both genuinely do what they claim.

The Honest Part

PhantomJS is old. It doesn't support ES2020+. Its async story required careful handling. And the Playwright-like API in phasma is still maturing — the sync paths are solid, but the full async implementation needs more work and testing.

But for the original problem — render Mermaid diagrams from Python with zero external dependencies — it works. Reliably. On every platform.

If you're building documentation tooling, CI pipelines, or any Python project that needs diagrams without Node.js, give mmdc a try. And if you're interested in the lower-level plumbing — running arbitrary JavaScript with a real DOM inside Python — phasma is the piece you want.

⭐ If either of these saves you from writing npm install in a Python project, a star on GitHub goes a long way:

And if you want to help stabilize the async Playwright-like API in phasma — PRs are open and very welcome.

Your Package Is Not As Lightweight As You Think

Mohammad Raziei — Sat, 04 Apr 2026 19:34:35 +0000

There's a claim you've probably seen in a README before:

"Zero dependencies. Lightweight. Minimal footprint."

It sounds great. But most of the time, it's only half the story.

The Weight You Don't See

When you run pip install some-package, you're not installing one thing. You're installing that package plus every library it depends on, plus every library those libraries depend on. The size printed on PyPI is just the tip of the iceberg.

This matters a lot in constrained environments: embedded systems, Docker containers you want to keep slim, serverless functions with cold start sensitivity, CI pipelines that install fresh on every run, or HPC clusters where storage quotas are real and network bandwidth costs time.

And yet, almost no one measures this before claiming their package is "lightweight."

A Real Example: XML Parsing in Python

I was building pygixml, a Python binding for the pugixml C++ library, aimed at high-performance XML parsing. At some point, I claimed it was lighter than the alternatives.

But lighter compared to what, exactly? And measured how?

I wrote a small tool called pip-size to find out. It queries the PyPI JSON API and calculates the real download size of a package — the wheel file itself — along with the complete transitive dependency tree. No downloads, no installs, no guesswork.

Here's what it showed for the three main Python XML parsing libraries:

$ pip-size pygixml
  pygixml==0.6.0  167.3 KB

$ pip-size pugixml
  pugixml==0.7.0  375.1 KB

$ pip-size lxml
  lxml==6.0.2  5.0 MB

The comparison holds up: pygixml is about 2.2× lighter than pugixml and roughly 30× lighter than lxml. In this case none of the three have significant Python-level dependencies, so the package itself is the story.

But that's not always the case.

When Dependencies Change Everything

Let me show you a more dramatic scenario. Imagine you're choosing an HTTP client for a minimal service:

$ pip-size httpie
  httpie==3.2.4  119.2 KB  (total: 4.1 MB)
  ├── requests==2.32.5  63.2 KB  (total: 834.8 KB)
  │   ├── urllib3==2.3.0  341.8 KB
  │   ├── charset-normalizer==3.4.1  204.8 KB
  │   ├── certifi==2025.1.31  164.0 KB
  │   └── idna==3.10  61.4 KB
  ├── rich==13.9.4  238.1 KB  (total: 1.2 MB)
  │   ├── markdown-it-py==3.0.0  87.3 KB
  │   └── pygments==2.19.1  4.4 MB
  └── ...

The package itself is 119 KB. Its total footprint is 4.1 MB. That's a 34× multiplier hidden behind a single pip install.

This is not a criticism of httpie — it's a fully-featured CLI tool and those dependencies are justified. The point is that the number on the PyPI page is almost never the number that matters.

The Fairness Problem

Here's the thing that bothered me when I started thinking about this:

If library A claims to be "lightweight" and library B doesn't make that claim, but A pulls in 800 KB of dependencies while B pulls in 200 KB — who's actually lighter?

The "lightweight" claim is often made based on the package's own size, or the number of dependencies, rather than the actual bytes that land on disk. Neither of those is a fair measure.

A fair comparison looks at the full dependency tree. And that's what pip-size is designed to make easy — no installation required, just a quick query against PyPI's public API.

How pip-size Works

The tool uses PyPI's JSON API (https://pypi.org/pypi/{package}/json) to:

Resolve the correct version based on your specifier
Select the right wheel for your platform, using the same priority logic as pip itself
Walk the requires_dist metadata to find all dependencies
Resolve each dependency recursively, in concurrent BFS layers
Report the size at every level of the tree

The output is a tree where every intermediate node shows both its own size and the total weight of its subtree — so you can see at a glance which dependency is responsible for the bulk of the footprint.

  fastapi==0.115.12  276.3 KB  (total: 1.1 MB)
  ├── starlette==0.46.1  254.0 KB  (total: 481.2 KB)
  │   └── anyio==4.9.0  227.2 KB
  ├── pydantic==2.11.3  440.5 KB  (total: 1.6 MB)  ← here's your culprit
  │   ├── pydantic-core==2.33.1  1.8 MB
  │   └── ...
  └── ...

When to Use This

A few concrete situations where this kind of measurement is useful:

Before publishing a library. If you're telling users your library is lightweight, measure it. Run pip-size your-package and check whether the claim survives contact with reality.

Choosing between alternatives. pip-size requests vs pip-size httpx vs pip-size aiohttp gives you a side-by-side cost comparison without installing anything.

Auditing a project's dependencies. pip-size your-project before a Docker build tells you where the size is coming from and which dependency is worth optimizing.

CI size budgets. The --quiet --bytes flags output a raw number, which you can compare against a threshold in a shell script or GitHub Action.

SIZE=$(pip-size my-package --quiet --bytes)
if [ "$SIZE" -gt 5000000 ]; then
  echo "Package exceeds 5 MB size budget"
  exit 1
fi

Optional Dependencies

One more thing worth mentioning: pip-size handles optional dependencies (extras) correctly.

By default, it only includes dependencies that are always required — the same ones pip would install for a plain pip install package. If you want to see the cost of enabling specific extras:

$ pip-size "requests[security]"
  requests==2.32.5  63.2 KB  (total: 1.2 MB)
  ├── urllib3==2.3.0  341.8 KB
  ├── ...
  ├── cryptography==44.0.3  518.2 KB  [extra: security]
  └── pyOpenSSL==25.0.0  112.4 KB    [extra: security]

Or if you want the absolute worst-case footprint — every optional dependency across the entire tree — use --all-extras.

Conclusion

The size shown on a PyPI package page is that package's size. The size that actually matters is the size of everything it brings with it.

Before claiming a package is lightweight, measure it. Before choosing between libraries, compare their full footprint. Before shipping a container, know what's in it.

pip-size is available on PyPI:

pip install pip-size

Source: github.com/mohammadraziei/pip-size

The numbers in this article were obtained on Python 3.11 / Linux x86_64. Sizes vary by platform and Python version because pip selects different wheels.

🎉 Big News for Python Developers & Mermaid Fans: "mmdc" Makes Mermaid Diagrams Easy as Python! 🚀

Mohammad Raziei — Thu, 08 Jan 2026 09:02:36 +0000

If you love Mermaid diagrams — flowcharts, sequence diagrams, Gantt charts, pie charts, and more — but you’ve ever felt stuck because you had to install Node.js, npm, browsers, or other system tools just to generate diagram files, today is your day!
Say hello to mmdc, the Python‑native Mermaid diagram converter that finally lets you generate beautiful diagrams straight from Python — with no external installs, no system packages, and no extra runtime hassles! 🙌(github)

🧠 First — What Is Mermaid?

Mermaid is an open‑source diagramming tool that lets you define diagrams using simple, text‑based syntax — very similar to Markdown — and render them into real diagrams.
You write plain text like:

graph TD
  A --> B
  B --> C

…and Mermaid turns it into a visual flowchart you can embed in docs, wikis, blogs, or technical writing. It’s fast to learn, version‑control friendly, and integrates with many tools.

Mermaid has become super popular in documentation, engineering teams, and developer blogs precisely because diagrams become code — no GUI drag‑and‑drop tools, no files to manage manually — just text that lives with your project.(Mermaid)

🌟 Why mmdc Is a Game Changer

Traditionally, if you wanted to convert Mermaid into SVG, PNG, or PDF, you needed:

✔ Node.js
✔ npm
✔ Mermaid CLI
✔ Browsers or headless workers
✔ Extra system tools

That always felt like overkill for something as simple as turn text into a diagram.

mmdc changes all that. It’s a pure Python solution — installable with a single pip install, and it works without installing ANY external tools like system packages or browsers.

It uses the powerhouse library Phasma, which leverages an internal PhantomJS instance under the hood to render Mermaid code into real diagram outputs — yet you never have to install anything else yourself. This makes it perfect for Python environments, automation, docs pipelines, and CI/CD workflows.

🚀 Installation

Just run:

pip install mmdc

That’s all — you’re ready to go! No Node.js, npm, apt installs, or browsers required.

🌈 Use It from the Command Line

Convert a simple Mermaid file to SVG:

mmdc --input my_diagram.mmd --output my_diagram.svg

Create PNG or PDF just by specifying the extension:

mmdc --input my_diagram.mmd --output my_diagram.png
mmdc --input my_diagram.mmd --output my_diagram.pdf

Perfect for automated doc builds, static site generators, or even blog pipelines!

🐍 Use It in Python Too

Want to generate diagrams right inside your Python code? No problem:

from mmdc import MermaidConverter

converter = MermaidConverter()

mermaid_text = """
graph TD
    A[Start] --> B{Is it cool?}
    B -->|Yes| C[Love it!]
    B ---->|No| D[Try again]
"""

converter.to_svg(mermaid_text, output_file="cool_diagram.svg")

Simple, powerful, and integrates cleanly with Python applications, docs generators, notebook workflows, and automation scripts!

📊 Example Mermaid Code Snippets

Here are a few Mermaid diagrams you can try:

📈 Flowchart

graph LR
    A[Idea] --> B[Develop]
    B --> C[Test]
    C --> D[Deploy]

🔁 Simple Loop

flowchart TD
    Start --> Process
    Process --> Review
    Review -->|OK| End
    Review -->|Fix| Process

⏱️ Sequence Diagram

sequenceDiagram
    Alice->>Bob: Hello Bob!
    Bob-->>Alice: Hi Alice!

💡 Why This Matters

🧩 No external setup: Python devs finally get Mermaid without any extra installs.
🛠 Fits docs automation: Great for Sphinx, MkDocs, Jupyter, notebooks, and CI/CD.
📦 Python‑centric workflows: Treat diagrams as first‑class parts of your codebase.

🎉 Wrap Up

If you’ve ever wanted a clean, Python‑only way to generate Mermaid diagrams, mmdc is huge news. It brings a beloved text‑based diagramming approach straight into the Python ecosystem — all with a single pip install.

Now diagrams truly can be code first — versioned, automated, lightweight, and beautiful — without the weight of external toolchains. 💥

DEV Community: Mohammad Raziei

Phasma: I Brought PhantomJS Back from the Dead (and It Runs with Just `pip install`)

The Problem: Browser Automation Should Be pip install

What Is Phasma?

The Real Use Case: Running DOM-dependent JavaScript Anywhere

How It Works Under the Hood

The API

SVG Rendering — Also Batteries Included

Running Any DOM-dependent JS: The General Pattern

Benchmark

Installation & Quick Start

Who Is This For?

Links

pygixml 0.10.0 released — A Faster, Smarter XML Parser for Python

The Numbers (50 iterations, 5 000 elements)

What's New in 0.10.0

1. children() — Iterate Direct Children (or All Descendants)

2. text() — Recursive Text Extraction with Configurable Joins

3. element.value = "text" — Finally, This Works

4. from_mem_id_unsafe() — O(1) Node Lookup

Why aren't XMLNode objects hashable?

Using nodes in dictionaries (the right way)

5. xpath Property — Generate Absolute XPath to Any Node

6. xml Property — One-Liner XML Serialization

7. ParseFlags Enum

8. Python 3.6–3.13 Support

Full Feature Summary

Installation

Zero Runtime Dependencies

Links

How to Parse XML Fast in 2026 (Python)

The Problem with XML Parsing in Python

The Numbers

How pygixml Works Under the Hood

Getting Started

Parsing XML

XPath Queries

Creating XML

Modifying XML

Performance Tuning: Parse Flags

Which Library Should You Use?

The Full Benchmark

Wrap-Up

Introducing pip-size: See the Real Cost of Python Packages

Why Package Size Matters More Than You Think

What is pip-size?

Quick Example

Why This Matters

1. Fair Comparison Between Alternatives

2. Audit Your Own Packages

3. Spot Heavy Dependencies

4. CI Automation

Installation

Key Features

The Bigger Picture

Why I Built pip-size: A Story About Obsession with Performance

It Started with a Simple Question

The Problem Nobody Talks About

The Search for a Solution

Introducing pip-size

Features

Why This Matters

1. Deployment

2. Cold Starts

3. CI/CD

4. User Trust

The Bigger Picture

What's Next

Final Thoughts

The Real Size of AI Frameworks: A Wake-Up Call

You Think You Know What You're Installing

The Numbers Don't Lie

The PyTorch Surprise

TensorFlow: The Heavy Champion

JAX: The Lightweight Contender?

The Hidden Gems

ONNX Runtime: Only 39.5 MB

Keras: Just 29.5 MB

What This Means for You

1. Docker Images

The Problem: Browser Automation Should Be `pip install`

1. `children()` — Iterate Direct Children (or All Descendants)

2. `text()` — Recursive Text Extraction with Configurable Joins

3. `element.value = "text"` — Finally, This Works

4. `from_mem_id_unsafe()` — O(1) Node Lookup

Why aren't `XMLNode` objects hashable?

5. `xpath` Property — Generate Absolute XPath to Any Node

6. `xml` Property — One-Liner XML Serialization

7. `ParseFlags` Enum