Generate Blazing-Fast Ad-Hoc Python Functions From Declarative Rules

#python #performance #etl #dataengineering

“I just need to turn this list of dicts into that list of dicts—why am I writing 200 lines of nested loops?”

If you’re a Python dev who keeps rewriting the same for loops to transform dicts, lists, or CSV streams, read on.

Today I’ll show you how to replace those loops with a tiny library that turns Python expressions into blazing-fast, ahead-of-time compiled functions.

pip install convtools

The 30-second pitch

You have 10K JSON blobs with ugly external keys and you need them renamed now.
This is the entire solution:

from convtools import conversion as c

data = [
    {"external key1": 1, "external key2": 2, "external key3": 3}
] * 10000

config = {
    "key1": "external key1",
    "key2": "external key2",
    "key3": "external key3",
}

# BEFORE: 1.33 ms ± 6.8 μs
results = []
for item in data:
    new_item = {k: item[v] for k, v in config.items()}
    results.append(new_item)

# AFTER (prepare once): 46.5 μs ± 133 ns
converter = c.list_comp(
    {k: c.item(v) for k, v in config.items()}
).gen_converter()

# AFTER (run many times): 801 μs ± 5.56 μs -- 1.66x times faster;
# same for 10x more data
converter(data)

# What convtools actually compiles for you
# ────────────────────────────────────────
def _converter(data_):
    try:
        return [
            {
                "key1": _i["external key1"],
                "key2": _i["external key2"],
                "key3": _i["external key3"],
            }
            for _i in data_
        ]
    except __exceptions_to_dump_sources:
        __convtools__code_storage.dump_sources()
        raise

That’s it.
No pandas install, no runtime overhead, no hidden global state—just a single Python function.

⚡ More 30-second snacks

CSV ➜ grouped JSON:

from convtools import conversion as c
from convtools.contrib.tables import Table

converter = c.group_by(c.item("country")).aggregate(
    {
        "country": c.item("country"),
        "total": c.ReduceFuncs.Sum(c.item("sales")),
    }
).gen_converter()
stream = Table.from_csv("big.csv", header=True).into_iter_rows(dict)
converter(stream)

flatten dict:

c.item("user", "profile", "name").as_type(str)

left join 2 sequences:

c.join(
    c.item(0),
    c.item(1),
    c.and_(
        c.LEFT.attr("id") == c.RIGHT.attr("id"),
        c.RIGHT.attr("other") <= 1000.0,
    ),
    how="left",
)