DEV Community

Cover image for 8 small data transforms I don't want to write as shell glue anymore
Gaëtan Montury
Gaëtan Montury

Posted on

8 small data transforms I don't want to write as shell glue anymore

🦀🐍 Practical fimod examples for small CI/config data transforms with a Python-like taste.

In the first article,
I wrote about why I built fimod: I kept seeing tiny python3 -c scripts in CI pipelines, doing just enough JSON/YAML/CSV work to be useful, and just enough shell quoting to become annoying.

This article is more concrete.

Here are 8 small transforms that I do not want to keep rewriting as shell glue, ad-hoc Python snippets, or copied scripts between repositories.

fimod is not trying to replace jq or yq. Those are excellent tools. I see fimod more as a small data-shaping Swiss army knife: a few megabytes, Python-like expressions, common structured formats, reusable molds, and a controlled execution surface.

📚 Website/docs: https://pytgaen.github.io/fimod/

⭐ Repository: https://github.com/pytgaen/fimod


1. Extract one value for a shell script

Sometimes a CI job only needs one value from a structured file.

{"name":"demo","version":"1.2.3"}
Enter fullscreen mode Exit fullscreen mode

With fimod:

fimod s -i package.json \
  -e 'data["version"]' \
  --output-format txt
Enter fullscreen mode Exit fullscreen mode

Output:

1.2.3
Enter fullscreen mode Exit fullscreen mode

That makes it easy to use in shell:

VERSION=$(fimod s -i package.json -e 'data["version"]' --output-format txt)
Enter fullscreen mode Exit fullscreen mode

No JSON boilerplate, no quotes around the string, no temporary script.


2. Validate a config file in CI

For boolean checks, --check suppresses stdout and maps truthiness to the exit code.

fimod s -i deploy.yaml \
  -e 'all(k in data for k in ["image", "replicas", "port"])' \
  --check
Enter fullscreen mode Exit fullscreen mode

For clearer error messages, use gk_assert:

fimod s -i deploy.yaml -e '
def transform(data, **_):
    gk_assert("image" in data, "missing image")
    gk_assert("replicas" in data, "missing replicas")
    gk_assert("port" in data, "missing port")
    return True
' --check
Enter fullscreen mode Exit fullscreen mode

On failure, the error messages go to stderr and the process exits non-zero — the shape CI expects.


3. Read an HTTP API directly

For small API transforms, fimod can read HTTPS URLs directly:

fimod s -i https://api.github.com/repos/pytgaen/fimod \
  -e '{"name": data["name"], "stars": data["stargazers_count"]}'
Enter fullscreen mode Exit fullscreen mode

Example output:

{
  "name": "fimod",
  "stars": 4
}
Enter fullscreen mode Exit fullscreen mode

This is not meant to replace a full HTTP client in an application. But for CI metadata, release scripts, or quick API-to-config glue, it avoids another curl | jq | sed chain.


4. Extract named regex captures

fimod injects regex helpers into every transform. re_search returns structured data, including named captures.

echo '{"tag":"release-v2.4.1"}' \
  | fimod s \
      -e 're_search(r"(?P<major>[0-9]+)\.(?P<minor>[0-9]+)", data["tag"])["named"]'
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "major": "2",
  "minor": "4"
}
Enter fullscreen mode Exit fullscreen mode

Under the hood, the regex helpers use Rust's fancy-regex crate, with PCRE-like features such as lookahead, lookbehind, backreferences, and named captures.


5. Flatten nested fields to CSV

APIs often return nested JSON, while the next step wants a flat CSV artifact.

[
  {"name":"Alice","email":"alice@example.com","address":{"city":"Paris"}},
  {"name":"Bob","email":"bob@example.com","address":{}}
]
Enter fullscreen mode Exit fullscreen mode
fimod s -i users.json \
  -e '[{"name": u["name"], "email": u["email"], "city": dp_get(u, "address.city", "unknown")} for u in data]' \
  -o contacts.csv
Enter fullscreen mode Exit fullscreen mode

Output:

name,email,city
Alice,alice@example.com,Paris
Bob,bob@example.com,unknown
Enter fullscreen mode Exit fullscreen mode

dp_get avoids a small pile of defensive nested dict.get(...) calls.


6. Hash sensitive fields before exporting

For anonymized fixtures or safer artifacts, hashing is built in:

fimod s -i people.csv \
  -e '[{**row, "email": hs_sha256(row["email"])} for row in data]' \
  -o people-anon.csv
Enter fullscreen mode Exit fullscreen mode

Input:

name,email
Alice,alice@example.com
Bob,bob@example.com
Enter fullscreen mode Exit fullscreen mode

Output shape:

name,email
Alice,ff8d9819fc0e12bf0d24892e45987e249a28dce836a85cad60e28eaaa8c6d976
Bob,5ff860bf1190596c7188ab851db691f0f3169c453936e9e1eba2f9a47f7a0018
Enter fullscreen mode Exit fullscreen mode

The point is not that hashing is hard in Python. It is not. The point is that in a small CI transform, I do not want to import and wire another script just for that.


7. Generate text/config with MiniJinja

fimod is not limited to data-to-data transforms. It can also render text from structured data using MiniJinja-powered helpers.

echo '{"app":"api","host":"localhost","port":8080}' \
  | fimod s --output-format txt \
      -e 'tpl_render_str("APP={{ app }}\nURL=http://{{ host }}:{{ port }}\n", data)'
Enter fullscreen mode Exit fullscreen mode

Output:

APP=api
URL=http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

This opens the door to .env files, Markdown snippets, Dockerfiles, small config files, or release notes generated from structured data.

For larger templates, fimod also supports rendering template files from directory molds with tpl_render_from_mold(...).


8. Reuse a mold from a registry

One-liners are nice, but the more interesting part is sharing transforms.

A mold is a reusable transform script. A registry lets you call molds by name with @name instead of copying scripts between repositories.

For example, with the example registry configured, you can call the shared pick_fields mold by name:

fimod s -i contacts.csv \
  -m @pick_fields \
  --arg fields=name,email \
  -o contacts-public.json
Enter fullscreen mode Exit fullscreen mode

This works because fimod resolves @pick_fields from a configured registry. In a fresh environment, you can configure the default registry once, or point this command at the official example registry explicitly with FIMOD_REGISTRY=https://github.com/pytgaen/fimod/tree/main/molds.

Input:

name,email,role
Alice,alice@example.com,admin
Bob,bob@example.com,user
Enter fullscreen mode Exit fullscreen mode

Output:

[
  {
    "name": "Alice",
    "email": "alice@example.com"
  },
  {
    "name": "Bob",
    "email": "bob@example.com"
  }
]
Enter fullscreen mode Exit fullscreen mode

In a team, that registry can be your own Git repository. That is where fimod becomes more than a one-liner tool: reviewed transforms can be reused across projects without copy-pasting another tiny Python script everywhere.


So when would I use this?

Fimod is a great fit for CI/config data plumbing: one small but powerful binary that can read many structured formats, pull input from HTTP when the data lives behind an API, and reuse molds when a transform should become shared project knowledge.

The registry and sandbox matter for that last part. They make sharing a transform less about copying a script around, and more about resolving known code from a known place, with a controlled execution surface.

I would reach for it when I need a small, explicit data transform in a pipeline:

  • extract one value;
  • validate a config;
  • reshape JSON/YAML/CSV/TOML;
  • read a small API response;
  • hash or mask fields;
  • generate a tiny text artifact;
  • reuse a shared transform as a mold.

That is where I want fimod to shine: boring CI/config transforms with less glue.

If you try one of these examples, I would start with a small file you already manipulate in a script today.

  • ⭐ If the idea looks useful, a GitHub star helps the project be discovered.
  • 💬 If you have a quick reaction, a comment here on dev.to is perfect.
  • 🐛 If an example breaks or the docs are unclear, a short GitHub issue with the command you tried is more than enough.

Top comments (0)