DEV Community: Teske Systemtechnik

AWS Cost Optimization

Teske Systemtechnik — Tue, 16 Jun 2026 15:58:30 +0000

65% AWS cost reduction ($3,850 → $1,330 / month) via safe legacy decommissioning, zero downtime for production systems.

The challenge

Blender Networks Inc., an advertising agency based in Bedford, Canada, had a bloated AWS bill, root cause: a legacy, proprietary ad-serving platform nobody seriously used anymore, yet still burning compute, load balancers and storage. Goal: decommission it cleanly without the production WordPress sites or email infrastructure (MX) going down for even a second.

The approach

Log analysis, not guesswork. Deep dive into ALB access logs to separate legitimate traffic from noise, uncovered hidden dependencies between the old EKS cluster and live production and cleanly decoupled them.
8 "zombie ALBs" eliminated. Load balancers that served no real traffic anymore, just magnets for bot scans (PHPUnit exploits, credential stuffing). Expensive noise with no business value. Shut down.
EKS cluster phased down. Controlled shutdown of the entire Kubernetes cluster including worker nodes; orphaned target groups and load balancers removed.
Storage cleanup. 1.4 TB of obsolete EBS snapshots and system logs deleted. 1 TB of historical RDS backups moved to Glacier Deep Archive via an S3 lifecycle rule.
EC2 right-sizing. Once bot traffic and legacy overhead were gone, the main server could be safely scaled down, the final push on compute cost.

Before / after: what drove the old bill and the steps that brought it down.

The result

−65% AWS cost, from ~$3,850 to $1,330 per month.
Zero downtime for production WordPress and email throughout the entire decommissioning.
1.4 TB storage freed, 1 TB migrated to cold storage.
8 attack vectors eliminated via zombie-ALB shutdown.

Book Lister AI

Teske Systemtechnik — Tue, 16 Jun 2026 15:58:16 +0000

Desktop app that scans used books in under 30 seconds, extracts data via Gemini vision, live-prices, and lists on eBay, +400% throughput via computer vision and GenAI.

The challenge

In the used-book trade the bottleneck isn't sales, it's data entry. Per book, staff used to spend 3 to 5 minutes photographing, transcribing (title, author, ISBN), researching prices, SEO-optimising, and uploading. At thousands of books per month that's enormous labour cost, before a single euro is earned.

The solution: hardware meets AI

An end-to-end pipeline that ties the physical scan process to multimodal AI and live APIs:

Smart scanning. The book sits on a mat calibrated with ArUco markers. The webcam corrects perspective in real time, physically measures dimensions (for automatic shipping classes) and scans the barcode.
AI data extraction. Two high-resolution scans (cover + back) go to Gemini 2.5 Flash. A strict JSON schema extracts title, author, publisher, year.
Automatic pricing. Cross-checks against the Google Books API, queries the eBay Browse API for live competitor listings, and calculates a competitive price with profit-margin protection.
Background upload. One operator click, a background worker pushes the listing live via the eBay Trading API while the next book is already being scanned.

Engineering highlights & fail-safe architecture

Absolute reliability was the core focus, the app runs in warehouse operations; downtime directly costs money:

Trust-but-verify on AI data. Since LLMs occasionally hallucinate ISBNs, the architecture treats AI output as a hypothesis only. The ISBN is forcibly validated against the hardware barcode scan and a Google Books fuzzy match (thefuzz). Bad data is blocked before it corrupts the listing.
Hybrid computer vision. Dual barcode-decoding system (ZBar for clean, zxing-cpp for damaged codes), maximum recognition rates even on old, scratched books.
Thread-safe capture pipeline. To prevent Windows STATUS_HEAP_CORRUPTION crashes from competing camera restarts: strict VideoCapture ownership inside a dedicated, watchdog-monitored capture thread.
Zero-touch database migrations. SQLite in WAL mode with automatic schema migration at app start. Every migration locked in by an explicit pytest suite. Updates roll without customer intervention.

The result

+400% throughput. From 3–5 minutes per book to under 30 seconds.
Cold start → first frame: 2–4 seconds.
Scan → price: 4–6 seconds.
~6,450 LOC production code, locked in by ~2,720 LOC of unit tests (260+ pytest tests) + GitHub Actions CI.
Deployment: 165 MB monolithic PyInstaller executable, double-click, done.

Multi-Process Browser Automation Framework

Teske Systemtechnik — Tue, 16 Jun 2026 15:57:56 +0000

17k LOC Python framework for parallel, fault-tolerant browser workflows, race-safe worker coordination, cross-process crash bridge, per-phase timeouts, and full operator UX through Streamlit.

The challenge

A private client needed a permanently operational backend for browser-based automation workflows. The requirements were engineering-first from day one, not feature-first:

Multiple parallel browser sessions, cleanly isolated from each other.
Subprocess architecture, a crash in one session must not take others down with it, and a hung workflow must not block the entire process tree.
Full observability, every phase transition logs its status; every crash carries a unique phase marker.
Operator UX through a dashboard rather than the CLI, the end user is the client, not the developer.
100 % type hinting, clear layer separation, full pytest setup with GitHub Actions CI from day one.

Trivial that's not on Windows: parallel Chrome instances are a minefield of race conditions (port collisions on dynamic CDP allocation, profile locks in the user-data-dir, zombie processes on Streamlit restart). And a subprocess that dies before its own crash handler can even run means, without a protection mechanism, a silently lost error report, exactly the class of bug that stays undetected in production for six weeks.

The approach

The result is a 17,461 LOC Python codebase across 25 cleanly modularized files, fully type-annotated, organized into three clearly decoupled layers:

Presentation → Streamlit UI Control → Scheduler + CLI orchestrator Execution → Browser workers (as subprocesses)

Cross-layer communication runs exclusively through SQLite and atomically written JSON files; no worker ever imports another.

Race-safe multi-worker coordination. With N parallel asyncio tasks, workers share an asyncio.Lock-based round-robin: only one worker at a time runs the expensive discovery step, the others wait at the lock and pick up the result from a shared dict. Halves outgoing output without losing speed and avoids all workers duplicating the same operation in parallel.
Best-result aggregation with coordinated cancellation. As soon as one worker hits the target result, an asyncio.Event fires and a watchdog task calls task.cancel() on all sibling tasks. Clean CancelledError propagation instead of polling. Additionally, a class-global _completed_jobs set suppresses late reports from the cancelled tasks, no notification spam, even when 10 siblings simultaneously walk their cleanup paths.
Subprocess isolation at Windows level. Each worker gets a fully isolated Chrome instance: race-safe port allocation via socket bind (_PortLock holds the port reserved until Chrome takes it over, no TOCTOU race), its own user-data-dir (chrome_<uuid>), its own crash dump path, its own CDP session. No shared resources, no lock conflicts between parallel sessions, no leaking browser state.
CDP-based auth configuration. Instead of a classic Manifest V2 browser extension, auth configuration runs directly through Chrome DevTools Protocol via Fetch.authRequired. Auth events propagate automatically onto popup pages via a context.on("page", …) handler. A lean class replaces the traditional extension workaround with significantly less surface area.
Cross-process crash file bridge. Workers run as subprocesses, spawned by the scheduler via subprocess.Popen. On a crash, the subprocess writes a structured JSON file to data/crashes/job_<id>_<ts>.json AND attempts a direct Telegram notification in parallel. The scheduler reads the file back after subprocess exit, deduplicates against the already-sent notification, fills in missing reports, or quietly cleans up the file when everything was already reported. A global sys.excepthook as last line of defence guarantees: no crash gets lost, even if the subprocess dies so early that its own crash handler never runs.
Per-phase timeouts with live phase tracking. Every workflow phase runs inside a dedicated asyncio.timeout() block; every phase updates a central context object with its current sub-step. On a crash, the Telegram report says exactly which phase of which worker failed, not "somewhere in main()" but "4/6 add_step: concrete UI element X". Debugging time drops from "first scan the logs" to "jump straight to the function".
Typed error hierarchy + swarm deduplication. A dedicated exception class per failure mode (ProxyError, NavigateError, SessionExpiredError, …), each with its own recovery policy (close the browser vs. leave it open, retry with a different proxy, hard fail). When 10 workers crash in parallel with the same root cause, report_grouped_errors groups the messages by (exception type, first stack frame) and sends a single aggregated Telegram message with worker IDs and affected phases, no 10 redundant pings.
SQLite with WAL + BEGIN IMMEDIATE for race safety. Counters and state in the tables are updated in read-modify-write transactions. With N parallel workers incrementing a counter simultaneously, naive UPDATE counter logic would increment by 1 instead of N, BEGIN IMMEDIATE serializes the updates correctly and prevents the race at the SQLite level before it ever reaches Python. Plus WAL mode + 64 MB cache + 256 MB mmap for read performance under parallel write pressure. Auto-migrations on first connect, with pytest tests verifying every migration individually.
Atomic file IPC. All inter-process state files (job status snapshots, live state, run reports) are written atomically, write to .tmp, then os.rename(). No worker ever reads a half-written JSON file, even under concurrent access from multiple subprocesses. POSIX semantics, works on Windows too since Path.replace().
Date-versioned logs. logs/<DD-MM-YYYY>/{chrome,traces,screenshots,…}/, every phase of every worker produces clearly attributed artefacts (Chrome stdout, Playwright trace, failure screenshot). On a production issue, ls logs/23-03-2026/screenshots/ finds the exact failure phase of every affected worker in five seconds, plus the full Playwright trace ready to replay in the browser trace viewer.
Streamlit operations UI with Windows hard cleanup. Multi-page dashboard with service lifecycle (start/stop/restart of all subsystems), DB CRUD, live logs, EN/IT localization across 300+ string pairs. Streamlit is finicky on Windows, process-tree cleanup is not guaranteed on shutdown, children become zombies. Solved via a Windows Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE: every subprocess gets assigned to the job handle on spawn, and on Streamlit exit the OS automatically terminates all children cascadingly. Works even on hard-kill via Task Manager, no orphaned browser processes left behind.
Centralized Telegram reporter. A single ErrorReporter class as the single entry point for all notifications. Fire-and-forget by contract: never throws an exception, never blocks longer than the HTTP roundtrip, fails silently on connection errors (Windows 10054 ConnectionReset, …) and retries with a fresh session. Direct connection without proxy (session.trust_env = False), so system proxy vars don't silently kill reports, plus global suppression logic against notification storms.

Engineering highlights & fail-safe architecture

Reliability was absolutely non-negotiable, the system runs unattended 24/7 and the end user is not a developer:

9 pytest test suites with GitHub Actions CI. Database migrations (idempotent, runnable multiple times), error reporter (260+ tests including suppression logic and crash-file roundtrip), coordination patterns, proxy layer, shared helpers, config resolution, all validated automatically on every push against Ubuntu Python 3.13. Migration bugs get caught before deploy, not at runtime.
Strict layer separation without circular imports. Presentation layer imports only the control layer; control layer imports only the execution layer + utils. Every subprocess can be brought up standalone, without Streamlit even being installed, relevant for CI runs and debugging sessions without UI overhead.
Singleton path resolution. A PATHS singleton class with auto-root detection (walks up looking for marker files like .git, .env, requirements.txt) and automatic directory creation on property access. Code called from any working directory consistently finds the same absolute paths, not a single os.path.join(os.path.dirname(__file__), …) in the entire codebase.
Dataclass-first domain model. All workflow inputs and outputs are dataclasses with type hints, validation logic, and clean from_dict/to_dict roundtrips. Clean interfaces between layers, IDE autocomplete works, refactorings raise compile-time errors instead of runtime AttributeErrors.
Test-first for IPC-critical components. Crash file bridge, suppression gate, and aggregator logic are the riskiest spots, they run exactly when everything else is broken. The test suite is correspondingly dense: a custom _clear_* fixture pattern resets class-global state between tests, every edge case (subprocess crashed BEFORE crash-file write, crash file without Telegram flag, Telegram after crash file, both in parallel) has an explicit test case.
Dependency-injection layer for tests. Every external dependency (DB, Telegram, proxy provider, filesystem) sits behind a thin interface that can be swapped for an in-memory equivalent in test mode. The SQLite tests, however, run against a real SQLite DB in pytest's tmp_path, not a mock, migration bugs would systematically not be detected by mocks.

Volume breakdown of the entire codebase across Presentation, Control, Execution Core, and Utility layers, the Execution Core (8,228 LOC) dominates visually as the largest block, while the utility layer shows modularity across 11 small helper modules.

The result

17,461 LOC of production Python across 25 cleanly modularized files, clean layer separation, no circular imports, every stage standalone runnable.
260+ pytest tests with GitHub Actions CI on every push, migration bugs, IPC race conditions, and suppression logic all get caught before deploy.
Cross-process crash bridge, no crash gets lost, even when a subprocess dies before its own crash handler.
Race-safe coordination across N parallel browser workers on Windows, no port collisions, no profile locks, no zombie processes on shutdown.
Full operator UX through Streamlit, the end client toggles services with one click, sees live status and live logs, without ever touching the CLI.
Modularly extensible, new workflow types are a new module against the existing coordination and reporting infrastructure, without touching the core.

Legacy-DB Reverse Engineering & Migration

Teske Systemtechnik — Tue, 16 Jun 2026 15:57:44 +0000

1.47 million parts liberated from a 1.2 GB password-protected manufacturer database and migrated into the client's new system, incl. 82,076 converted exploded-view drawings, validated to zero rule violations, fully auditable.

The challenge

The client runs the after-market spare-parts business for an international Tier-1 construction-machinery manufacturer and holds a valid license for that manufacturer's maintenance database, a 1.2 GB password-protected MS Access file (.mdb) from the early 2000s. The lawfully acquired data needs to migrate into the client's new ERP / inventory system. Three walls in the way.

First: the database's original configuration parameters, specifically the access credentials, got lost internally over the years, and the legacy manufacturer tool will open the DB in the background but exports no raw data.

Second: the schema. Around 30 interlocked tables with cryptic column names, n:m relations between catalogues and sales models, part names spread across three tables plus a language field, none of it readable without structural analysis.

Third: roughly 82,000 exploded-view drawings in the obscure DjVu format of the early 2000s, which the new system doesn't render.

And all of it at a scale (~1.9 million raw rows) where Excel would silently truncate at 1,048,576 rows, effectively swallowing half the machinery fleet. The job: tear down all three walls and migrate the full, licensed dataset cleanly into the new system.

The approach

Forensic recovery of lost credentials. Rather than treating the legacy maintenance tool as a black box, the client's installation environment is forensically analysed: a targeted scan across the entire local install directory (.exe, .dll, .ini, .cfg, .xml) walks the configuration files and runtime artifacts looking for the classic indicators of persisted connection parameters, Jet OLEDB strings, PWD/UID references, .mdw paths, with a context window around every match. The configuration leftovers reconstruct the original credentials, which the client owns by virtue of his license anyway. From that point on, the DB is readable directly at the data layer via pyodbc + Microsoft Access Driver.
Structural schema reconstruction. The DB has no docs, no ER diagrams, only table names and column codes. A schema scanner pulls 10-row samples plus full column headers from each table and surfaces the relationships between the ~30 tables, which table holds replacement numbers, which holds image file IDs, where the language variants of part names live. That's the foundation for the later single-source join. Result: three laterally scattered sources for replacement-part numbers (one live mapping table plus two master-data tables with historical replacements), two for image files (with fallback chain), and one language-filtered source for English plain-text names.
One-shot cold storage in SQLite. The entire Access DB is cloned in a single take into a local SQLite file, every column as TEXT (the safest defence against the inconsistent typing of the source), in 10k chunks via fetchmany. Benefit: from then on every analysis runs locally, any number of times, without ODBC overhead, the 1.2 GB millstone becomes a 400 MB read-only source decoupled from the migration pipeline.
Multi-table join as single source of truth. A central query bundles all the schema knowledge into one statement. The parts catalogue joins with the catalogue-to-model mapping table and the sales-model table (because maintenance catalogues are named differently internally than the end-product models), the figures table (group / subgroup / image file via COALESCE as a fallback chain), the language-filtered plain-text-name table, and a CTE that condenses all replacement-part relationships per part into a comma-separated list with GROUP_CONCAT. Defensive LEFT JOINs with IFNULL catch empty key fields, the target system needs a complete tuple per row, not a sparse one.
Self-healing part names. The raw data often had only the placeholder "PART", pure alphanumeric codes, or junk like "(OPTIONAL)" in the "part name" field, values any modern ERP / shop system would flag as garbage immediately. Solution: an in-RAM dictionary built from the English-language master-data table maps part numbers to correct plain-text names. Heuristics (min length 3, not pure code string, not "PART" placeholder) decide when to look up, on a hit, the name gets healed, otherwise the record is marked UNKNOWN_NAME_REQUIRES_CHECK and hard-filtered in a later stage. The healing step alone rescues tens of thousands of rows from the trash filter.
DjVu → JPG pipeline with multithreading. ~82,000 exploded-view drawings sat in the obscure DjVu format, a format the new system can't render and for which there's no modern standard library. Pipeline: ddjvu (DjVuLibre) converts each file into a temporary PDF, PyMuPDF (fitz) renders the first page at 150 DPI in greyscale as JPG (massive disk-space win without legibility loss), the temp PDF is deleted immediately. ThreadPoolExecutor runs with os.cpu_count() workers. Idempotent skip logic (existing targets get skipped) makes the run re-runnable, a crash at file 50,000 doesn't cost 50,000 reconversions.
System-aware hyperlinks. Instead of bare paths, every image cell is written as =HYPERLINK("…\<path>.jpg", "<name>.jpg"), a format the target system understands directly as a clickable image link. A recursive index of all JPGs (filename → relative path) resolves the images before write; not-found images get defensively flagged as MISSING_JPG: <stem> rather than silently polluting the column or aborting the migration.
Iterative cleanup stages with audit trail. Raw → v2 (hyperlinks injected) → Pristine_v2 (UNKNOWN names + manually blocked regional special variants removed) → Pristine_v3 (string cleanup: semicolons, leading special chars, double whitespace) → Final_Delivered (replacement-part numbers re-loaded from both master-data sources, deduplicated) → Perfect (three last problem rows with polluted SubGroup removed). Every stage is its own file and its own script, debuggable, reproducible, with a clear audit trail. When someone later asks "why isn't row X in the new system?", there's an answer.
Strict-mode validation against the target schema. A dedicated validator checks the final CSV against a 5-rule schema contract that mirrors 1:1 what the target system accepts: header (exactly 10 columns in fixed order), required fields (catalogue + part number set), no illegal control characters (regex \x00-\x1f), part-number format (no parentheses), part-name format (uppercase, no leading special character, no UNKNOWN_NAME), Image_File format (=HYPERLINK + .jpg or explicit MISSING_JPG). Result after several iterations: 1,473,210 rows, zero rule violations. Every violation lands in a log file with row number + catalogue + part number + reason, surgical, not "something is broken".
Trust by transparency. Two separate reports ship alongside the data. Coverage report: all 386 master catalogues from the source DB with mapping to sales model and row count in the final CSV, the client sees, per catalogue, whether and how many parts were migrated into the new system. Trash analysis: all 470,276 filtered-out rows with reason, missing English name, manually blocked regional special variant, part number under four characters, other cleanup filter. The client gets not just the migration but every single filter decision documented.

The diagram shows the full migration flow: from forensic credential recovery through the opened 1.2 GB MS Access source database, the SQLite clone, the central multi-table join query, and the parallel DjVu-to-JPG conversion pipeline, all the way to hyperlink injection and the strict-mode validation against the new target system's schema.

The result

Vendor lock-in lifted. A 1.2 GB legacy database the client had legally licensed but could no longer practically use was forensically opened, its schema structurally reconstructed, and the full dataset migrated into his ownership, without further dependence on the legacy maintenance tool.
1,473,210 migrated parts records across 339 sales models (from 386 master catalogues including n:1 mappings, where one catalogue covers several model variants), ready for ingestion into the new system.
~82,000 original exploded-view drawings converted from old DjVu format into modern JPG and referenced via =HYPERLINK, defensively flagged where conversion failed, rather than blocking delivery.
Zero rule violations in the final CSV against a 5-rule strict-mode validation contract, the target system's acceptance criterion was binary, the result is binary.
1.2 GB legacy MDB → 400 MB SQLite (cold storage) → 274 MB CSV (migration delivery). Full data sovereignty beyond the proprietary manufacturer tool, in a format every modern ERP, shop, or BI system understands.
Full audit trail over every single one of the 470,276 dropped raw rows plus over each of the 386 catalogues, not a black-box ETL but a migration contract the client can trace row by row.
Excel hard limit dodged. The CSV stays streamable, the new system ingests sequentially, nothing gets silently cut off at 1,048,576 rows the way a naive Excel re-save would have done.

Microsoft Shopping Feed Pipeline

Teske Systemtechnik — Tue, 16 Jun 2026 15:57:31 +0000

Fully automated daily sync of high-volume affiliate feeds (Connexity, Shopping24) into Microsoft Merchant Center, OOM-safe, chunked upload, 100 % compliant.

The challenge

Blender Networks Inc. runs a large price-comparison portal whose entire monetization is exclusively built on Microsoft Advertising Product Listing Ads (PLAs). The previous in-house feed solution was unstable and had to be replaced.

The complication came from the heterogeneous third-party sources: Connexity delivers zipped JSON bundles via an API index, Shopping24 (S24) provides a CSV master file plus fragment updates over FTP. Both formats had to be flawlessly translated into Microsoft Merchant Center's strictly defined TSV schema. Every format error, every "item drop" = direct revenue loss.

Additional difficulty: Connexity data routinely exceeds 2 GB per publisher account, the mapped output runs into double-digit gigabytes, and the full daily upload into Microsoft Merchant Center sits at around 200 GB. Naive in-memory processing was off the table, the pipeline had to stay OOM-safe even on modest hardware.

The approach

Three heterogeneous sources, one unified pipeline. The architecture follows a strict three-stage model, Ingest → Map → Upload, where each affiliate network sits behind the same interface but internally taps its own stack (see pipeline diagram below).

Streaming-first ingestion. A Python pipeline using ijson parses the zipped JSON bundles item-by-item directly off the stream instead of deserializing the whole document into RAM. Memory usage stays constant regardless of feed size.
Disk-backed deduplication. A SQLite table with PRAGMA tuning handles cross-account deduplication on disk. Multi-million-item feeds stay clean without blowing up the heap, the dedup state survives even on abort, ready for inspection.
Strict mapping to the MS spec. Every source record gets deterministically mapped onto the required id/title/description/link/image_link/price/... schema. Validation logic filters unusable brand strings (purely numeric, too long, too many words), checks GTINs for valid lengths (8/12/13/14) and sets identifier_exists consistently, no more "partial identifier" warnings.
15 GB chunking & chunk-aware upload. Mapping outputs are auto-split at 15 GB into _0.txt, _1.txt, … to match the Microsoft upload limit. The uploader detects these chunks via pattern matching and numbers them remotely correctly (TipDigest_US_0.txt, TipDigest_US_1.txt, …).
High-speed SFTP via LFTP. Instead of Python SFTP libraries (paramiko), the system LFTP binary is driven with enlarged TCP socket buffer and a reconnect strategy. Significantly faster and more stable than any Python implementation on multi-GB files.
Multi-account orchestration. Multiple Connexity publishers and multiple Merchant Center accounts (US/DE) are processed in parallel; each account writes to its own output path and is correctly routed via a store-mapping config.
Resilient daily sync. A cron lockfile prevents overlapping double-runs. Pipeline stages can be toggled individually (--skip-ingest, --skip-map, --skip-upload) for targeted re-runs after partial failures, without re-pulling the entire 2 GB download.
Proactive error handling & diagnostics. Hybrid logging (Rich console with emojis for humans + RotatingFileHandler for the machine), per-stage execution-report table, per-account isolation. A single truncated JSON stream doesn't stop the overall run, errors get logged locally and the run continues with the rest.
Modular architecture. Each affiliate network lives in its own module (ingest_*.py + mapper_*.py) behind a unified pipeline interface. A third source (Kelkoo) was added without touching Connexity or S24, proof the abstraction holds.
Acceptance criterion. The hard acceptance bar (5+ consecutive days of error-free automation) was passed on the first attempt, secured by clean module boundaries and consistent validation layering.

The diagram shows the full data flow: from heterogeneous source systems (zipped JSON streams, REST API, FTP dumps), through the OOM-safe ingest layer, the validation-driven mapping with 15 GB chunking, all the way to the chunk-aware LFTP upload into Microsoft Merchant Center.

The result

100 % feed compliance, no more disapprovals from format errors.
Zero errors in the daily sync, acceptance criterion passed first try.
ROI protected, no revenue loss from broken feed updates.
OOM-safe on multi-GB feeds, pipeline runs on modest hardware without memory pressure.
Modularly extensible, new affiliate networks integrate without touching the core.

Math Engine, eval()-free expression interpreter for Python

Teske Systemtechnik — Tue, 16 Jun 2026 15:57:16 +0000

A safe evaluation engine for mathematical expressions, built from scratch: tokenizer, recursive-descent parser, AST, linear equation solver and a type-safe output system, entirely without Python's eval(). Live on PyPI, 399 tests, 90% coverage, green across five Python versions.

The challenge

The obvious way to evaluate an expression like 3 + 4 * 2 in Python is a single line: eval("3 + 4 * 2"). That very line is the problem. eval() executes arbitrary Python code, a string disguised as numeric input such as __import__('os').system('rm -rf …') runs without complaint. For any application that takes expressions from a file, a form field, an API or a configuration string, eval() is therefore a direct code-execution vector, not a calculator.

The second, quieter defect is correctness. eval() and Python's float compute in binary: 0.1 + 0.2 yields 0.30000000000000004, 1/3 is truncated, large integers tip over into scientific notation. For a calculator, a financial formula or an educational context, that is not "almost right", it is wrong.

The third defect is diagnostics. Hand eval() a broken expression and you get a Python traceback at an internal line number, not the spot in the input string where the problem sits. For a tool that processes end-user input, that is useless.

The task, then: a complete evaluation engine from scratch that (1) never executes foreign code, (2) computes exactly rather than binary-approximately, (3) pinpoints every error to the exact character, and (4) does all of that at library quality, tested, documented, versioned and installable from PyPI. Not a weekend parser, but an engine with the discipline of a small compiler.

The implementation

eval()-free by construction

The entire library never calls Python's eval(), exec() or compile() anywhere, this is not an after-the-fact filter but the architecture itself. Input strings pass through a closed pipeline (Input → Tokenizer → Parser → Evaluator/Solver → Formatter → Output Converter), whose alphabet is a finite set of numbers, operators, parentheses and a whitelist of function names. At worst, an attacker-controlled string can trigger a typed MathError, never code execution. Even the single place that parses a user-supplied data structure uses the safe ast.literal_eval, which accepts literals only.

Recursive-descent parser with a 10-level precedence chain

Operator precedence is not hacked in via regex or a shunting-yard table, but encoded structurally as ten nested parser closures, each with exactly one precedence level: from parse_gleichung (=) through bitwise operators, shift operations, sum and term, down to parse_power () and parse_factor. Left- vs. right-associativity falls out of the structure: whatever consumes in a loop is left-associative (a - b - c = (a - b) - c); parse_power recurses to the right and makes ` correctly right-associative. A deliberate decision: ^` is bitwise XOR, not exponentiation, exactly as in C and Python.

Decimal precision with dynamic scaling

Every number is a decimal.Decimal from the tokenizer through to the output, never a float, which is why 0.1 + 0.2 is exactly 0.3. The precision of the Decimal context is determined anew for each calculation (between 100 and 10,000 digits, depending on the input), plus a hard input ceiling of 20,000 digits. The point: a long result is never silently truncated, a short one never wastes memory. Exactly the class of correctness that float-based calculators quietly lose here.

Character-exact error positioning

Alongside the token list, the tokenizer keeps a span list: for each token a (start_col, end_col, original_text) triple. Every AST node and every MathError carries position_start / position_end. The payoff: an error does not say "syntax error somewhere", it points at the exact character. This bookkeeping is the reason the engine is debuggable across an API. Via a single setting (readable_error), the same position info switches between two contracts: typed exceptions for the library, a visual diagnostic with a ^ pointer under the faulty column for the console.

Typed, catalogued error system

A base class MathError plus exactly seven domain subclasses, including a catalogue of 78 unique, four-digit error codes across nine families. The digits are structured: first digit = family, second = component, the rest = sequence number. Code 3008 therefore means "Calculator family, core parser, more than one '.' in a number". These codes are deliberately never renumbered, they are a contract toward the UI and external log parsers. The public calculate() function wraps the whole pipeline in a layered except block, so that no raw ZeroDivisionError or ValueError ever reaches the caller, everything lands typed in the MathError hierarchy.

More than a calculator

Two further capabilities sit on the same AST. If an expression contains an = and a variable, the engine solves the linear equation symbolically: each node returns a (factor, constant) pair, the solver brings both sides into the form A·x + B = C·x + D and computes x. Non-linearity is caught structurally (variable·variable, variable in the denominator, variable in the exponent), degenerate cases named cleanly ("No Solution", "Inf. Solutions"). On top of that, a programmer's-calculator mode with fixed word width (8/16/32/64 bit), two's complement and bitwise operators, so that 127 + 1 in 8-bit signed mode correctly overflows to -128. A prefix-driven output system (dec:, int:, hex: …) determines the Python return type and refuses lossy conversions instead of silently truncating.

Engineering highlights & test discipline

Reliability was not a feature here but the reason for being, a safe engine you cannot trust is useless.

399 pytest tests, 90% coverage. The suite was grown from 234 to 399 tests, coverage raised from 69% to 90%. A dedicated helper assert_error_location(expr, code, start, end) checks not only that an expression fails, but that it fails with the exact error code at the exact character position, the position data is itself part of the test contract.
CI matrix across five Python versions. GitHub Actions runs the full suite on every push and pull request against Python 3.8, 3.9, 3.10, 3.11 and 3.12; the coverage report goes to Codecov. Dead and work-in-progress code is honestly excluded from coverage rather than padding the number.
Clean layering, broken cycles. Clearly separated modules (calculator / utility / cli / plugins); circular imports are resolved via deliberately deferred imports. Every class and function carries a docstring, a standalone DOCUMENTATION.md captures the architecture, the full API, parser internals and the complete error-code catalogue.
Library quality on delivery. Pure-Python wheel, three console entry points, exactly two runtime dependencies (rich, prompt_toolkit). The interactive REPL offers persistent history and tab completion. Six minor releases (0.1.0 → 0.6.7) in roughly five months, throughout following Semantic Versioning.

The result

Live on PyPI as math-engine, installable via pip install math-engine, MIT-licensed, pure-Python wheel for Python 3.8+, with three console commands out of the box.
eval()-free by construction. Closed input alphabet, the worst case of a hostile input is a typed error, never code execution.
399 tests, 90% coverage, green across five Python versions (3.8–3.12) on every push, with test cases that pin exact error codes to exact character positions.
Exact Decimal arithmetic with adaptive precision (100 … 10,000 digits) and a 20,000-digit input limit, no silent float drift, no silent truncation.
Character-exact diagnostics: 78 error codes across nine families, an eight-class typed exception hierarchy, position_start / position_end on every error.
Roughly 4,200 LOC of production code in cleanly layered modules, backed by ~2,400 LOC of tests, plus full technical documentation and a catalogued error system.