DEV Community: billkhiz

I built a machine-readable UK Chart of Accounts for Python (because one didn't exist)

billkhiz — Wed, 01 Apr 2026 16:19:39 +0000

What it does
Quick example
Why VAT treatments matter
The LLM use case
HMRC box mappings

If you've ever tried to build accounting software for the UK market, you've hit the same wall I did: there's no clean, machine-readable UK Chart of Accounts available on PyPI.

US-centric ones exist. Plenty of them. But UK accounting has different categories, VAT treatments, and HMRC-specific codes that don't map neatly onto American standards.

So rejoice UK accountants, I built one.

What it does

uk-chart-of-accounts is a Python library with 166 standard UK nominal codes - the numbered category system every UK business uses to classify transactions. Each code includes:

Account type with double-entry rules (does a debit increase or decrease this account?)
VAT treatment (standard 20%, reduced 5%, zero-rated, exempt, or outside scope)
HMRC box mappings (CT600, VAT Return, FPS/RTI, EPS, CIS)
Tags for searchable grouping (motor, payroll, premises, etc.)
Descriptions on complex codes explaining the nuances

Zero dependencies. Pure Python. Works with Python 3.10+.

Quick example

from uk_coa import ChartOfAccounts, VatRate

coa = ChartOfAccounts()

# Look up any nominal code
account = coa.get(7602)
account.name          # "Accountancy Fees"
account.vat           # VatRate.STANDARD
account.vat_rate_pct  # 0.20
account.debit_increase  # True

# Search
coa.search("insurance")       # All accounts with "insurance" in the name
coa.by_vat(VatRate.EXEMPT)    # All VAT-exempt accounts
coa.by_tag("motor")           # All motor-related accounts

# Export for LLM prompts
context = coa.to_prompt_context()

Why VAT treatments matter

💡 Why VAT treatments matter

Getting this wrong means incorrect VAT returns. This library has every code's VAT treatment pre-set, including nuances like residential vs. commercial rent.

Different expenses have different treatments:

Treatment	Rate	Examples
Standard	20%	Most business expenses
Reduced	5%	Domestic energy
Zero-rated	0%	Books, children's clothes
Exempt	-	Insurance, bank charges, Royal Mail postage
Outside scope	-	Wages, taxes, depreciation

This is the part most non-UK developers get wrong. UK VAT isn't just "add 20%".

The LLM use case

If you're building AI-powered bookkeeping tools, the to_prompt_context() method formats the entire chart as structured text you can inject into an LLM prompt:

# Feed the chart to an LLM for transaction categorisation
context = coa.to_prompt_context(types=[AccountType.OVERHEAD])
prompt = f"""Given this chart of accounts:
{context}

Categorise this transaction: "TESCO 15.40 GBP"
"""

This gives the model the full code structure, names, and VAT treatments without you having to maintain prompt templates.

HMRC box mappings

Each code references the HMRC form and box it feeds into:

corp_tax = coa.get(2110)
corp_tax.hmrc_box  # "CT600 Box 86"

entertainment = coa.get(7403)
entertainment.hmrc_box  # "CT600 Box 46"
entertainment.description
# "VAT on business entertainment is blocked from input tax
#  recovery (HMRC VAT Notice 700/65). Disallowable for
#  corporation tax - must be added back on CT600."

Mappings cover CT600, VAT Return (Boxes 1-9), FPS/RTI, EPS, and CIS returns.

Install

pip install uk-chart-of-accounts

Background

I'm a finance professional who builds AI tools for UK accounting. This library came from extracting the reference data layer of a larger bookkeeping automation project. The codes, VAT treatments, and HMRC mappings are standard public knowledge - I've just packaged them in a way that's actually usable in code.

Star the repo or open a PR

billkhiz-bit / uk-chart-of-accounts

Machine-readable UK Chart of Accounts for Python. 166 nominal codes with VAT treatments and HMRC mappings.

uk-chart-of-accounts

Machine-readable UK Chart of Accounts for Python. 166 standard nominal codes with account types, VAT treatments, and HMRC box mappings.

Install

pip install uk-chart-of-accounts

Quick start

from uk_coa import ChartOfAccounts, AccountType, VatRate
coa = ChartOfAccounts()

# Look up by code
account = coa.get(7602)
account.name          # "Accountancy Fees"
account.type          # AccountType.OVERHEAD
account.vat           # VatRate.STANDARD
account.vat_rate_pct  # 0.20
account.debit_increase  # True (expenses increase on debit side)

# Search
coa.search("insurance")           # All accounts with "insurance" in the name
coa.by_type(AccountType.INCOME)   # All income accounts
coa.by_vat(VatRate.EXEMPT)        # All VAT-exempt accounts
coa.by_tag("motor")               # All motor-related accounts
coa.code_range(7000, 7012)        # Payroll overheads

# Convenience
coa.expenses()

…

View on GitHub

I built an AI bookkeeping agent that reached the AWS semifinals from 10,000+ entries

billkhiz — Mon, 30 Mar 2026 14:06:31 +0000

The architecture
The categorisation engine
Few-shot learning that actually improves over time
Handling real-world bank statements
Batched processing with concurrency control
Double-entry done right
What I learned
The numbers

Every month, I sit down with bank statements from multiple clients and manually assign each transaction to the correct nominal code — a process called transaction categorisation.

It takes hours. There are 166 standard UK nominal codes, five VAT rate categories, and endless edge cases. "AMAZON MARKETPLACE" could be office supplies, stock purchases, or a personal expense depending on the client. Multiply that across hundreds of transactions per client, per month, and you start to understand why 75% of CPAs are expected to retire in the next decade with fewer graduates replacing them.

So I built LedgerAgent - an AI-powered bookkeeping agent that categorises bank transactions automatically using Amazon Bedrock. It reached the semifinals of the AWS 10,000 AIdeas competition (top ~1,000 from over 10,000 entries) in the EMEA Commercial Solutions category.

Here it is in action:

Here's how it works under the hood.

The architecture

Stack: React 19 + Express + 8 AWS services (Bedrock, DynamoDB, S3, SQS, Lambda, API Gateway, EventBridge, Cognito)

LedgerAgent uses 8 AWS services working together:

Browser (React 19)
    │
    ├── Cognito JWT auth
    │
Express Server (port 3001)
    │
    ├── Amazon Bedrock ──── Claude 3.5 Haiku (categorisation)
    │                       Claude 3.5 Sonnet (receipt OCR)
    ├── DynamoDB ─────────── Client vault (transactions, learned patterns)
    ├── S3 ───────────────── File storage (uploads, receipts, backups)
    ├── SQS ──────────────── Async job queue (large batches)
    │     │
    │     └── Lambda ─────── Serverless batch processor
    │
    ├── API Gateway ──────── REST endpoint for job status
    └── EventBridge ──────── Daily DynamoDB → S3 backup

The frontend is React 19 with Vite and Tailwind. The backend is Express running on Node.js 20. All AI inference runs through Amazon Bedrock - Claude 3.5 Haiku for transaction categorisation (fast and cheap) and Claude 3.5 Sonnet for receipt OCR (multimodal image understanding).

The key design decision was using DynamoDB as a persistent "vault" for each client. Every accounting practice manages multiple clients, and each client has their own transaction history, confirmed categorisations, and learned patterns. DynamoDB's pay-per-request billing made this economical - so I'm not paying for idle capacity between categorisation runs.

The categorisation engine

The core of LedgerAgent is the chartOfAccounts.mjs service. It loads two data files at startup:

nominal_codes.json — 166 UK standard accounting codes (from 1001 Fixed Assets through to 9999 Suspense)
global_rules.json — 365 vendor-to-category mapping rules built from my experience coding thousands of real transactions

The system prompt establishes a UK bookkeeper persona with the full code reference. When a transaction comes in, the buildUserMessage function constructs the prompt:

// Conceptual flow — simplified
function buildUserMessage(transaction, confirmedExamples) {
  // 1. Transaction details (date, description, amount)
  // 2. Any previously confirmed categorisations for this client
  //    injected as few-shot examples
  // 3. Request structured JSON response with
  //    account_code, account_name, confidence, reasoning

  // The full prompt includes the 166 UK nominal codes
  // and 365 vendor-to-category rules as system context
}

Bedrock returns a structured JSON response with the nominal code, account name, a confidence level (high, medium, or low), and a reasoning string explaining the decision. The confidence scoring was essential - it tells me which transactions I can trust and which need manual review.

Few-shot learning that actually improves over time

This is the part I'm most proud of. When I review a categorisation and confirm it's correct (or manually correct it), that decision gets saved to the client's confirmedExamples array in DynamoDB:

// Conceptual flow — the key insight is per-client learning
// When a user confirms "AMAZON MARKETPLACE → 7502 Stationery",
// that decision is stored against the client in DynamoDB.
//
// Next time we categorise for that client, confirmed examples
// are injected into the prompt as few-shot context.
//
// Max 50 examples per client, deduplicated by description.
// This means a retail client and a tech consultancy categorise
// the same vendor differently — because their confirmed
// examples are different.

The next time I categorise transactions for that same client, those confirmed examples are injected into the Bedrock prompt as few-shot context. The model sees: "Last time you saw AMAZON MARKETPLACE for this client, it was coded to 7502 Stationery & Printing."

This creates a per-client learning loop. A retail client's Amazon purchases get categorised differently from a tech consultancy's Amazon purchases - because the confirmed examples are client-specific. After confirming 20-30 transactions, accuracy jumps noticeably because the model has real context about how this particular business operates.

Handling real-world bank statements

UK bank CSVs are a mess. Every bank uses different column names, different date formats, and different ways of representing debits and credits. The csvParser.mjs service handles this with intelligent column detection:

// Simplified from csvParser.mjs
function detectColumns(headers) {
  const map = {};
  headers.forEach((h, i) => {
    const lower = h.toLowerCase().trim();
    if (/date|trans.*date|posted|value.*date/.test(lower)) map.date = i;
    if (/description|narrative|details|memo|payee/.test(lower)) map.desc = i;
    if (/^amount$|^value$|^sum$|^total$/.test(lower)) map.amount = i;
    if (/debit|dr|money.*out|paid.*out/.test(lower)) map.debit = i;
    if (/credit|cr|money.*in|paid.*in/.test(lower)) map.credit = i;
  });
  return map;
}

It handles three different amount formats: a single amount column (negative for debits), separate debit and credit columns, and amounts with pound signs and comma formatting. This means I can upload a Lloyds statement, a Barclays statement, and an HSBC statement without any manual configuration.

Batched processing with concurrency control

For large bank statements (100+ transactions), hitting Bedrock sequentially would take minutes. LedgerAgent uses a parallel worker pool with concurrency of 3:

// Conceptual flow — concurrency-controlled batch processing
// Transactions are processed in parallel chunks (concurrency of 3)
// to balance speed against Bedrock rate limits.
//
// For batches over 100 transactions, the async pipeline kicks in:
// Express → SQS queue → Lambda picks up job → Bedrock AI → DynamoDB
// Frontend polls API Gateway for completion status.

For even larger batches, the async pipeline kicks in - transactions get sent to SQS, picked up by a Lambda function, processed against Bedrock, and results are written back to DynamoDB. The frontend polls for completion via API Gateway.

Double-entry done right

One thing that surprised me during development: most "AI bookkeeping" demos I've seen online produce a single-entry list of categorised transactions. That's not bookkeeping - it's just labelling. Real bookkeeping requires double-entry, where every transaction creates two ledger entries that must balance.

In LedgerAgent, the bank account (nominal code 1200) acts as the contra account for every transaction:

Transaction type	Bank account (1200)	Categorised account
Money out	Credit	Debit
Money in	Debit	Credit

The trial balance splits automatically at the code 4000 boundary - codes below 4000 go on the Balance Sheet (assets, liabilities, equity), codes 4000 and above go on the Profit & Loss (income, expenses). Total debits must always equal total credits.

This sounds basic to anyone with accounting training, but getting an AI system to consistently produce balanced double-entry output required careful prompt engineering and validation logic.

What I learned

Key takeaways:

Domain knowledge is the moat - not the AI wrapper
Few-shot learning beats fine-tuning when per-client variation is high
Confidence scoring changes the entire review workflow

Domain knowledge is the moat. The 166 nominal codes, 365 vendor rules, VAT rate handling, and double-entry logic aren't things you can prompt-engineer from scratch. They come from years of sitting with bank statements. Any developer can connect to Bedrock — few can tell you that a Deliveroo transaction for a sole trader should be coded to 7901 (Staff Welfare) not 7400 (Travel & Subsistence) unless it was a client entertainment expense, in which case it's 7601 (Entertaining).

Few-shot learning beats fine-tuning for this use case. I considered fine-tuning a model on accounting data, but the per-client variation is too high. A retail business and a tech consultancy categorise the same vendors completely differently. Dynamic few-shot context from confirmed examples handles this naturally.

Confidence scoring changes the workflow. Without confidence scores, you'd have to review every single categorisation. With them, I can filter to "low confidence" transactions and review only the 10-15% that genuinely need human judgement. The rest can be confirmed in bulk.

The numbers

166 UK nominal codes mapped
365 vendor-to-category rules
5,860 lines of code across 39 source files
8 AWS services integrated
Top ~1,000 from 10,000+ entries in AWS AIdeas

LedgerAgent is currently a tool I use for my own practice, but I'm planning to open it up to other small accountancy firms. If you're an accountant drowning in manual transaction categorisation, or a developer building fintech tools, I'd like to hear from you.

Connect with me on X/Twitter to discuss AI in Fintech!

If you're interested in the code or want to connect, check out the repository and my profile:

Check out my GitHub Profile

billkhizFollow

I make stuff

Built with React 19, Express, Amazon Bedrock (Claude 3.5 Haiku + Sonnet), DynamoDB, S3, SQS, Lambda, API Gateway, EventBridge, and Cognito.

I Applied Anthropic's Internal Skills Playbook to My Projects - Here's What Changed

billkhiz — Wed, 18 Mar 2026 17:25:36 +0000

@trq212 recently published "Lessons from Building Claude Code: How We Use Skills", Anthropic's internal playbook for how they build and use Skills in Claude Code.

I don't come from a software engineering background. I work in accounting, and I use Claude Code across projects ranging from helping with my bookkeeping automation, AWS Lambda (which I used for the AWS 10,000 Ideas competition) recently to backends to mobile appications (I was very excited to publish my first application to the Google and iOS stores :D). Skills sounded useful but I never knew how to implement them appropriately or how to structure them properly.

I spent an afternoon applying every recommendation from the article to my own setup across 5 active projects in Claude Code. The following is what I built, what worked, and I recommend you use for your own projects.

Let's start with this: Skills are "just markdown files"

They're surprisingly not. A skill is a folder. It can contain scripts, config files, reference data, templates, anything Claude might need.

Here's what my AWS debugging skill now looks like:

aws-debug/
SKILL.md: main instructions
config.json: which AWS profile to use per project
references/services.md: maps my projects to Lambda functions, log groups, common errors

The key thing here: Claude sees the SKILL.md when I say "check the logs." But it only opens references/services.md when it actually needs to look up a specific log group or service mapping. So it's only loaded when relevant.

The 6 skills that changed my workflow:

/gotcha, the skill that improves all other skills

This one is my favourite. Every time Claude makes a mistake, I can type:

/gotcha Claude forgot --profile flightmap

And it figures out which skill that belongs to, opens the file, and adds it to the Gotchas section. It's quick and simple.

The original article makes this point that the Gotchas section is the most valuable part of any skill. I completely agree. But the problem is nobody goes back and updates their skills after writing them. /gotcha fixes that.

/careful, on-demand production safety

An on-demand hook that blocks destructive commands (rm -rf, DROP TABLE, git push --force, AWS deletes) for the current session only.

I mentioned it before - I come from a non technical background so this is incredibly useful. Not ideal to have on all the time but something you should initiate before pushing anything live or anywhere near production.

/aws-debug, a debugging runbook

Instead of me manually going into CloudWatch every time something breaks, this skill walks Claude through a proper investigation. Checks the logs, look for cold start timeouts, missing env vars, permission errors, and then writes up a structured report.

I put a config.json in there with my AWS profiles for each project. That way Claude never forgets which --profile flag to use. That was literally one of the first gotchas I wrote into the skill because it kept getting it wrong.

/bookkeeping-verify, domain-specific verification

This is where it gets interesting for non-engineers. I categorise bank transactions for accounting clients, hundreds of them per company, into Sage 50 nominal codes. After I'm done categorising, this skill runs through everything and checks: did I miss any? Is the same payee showing up under different categories? Any duplicates? Any codes that don't exist?

There's a reference file in there (references/categories.md) with all the valid categories and a table of common mistakes. Things like PayPal fees ending up under General Expenses when they should be Bank Charges, or HMRC payments going to the wrong nominal code or account. Claude reads that file during verification but it's not loaded the rest of the time.

This is a good example of skills going beyond pure software engineering. If your work has any kind of quality checklist, you can turn it into a skill.

/explain, making code accessible

I built this one because I'm not from a technical background (have I mentioned this before?) and sometimes I Want Claude to explain what a piece of code does in plain English. Either using analogies or starting with the big picture before the details.

If you're learning as you go or working with code outside your comfort zone, something like this is worth building.

Enhanced /preflight with progressive disclosure

I already had a preflight skill that runs before every commit. But it was a single file trying to cover security, frontend, backend, and accessibility. Now it looks like:

preflight/
SKILL.md: the main orchestrator (reads sub-files as needed)
checks/security.md: detailed security checklist + project-specific notes
frontend.md: accessibility, performance, design consistency
backend.md: API safety, Lambda gotchas, Python patterns

Claude reads the main file and then only pulls in the relevant checks file based on what kind of project it's in. Working on a React app? It reads frontend.md. Working on a Lambda backend? It reads backend.md. The rest stays out of the way.

A few things I have since learned:

Write descriptions for Claude, not for you

The description field at the top of your SKILL.md isn't a summary. It's what Claude uses to decide whether to trigger the skill. So write it like a trigger condition.

Bad: "AWS debugging tool"

Good: "Debug AWS Lambda errors, API Gateway issues, or CloudWatch anomalies. Use when the user reports a Lambda failure, 5xx error, timeout, or says 'check the logs'"

Wire skills into your project CLAUDE.md

The skill knows how to do something. Your project's CLAUDE.md tells Claude when to do it.

In my bookkeeping project I added:
Run /bookkeeping-verify after categorising any company

In my AWS projects:
Run /careful before touching live AWS resources
Use /aws-debug when Lambda errors occur

The skills work in any project, but each project's CLAUDE.md gives them context.

Use config files for stuff that changes per project

If your skill needs something specific to you or your project (AWS profile, a channel name, a database), put a config.json in the skill folder. Claude reads it when the skill runs.

Global skills vs project overrides

I put reusable skills in ~/.claude/skills/ so they're available everywhere. Some projects also have their own skills in .claude/skills/ that override the global ones. Right now I've got 13 global skills and 6 project-level overrides across 4 projects.

If you're starting from scratch

Start with two skills: /preflight and /gotcha. Preflight stops you committing broken code. Gotcha captures mistakes so they don't happen twice. Build everything else from there.

Don't try to write a perfect skill on day one. Anthropic's own team says their best skills started as a few lines and one gotcha, and got better over time because people kept adding to them. That tracks with my experience too.

Use folders. The moment your skill has more than a couple of sections of reference material, split it into sub-files. Claude only reads what it needs.

Don't write stuff Claude already knows. Focus on things that are specific to you: your project requirements, your conventions, the edge cases that keep tripping you up.

Always launch Claude from your project directory. I know this sounds obvious and, to be honest, I still forget to do this myself. When Claude starts in the right place, it picks up your project CLAUDE.md and all your project-level skills. Start from the wrong directory and none of that loads instantly.

And if you're not an engineer, that's fine. If your work has repeatable processes or domain knowledge that Claude doesn't have by default, it works the same way. My bookkeeping verification skill has nothing to do with code. It's just a quality checklist for accounting data.

How long did this take?

One afternoon to set everything up. After that it's just /gotcha whenever something goes wrong. The skills keep getting better on their own.

My full setup

~/.claude/skills/ (13 global skills)

audit: weekly codebase audit
aws-debug: AWS debugging runbook (+config, +references)
bookkeeping-verify: transaction verification (+references)
careful: on-demand prod safety hook
docs: documentation generation
explain: plain English code explainer
gotcha: capture mistakes into skill gotchas
migrate: dependency upgrades
perf: performance analysis
preflight: pre-commit gates (+checks/ sub-files)
refactor: code cleanup
release: generic release pipeline
test: test generation

Plus 6 project-level overrides across 4 projects for specific release and preflight workflows.

Thank you very much to @trq212 for publishing the original article. Went from "I should probably sort out my skills at some point" to actually having a setup I am very happy with (for now :D).

What skills have you built? I would be keen to see what others are doing with this.

Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation

billkhiz — Mon, 16 Mar 2026 21:36:00 +0000

How I used the Gemini Live API with native audio, function calling, and Google Search grounding to build an AI agent that turns family conversations into illustrated timelines.

This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

The Problem

My grandmother came to London from Jamaica in the 1950s. She had stories about Brixton in the Windrush era, about what things cost, about neighbours and churches and dance halls. Most of those stories were never written down.

This is true for nearly every family. The stories exist in the memories of older generations - rich, vivid, emotional - but they're never preserved. Traditional approaches (family tree software, memoir-writing tools) feel like work. They require forms, dates, and data entry. Nobody wants to do that.

What if preserving family history was as easy as having a conversation?

The Solution: Heritage Keeper

Heritage Keeper is a voice-first AI agent built on the Gemini Live API. You simply talk about your memories, and the agent:

Listens via real-time audio streaming
Extracts names, dates, places, and relationships
Saves each memory as a timeline entry
Finds historical photographs from Wikimedia Commons
Builds a family tree from the people you mention
Grounds historical facts using Google Search
Adds context - cost of living, daily life, world events

No forms. No data entry. Just talk.

Architecture

The browser captures microphone audio as PCM 16-bit at 16kHz and streams it over a WebSocket to an Express server on Google Cloud Run. The server maintains a bidirectional session with the Gemini Live API using the Google GenAI SDK. Gemini responds with native audio (24kHz) and function calls.

The flow looks like this:

Browser (React 19) communicates via WebSocket with PCM audio and JSON messages to the Express Server on Cloud Run, which connects to the Gemini Live API (gemini-2.5-flash-native-audio). The agent has access to 5 function-calling tools, Google Search grounding, and the Wikimedia Commons API.

The Five Tools

I designed five function-calling tools that the agent uses autonomously:

save_story - Extracts year, title, summary, location, Then/Now descriptions, cost of living, daily life, events, and photo search queries
search_photos - Queries Wikimedia Commons for historical photographs with bitmap-only filtering
add_family_member - Adds a person to the family tree with generation number and relationship
get_family_tree - Retrieves the current tree (so the agent knows who's already been mentioned)
get_timeline - Retrieves saved stories (so the agent can reference previous memories)

The agent decides when to call each tool based on the conversation. When you say "my grandmother came to London in 1955", it calls save_story AND add_family_member AND search_photos - all autonomously.

Google Search Grounding

One of the most impactful additions was enabling Google Search grounding alongside function calling. This means when the agent generates historical facts about 1950s Brixton, it can verify them against Google Search results. The grounding sources are stored per story and displayed as clickable links - so users can verify the facts themselves.

This transforms AI-generated context from "maybe true" to "verifiably true."

Lessons Learned

1. Thought Parts Need Filtering

The gemini-2.5-flash-native-audio model includes internal reasoning ("thought" parts) in its responses. Without filtering, users see the model's chain-of-thought ("Interpreting 'Funny Bob'... I'm hesitant to categorise this..."). The fix was checking each response part and only forwarding actual responses, not internal reasoning. A small code change with massive UX impact.

2. The Message Format Varies

The SDK's onmessage callback can pass different message formats - a LiveServerMessage, a MessageEvent, or even a JSON string. My parser needed to handle all three cases, with a graceful fallback for raw audio binary data that would otherwise crash the JSON parser.

3. Cost of Living > Music Trivia

I initially included "popular music" and "film/TV" as cultural context. But for family heritage, knowing that "a house cost £2,500 and the weekly wage was £15" is far more powerful than knowing what song was number one. It grounds the story in lived reality.

4. Auto-Reconnect Is Essential for Live APIs

WebSocket connections to the Gemini Live API can drop (Cloud Run timeouts, network blips). Exponential backoff reconnection (1s, 2s, 4s) keeps the experience seamless.

5. Voice Commands for Family Trees

Users want to build family trees quickly by voice - "Bob is my father", "Elena is Bob's mother." The agent needed specific instructions to handle these short commands with just an add_family_member call, without trying to create a full story entry.

What's Next

Heritage Keeper is a prototype built for the Gemini Live Agent Challenge. The natural evolution is:

User accounts with Firestore persistence
Family collaboration - multiple members contributing to the same timeline
Genealogy API integration for data enrichment
Mobile app for recording stories on the go

The core insight remains: the best tool for preserving family history is a good conversation partner.

Heritage Keeper was built for the Gemini Live Agent Challenge 2026. Try it at heritage-keeper-87502328327.us-central1.run.app. View the source on GitHub.

DEV Community: billkhiz

I built a machine-readable UK Chart of Accounts for Python (because one didn't exist)

Table Of Contents

What it does

Quick example

Why VAT treatments matter

💡 Why VAT treatments matter

The LLM use case

HMRC box mappings

Install

Background

billkhiz-bit / uk-chart-of-accounts

Machine-readable UK Chart of Accounts for Python. 166 nominal codes with VAT treatments and HMRC mappings.

uk-chart-of-accounts

Install

Quick start

I built an AI bookkeeping agent that reached the AWS semifinals from 10,000+ entries

Table Of Contents

The architecture

The categorisation engine

Few-shot learning that actually improves over time

Handling real-world bank statements

Batched processing with concurrency control

Double-entry done right

What I learned

The numbers

billkhizFollow

I Applied Anthropic's Internal Skills Playbook to My Projects - Here's What Changed

Building Heritage Keeper: A Gemini Live Agent for Family Story Preservation

The Problem

The Solution: Heritage Keeper

Architecture

The Five Tools

Google Search Grounding

Lessons Learned

1. Thought Parts Need Filtering

2. The Message Format Varies

3. Cost of Living > Music Trivia

4. Auto-Reconnect Is Essential for Live APIs

5. Voice Commands for Family Trees

What's Next