Why I built codesize: enforcing function length limits with an AST

#automation #codequality #showdev #tooling

Every team I have worked on had some version of the rule: "keep functions short." It shows up in style guides, code review comments, and onboarding docs. It almost never shows up in CI.

When it does get automated, the tooling usually reaches for wc -l on the whole file. That is a rough proxy at best. A 300-line file might contain five short, readable functions and a big block of comments. A 150-line file might contain one function that does the work of three. File length and function length are different problems.

In the advent of AI, these constraints can be even more difficult to enforce.

I built codesize to check the thing that actually matters: how long each function is.

Why existing tools fall short

cloc and similar tools count lines of code, but at the file level. Linters can flag long functions, but they are language-specific — you need a different one for every language in a polyglot repo, each with its own config format and output schema.

What I wanted was a single binary that could scan a mixed-language project and produce a uniform report: which files exceed their limit, which functions exceed their limit, sorted by how far over the line they are.

Using tree-sitter to find function boundaries

codesize uses tree-sitter to parse each source file and walk the AST to find function boundaries. Instead of counting lines in a text buffer, it counts the lines that belong to each function from signature to closing brace.

This handles a number of cases that trip up simpler approaches: blank lines inside a function body, comments interleaved with code, string literals that happen to contain braces, and nested functions. Arrow functions in JavaScript and TypeScript are counted as functions. Constructors in Java are counted. Nested functions in Rust and Python are each counted independently.

Ten languages have built-in grammars: Rust, TypeScript, JavaScript, Python, Go, Java, C, C++, Swift, and Lua. For any other language you can add an extension mapping in config and still get file-level enforcement — you just won't get per-function analysis until a grammar is available.

What the output looks like

Results go to a CSV file (or stdout with --stdout). Six columns:

language	exception	function	codefile	lines	limit
Rust	function	build_report	src/scanner.rs	95	80
Python	file		src/legacy/monolith.py	450	300

The exception column is either file (the whole file is over the limit) or function (a specific function is). For file-level violations, function is empty. Rows are sorted by language, then by line count descending, so the worst offenders are at the top.

This format was a deliberate choice. CSV goes everywhere: spreadsheets, GitHub issue imports, Jira, Linear, a shell pipeline. The intent is not to fail a build on day one — it is to generate a list of violations you can work through over time, treating function length as technical debt to be retired gradually rather than a gate that blocks you immediately.

Configuration

Limits are per-language and fully configurable in a TOML file at ~/.config/codesize/config.toml:

[limits.Rust]
file     = 500
function = 80

[limits.Python]
function = 50   # leave file limit at the default 300

You can also add languages that have no built-in grammar:

[languages]
".rb" = "Ruby"

[limits.Ruby]
file     = 300
function = 30

When you are onboarding an existing codebase with a lot of violations, the --tolerance flag lets you start with headroom and tighten the limits over time:

# Report only functions more than 20% over the limit
codesize --tolerance 20 --gitignore

Once the backlog is clear, drop the tolerance and the limits become exact.

CI integration

For GitHub Actions, there is a companion action that installs and runs codesize with no setup beyond a checkout:

# .github/workflows/codesize.yml
name: Code size check
on: [push, pull_request]
jobs:
  codesize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: ChrisGVE/codesize-action@v1.0.0

The --fail flag makes codesize exit with status 1 when violations are found, which is what you want for a blocking CI check. Without it, the tool always exits 0 and just writes the report — useful for the gradual rollout approach.

Try it

Install via Homebrew (macOS and Linux):

brew install ChrisGVE/tap/codesize

Or from crates.io:

cargo install codesize

Shell completions for zsh, bash, and fish are included. When installed via Homebrew they are set up automatically; otherwise run codesize init <shell> to generate them.

Source, issues, and the full CLI reference are at github.com/ChrisGVE/codesize. The companion GitHub Action is at ChrisGVE/codesize-action.