git log --pretty is a tiny DSL: building a zero-dep CHANGELOG generator
A ~400-line TypeScript CLI that turns conventional commits into a CHANGELOG.md fragment. No runtime dependencies — the only thing it needs is
gitonPATH. Along the way, a slightly underappreciated feature ofgit log: its--pretty=format:string is a tiny DSL that you can safely parse from any language.
Every time I start a new project I hit the same branch point. Do I reach for standard-version? semantic-release? changesets? They all work. They all do more than I want. They create tags, bump versions, publish to npm, open PRs, hook into CI, and drag in a tree of dependencies to do it. What I actually need, 90% of the time, is:
"Give me the markdown for everything between the last tag and HEAD so I can paste it into a GitHub release or a CHANGELOG."
That's it. No version math. No publish. No PR bot. Just the bullet list.
So I wrote changelog-gen. It's one binary, zero runtime npm deps, and about 400 lines of TypeScript. This article is the design walkthrough and the thing I wish I'd understood earlier: git log --pretty=format: is a small DSL, and shelling out to git is almost always cleaner than pulling in a git binding.
📦 GitHub: https://github.com/sen-ltd/changelog-gen
The shape of the problem
Conventional commits (conventionalcommits.org) define a tiny grammar on the commit subject line:
<type>(<optional scope>)<optional !>: <description>
Examples:
feat: add greeting command
fix(parser): handle empty input
feat(api)!: rename v2 endpoints
docs: clarify --since default
The ! marks a breaking change. A body footer like BREAKING CHANGE: ... also marks it as breaking. And then there are footer trailers like Closes #42, Fixes GH-7 that link the commit to an issue.
A changelog generator has exactly three jobs:
- Read the commits in a range from git.
- Parse the conventional grammar on each subject.
- Group them into sections (Features, Bug Fixes, Breaking Changes, …) and render.
None of those are hard. The interesting question is how you do step 1 — and whether you pull in a library or shell out to the git binary.
Why not simple-git / nodegit / isomorphic-git?
A few reasons I talked myself out of each.
-
nodegitis native bindings to libgit2. Installs are painful (prebuilt or compile), and it's a tree of deps the minute younpm installit. -
simple-gitwraps the git CLI but still pulls in transitive dependencies, and you're paying for abstraction you don't need. -
isomorphic-gitis a real JS implementation of git. It's impressive, but it's also hundreds of kilobytes of JavaScript to do something thegitbinary already does perfectly.
For a CLI whose entire point is "be a small thing you can alias to npm run changelog", adding "dependencies": {...} undermines the pitch. And every user already has git on PATH — if they didn't, they couldn't have made the commits in the first place.
So: shell out.
Interlude: git log --pretty as a DSL
Here's the thing I didn't fully appreciate until I wrote this tool. The --pretty=format: string that git accepts is a real little DSL. You write a format string with %-prefixed tokens, and git emits exactly those fields for each commit, in your chosen order, with whatever literal text you interleave.
The tokens you care about for a changelog:
| Token | Meaning |
|---|---|
%H |
Full commit hash |
%h |
Abbreviated hash |
%an |
Author name |
%s |
Subject (first line) |
%b |
Body (everything after) |
%cn |
Committer name |
%ci |
Committer date ISO-8601 |
So --pretty=format:"%H %s" produces one line per commit with <full-hash> <subject>. Easy.
The snag is that %b can contain newlines. And %s can, in weird repos, contain quote characters. You can't just split on \n to get records, because any multi-line body will break your parser.
The fix is to pick your own separators — ones that basically never appear in real commit messages — and use them to delimit fields and records. I went with two ASCII control characters that the Unicode standard explicitly reserves for this purpose:
-
\x1f— Unit Separator — between fields within a record. -
\x1e— Record Separator — between records.
So my format string becomes:
const FIELD_SEP = '\x1f';
const RECORD_SEP = '\x1e';
const PRETTY_FORMAT =
`%H${FIELD_SEP}%h${FIELD_SEP}%an${FIELD_SEP}%s${FIELD_SEP}%b${RECORD_SEP}`;
And the invocation is a boring spawn:
import { spawn } from 'node:child_process';
export async function runGitLog(opts: GitLogOptions) {
const range = opts.since ? `${opts.since}..${opts.until}` : opts.until;
const args = [
'log',
`--pretty=format:${PRETTY_FORMAT}`,
'--no-color',
range,
];
const stdout = await run('git', args, opts.cwd);
return { commits: parseLog(stdout) };
}
Parsing is .split(RECORD_SEP) then .split(FIELD_SEP) on each chunk. No state machine, no stream parser, nothing. The only real footgun is that git log appends a trailing \n after each record, so the last record ends with \x1e\n — strip that before splitting or you'll get an empty trailing record.
export function parseLog(stdout: string): RawCommit[] {
const trimmed = stdout.replace(new RegExp(`${RECORD_SEP}\\n?$`), '');
if (trimmed.length === 0) return [];
const records = trimmed.split(new RegExp(`${RECORD_SEP}\\n?`));
return records
.filter((r) => r.length > 0)
.map((r) => {
const [hash, shortHash, author, subject, body] = r.split(FIELD_SEP);
return { hash, shortHash, author, subject, body };
});
}
That's the entire "git integration" layer. One spawn, one format string, two splits. No deps.
Parsing the conventional grammar
The parser is a single regex on the subject plus a scan of the body for BREAKING CHANGE: and issue references.
const HEADER_RE =
/^(?<type>[a-zA-Z]+)(?:\((?<scope>[^)]+)\))?(?<bang>!)?: (?<desc>.+)$/;
const BREAKING_RE =
/(?:^|\n)BREAKING[- ]CHANGE:\s*(.+?)(?:\n\n|$)/s;
const REF_RE =
/(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?|ref[s]?)\s+([^\s,.]+#\d+|#\d+|GH-\d+)/gi;
That's 80% of the work, and it handles every commit I've ever written. The interesting choices:
-
The header regex is anchored and greedy on the description, which means
feat: add colon: to outputparses correctly —type=feat,desc=add colon: to output. -
Unknown types fall back to
other, not to an error. If someone's commit sayswip: half done, we'd rather surface it under "Other" than crash. -
Breaking detection has two paths: the
!in the header or aBREAKING CHANGE:footer in the body. Both are legitimate per the spec and real projects use both. -
Issue references are deduplicated. If a commit says "closes GH-10 and also fixes #10" (weirdly common in squash-merged PRs), both normalize to
#10and get counted once.
Grouping is where opinion lives
Once you have parsed commits, the grouper decides which types go to which section — and, critically, which types get dropped. This is where I disagree the most with standard-version defaults, so this is where I get to have taste:
const DROPPED_TYPES = new Set(['chore', 'ci', 'build', 'style', 'test', 'refactor']);
A release reader does not care that you bumped a GitHub Action, reformatted with Prettier, or added 30 tests. Those commits are real work and they belong in history, but they don't belong in the changelog. If you don't agree — fine, delete two lines.
The other opinion: breaking commits go in the Breaking Changes section AND in their native section. If feat!: rename v2 endpoints is a breaking change, it's still a feature, and a reader scrolling through "Features" shouldn't miss it because it happens to be breaking. That's one of the annoyances I have with the tools that try to be clever here.
Output formats
Three formatters: markdown, json, plain. The JSON one exists specifically to be piped into further tooling — imagine a release script that wants to pull out the breaking changes and the issue refs separately:
changelog-gen --format json | jq '.sections[] | select(.id == "breaking")'
The plain formatter is for email and chatops where markdown asterisks look like garbage.
Here's the core markdown renderer, roughly:
function formatMarkdown(sections: Section[], opts: FormatOptions): string {
if (sections.length === 0) {
return '## Changelog\n\n_No notable changes._\n';
}
const out: string[] = ['## Changelog\n'];
for (const section of sections) {
out.push(`### ${section.title}\n`);
for (const c of section.commits) {
const scope = c.scope ? `**${c.scope}:** ` : '';
const refs = c.refs.length > 0 ? ' (' + c.refs.join(', ') + ')' : '';
out.push(`- ${scope}${c.description} (${c.shortHash})${refs}`);
}
out.push('');
}
return out.join('\n').replace(/\n+$/, '\n');
}
Short hash in parens, scope as a bold prefix, issue refs after, author optional. That's the format I'd write by hand if I were composing a release note, which is the bar: could I paste this into GitHub Releases and not cringe?
Tradeoffs I'm comfortable with
Things changelog-gen does not do, on purpose:
-
No version bump. It does not read or write
package.json. If you want SemVer inference from commits, that'ssemantic-release's specialty — use it. - No tag creation. Tag when and how you want. A lot of teams want humans in that loop anyway.
- No Co-Authored-By trailer parsing. Authors come from the commit author, not from trailers. Adding trailer parsing is a 10-line change I'll do if someone opens an issue, but YAGNI until then.
- No streaming for huge ranges. Everything is held in memory. For the "since last tag" ranges this tool is designed for, you'd need a 100k-commit release to notice.
Things it does well:
-
Zero runtime deps. The only dependency is the
gitbinary, which is obviously already there. - Deterministic output. Given the same range, you get bit-identical markdown.
-
Tests against real git. The test suite creates temporary git repositories with
child_process, makes real commits, and asserts on the real output. No mocked git layer — ifgit logchanges tomorrow, the tests will tell me.
Try it in 30 seconds
docker build -t changelog-gen https://github.com/sen-ltd/changelog-gen.git
docker run --rm -v "$PWD":/work -w /work changelog-gen --since v1.0.0
The image is node:20-alpine with git installed, ~150 MB total. Mount your repo, pass a --since, get markdown on stdout.
Or if you'd rather not deal with Docker:
git clone https://github.com/sen-ltd/changelog-gen
cd changelog-gen && npm install && npm run build
node dist/main.js --help
The meta-point
I've been building a lot of small CLIs recently and the pattern that keeps winning is the same: shell out to the obvious underlying binary, parse its structured output, do the interesting work in your language. The underlying binary (git, tar, jq, ffmpeg) has been battle-tested for decades. Your code gets to be 300 lines instead of 3,000. Users don't need a native build step. You can write honest tests against real data. And on install day, nobody has to wonder whether npm rebuild is going to take four minutes.
git log --pretty is a particularly nice DSL because git has been careful about the format tokens — they're stable across versions, documented in git help log, and portable. If you've only ever used git log for the default pretty output, try git log --pretty=format:'%h %s' once. The whole file history suddenly becomes structured data you can pipe.
The whole thing is ~400 lines of TypeScript, 38 tests, a 150 MB Docker image, and zero npm runtime dependencies. Alias it to npm run changelog, paste the output into your next release, move on.

Top comments (0)