Most code formatters want to own your style. They have opinions about brace placement, line length, trailing commas, and a hundred other things you never asked for. Sometimes you just want the indentation fixed. Tabs consistent, nesting correct, content untouched.
That is was the goal for Eshu and exactly what it does. It reads source code line by line, tracks nesting depth through a state machine, rewrites the leading whitespace, and leaves everything else alone. The entire engine is written in C, exposed to Perl through XS, and ships with a CLI that can check, diff, or fix files in place. The distribution also ships with a vim plugin as that is my editor of choice.
Eight languages, one tool
C | Perl | XS | XML | HTML | CSS | JavaScript | POD
Each language gets its own scanner with its own state machine. They share a common architecture which track depth, emit indentation and advance but each state machine actually handles the constructs that make the specific language awkward to indent.
Perl has heredocs, quoted constructs (qw(), qq{}, s///), and embedded
POD. JavaScript has template literals with nested ${} interpolation. XS
files switch between C and Perl conventions at the MODULE = boundary. HTML
has void elements and verbatim zones inside <pre> and <script>. Each of these needs specific handling, and getting any of them wrong means corrupting content visually.
Why C?
Indentation fixing is embarrassingly linear. You read a line, update state, emit the line with new leading whitespace, repeat. There is no tree to build, no AST to walk, no multipass resolution. A single-pass scanner in C processes a large codebase in milliseconds.
The engine is implemented entirely in standalone C header files -- ten of them, roughly 3,600 lines total. They have no Perl dependencies. No SV*, no croak(), no interpreter context. Just stdlib.h, string.h, and ctype.h. This means they can be reused from any C program or language that can bind them, not just my Perl XS modules.
include/
eshu.h Core types, config, buffer, language enum
eshu_c.h C scanner
eshu_pl.h Perl scanner (heredoc, regex, qw, POD)
eshu_xs.h XS dual-mode scanner
eshu_xml.h XML/HTML scanner
eshu_css.h CSS scanner
eshu_js.h JavaScript scanner (template literals)
eshu_pod.h POD scanner
eshu_file.h File I/O, directory walking, binary detection
eshu_diff.h Unified diff generation
How the scanner works
Every language scanner follows the same pattern. For each line of input:
pre-adjust depth --> emit indent --> copy content --> post-adjust depth
A closing brace on a line means you dedent before emitting that line. An opening brace means you indent the next line. The scanner maintains a state enum to know whether it is inside a string, a comment, a heredoc, a regex, or regular code. State transitions happen character by character within the scan function; depth changes happen at line boundaries.
The Perl scanner, for example, tracks 14 distinct states: regular code, double-quoted strings, single-quoted strings, regex, heredoc (both standard and indented), qw, qq, q, POD, line comments, and block comments. It detects heredoc terminators, remembers whether the variant is indented (<<~EOF), buffers the body verbatim, and resumes normal scanning after the terminator.
The hard parts
Perl: Is / division or regex?
The classic Perl parsing problem. Eshu tracks whether the previous meaningful token was a value (a variable, a closing bracket, a number) or an operator. If it was a value, / is division. Otherwise, it opens a regex. This is the same heuristic that syntax highlighters use, and it covers real-world code well.
XS: Two languages in one file
An XS file is C code at the top and a Perl/C hybrid below MODULE =. Eshu detects the boundary and switches scanners. Below the boundary, it tracks XSUB blocks (each new function declaration resets depth), labels like CODE:, OUTPUT:, and INIT:, and special cases like BOOT: sections that use shallower indentation.
JavaScript: Template literals with nested interpolation
A backtick string in JavaScript can contain ${expr}, and that expression can contain braces, function calls, even another template literal. Eshu maintains a depth counter for interpolation braces so it knows when the } closes the interpolation versus when it closes a block inside the interpolation.
HTML: Script blocks need JavaScript rules
Content inside <script> tags is JavaScript, not HTML. Eshu collects the entire script block, passes it through the JavaScript scanner for reindentation, then splices it back at the correct HTML depth. The same applies to recognising void elements (<br>, <img>, etc.) that should not increase nesting depth.
The CLI
Eshu ships with a command-line tool that supports the three modes you actually need:
# Preview what would change
eshu --diff lib/
# Check in CI (exit 1 if anything needs fixing)
eshu --check lib/ t/
# Fix in place
eshu --fix lib/
By default, nothing is modified. You have to explicitly ask for --fix. Language is detected from file extensions, but can be overridden with --lang. You can filter files with --exclude and --include (regex patterns), choose tabs or spaces, set the indent width, and even restrict processing to a line range within a file.
Directory processing is recursive by default, skips binary files (detected by sampling the first 8KB for NUL bytes), respects a 1MB size limit, and follows file symlinks but not directory symlinks.
The Perl API
Everything the CLI does is available programmatically:
use Eshu;
# Fix a string
my $fixed = Eshu->indent_pl($source, spaces => 4);
# Auto-detect language and process
my $result = Eshu->indent_file('lib/App.pm',
fix => 1,
diff => 1,
);
say $result->{diff} if $result->{status} eq 'changed';
# Process an entire directory
my $report = Eshu->indent_dir('lib/',
fix => 1,
recursive => 1,
exclude => [qr/\.bak$/],
);
say "$report->{files_changed} files fixed";
Each language also has a direct method: indent_c, indent_xs, indent_xml, indent_html, indent_css, indent_js, indent_pod.
Idempotent by design
Running Eshu twice produces the same result as running it once. This is not just a goal, it is tested. The test suite includes real-world Perl example, verifies that processing them does not crash, and asserts that a second pass produces identical output.
This matters for CI integration. If eshu --check passes, you know that running eshu --fix would be a no-op. There is no oscillation, no cascading reformats, no "fix the fix" loops.
What it does not do
Eshu does not reformat code. It does not move braces, break long lines, sort imports, add or remove semicolons, or have opinions about blank lines. It touches leading whitespace and nothing else. Diffs are clean: every changed line shows only the whitespace prefix changing.
This is a deliberate constraint. A tool that only fixes indentation is a tool you can run on any codebase without fear. It will not start a style war. It will not produce a 10,000 line diff that buries your actual changes. It will make the nesting visible and get out of the way.
Vim integration
Eshu ships with a Vim plugin in the distribution. It pipes the current buffer through eshu and replaces the content in place, with automatic language detection and cursor position preservation.
Installation
The easiest way with Vim 8+ native packages:
mkdir -p ~/.vim/pack/eshu/start
ln -s /path/to/Eshu/vim ~/.vim/pack/eshu/start/eshu
For Neovim:
mkdir -p ~/.local/share/nvim/site/pack/eshu/start
ln -s /path/to/Eshu/vim ~/.local/share/nvim/site/pack/eshu/start/eshu
Or if you prefer vim-plug:
Plug '/path/to/Eshu/vim'
Or just source it directly in your .vimrc:
source /path/to/Eshu/vim/plugin/eshu.vim
Usage
Once loaded, you get two commands and a default keybinding:
| Command | Mode | What it does |
|---|---|---|
:EshuFix |
Normal | Fix indentation for the entire file |
:EshuFixRange |
Visual | Fix indentation for the selected lines |
\ef |
Normal | Fix entire file (default <Leader>ef mapping) |
\ef |
Visual | Fix selected lines (default <Leader>ef mapping) |
The plugin detects the language from the file extension -- .pm and .pl map to Perl, .xs to XS, .html to HTML, and so on. If the eshu binary is not in your $PATH, point the plugin at it:
let g:eshu_cmd = '/path/to/Eshu/bin/eshu'
To disable the default mappings and use your own:
let g:eshu_no_mappings = 1
nnoremap <silent> <F6> :EshuFix<CR>
vnoremap <silent> <F6> :EshuFixRange<CR>
Eshu is available on CPAN under the Artistic License 2.0.
Top comments (0)