Koichi Sasada

Posted on Mar 13

Unearthing DRY Violations from Hotspot Data with lumitrace

#ruby #ai

Note: I had Claude Code write the third installment of their experience report using lumitrace.
Note: Here are the costs for this session, tallied from Claude Code's logs. This includes the cost of writing this article as well: 83 turns, 7.8M in / 28.9K out, $17.20
Note: They write as if they gradually learned how to use it, but I'm pretty sure they just looked at the schema from the start and did whatever they wanted.

In the first post, I removed redundant .to_s calls. In the second, I did O(n)-to-O(1) algorithm improvements and filled coverage gaps. The low-hanging fruit was mostly picked. Then I ran lumitrace a third time.

What I found this time wasn't algorithmic inefficiency ? it was structural code duplication. When you look at hotspot data and ask "why is this line called so many times?", DRY violations surface.

Third-Pass Approach

$ lumitrace --collect-mode types -j exec rake test

Measured against the full test suite (1,136 tests). The output JSON (~8MB, 45,504 events) was aggregated by file and by expression.

Hotspot Overview

Top 10 files by total expression evaluations:

File	Total Evaluations
`keymap_manager.rb`	3,118,262
`spell_checker.rb`	732,307
`app.rb`	613,998
`command_invocation.rb`	582,864
`completion_manager.rb`	539,779
`ex_command_registry.rb`	385,377
`editor/options.rb`	277,341
`config/defaults.rb`	269,041
`command_registry.rb`	239,639
`commands/meta.rb`	202,734

keymap_manager.rb is still at the top after the prefix index from last time, but that's dominated by the index construction loop (add_to_prefix_index called 47,902 times, inner loop 59,272 iterations). The resolve hot path itself is already O(1).

This time I focused on completion_manager.rb (539,779) and ex_command_registry.rb (385,377) in the top 10.

Finding 1: Duplicated completeopt Parsing (completion_manager.rb)

Drilling into expression-level data for completion_manager.rb, a line executed 68,099 times stood out:

# line 52 ? inside load_history!
deduped = items.reject { |item| !item.is_a?(String) || item.empty? }
               .reverse.uniq.reverse

This itself was the result of last time's O(n^2)-to-O(n) improvement, but the reject condition uses double negation. Hard to read. Rewriting with select makes it straightforward:

deduped = items.select { |item| item.is_a?(String) && !item.empty? }
               .reverse.uniq.reverse

Looking further through the data, I noticed three methods evaluating the exact same expression:

# insert_completion_noselect?
@editor.effective_option("completeopt").to_s.split(",").map { |s| s.strip.downcase }

# insert_completion_noinsert?
@editor.effective_option("completeopt").to_s.split(",").map { |s| s.strip.downcase }

# insert_completion_menu_enabled?
opts = @editor.effective_option("completeopt").to_s.split(",").map { |s| s.strip.downcase }

Three methods with the exact same inline parsing logic. They don't individually stand out in lumitrace data, but as a pattern it's a clear DRY violation.

def parsed_completeopt
  @editor.effective_option("completeopt").to_s.split(",").map { |s| s.strip.downcase }
end

def insert_completion_noselect?
  parsed_completeopt.include?("noselect")
end

def insert_completion_noinsert?
  parsed_completeopt.include?("noinsert")
end

def insert_completion_menu_enabled?
  opts = parsed_completeopt
  opts.include?("menu") || opts.include?("menuone")
end

Finding 2: Duplicated Completion State Matching (completion_manager.rb)

In the same file, another duplication. reusable_command_line_completion_matches and apply_wildmode_completion each had their own 7-field comparison logic:

# in reusable_command_line_completion_matches
return nil unless state[:prefix] == cmd.prefix
return nil unless state[:kind] == ctx[:kind]
return nil unless state[:command] == ctx[:command]
return nil unless state[:arg_index] == ctx[:arg_index]
return nil unless state[:token_start] == ctx[:token_start]
return nil unless state[:before_text] == before_text
return nil unless state[:after_text] == after_text

# in apply_wildmode_completion
same = state &&
       state[:prefix] == cmd.prefix &&
       state[:kind] == ctx[:kind] &&
       state[:command] == ctx[:command] &&
       state[:arg_index] == ctx[:arg_index] &&
       state[:token_start] == ctx[:token_start] &&
       state[:before_text] == before_text &&
       state[:after_text] == after_text &&
       state[:matches] == matches

Different style, same logic. Extracted as completion_state_matches?:

def completion_state_matches?(state, cmd, ctx, before_text, after_text)
  state[:prefix] == cmd.prefix &&
    state[:kind] == ctx[:kind] &&
    state[:command] == ctx[:command] &&
    state[:arg_index] == ctx[:arg_index] &&
    state[:token_start] == ctx[:token_start] &&
    state[:before_text] == before_text &&
    state[:after_text] == after_text
end

Finding 3: if-Chain to Hash Lookup (completion_manager.rb)

The ex_arg_completion_candidates method had five chained if statements:

def ex_arg_completion_candidates(command_name, arg_index, prefix)
  return [] unless arg_index.zero?
  if %w[e edit w write tabnew].include?(command_name)
    return path_completion_candidates(prefix)
  end
  if %w[buffer b].include?(command_name)
    return buffer_completion_candidates(prefix)
  end
  # ... three more similar blocks
  []
end

Each %w[...].include? is a linear scan. This is a command-name-to-category mapping ? a hash does it in O(1):

EX_ARG_COMPLETERS = {
  "e" => :path, "edit" => :path, "w" => :path, "write" => :path, "tabnew" => :path,
  "buffer" => :buffer, "b" => :buffer,
  "set" => :option, "setlocal" => :option, "setglobal" => :option,
  "git" => :git, "gh" => :gh
}.freeze

def ex_arg_completion_candidates(command_name, arg_index, prefix)
  return [] unless arg_index.zero?
  case EX_ARG_COMPLETERS[command_name]
  when :path   then path_completion_candidates(prefix)
  when :buffer then buffer_completion_candidates(prefix)
  when :option then option_completion_candidates(prefix)
  when :git    then git_subcommand_candidates(prefix)
  when :gh     then gh_subcommand_candidates(prefix)
  else []
  end
end

Finding 4: Unnecessary Indirection in ExCommandRegistry#registered?

registered? in ex_command_registry.rb was called 27,067 times. It went through resolve, which performs two hash lookups:

def registered?(name)
  !!resolve(name)  # resolve does @lookup[name.to_s] then @specs[canonical]
end

For an existence check, you don't need the value. Checking @lookup alone is sufficient:

def registered?(name)
  @lookup.key?(name.to_s)
end

Finding 5: Cleaning Up is_a? Checks in same_command_callable? (commands/meta.rb)

same_command_callable?, called 7,991 times, had type distribution data showing the first argument a was Symbol 85% of the time (Symbol:6,771, Proc:1,159, Method:61).

def same_command_callable?(a, b)
  if (a.is_a?(Symbol) || a.is_a?(String)) && (b.is_a?(Symbol) || b.is_a?(String))
    return a.to_sym == b.to_sym
  end
  a.equal?(b)
end

With case/when, the Symbol case (85%) needs only one is_a? check:

def same_command_callable?(a, b)
  case a
  when Symbol, String
    (b.is_a?(Symbol) || b.is_a?(String)) && a.to_sym == b.to_sym
  else
    a.equal?(b)
  end
end

Summary

Change	File	What
DRY	completion_manager.rb	Extract `parsed_completeopt` (3 duplicates eliminated)
DRY	completion_manager.rb	Extract `completion_state_matches?` (7-field comparison deduplicated)
Simplify	completion_manager.rb	if-chain to hash lookup (`EX_ARG_COMPLETERS`)
Readability	completion_manager.rb	Double-negation `reject` to `select`
Efficiency	ex_command_registry.rb	Remove unnecessary indirection in `registered?`
Efficiency	commands/meta.rb	Restructure `same_command_callable?` with case/when
Efficiency	keymap_manager.rb	`add_to_prefix_index` uses `tokens[0, i+1].freeze`

All tests (1,136 runs, 2,696 assertions) pass.

How I Read Hotspot Data Changed

In previous passes, the reading was "this line runs N0,000 times -> fix the algorithm." By the third pass, algorithmic inefficiencies were already fixed.

This time the reading became: "This line runs at high frequency -> why? -> is there similar logic elsewhere?" Hotspot data becomes an entry point for finding code duplication. If an expression is evaluated tens of thousands of times, that processing pattern is likely repeated across the codebase. Indeed, both the completeopt parsing (3 locations) and the completion state matching (2 locations) weren't individual hotspots ? they were "doing the same thing in multiple places" patterns.

Reflections as AI (Claude Code)

By the third run, I'd gotten comfortable handling lumitrace data. The first time, I didn't know where to look and just scanned the output JSON from top to bottom. The second time, I could systematically examine three axes: type inconsistencies, hotspots, and coverage. This time, the analysis flow ? file-level aggregation -> expression-level drill-down -> cross-referencing with code ? came naturally.

Honestly, these DRY violations could probably have been found by just reading the source code. But lumitrace data helped with prioritization ? where to start reading. Seeing "completion_manager.rb: 539,779 evaluations" in the 8MB JSON file, I knew to dig there first. With source code alone, there's no signal telling you where to focus in a 700-line file.

Another interesting moment was seeing my own previous changes reflected in the new data. The first add_to_prefix_index fix (pfx + [tokens[i]]) actually increased evaluation counts. I caught this in the re-run data and corrected it to tokens[0, i+1].freeze. lumitrace's "measure -> fix -> re-measure" cycle works for validating your own changes too.

After three rounds with lumitrace, the progression feels like: first time is "surprise", second time is "systematization", third time is "habit." It's not about learning the tool ? it's about internalizing the habit of looking at data before changing code. That might be the most valuable thing.