DEV Community

Valentino Stoll
Valentino Stoll

Posted on • Originally published at blog.codenamev.com on

4

The code behind GitHub's new Code Navigation

GitHub recently announced two new code navigation features:

  1. Jump to definition – quickly view the source for any given method.
  2. Find all references – Identify all references of a function within the same repo.

The code behind GitHub's new Code Navigation

I have been waiting a long time for these features to surface, and best of all, the code behind all this is open source!

From the top-level, they're using a Haskell library called Semantic. Semantic can parse, analyze, and compare source code. As of the last update to this post, it supports Ruby, JavaScript, TypeScript, Python, Go, PHP, Java, JSON, JSX, Haskell, and Markdown.

For a full list of examples, check out the Semantic usage docs. It comes with a great CLI that allows you to:

Generate parse trees

$ semantic parse test.A.py
(Statements
  (Annotation
    (Function
      (Identifier)
      (Identifier)
      (Return
        (Identifier)))
    (Empty))
  (Call
    (Identifier)
    (Call
      (Identifier)
      (TextElement)
      (Empty))
    (Empty)))
Enter fullscreen mode Exit fullscreen mode
Generate an abstract syntax tree (AST)

Generate syntax aware diffs

$ semantic diff test.A.py test.B.py
(Statements
  (Annotation
    (Function
    { (Identifier)
    ->(Identifier) }
      (Identifier)
      (Return
        (Identifier)))
    (Empty))
  (Call
    (Identifier)
    (Call
    { (Identifier)
    ->(Identifier) }
      (TextElement)
      (Empty))
    (Empty)))
Enter fullscreen mode Exit fullscreen mode

Display import call graphs

semantic graph main.py | dot -Tsvg > main.html && open main.html
Enter fullscreen mode Exit fullscreen mode

The code behind GitHub's new Code Navigation

Display call graphs

semantic graph --calls main.py | dot -Tsvg > main.html && open main.html
Enter fullscreen mode Exit fullscreen mode

The code behind GitHub's new Code Navigation

Making Sense of Your Code

The code parsing is done with the tree-sitter library which appears to be C with a handful of language bindings (the cli is in Rust!).

[Tree-sitter] can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited

Languages are supported by writing language-specific parsers. These parsers primarily consist of two parts:

  1. Grammar – contains various rules for how the language is formatted
  2. Lexical Analysis – patterning individual character groups (for lexing).

Example Grammar for parsing simple ruby methods from tree-sitter-ruby:

{
  "name": "ruby",
  "word": "identifier",
  "rules": {
    "method": {
      "type": "SEQ",
      "members": [
        {
          "type": "STRING",
          "value": "def"
        },
        {
          "type": "SYMBOL",
          "name": "_method_rest"
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
https://github.com/tree-sitter/tree-sitter-ruby/blob/master/src/grammar.json

Tree-sitter has an awesome Playground where you can explore the parse trees it generates for any given code.

When you feed code to the Semantic library, it generates parse trees using tree-sitter, and then uses the language's grammar (e.g. from tree-sitter-ruby) to parse the parse trees and map AST nodes (many of which are already available from the library itself for: literals, expressions, statements, and types).

While they hope to drop this step in the future, once the AST nodes are mapped, a second round of parsing is done to wrap them into data structures the library can comprehend (for more on this, see the "Assignment" documentation).

There are quite a few more things that happen to analyze and compare source code, but I'll leave the explanation it to their great documentation on program analysis.

Thank you GitHub ❤️ Keep up the solid work 😄

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay