German Robayo

Posted on Aug 24, 2020

Jump to definition on source code

#haskell #dhall #gsoc

Introduction

Hi there! It's been a while since my last post and a cool feature that I've been working on recently it's roughly done, and it's really cool!

The feature is jump-to-definition (I'll abbreviate that to JTD in the rest of the post), and it allows rendered source code to be more navigatable.

TL;DR

To see some examples of this feature, you can visit the following links:

Prelude.JSON.renderAs https://hydra.dhall-lang.org/build/70743/download/1/docs/JSON/renderAs.dhall.html
Every file with the JumpTo prefix on this list https://hydra.dhall-lang.org/build/70746/download/1/docs/

Jump to definition

The main purpose of using a documentation generator for your projects (in any programming language) is to make it easy for your users to learn how to use your library/framework/package/whatsoever. Simple things such as analyzing source-code comments (as dhall-docs already does) are good enough, but we can always add more features that make code discoverable and easier to read.

Jump to definition is a feature that most documentation generators tools have. Let's use haddock as an example: when haddock generates the documentation of a Haskell packages it generates HTML for:

The source-code comments that the user marked as documentation.
The actual source code of the package.

If we use the dhall package, for example:

This is the generated documentation for Dhall.Core (which re-exports Dhall.Syntax) https://hackage.haskell.org/package/dhall-1.34.0/docs/Dhall-Core.html;
This is how haddock renders the source code for the Dhall.Syntax module https://hackage.haskell.org/package/dhall-1.34.0/docs/src/Dhall.Syntax.html

If you open the rendered source code for Dhall.Syntax, you will see that:

It is rendered using syntax-highlighting
It's full of links on the data-types and function names, which suggests you that it's clickable and it allows you to share the implementation of anything inside that module. For instance, you can see here the definition of the Expr data type: the Dhall AST.

We want to add the same for dhall-docs: take a file and do static analysis to see where each name was defined, highlight it correctly when the user hovers it and jump to the definition of that name when the user clicks it.

Implementation

The Dhall AST

The Expr data-type from the dhall package contains a special constructor useful to get the source-span of a parsed expression:

data Expr s a =
    ...
    | Note s (Expr s a)
    ...

(you can see its haddock here)

The purpose of Note is to create useful parse and type-errors by underlining the problematic expression. The Dhall parser outputs an Expr Src Import (explanations of Src and Import can be found on Expr haddock) where each meaningful expression is wrapped on a Note constructor. Srcs contains metadata information about the parsed expression, such as source text and line and column numbers. For example, if you have the following expression:

1 + 2

it will be parsed to something (roughly) like this:

Note (source code info of the "1 + 2" expression)
  (NaturalPlus -- for the `+` operator
    (Note (source code info for the "1") (NaturalLit 1))
    (Note (source code info for the "1") (NaturalLit 2))
  )

Besides Note, there are other constructors that preserve extra-information about the source code. It is mostly used by the dhall format and dhall lint commands.

Code renderer

Now the question is, how can I manipulate/traverse the Expr in a way that lets me generates HTML for the following?

The declaration & use of names (variables)
Relative imports
The rest of the code, that should not be affected at all.

I first started making this simple data-type:

-- we will come to the ??? later
data NameDecl = NameDecl Src Text ???

data SourceCodeType
    = ImportExpr Dhall.Import
    | NameUse NameDecl
    | NameDeclaration NameDecl

data SourceCodeFragment = SourceCodeFragment Src SourceCodeType

You can think of each SourceCodeFragment as a subsequent section of the source code. Similar to a token (but not exactly that, though). Each one is discriminated by its SourceCodeType, which carries the information of how the code should be rendered in the HTML.

Now we need to transform an Expr Src Import to a [SourceCodeFragment] and that's where the fragments function comes:

fragments :: Expr Src Import -> [SourceCodeFragment]

The implementation of the fragments function is quite large to fit in this post (110 lines, to be exact) but what it does is run a simple context-analysis on the AST to generate SourceCodeFragment whenever a name was declared and used. It is clever enough to know when a name was not declared, and when a name was unused and showing no highlight on its HTML (i.e. not generating a SourceCodeFragment for it).

All the work is done in the infer function, declared inside fragments:

-- explanation for ??? will come up later
infer :: Dhall.Context NameDecl -> Expr Src Import -> Writer [SourceCodeFragment] ???

Dhall.Context is a context table that allows us to insert and query for names when traversing an AST. We insert names when it makes sense to (i.e. let-bindings, record fields, and lambda parameters.

All of this work allowed dhall-docs to generate jump-to-definition for the following:

Let-bindings
Lambda argument names
Record (literals and types) field names
Imports (relative and remote imports)

You can see here for a demo. The test cases are named with the feature that is being shown there.

What about the `???` on the code?

let-binding and lambda argument names share the same JTD logic:

Calculate the source span position of the binding name
Traverse the sibling expressions on both Let and Lam adding the name to the context accordingly.

However, adding support for record field is slightly different. We want to highlight the names where fields in selector-expressions were defined. In the following example we want to highlight foo when the user hovers the foo in the selector-expression a.foo:

let a = { foo = 2 }
in  a.foo

Pushing the foo label to the context isn't the right solution because handling the exact expression to remote the foo label would be complicated.

What we instead do, is to use another data-type:

data JtdInfo
    = RecordFields (Set.Set NameDecl)
    | NoInfo

??? corresponds to JtdInfo. You can think of JtdInfo as extra information recovered from the static-analysis that could aid in more precise generated JTD. Replacing ??? on the above snippets, we got:

data NameDecl = NameDecl Src Text JtdInfo

infer :: Dhall.Context NameDecl -> Expr Src Import -> Writer [SourceCodeFragment] JtdInfo

Now each NameDecl has an associated JtdInfo, and every expression returns a JtdInfo from infer. Now we can pattern match against the Field constructor to correctly setup JTD:

Field e (FieldSelection (Just Src{srcEnd=posStart}) label (Just Src{srcStart=posEnd})) -> do
  fields <- do
    dhallType <- infer context e
    case dhallType of
      NoInfo -> return mempty
      RecordFields s -> return $ Set.toList s

  let src = makeSrcForLabel posStart posEnd label
  let match (NameDecl _ l _) = l == label
  case filter match fields of
    x@(NameDecl _ _ t) : _ -> do
      Writer.tell [SourceCodeFragment src (NameUse x)]
      return t
    _ -> return NoInfo

We setup JTD only if it makes sense. Thankfully pattern matching helps us elegantly handle all cases.

To end

This was the last feature that I developed as part of my GSoC project. Keep in tune to read my last report about a summary of what the whole project and a story about all my experience through.

Thanks for reading!

DEV Community

Jump to definition on source code

Introduction

TL;DR

Jump to definition

Implementation

The Dhall AST

Code renderer

What about the `???` on the code?

To end

Oldest comments (0)

Introduction

TL;DR

Jump to definition

Implementation

The Dhall AST

Code renderer

What about the ??? on the code?

To end

What about the `???` on the code?