German Robayo

Posted on Jul 26, 2020

Record keys prefix whitespace recovering

#haskell #gsoc #dhall

Introduction

Hi folks! I haven't posted here since two weeks ago, so I'd like to give you an update on the progress of my #GSoC project.

The main subject of this post is the recovering of prefix whitespace on Record keys, which will allow users to document record types and literals' keys.

First problem

Haskell's Dhall's parser treats both block and single-line comments as whitespace, which ignores since Dhall's is whitespace insensitive.

A limitation of this implementation is that we cannot fully reconstruct a Dhall file with the consumed whitespace and therefore the comments. This has been reported in some old issues on the repository. In particular, dhall format removes comments after formatting the files

dhall-format removes comments #145

psibi posted on Sep 29, 2017

Example file:

--- This is dhall file
{ foo = "jax" }

The Output it gives:

{ foo = "jax" }

View on GitHub

Due to the size of the project, some simple solutions for this have been made. The parser preserves:

The Header whitespace before the first non-whitespace character of the file. This is what dhall-docs supports at this moment.
Whitespaces in-between let-bindings components, as highlighted by this haddock comment

... but no more expressions have been handled yet because it's a little onerous to code and it introduces a lot of backward-incompatible changes.

Two weeks ago, I started to add support to Record keys prefix on both record literals and types. The whole work is on the following PR

feat(comments): support prefix comments on Record's key-value pairs. #1908

german1608 posted on Jul 10, 2020

This PR tries to add support to record field comments both on formatting and dhall-docs documentation extension. Specifically the following ones:

{
  -- before each record field
  a : Text
  -- after each record field
, ...
}

I updated the Record's Expr constructor from:

Record (Map Text (Expr s a))

Record (Map Text (RecordField s a))

where RecordField stores the last of the expression on the right side of the : and the prefix and suffix comments. Although the prefix comment is annotating the label field, the best place to save that comment is there.

However, I'm having a problem with unsafeSubExpressions, because it can't traverse the Maybe s I put on RecordField. A quick solution for that is to change those RecordField's Maybe s' to Nothing, although that means we loose that information when traversing.

I'll keep this PR as a draft to get your opinion in that matter.

View on GitHub

I made this because:

It will be another element that dhall-docs will document for
It will help to fix the issue with dhall format removing comments
To take more knowledge of the whole dhall codebase.

In the following sections, I'll describe how the task was made.

New data constructors

The Dhall's AST constructors for Record literals and types are the following:

data Expr s a
    = ...
    -- | > Record       [(k1, t1), (k2, t2)]        ~  { k1 : t1, k2 : t1 }
    | Record    (Map Text (Expr s a))
    -- | > RecordLit    [(k1, v1), (k2, v2)]        ~  { k1 = v1, k2 = v2 }
    | RecordLit (Map Text (Expr s a))
    ...

They don't contain any property that will allow us to recover whitespace. So the first thing we need to do is to modify the constructors to include them. I ended up writing the following data-type:

data RecordField s a = RecordField
    { recordFieldSrc  :: Maybe s
    , recordFieldValue :: Expr s a
    } deriving (Data, Eq, Foldable, Functor, Generic, Lift, NFData, Ord, Show, Traversable)

And modify the Expr s a constructors like this:

data Expr s a
    =
    -- | > Record [ (k1, RecordField _ t1)          ~  { k1 : t1, k2 : t1 }
    --   >        , (k2, RecordField _ t2)
    --   >        ]
    | Record    (Map Text (RecordField s a))
    -- | > RecordLit [ (k1, RecordField _ v1)       ~  { k1 = v1, k2 = v2 }
    --   >           , (k2, RecordField _ v2)
    --   >           ]
    | RecordLit (Map Text (RecordField s a))
    ...

The captured whitespace is the prefix of the label. Although the Map on the constructor maps labels to values, the less cumbersome place to save that was on the RecordField data-type. This will allow us to map the following:

{ -- my single-line before `a`
    a = 1
}

To something (roughly) like this:

RecordLit [("a", RecordField "-- my single-line before `a`" (NaturalLit 1)]

Note that we don't save suffix whitespace right after each record value because the Dhall's parser consumes them and there is no pretty way that doesn't involve modifying a lot the parser to fix that. Thankfully, places like the following:

a {- A -} : {- B -} B

will be easy to recover in the future and add value in the context of #145. I didn't add support for those yet since I feel that people won't write a lot of useful documentation on that place, and I wanted to finish this as quickly as possible to start other tasks more directly related to my project.

After modifying those records, the cumbersome part of the task came in: tracking the type-checking errors on the whole project was exhausting. The good thing is that it wasn't difficult, and it helped me learn more about the whole project, which is good :)

Parser modifications

Dhall's parser is located at the Dhall.Parser.Expressions module. It is built using megaparsec parser combinators. The relevant sections of the code are here on alternative04:

alternative04 = (do
    _openBrace

    src0 <- src whitespace
    mComma <- optional _comma

    -- `src1` corresponds to the prefix whitespace of the first key-value
    -- pair. This is done to avoid using `try` to recover the consumed
    -- whitespace when the comma is not consumed
    src1 <- case mComma of
        Nothing -> return src0
        Just _ -> src whitespace

    a <- recordTypeOrLiteral src1

    _closeBrace

    return a ) <?> "literal"

and here:

recordTypeOrLiteral firstSrc0 =
        choice
            [ emptyRecordLiteral
            , nonEmptyRecordTypeOrLiteral firstSrc0
            , emptyRecordType
            ]

On the first snippet note that we parse the prefix whitespace of the first key-value pair of the record, and pass it to the recordTypeOrLiteral combinators. That allows us to share that whitespace over all possible choices on recordTypeOrLiteral efficiently. If we didn't do that, we would recur to the try function from megaparsec that backtracks on parse failures, which increases the benchmark numbers.

By doing all this work, the prefix whitespace of the record was preserved. You can check out that by using the following example:

{ -- I'm recovered
    a = 1,

    {- me too! -}
    b = 2,
}

...on dhall haskell-syntax-tree --noted

To end

This was a hard job. I feel great that I'm contributing not only to my particular project but the whole Dhall's Haskell's implementation.

I'd like to thank my mentors Simon and Gabriel for their help with this PR not only by giving me suggestions and guidelines but on the exhausting job of reviewing that huge PR.

Thanks for reading!