DEV Community

Cover image for Comment's format specification
German Robayo
German Robayo

Posted on

Comment's format specification

Introduction

Hi everyone! In this post, I'll write a summary of what I did this week on my GSoC project: Define and implement a specification for dhall-docs comment's format.

Why do we need a spec?

Because of two particular reasons.

Distinguish between internal and documentation comments

Is really common on several programming languages to write comments that you may not want to render in the language documentation generator. A simple way of avoiding so is by having "markers" on comments. Haddock, for instance, requires that you use | as a marker in both single-line and block comments:

-- | Valid haddock comment

-- ignored haddock comment

{-| Valid haddock comment -}

{- Ignored haddock comment -}

Javadoc also does it, by requiring that Javadoc comments must start with /**.

For the same reason above, we need to be able to distinguish between internal and documentation comments on dhall-docs.

Indentation issues

We are going to use Markdown as a markup language for the documentation generator, specifically, We will use a particular flavor of Markdown named CommonMark. Markdown, unlike Dhall, is sensitive to indentation. The following markdown document:


first column
    4 columns later

will render "first column" in a normal paragraph, and "4 columns later" as an Indented Code Block. I invite you to give it a try.

In the beginning, since dhall-docs only supports Header comments, we used the first column of the line as a base of indentation. That means that something like this:

{-
foo
bar
baz
-}

will be rendered as 3 different paragraphs, in different lines, whereas:

{-
     foo
     bar
     baz
-}

will render all three lines in an indented-code-block. But to support other documentation comments (such as record fields), this doesn't scale well. Take this example:

        {
            {-
            should this be indented???
            -}
            foo = bar
        }

Comment's content indentation is now clear, and forcing users to write this case like this:

        {
            {-
should this be indented???
            -}
            foo = bar
        }

is awful and was never an option.

Final specification

Rewriting here the final specification is a non-sense, but you can read it here. Block-comments were heavily based on Dhall's multiline strings to make it really familiar to something already implemented on the language. You can read here the specification for multiline strings here. Single-line comments, on the other hand, was something completely new, or at least couldn't find anything similar from the Dhall's language design that could help, but I think that it will be comfortable for users.

To give you a summary:

  • Block-comments starts with {-| and a newline e.g.

    {-|
    foo
    bar -}
    
  • Single-line comments can span several lines, but the first one should init with --| (note the final whitespace) and every other line should start with -- (note the two whitespaces). Also, they need to be vertically aligned e.g.

    --| foo
    --  bar
    

Implementation

I have to admit: Text-manipulation is really hard and messy for me. My first implementation of the specification was really awful and cumbersome. One of my mentors gave me this post about some tips on type-driven design and I applied them to the implementation. I heavily recommend reading that post.

I defined the following data-type:

{-# LANGUAGE DataKinds         #-}
{-# LANGUAGE KindSignatures    #-}

data CommentType = DhallDocsComment | MarkedComment | RawComment

type ListOfSingleLineComments = NonEmpty (SourcePos, Text)

data DhallComment (a :: CommentType)
    = BlockComment Text
    | SingleLineComments ListOfSingleLineComments
    deriving Show

newtype DhallDocsText = DhallDocsText Text

Note the Language extensions I'm using (disclaimer: I'm still learning Haskell and specifically the use of language extensions and some of them require some math background. I apologize if I say some nonsense. If you want to give me a term fix, please leave a comment on the post):

  • DataKinds allows us to extend Haskell's kind system by promoting our data-types to kinds.
  • KindSignatures allow the a :: CommentType syntax. In that example it means "I accept any valid value of kind CommentType i.e. DhallDocsComment, MarkedComment, RawComment.

That allows us to have these type of comments:

DhallComment RawComment
DhallComment MarkedComment
DhallComment DhallDocsComment

and you know for sure what kind of text stores each possible type. This is something I love about Haskell: the type-system itself helps you write correct programs.

After that, all left was doing some mappers between each possible DhallComment:

-- checks if a DhallComment has the `|` marker
parseMarkedComment
    :: DhallComment 'RawComment
    -> Maybe (DhallComment 'MarkedComment)

-- check that a MarkedComment is valid against the `dhall-docs` spec
parseDhallDocsComment
    :: DhallComment 'MarkedComment
    -> Either CommentParseError (DhallComment 'DhallDocsComment)

-- Manipulates the comment's text. Since the Comment
-- is a 'DhallDocsComment it should never fail
parseDhallDocsText
    :: DhallComment 'DhallDocsComment
    -> DhallDocsText

and to ensure that the implementation worked properly, the dhall-docs test setup was enhanced to add several unit test cases for this which you can see here

Examples of using this new spec

Recently a PR was opened to modify the Dhall's Prelude header files. You can see the PR here:

Use `dhall-docs` comment format for Prelude #1045

... so that the comment headers are included in the generated documentation

To end

I have 1 month left to finish the project and there are some core and cool features that are missing. As soon as I finish them I'll post here so you can check out. I have to say: I'm really excited to finish this project :)

Thanks for reading!

Top comments (0)