German Robayo

Posted on Jun 28, 2020

Document generation & Rendered Source Code

#haskell #dhall #gsoc

TL;DR

If you only want to see my current progress, you can visit this website to see it https://hydra.dhall-lang.org/build/63538/download/1/docs/

Motivation

What is a document generator? If you're familiar with other programming languages such as Java or Haskell, for example, you may know of tools that analyze your source code, extract useful information in your comments and display them in a a markup language such as HTML, for instance.

In haskell, that tool is haddock. It analyzes your source code, searching for comment annotations on your data-types and functions, to report them in a nice way using HTML.

For instance, the following function declaration:

-- | Function description
haddockExample
  :: a -- ^ input description
  -> b -- ^ output description

it will render something like this on the generated HTML:

The generated HTML can be then uploaded to any host to serve them. This is useful if you're a package maintainer to let your consumers know how to use your packages.

The main goal of this #GSoC project is to build a similar tool for the Dhall configuration language. I set some milestones for the project, and this post will focus on the following:

Generate some readable with a nice UI/UX from a dhall package. A Dhall package is essentially (at this moment) a folder with several Dhall files, ending with the .dhall extension.
Add rendered source code (first iteration). Haddock does it. For each haskell module, it will create:
- The HTML documentation
- A HTML rendered source code, similar to this

Documentation generator

At this moment, I developed the dhall-docs executable. That takes the following flags:

--input, which is a relative or absolute path to the Dhall package.
--output-link, which is a symlink (defaulting to ./docs) to the generated documentation.
--package-name, which is the actual package-name used in HTML titles and in the place where generated documentation are actually saved.

The tool will traverse the whole --input directory in a recursive way, searching for all the files that ends in .dhall and parse them. If a dhall file fails to parse as a Dhall expression, it won't be included.

Structure of the generated documentation

On each directory on the generated documentation, an index.html file is generated listing the subpackages (the directories in it) and the exported dhall files in that level.

If we visit a .dhall file, we can see something like this:

The following is the actual source code that the tool took as its part of its input:

{-
`subtract m n` computes `n - m`, truncating to `0` if `m > n`
-}
let subtract
    : Natural → Natural → Natural
    = Natural/subtract

let example0 = assert : subtract 1 2 ≡ 1

let example1 = assert : subtract 1 1 ≡ 0

let example2 = assert : subtract 2 1 ≡ 0

let property0 = λ(n : Natural) → assert : subtract 0 n ≡ n

let property1 = λ(n : Natural) → assert : subtract n 0 ≡ 0

let property2 = λ(n : Natural) → assert : subtract n n ≡ 0

in  subtract

As you would notice, the Documentation header on the generated HTML corresponds to the Header comment, and the actual source code is the rest of the file. The header is written in Markdown, dhall-docs uses mmark as markdown parser and preprocessor.

This is the first iteration of the work, I have plans on expanding the places where annotation comments can go,
like record type labels, for example.

Here is the list of PRs involved on this task:

dhall-haskell#1833 introduced the repository skeleton
dhall-haskell#1845 first attempt to generate this documentation without any css, parsing the header without using a markdown pre-processor.
dhall-haskell#1848 improved the css rules
dhall-haskell#1863 parsed the header markdown contents and rendered them on the html page
dhall-haskell#1871 added a small ci/cd configuration to generate sample documentation. you can see it here. This was my hardest task since I didn't knew any of nix, feels good to actually have accomplish it.
dhall-haskell#1876 stored the generated documentation at $XDG_DATA_HOME/dhall-docs following the XDG specification. Documentation is stored at $XDG_DATA_HOME/dhall-docs/${SHA256_OF_DOCS}-${PACKAGE-NAME} this makes it content-addressable.

All of the work was really ad-hoc, so I won't add any implementation details: you can see them on the PRs. Next section was way more interesting to implement, so please keep reading :)

Rendered Source Code (first iteration)

On the previous section I showed up a first iteration on rendered source code. There were several ways of doing this task, but the thing that I was almost about to start to do was to traverse the Dhall AST, generating Html (). In FP terms, I should create a catamorphism. In non-FP terms, I should create a mapper.

But this was going to involve a lot of lines of code, and actually some repetition of what the Dhall.Pretty module of the dhall package does i.e. define formatting rules for the AST elements and tokens.

The Dhall.Pretty module used under the hood the prettyprinter, its core consists of the following functions and ADT:

data Ann
  = Keyword     -- ^ Used for syntactic keywords
  | Syntax      -- ^ Syntax punctuation such as commas, parenthesis, and braces
  | Label       -- ^ Record labels
  | Literal     -- ^ Literals such as integers and strings
  | Builtin     -- ^ Builtin types and values
  | Operator    -- ^ Operators
  deriving Show

-- Create a `Doc Ann` from a dhall expression
-- annotating elements using our syntatic rules
prettyExpr :: Pretty a => Expr s a -> Doc Ann

-- SimpleDocStream can be later rendered as `Text` on
-- a terminal
layout :: Doc ann -> Pretty.SimpleDocStream ann

This module contained basically all of what I have to do, and repeating code is bad! The prettyprinter says on its package description:

A prettyprinter/text rendering engine. Easy to use, well-documented, ANSI terminal backend exists, HTML backend is trivial to implement, no name clashes, Text-based, extensible.

so I thought: "man, there should be a way to generate Html () from a dhall expression using this module".

And there was a way. On the package documentation, they recommend using SimpleDocTree instead of SimpleDocStream to render HTML-like output, and the package itself exports an utility to do the conversion: treeForm. Traversing the SimpleDocTree ADT made all this work possible in the following function:

import Lucid
import Dhall.Pretty (Ann (..))

import qualified Data.Text.Prettyprint.Doc.Render.Util.SimpleDocTree as Pretty
import qualified Dhall.Pretty

exprToHtml :: Expr Src Import -> Html ()
exprToHtml expr = renderTree prettyTree
  where
    prettyTree = Pretty.treeForm
        $ Dhall.Pretty.layout
        $ Dhall.Pretty.prettyExpr expr

    textSpaces :: Int -> Text
    textSpaces n = Data.Text.replicate n (Data.Text.singleton ' ')

    renderTree :: Pretty.SimpleDocTree Ann -> Html ()
    renderTree sds = case sds of
        Pretty.STEmpty -> return ()
        Pretty.STChar c -> toHtml $ Data.Text.singleton c
        Pretty.STText _ t -> toHtml t
        Pretty.STLine i -> br_ [] >> toHtml (textSpaces i)
        Pretty.STAnn ann content -> encloseInTagFor ann (renderTree content)
        Pretty.STConcat contents -> foldMap renderTree contents

    encloseInTagFor :: Ann -> Html () -> Html ()
    encloseInTagFor ann = span_ [class_ classForAnn]
      where
        classForAnn = "dhall-" <> case ann of
            Keyword -> "keyword"
            -- ommited for brevity

This is similar to the first option: transform the Dhall AST (Expr s e) to Html (), the difference is that we don't have to worry about the types of syntactical elements: that logic can be kept on the Dhall.Pretty module, and this function only creates the Html () from it.

The PR that introduced that change is dhall-haskell#1892, and we can see how many additions/deletions that change involved:

Neat! A lot of value in less than 160 lines of code.

The things I learnt along the way

Of course, I've improved my haskell, specifically the package ecosystem. One of the things that overwhelmed (and sometimes annoyed me) is how package versions are resolved, and since we have to ensure our project works on several GHC versions with several package versions, we have to be really sure about the version of a package that we are adding.

This was difficult on the first two weeks, since I had to do some little research about packages to render HTML and parse markdown, and fight against the ci/cd pipeline when a version error occurred.

Thankfully, now I understand better how it works, and in the future of the project I don't think I'll add more packages, but now I'm sure how to tackle that kind of issues.

Another thing that I've learned a little on the project was Nix. In short words, its a functional package manager. Fun fact: I see that a lot of people that enters the functional programming world tends to use only tools that uses that paradigm. Everytime I searched something about Nix, it was using a haskell project.

If you made it this far

Thanks for reading!

DEV Community

Document generation & Rendered Source Code

TL;DR

Motivation

Documentation generator

Structure of the generated documentation

Rendered Source Code (first iteration)

The things I learnt along the way

If you made it this far

Top comments (0)

Read next

The Future of Rust Programming and My Experience with Rust-Based Tools

How Pod Creation Happens in Kubernetes? Understand Full K8s Workflow

Understanding Lambda, Map, and Filter in Python

Advent of Code 2024 - Day 20: Race Condition