loading...

A modular OCaml project structure

yawaramin profile image Yawar Amin Updated on ・4 min read

THANKS to new efforts in the OCaml ecosystem like ReasonML, BuckleScript, and Dune (formerly JBuilder), there are quite a few new OCaml projects being started nowadays. The BuckleScript and Dune build systems provide automatic project-level namespacing, which is quite convenient. However, they do set some defaults which I feel lose some nicer features in a project, like having a toplevel module that can be documented with OCaml's built-in documentation comments. [EDIT: after using Dune for a bit–you can definitely have a toplevel module in the project. It's just that each 'index' module (see below) lives at the same level in the filesystem hierarchy as its children modules.]

In this post I describe a project structure that places OCaml's modules at the forefront. It involves a little legwork, but as we'll see, not more than in equivalent OCaml or JavaScript projects, and it pays off in a big way in readability and documentation.

Suppose you have a project https://github.com/myname/myproject, with the following main code files: add.ml, subtract.ml, multiply.ml, and divide.ml and corresponding interface files (or .re/.rei). I suggest the following layout for the code files:

myproject/
  src/
    Myname__Myproject.ml
    Myname__Myproject/
      Myname__Myproject__Add.ml
      Myname__Myproject__Add.mli
      Myname__Myproject__Subtract.ml
      Myname__Myproject__Subtract.mli
      Myname__Myproject__Multiply.ml
      Myname__Myproject__Multiply.mli
      Myname__Myproject__Divide.ml
      Myname__Myproject__Divide.mli

... and so on for other nested modules. The key is to have a single toplevel module Myname__Myproject.ml under the src/ directory, corresponding to the name of the project. It provides the following benefits:

  • a namespace for the project (because OCaml doesn't by default)
  • a unified toplevel for the project code documentation
  • an entry-point and index module for your project as a whole

This toplevel module can be named anything, really; people have (e.g.) used Myproject.ml for convenience. I highly recommend Myname__Myproject, however, in order to namespace it more strongly with your username and avoid conflicts.

Why Myname__Myproject instead of Myname.Myproject

In OCaml, existing modules aren't extensible. This means that you can't release a project containing a Myname module and then later release a project containing another Myname module and use both the projects together. This is inconvenient but understandable if you look at it from OCaml's point of view (i.e. people shouldn't be able to mess with existing modules).

The workaround, Myname__Myproject, is ultimately not too cumbersome–in my opinion.

The toplevel module should contain...

  • toplevel code documentation for the project as a whole
  • nested module aliases to the project's exported modules so that users can refer to them with dot-notation
  • code documentation for the module aliases so that the docs appear for the corresponding modules

For example:

(** [Myname__Myproject.ml] - this is the toplevel module documentation. *)

(** Module-level documentation for the [Add] module *)
module Add = Myname__Myproject__Add

(** Module-level documentation for the [Subtract] module *)
module Subtract = Myname__Myproject__Subtract

(** Module-level documentation for the [Multiply] module *)
module Multiply = Myname__Myproject__Multiply

(** Module-level documentation for the [Divide] module *)
module Divide = Myname__Myproject__Divide

Now, users of your library or app can refer to your modules using convenient dot-notation and auto-complete: Myname__Myproject.Add, etc. They can alias the long toplevel module name for ease of use: module Proj = Myname__Myproject. And they can access the entire project (code and documentation) through a convenient single point of entry.

Exported modules vs not

Earlier I mentioned that you can provide aliases in the toplevel module for the modules you want to export. This is more of a convention; if you have for example a file Myname__Myproject__Sqrt.ml in your source tree, the module Myname__Myproject__Sqrt will be visible to users of your library. OCaml doesn't really have a way to hide file modules. [EDIT: this is also not quite true–see my next post about this.] But by not listing it in your toplevel module, you do provide a strong hint that it's project-internal.

This might seem like a limitation but keep in mind that in mainstream languages like Java you can't hide public classes from consuming packages either (at least, not unless you use the new Java modules feature).

Aliased module cyclic dependencies

Aliased modules in the same path can't refer to each other with dot-notation. For example, in Myname__Myproject__Multiply.ml, you can't call Myname__Myproject.Add.whatever. The problem is that the Myname__Myproject refers to Myname__Myproject__Multiply (because it aliases it). So the latter also referring to the former makes a cyclic dependency and OCaml doesn't support cyclic dependencies across separate files.

The solution is to use the full (underscored version) module name from modules in the same directory root.

Overall

I believe this custom work is worth it overall, because it provides a better experience for users (and readers, including Future You). OCaml projects have historically dumped all the source code into a single directory because OCaml doesn't namespace by directory. The layout I suggest here exposes the complexity of your project in smaller chunks, with a more easily-digestible entry-point.

My hope is that in the future the OCaml ecosystem will organize itself more around modules rather than packages (e.g. https://opam.ocaml.org/packages/ , https://redex.github.io/ ) as the standard searchable units of code reuse. OCaml modules have certain benefits, like automatic compile-time compatibility checking, that make them quite suited to taking the role of 'packages' from other ecosystems.

Searchable module indexes that work by indexing 'exported' modules (i.e. aliased modules reachable from the toplevel module) would do away with the effort of having to search for packages in the ecosystem to find specific modules that we need.

Another benefit I haven't actually mentioned yet is that this structure is very amenable to a documentation toolchain that understands OCaml doc comments. For example, I have an OCaml project with generated documentation that I hope to blog about in the near future.

Posted on by:

yawaramin profile

Yawar Amin

@yawaramin

Programming languages enthusiast. Author of Learn Type Driven Development: https://www.packtpub.com/application-development/learn-type-driven-development

Discussion

markdown guide