THANKS to new efforts in the OCaml ecosystem like ReasonML, BuckleScript, and Dune (formerly JBuilder), there are quite a few new OCaml projects being started nowadays. The BuckleScript and Dune build systems provide automatic project-level namespacing, which is quite convenient. However, they do set some defaults which I feel lose some nicer features in a project, like having a toplevel module that can be documented with OCaml's built-in documentation comments. [EDIT: after using Dune for a bit–you can definitely have a toplevel module in the project. It's just that each 'index' module (see below) lives at the same level in the filesystem hierarchy as its children modules.]
Suppose you have a project
https://github.com/myname/myproject, with the following main code files:
divide.ml and corresponding interface files (or
.rei). I suggest the following layout for the code files:
myproject/ src/ Myname_Myproject.ml Myname_Myproject/ Myname_Myproject_Add.ml Myname_Myproject_Add.mli Myname_Myproject_Subtract.ml Myname_Myproject_Subtract.mli Myname_Myproject_Multiply.ml Myname_Myproject_Multiply.mli Myname_Myproject_Divide.ml Myname_Myproject_Divide.mli
... and so on for other nested modules. The key is to have a single toplevel module
Myname_Myproject.ml under the
src/ directory, corresponding to the name of the project. It provides the following benefits:
- a namespace for the project (because OCaml doesn't by default)
- a unified toplevel for the project code documentation
- an entry-point and index module for your project as a whole
This toplevel module can be named anything, really; people have (e.g.) used
Myproject.ml for convenience. I highly recommend
Myname_Myproject, however, in order to namespace it more strongly with your username and avoid conflicts.
In OCaml, existing modules aren't extensible. This means that you can't release a project containing a
Myname module and then later release a project containing another
Myname module and use both the projects together. This is inconvenient but understandable if you look at it from OCaml's point of view (i.e. people shouldn't be able to mess with existing modules).
Myname_Myproject, is ultimately not too cumbersome–in my opinion.
- toplevel code documentation for the project as a whole
- nested module aliases to the project's exported modules so that users can refer to them with dot-notation
- code documentation for the module aliases so that the docs appear for the corresponding modules
(** [Myname_Myproject.ml] - this is the toplevel module documentation. *) (** Module-level documentation for the [Add] module *) module Add = Myname_Myproject_Add (** Module-level documentation for the [Subtract] module *) module Subtract = Myname_Myproject_Subtract (** Module-level documentation for the [Multiply] module *) module Multiply = Myname_Myproject_Multiply (** Module-level documentation for the [Divide] module *) module Divide = Myname_Myproject_Divide
Now, users of your library or app can refer to your modules using convenient dot-notation and auto-complete:
Myname_Myproject.Add, etc. They can alias the long toplevel module name for ease of use:
module Proj = Myname_Myproject. And they can access the entire project (code and documentation) through a convenient single point of entry.
Earlier I mentioned that you can provide aliases in the toplevel module for the modules you want to export. This is more of a convention; if you have for example a file
Myname_Myproject_Sqrt.ml in your source tree, the module
Myname_Myproject_Sqrt will be visible to users of your library. OCaml doesn't really have a way to hide file modules. [EDIT: this is also not quite true–see my next post about this.] But by not listing it in your toplevel module, you do provide a strong hint that it's project-internal.
This might seem like a limitation but keep in mind that in mainstream languages like Java you can't hide public classes from consuming packages either (at least, not unless you use the new Java modules feature).
Aliased modules in the same path can't refer to each other with dot-notation. For example, in
Myname_Myproject_Multiply.ml, you can't call
Myname_Myproject.Add.whatever. The problem is that the
Myname_Myproject refers to
Myname_Myproject_Multiply (because it aliases it). So the latter also referring to the former makes a cyclic dependency and OCaml doesn't support cyclic dependencies across separate files.
The solution is to use the full (underscored version) module name from modules in the same directory root.
I believe this custom work is worth it overall, because it provides a better experience for users (and readers, including Future You). OCaml projects have historically dumped all the source code into a single directory because OCaml doesn't namespace by directory. The layout I suggest here exposes the complexity of your project in smaller chunks, with a more easily-digestible entry-point.
My hope is that in the future the OCaml ecosystem will organize itself more around modules rather than packages (e.g. https://opam.ocaml.org/packages/ , https://redex.github.io/ ) as the standard searchable units of code reuse. OCaml modules have certain benefits, like automatic compile-time compatibility checking, that make them quite suited to taking the role of 'packages' from other ecosystems.
Searchable module indexes that work by indexing 'exported' modules (i.e. aliased modules reachable from the toplevel module) would do away with the effort of having to search for packages in the ecosystem to find specific modules that we need.
Another benefit I haven't actually mentioned yet is that this structure is very amenable to a documentation toolchain that understands OCaml doc comments. For example, I have an OCaml project with generated documentation that I hope to blog about in the near future.