FROM time to time the question comes up in the OCaml/ReasonML communities:
Why do I need two files–an implementation file and an interface file–to define a module?
In this post I will try to answer this question.
(I know this is not always the exact question, but bear with me for a moment here.)
You maybe don't need interface files
First of all–you may not need two files! You need an interface file if you want to hide some parts of the implementation. That's it. If you want to expose the entire module implementation–which you sometimes do for good reasons–then don't write an interface file.
When do you want to expose the entire implementation? For me, the following cases make sense:
- The module is a 'namespacing' module which simply re-exports other modules
- The module just defines interfaces and functors which create modules
- The module really doesn't contain anything worth hiding
Although note that, if you expose an implementation, OCaml may not infer type parameter names as 'pretty' as you might have liked. E.g.,
let context request = ...
(* val context : 'a request -> 'a *)
You might want to give it a more descriptive type name e.g. 'ctx request -> 'ctx
. Interfaces are useful for that as well.
Why interface files
So sometimes you do need a good old-fashioned interface file, if you want to:
- Hide some irrelevant implementation details
- Disallow creating certain types without using your provided functions that guarantee correctness
- And document the above properly
But let's do a deeper dive into why specifically the interface/implementation file approach was chosen for OCaml/Reason. It goes back to OCaml's roots, and the source of its module interface/implementation design: Modula-2. Modula-2 was a descendant of Pascal, a teaching programming language which found application in industry with Turbo Pascal, Delphi, and others. Incidentally, the creator of Delphi, who is also the creator of C#, then went on to create TypeScript (which we will come to later in this post).
A dive into Modula-2
Anyway, Modula-2 had one big difference from Pascal–modules (as you may have guessed from the name):
Modules are the most important feature of Modula-2 over its predecessor Pascal making it very important for you to understand what they are and how they work. Fortunately for you, there are not too many things to learn about them and after you master them you will find many uses for them as you develop programs, and especially large programs.
Modula-2 came with a clever way to develop programs in a segregated way that would optimize their build times. The key concepts were:
- Module implementations and interfaces (which it called a 'definition') should be separated
- A module implementation M1 that used another module M2 would depend only on the interface of M2, not the implementation
- Rebuilding the implementation of M2, without touching its interface, thus would have no effect on the build of its dependencies like M1
As you can see, this scheme would ensure that, as programs got bigger and bigger, build times should stay as minimal as possible, if module interfaces are well-designed. In fact this is exactly what the documentation says:
Why all of this trouble?
It may not seem to be worth all of the extra trouble that the Modula-2 compiler and linker go through to do this checking but it is important for a large program. The information used in the definition part of the module is the type of information that should be well defined in the design stages of a programming project, and if well done, very few or no changes should be required during the coding phase of the project. Therefore it is expected that recompiling several definition modules should not happen very often. On the other hand, during the coding and debugging phase of the project, it is expected that many changes will be required in the implementation parts of the modules. Modula-2 allows this and still maintains very strong type checking across module boundaries to aid in detecting sometimes very subtle coding errors.
The above paragraph should be interpreted as a warning to you. If you find that you are constantly recompiling modules due to changes in the definition modules, you should have spent more time in the software design.
The 'checking' that it is talking about is explained earlier on that page. Basically, when an interface is compiled, one of its outputs is a 'key' that uniquely identifies the built interface, and checks that against the key that the dependency module says that it builds against. Think of it as a 'content-addressable hash' of the interface, used for strong typechecking during link time.
Back to OCaml
In fact this is almost exactly the same technique that OCaml uses to check modules during link time:
When OCaml builds a module, it takes a[n] MD5 hash over the interface and some of the internals. It also stores in the [dependent] module the MD5 hashes of any modules that it depends upon. At link time the MD5 hashes are compared, and the link is only allowed to proceed if they match.
As you can see, significant parts of OCaml's module design and linker behaviour are copied straight from Modula-2. Here's what the creator of OCaml has to say about it:
The “one compilation unit = one .mli interface file + one .ml implementation file” design goes back to Caml Light and was taken from Modula-2, Wirth’s excellent Pascal successor. As previously mentioned, it works great with separate compilation and parallel “make”.
...
OCaml combines this Modula-2 approach to compilation units with a Standard ML-like language of modules, featuring nested structures, functors, and multiple views of structures. The latter is a rarely-used but cool feature whereas a given structure can be constrained by several signatures to give different views on the implementation, e.g. one with full type abstraction for use by the general public and another with more transparent types for use by “friend” modules. Again, and perhaps even more than in Modula-2, it makes sense to separate structures (implementations) from signatures (interfaces) that control how much of the implementation is visible.
- Xavier Leroy (that whole thread is a goldmine by the way)
Prof. Leroy here also hints at a really cool technique that's available in the ML language family: modules can conform to multiple interfaces and expose different interfaces to different consumers. In other words–a module can wear many masks, depending on who is using it. I'll explore this technique more in a future post. Anyway, back to the drudgery of writing interfaces...
Interfaces in other languages
It turns out that other languages have been using separate interfaces (to lesser or greater extents), sometimes for a long time! For example:
C
The first example that springs to mind is the venerable C. It uses header (.h
) files to declare types, functions, and other program components. These header files then get used by other implementation files/compilation units (.c
) at link time. C also allows separate compilation when the implementation changes, and the header file doesn't. In fact in the C world a standard way of distributing software is shipping libraries with header files to declare the public API, along with compiled object files to actually link against.
Java
In the Java world the standard advice is, 'Program to the interface, not the implementation'. This adds flexibility to the program, allowing easier change down the road when requirements change: https://softwareengineering.stackexchange.com/questions/150045/what-is-the-point-of-having-every-service-class-have-an-interface
In Java, writing an interface
and then an extending class (or more) is the equivalent of writing an interface file and then an implementation file (or more) in OCaml. You can see from the above Stack Exchange answer that newcomers sometimes don't get the point of repeating the API in the interface and then the implementation, and the 'grizzled old veterans' induct them.
TypeScript, Python, Ruby
In the gradually-typed languages we see cropping up today, interface files are very much in vogue, to 'ascribe' static types to dynamically-typed libraries:
-
TypeScript declaration files (
.d.ts
) -
Python stub files (
.pyi
) -
Ruby interface files (
.rbi
)
You can see that OCaml/Reason's interface file precedent is copied almost wholesale.
Facing up to interfaces
Interfaces are a critical part of OCaml/Reason. In fact, they are almost universally recognized as a good idea and are spreading across more and more languages. Once you settle in to the OCaml mindset, you may find yourself enjoying the craft of software construction they enable!
Top comments (5)
Great post. I think it's also worth mentioning another benefit to interface files: they let the compiler do more optimizations. When the compiler knows that certain values are only internal to a module, it can compile them more efficiently.
And related to that, the compiler can also give you more warnings about unused values. That's especially useful if you have a large module with lots of internal helper functions and you want to clean them up.
Do you know if there's a way of telling the compiler which interface file to use depending on the environment? For example:
This is one of the things I miss most from Rust
Yes–apply the signature to the module at its point of use, not its point of definition. This keeps the module definition as general as possible but you can expose different facets of the module to different consumers.
Personally I don't recommend that, however. I think that unit tests should test using the exact same interface that's exposed in production. They shouldn't become deeply coupled to implementation details, because that leads to fragile tests.
Ha, ReWeb will have to wait a bit more :-) for the modules post, I am still mulling the idea in my head.
Thank you! I'm enjoying building it up piece by piece. Please let me know if you have any questions.