DEV Community

Renato Alencar
Renato Alencar

Posted on

Why Should You Learn OCaml?

When I decided to learn OCaml, one of the first talks I watched was Yaron Minsky's 'Why OCaml?'. Where he presents some of the reasons why Jane Street have choose OCaml as the language they would use to build almost everything within the company. What is interesting, is that OCaml was the language I've choose for very specific and well though reasons that kept being true with time and experience with the language.

Functional programming

A few years ago I've been focusing a lot on functional programming, which I believe to be the best way of writing software and its counterparts. Although one can use functional programming elements in languages like Python and Ruby, the idioms are still designed towards a more Object Oriented approach, which makes such idioms a lot of times almost impossible to understand, mostly when used in conjunction.

Python and Clojure

Python was the first language I could say I mastered in depth, back in 2012, since then I have a certain love for simplicity, and a lot of that simplicity comes from its influences from Lisp.

Clojure being functional and with its syntax inherited from Lisp, easily caught my attention, I spent a lot of time focused on the idioms of the language. But essentially, there was something missing, like a good type system (types and Clojure are a very deep rabbit hole) and the dependency on the JVM (what's actually good on the domains Clojure is mostly present as a language). Moreover, the tooling it's reasonably complicated, there's a lack of documentation and the community tends to be very consulting and enterprise focused.

Haskell

I tried Haskell, but although it's an interesting language, there's a few things to consider. The whole tooling tends to be heavyweight, for example the Docker image has 700 MB (it was more than 1 GB when I first tried). Moreover, there's a certain overhead on doing more practical things, something that is easier in Clojure for instance with state, using just an atom.

Enters OCaml

OCaml makes things a little bit different, the syntax closer to ML with the possibility to run side effects without much complexity helps who's learning, also making experimentation and a more progressive design easier.

It isn't necessary to understand the IO monad to make a simple Hello World, but it is possible to run everything behind a monad if that's the interest of the programmer.

main :: IO ()
main = putStrLn "Hello world"
Enter fullscreen mode Exit fullscreen mode
let () = print_endline "Hello world"
Enter fullscreen mode Exit fullscreen mode

print_endline is a direct call, there's no extra abstractions and little to no overhead to understand a single line of code.

If I have the OCaml tooling locally installed, I can even compile a .ml file with a simple ocamlc call without the need of creating a project from scratch just to make little experiments like in Clojure. Babashka solves this problem, but adds an extra burden to newcomers.

(Good) Types

I lot of people don't like types, and I particularly think that TypeScript creates some hardships if you're just trying to build something very simple, besides having a type system I particularly think it's not very pleasant, structural subtyping creates some error messages that are very hard to read and understand.

Moreover, there's another problem: having to always declare types if you want to compiler to verify what you want.

In Python for instance, the following function has to use Any for n and for the return type:

def fib(n):
    if n < 2: return n
    return fib(n - 1) + fib(n - 2)
# def fib(n: Any) -> Any
Enter fullscreen mode Exit fullscreen mode

But the following OCaml code has its types inferred with no further problems:

let rec fib n =
    if n < 2 then n
    else fib (n - 1) + (fib - 2)
(* val fib : int -> int = <fun> *)
Enter fullscreen mode Exit fullscreen mode

Errors become obvious

Functional languages in general have this property of letting that code that probably shouldn't be done that way it was more obvious. With a good type system that becomes even more obvious.

For example, serializers should be pure functions, and shouldn't make queries to the database. For obvious performance and maintenance reasons, also rarely would someone expect that a serializer would be making a database query.

In OCaml, for production systems, we usually use Lwt, which is basically a promise system for asynchronous IO and concurrency. But the Lwt.t type is a monad, and monads have this property of "infecting" everything it touches. Let's suppose you serializer follows the following signature, where it converts a given type t into a string to be sent to another service (probably JSON).

type t

val serialize : t -> string
Enter fullscreen mode Exit fullscreen mode

If you need to convert some information to a string, this is the signature your serializer should have. However, if you need to make a database query inside the serializer, for whatever reason, the function is forced to have another signature:

val serialize : t -> string Lwt.t
Enter fullscreen mode Exit fullscreen mode

Which makes obvious that serialize is doing something that it shouldn't.

Performance

Another important factor is performance. OCaml has very predictable performance, if you come to explore a little on how the compiler works, and how your code ends up in the final Assembly on a given architecture. You'll end up realizing that you are able to predict beforehand with a certain ease how your code it's actually executed.

The OCaml compiler is very well known for emitting very efficient code, which is very good for a language that offers the level of abstraction OCaml does.

Fibonacci is classical example, look how efficient is the final emitted code:

let rec fib n a b =
  if n < 2 then b
  else fib (n - 1) b (a + b)
Enter fullscreen mode Exit fullscreen mode

OCaml uses the least significant bit to differentiate between integers and pointers and make unboxed integer operations. So the code is little different than the expected:

camlExample__fib_268:
        subq    $8, %rsp
.L101:
        cmpq    (%r14), %r15
        jbe     .L102
.L103:
        cmpq    $5, %rax
        jge     .L100
        movq    %rdi, %rax
        addq    $8, %rsp
        ret
.L100:
        leaq    -1(%rbx,%rdi), %rsi
        addq    $-2, %rax
        movq    %rdi, %rbx
        movq    %rsi, %rdi
        jmp     .L101
.L102:
        call    caml_call_gc@PLT
.L104:
        jmp     .L103
Enter fullscreen mode Exit fullscreen mode

Obverse a few details:

  • All the operation is done directly in the registers.
  • Recursions are converted into a loop in a efficient manner.
  • Registers are used are the same as the ones from the calling convention, System V in this case.

Ecosystem

The ecosystem is very accessible if know English, the community Discourse instance is quite crowded and the people involved in the libraries and the language almost always answer people on the forum. Which is very cool, since you have a direct perspective from the people that work on that instead of only a direct user of the language of library.

Where to begin?

Top comments (0)