FOR some time now, I have been maintaining an OCaml library called dream-html. This library is primarily intended to render correctly-constructed HTML, SVG, and MathML. Recently, I added the ability to render well-formed XML markup, which has slightly different rules than HTML. For example, in HTML if you want to write an empty div
tag, you do: <div></div>
. But according to the rules of XML, you could also write <div/>
ie a self-closing tag, however HTML 5 does not have the concept of self-closing tags!
So by having the library take care of these subtle but crucial details, you can just concentrate on writing code that generates the markup. Of course, this has many other advantages too, but in this post I will just look at XML.
It turns out that often we need to serialize some data into XML format, for storage or communication purposes. There are a few packages in the OCaml ecosystem which handle XML, however I think dream-html actually does it surprisingly well now. Let's take a look.
But first, a small clarification about the dream-html package itself. Recently I split it up into two packages:
-
pure-html
has all the functionality needed to write valid HTML and XML -
dream-html
has all of the above, plus some integration with the Dream web framework for ease of use.
As you might imagine, the reason for the split was to allow using the HTML/XML functionality of the package without having to pull in the entire Dream dependency cone, which is quite large, especially if you happen to be using a different dependency cone as well. So pure-html
depends only on the uri
package to help construct correct URI strings.
To start using it, just install: opam install pure-html
And add to your dune
file: (libraries pure-html)
Now, let's look at an example of how you can use it to construct XML. Suppose you have the following type:
type person = {
name : string;
email : string;
}
And you need to serialize it to XML like this:
<person name="Bob" email="bob@info.com"/>
Let's write a serializer using the pure-html
package:
open Pure_html
let person_xml =
let person = std_tag "person"
and name = string_attr "name"
and email = string_attr "email" in
fun { name = n; email = e } -> person [name "%s" n; email "%s" e] []
Let's test it out:
$ utop -require pure-html
# open Pure_html;;
# let pp = pp_xml ~header:true;;
val pp : Format.formatter -> node -> unit = <fun>
# #install_printer pp;;
# type person = {
name : string;
email : string;
};;
type person = { name : string; email : string; }
# let person_xml =
let person = std_tag "person"
and name = string_attr "name"
and email = string_attr "email" in
fun { name = n; email = e } -> person [name "%s" n; email "%s" e] [];;
val person_xml : person -> node = <fun>
# person_xml { name = "Bob"; email = "bob@example.com" };;
- : node =
<?xml version="1.0" encoding="UTF-8"?>
<person
name="Bob"
email="bob@example.com" />
OK cool, so our person
record is serialized in this specific way. But, what if we need to serialize it like:
<person>
<name>Bob</name>
<email>bob@example.com</email>
</person>
After all, this is a common way of formatting records in XML. Let's write the serializer in this style:
let person_xml =
let person = std_tag "person"
and name = std_tag "name"
and email = std_tag "email" in
fun { name = n; email = e } ->
person [] [
name [] [txt "%s" n];
email [] [txt "%s" e];
]
Let's try it out:
# let person_xml =
let person = std_tag "person"
and name = std_tag "name"
and email = std_tag "email" in
fun { name = n; email = e } ->
person [] [
name [] [txt "%s" n];
email [] [txt "%s" e];
];;
val person_xml : person -> node = <fun>
# person_xml { name = "Bob"; email = "bob@example.com" };;
- : node =
<?xml version="1.0" encoding="UTF-8"?>
<person><name>Bob</name><email>bob@example.com</email></person>
Looks good! Let's examine the functions from the pure-html
package used here to achieve this.
std_tag
This function lets us define a custom tag: let person = std_tag "person"
. Note that it's trivial to add a namespace: let person = std_tag "my:person"
.
string_attr
This allows us to define a custom attribute which takes a string payload: let name = string_attr "name"
. Again, easy to add a namespace: let name = string_attr "my:name"
.
There are other attribute definition functions which allow int
payloads and so on. See the package documentation for details.
pp_xml
This allows us to define a printer which renders XML correctly according to its syntactic rules:
let pp = pp_xml ~header:true
The optional header
argument lets us specify whether we want to always print the XML header or not. In many serialization cases, we do.
There's also a similar function which, instead of defining a printer, just converts the constructed node into a string directly: to_xml
.
Conclusion
With these basic functions, it's possible to precisely control how the serialized XML looks. Note that dream-html
and pure-html
support only serialization of data into XML format, and not deserialization ie parsing XML. For that, there are other packages!
Top comments (1)
Just wrote an atom feed generator using your library. Great DX!