This is the summary of my experience participating in Google Summer of Code 2021, where I worked with (and for) the Dhall community to create a new package to translate bidirectionally between Dhall and CSV.
The Problem
Dhall is a configuration language that can save you the tedious work of creating long and repetitive configuration files without getting to be a Turing complete programming language. However, it is no use if you need the data in another format because the tool you are using doesn't understand Dhall. For that reason, Dhall provides some packages to translate bidirectionally between common configuration languages like JSON and YAML.
As CSV is one of the most popular formats to store data, it was thought that it would be a great addition to the Dhall environment to provide a way to translate back and forth between Dhall and CSV. The original idea of providing a Dhall-CSV package belongs to Adrian Sieber who posted issue #1165 on the Dhall language repository. There lies the original discussion on how to go about making the dhall-csv
package.
My contribution
For my GSoC project, I built from scratch the dhall-csv
package on the Dhall Haskell implementation Github Repository. Said package provides two executables, dhall-to-csv
(which converts Dhall files into CSV files) and csv-to-dhall
(which converts CSV files into Dhall files). It also provides Haskell libraries with the functions that translate bidirectionally between Dhall and CSV.
Through the 10 weeks of the coding period, I first created a skeleton for the dhall-csv
package and then built the features incrementally from there. The main features were done in the following order:
Create a skeleton for the
dhall-csv
package, ensuring it can build with the tools used in the rest of the project and integrating said package into the infrastructure of the project (#2214)Create a testing tool using Tasty Golden Tests framework. The main goal of this is to provide only the input and expected output files in a
data/
sub-directory, thus helping the development process (#2225)Basic conversion from Dhall to CSV. Functions to make the translationfrom Dhall to CSV but only supporting a subset of all Dhall types (#2226)
Basic conversion from CSV to Dhall. Functions to translate from CSV to Dhall but translating all CSV fields to Dhall Text. (#2234)
Add command line options to both executable for a better user experience. Provide both excecutables (
dhall-to-csv
andcsv-to-dhall
)with command line options similar to the options provided by other executables in the project (#2242)Add an option to translate CSV files with no header.
(This was discussed in the issue #2241) (#2253)CSV to Dhall conversion providing a Dhall schema with the
expected type of the output Dhall expression. This is how conversion to Dhall was handled at first in other packages
likedhall-json
(#2260)Tests for the
dhall-csv
package. Add the missing types that should be supported bydhall-to-csv
(#2277)Reference documentation for the
dhall-csv
package. Use Haddocks to document all the exported functions. Create meaningful error messages to provide a better user experience (#2279)
Besides these Pull Requests concerning the dhall-csv
package, I also refactored some repeated code in the dhall-yaml
and dhall-json
packages and that was later used by dhall-csv
. I pointed the issue out on #2233
and merged the changes on PR #2235.
I discussed the problems I encountered with my mentors through weekly meetings but also with the community through GitHub Issues.
All the source code can be found here. There is no official release with the dhall-csv
package included, but it will be included on the next release of dhall-haskell
, which will come soon in the following days.
Pending Work
My original proposal was to create Dhall bindings to TOML but that project fell into the great hands of Julio Grillo, while I got to work with
Dhall bindings to CSV. Because of this I won't be comparing
the goals in the proposal vs. the actual work done. I will just list some things that are present on other similar packages like dhall-json
but missing in dhall-csv
and that would be nice to have.
The things that are still missing on the dhall-csv
package are:
Type inference when converting from CSV to Dhall. In my original proposal I thought this would be the easy part and it would be hard to allow providing a schema with the expected type. It turned out to be quite the opposite and I had no time to do it.
Option to translate unions with the name of the alternative
(for non-empty alternatives).Tutorial for the
dhall-csv
package.
I will try to keep working on these things over the following days after the end of the program.
Final thoughts
When I first got the email that my proposal was accepted I couldn't believe it. I had tried to get into Google Summer of Code two times before but didn't found a project that resonated with me and ended up without a complete proposal by the deadline. This year I decided I couldn't let go of this project I really liked and I put my best into the proposal... And it payed off. I was so excited (and scared) when I got accepted, I have never gotten this far before.
The fear eventually wore off, because the Dhall community (and specially my mentors) was just wonderful with me, always giving me advice and taking every opportunity to teach and learn. The excitement only grew bigger each week, to see my project taking form and to get feedback from the community was thrilling.
I knew about Open Source before but I had never worked in an Open Source project. From the inside, I found it particularly fascinating how a bunch of complete strangers can get together with a common goal and bring up a project as complex as Dhall to life. It was awesome to see everyone discussing, teaching and learning with a good attittude all the time.
As for the Haskell experience, most of what I had done with it were university projects and I didn't got too far into the language. By the time I was still scared of Monads and do
s. They were still a bit mystic to me. So you can imagine my confusion when I cloned the dhall-haskell
repository and found pragmas everywhere and some pattern synonyms.
Along the way (and with the help of my mentors) I got way more comfortable with the language. I learned way more than I expected just by reading the other parts of the code and trying to really understand what was happening there. I learned some best practices that make up some really beautiful code. And if I ever failed to understand something, my mentors and the rest of the community was always happy to help.
Furthermore, I learned about documenation and how important it is to document your code correctly. I still have a long way to go in this department but I'm surely way ahead from where I was 10 weeks ago.
I would have wanted to blog more about the project but I have always struggled a bit with writing.
I plan to continue working with the Dhall community in the future. In part because I want to finish the missing features for dhall-csv
as a personal challenge, but also because I think it is a great project and I want to keep adding up to it.
Acknowledgments
To my mentors Gabriella and Simon thanks for all the feedback, the weekly meetings and the good vibes. It was a hard road but thanks to you it was also a really fun one. I wish you both all the best and I hope you know how much I appreciate your guidance through this last 10 weeks.
To my friend German for helping me with my proposal and all the way through the project being another mentor to me.
To my teammate Julio, for sharing the same struggles on both our projects.
To the Dhall community for being so cool.
To Haskell.org for hosting the project and giving Dhall an additional slot for the program, so that Julio and me could be both part of it.
And to Google for doing the GSoC program every year and giving students like me the oportunity to work with (and learn from) great and talented people all across the world.
Thanks to all for this amazing experience.
Top comments (1)
The release that contains
dhall-csv
package is already published and you can find it here!