<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bruno Rodrigues</title>
    <description>The latest articles on DEV Community by Bruno Rodrigues (@brodrigues).</description>
    <link>https://dev.to/brodrigues</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1039375%2F8b2c3406-39f8-4f7e-b6a5-0d58f266913a.jpeg</url>
      <title>DEV Community: Bruno Rodrigues</title>
      <link>https://dev.to/brodrigues</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brodrigues"/>
    <language>en</language>
    <item>
      <title>Reproducible data science with Nix, part 3 -- frictionless {plumber} api deployments with Nix</title>
      <dc:creator>Bruno Rodrigues</dc:creator>
      <pubDate>Wed, 02 Aug 2023 07:52:31 +0000</pubDate>
      <link>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-3-frictionless-plumber-api-deployments-with-nix-1c8d</link>
      <guid>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-3-frictionless-plumber-api-deployments-with-nix-1c8d</guid>
      <description>&lt;p&gt;This is the third post in a series of posts about Nix. Disclaimer: I’m a super
beginner with Nix. So this series of blog posts is more akin to notes that I’m
taking while learning than a super detailed tutorial. So if you’re a Nix expert
and read something stupid in here, that’s normal. This post is going to focus on
R (obviously) but the ideas are applicable to any programming language.&lt;/p&gt;
&lt;p&gt;This blog post is part tutorial on creating an api using the &lt;code&gt;{plumber}&lt;/code&gt; R
package, part an illustration of how Nix makes developing and deploying a
breeze.&lt;/p&gt;

&lt;h2&gt;Part 1: getting it to work locally&lt;/h2&gt;
&lt;p&gt;So in &lt;a href="https://www.brodrigues.co/blog/2023-07-13-nix_for_r_part1/"&gt;part 1&lt;/a&gt; I
explained what Nix was and how you could use it to build reproducible
development environments. In &lt;a href="https://www.brodrigues.co/blog/2023-07-19-nix_for_r_part2/"&gt;part
2&lt;/a&gt; I talked about
running a &lt;code&gt;{targets}&lt;/code&gt; pipeline in a reproducible environment set up with Nix,
and in this blog post I’ll talk about how I made an api using {plumber} and how
Nix made going from my development environment to the production environment (on
Digital Ocean) the simplest ever. Originally I wanted to focus on interactive
work using Nix, but that’ll be very likely for part 4, maybe even part 5 (yes, I
really have a lot to write about).&lt;/p&gt;
&lt;p&gt;Let me just first explain what &lt;code&gt;{plumber}&lt;/code&gt; is before continuing. I already
talked about &lt;code&gt;{plumber}&lt;/code&gt;
&lt;a href="https://www.brodrigues.co/blog/2021-06-04-own_knit_server/"&gt;here&lt;/a&gt;, but in
summary, &lt;code&gt;{plumber}&lt;/code&gt; allows you to build an api. What is an api? Essentially a
service that you can call in different ways and which returns something to you.
For example, you could send a Word document to this api and get back the same
document converted in PDF. Or you could send some English text and get back a
translation. Or you could send some data and get a prediction from a machine
learning model. It doesn’t matter: what’s important is that apis completely
abstract the programming language that is being used to compute whatever should
be computed. With &lt;code&gt;{plumber}&lt;/code&gt;, you can create such services using R. This is
pretty awesome, because it means that whatever it is you can make with R, you
could build a service around it and make it available to anyone. Of course you
need a server that actually has R installed and that gets and processes the
requests it receives, and this is where the problems start. And by problems I
mean THE single biggest problem that you have to deal with whenever you develop
something on your computer, and then have to make it work somewhere else:
deployment. If you’ve had to deal with deployments you might not understand why
it’s so hard. I certainly didn’t really get it until I’ve wanted to deploy my
first Shiny app, many moons ago. And this is especially true whenever you don’t
want to use any “off the shelf” services like &lt;em&gt;shinyapps.io&lt;/em&gt;. In the &lt;a href="https://www.brodrigues.co/blog/2021-06-04-own_knit_server/"&gt;blog post
I mentioned above&lt;/a&gt;,
I used Docker to deploy the api. But Docker, while an amazing tool, is also
quite heavy to deal with. Nix offers an alternative to Docker which I think you
should know and think about. Let me try to convince you.&lt;/p&gt;
&lt;p&gt;So let’s make a little &lt;code&gt;{plumber}&lt;/code&gt; api and deploy that in the cloud. For this, I’m
using Digital Ocean, but any other service that allows you to spin a virtual
machine (VM) with Ubuntu on it will do. If you don’t have a Digital Ocean
account, you can use my &lt;a href="https://m.do.co/c/b68adc727710"&gt;referral link&lt;/a&gt; to get
200$ in credit for 60 days, more than enough to experiment. A VM serving a
&lt;code&gt;{plumber}&lt;/code&gt; api needs at least 1 gig of RAM, and the cheapest one with 1 gig of
ram is 6$ a month (if you spend 25$ of that credit, I’ll get 25$ too, so don’t
hesitate to experiment, you’ll be doing me a solid as well).&lt;/p&gt;
&lt;p&gt;I won’t explain what my api does, this doesn’t really matter for this blog post.
But I’ll have to explain it in a future blog post, because it’s related to a
package I’m working on, called &lt;a href="https://github.com/b-rodrigues/rix"&gt;{rix}&lt;/a&gt; which
I’m writing to ease the process of building reproducible environments for R
using Nix. So for this blog post, let’s make something very simple: let’s take
the classic machine learning task of predicting survival of the passengers of
the Titanic (which was not that long ago in the news again…) and make a
service out of it.&lt;/p&gt;
&lt;p&gt;What’s going to happen is this: users will make a request to the api giving some
basic info about themselves: a simple ML model (I’ll go with logistic regression
and call it “machine learning” just to make the statisticians reading this
seethe lmao), the machine learning model is going to use this to compute a
prediction and the result will be returned to the user. Now to answer a question
that comes up often when I explain this stuff: &lt;em&gt;why not use Shiny? Users can
enter their data and get a prediction and there’s a nice UI and everything?!&lt;/em&gt;.
Well yes, but it depends on what it is you actually want to do. An api is useful
mostly in situations where you need that request to be made by another machine
and then that machine will do something else with that prediction it got back.
It could be as simple as showing it in a nice interface, or maybe the machine
that made the request will then use that prediction and insert it somewhere for
archiving for example. So think of it this way: use an api when machines need to
interact with other machines, a Shiny app for when humans need to interact with
a machine.&lt;/p&gt;
&lt;p&gt;Ok so first, because I’m using Nix, I’ll create an environment that will contain
everything I need to build this api. I’m doing that in the most simple way
possible, simply by specifying an R version and the packages I need inside a
file called &lt;code&gt;default.nix&lt;/code&gt;. Writing this file if you’re not familiar with Nix can
be daunting, so I’ve developed a package, called &lt;code&gt;{rix}&lt;/code&gt; to write these files
for you. Calling this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rix::rix(r_ver = "4.2.2",
         r_pkgs = c("plumber", "tidymodels"),
         other_pkgs = NULL,
         git_pkgs = NULL,
         ide = "other",
         path = "titanic_api/", # you might need to create this folder
         overwrite = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;generates this file for me:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# This file was generated by the {rix} R package on Sat Jul 29 15:50:41 2023
# It uses nixpkgs' revision 8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8 for reproducibility purposes
# which will install R version 4.2.2
# Report any issues to https://github.com/b-rodrigues/rix
{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz") {} }:

  with pkgs;

  let
  my-r = rWrapper.override {
    packages = with rPackages; [
      plumber tidymodels
    ];
  };
  in
  mkShell {
    buildInputs = [
      my-r
      ];
  }&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(for posterity’s sake: this is using &lt;a href="https://github.com/b-rodrigues/rix/tree/935fb194b38adfb085a5bda9ebe5dc5bb504f2cb"&gt;this version of
{rix}&lt;/a&gt;.
Also, if you want to learn more about &lt;code&gt;{rix}&lt;/code&gt; take a look at its
&lt;a href="https://b-rodrigues.github.io/rix/"&gt;website&lt;/a&gt;. It’s still in very early
development, comments and PR more than welcome!)&lt;/p&gt;
&lt;p&gt;To build my api I’ll have to have &lt;code&gt;{plumber}&lt;/code&gt; installed. I also install the
&lt;code&gt;{tidymodels}&lt;/code&gt; package. I actually don’t need &lt;code&gt;{tidymodels}&lt;/code&gt; for what I’m doing
(base R can fit logistic regressions just fine), but the reason I’m installing
it is to mimic a “real-word example” as closely as possible (a project with some
dependencies).&lt;/p&gt;
&lt;p&gt;When I called &lt;code&gt;rix::rix()&lt;/code&gt; to generate the &lt;code&gt;default.nix&lt;/code&gt; file, I specified that
I wanted R version 4.2.2 (because let’s say that this is the version I need.
It’s also possible to get the current version of R by passing “current” to
&lt;code&gt;r_ver&lt;/code&gt;). You don’t see any reference to this version of R in the &lt;code&gt;default.nix&lt;/code&gt;
file, but this is the version that will get installed because it’s the version
that comes with that particular revision of the &lt;code&gt;nixpkgs&lt;/code&gt; repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;"https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This url downloads that particular revision on &lt;code&gt;nixpkgs&lt;/code&gt; containing R version
4.2.2. &lt;code&gt;{rix}&lt;/code&gt; finds the right revision for you (using &lt;a href="https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&amp;amp;package=r"&gt;this handy
service&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;While &lt;code&gt;{rix}&lt;/code&gt; doesn’t require your system to have Nix installed, if you want to
continue you’ll have to install Nix. To install Nix, I recommend you don’t use
the official installer, even if it’s quite simple to use. Instead, the
&lt;a href="https://zero-to-nix.com/start/install"&gt;Determinate Systems&lt;/a&gt; installer seems
better to me. On Windows, you will need to enable WSL2. An alternative is to run
all of this inside a Docker container (but more on this later if you’re thinking
something along the lines of &lt;em&gt;isn’t the purpose of Nix to not have to use
Docker?&lt;/em&gt; then see you in the conclusion). Once you have Nix up and running, go
inside the &lt;code&gt;titanic_api/&lt;/code&gt; folder (which contains the &lt;code&gt;default.nix&lt;/code&gt; file above)
and run the following command inside a terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-build&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will build the environment according to the instructions in the
&lt;code&gt;default.nix&lt;/code&gt; file. Depending on what you want/need, this can take some time.
Once the environment is done building, you can “enter” into it by typing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now this is where you would use this environment to work on your api. As I
stated above, I’ll discuss interactive work using a Nix environment in a future
blog post. Leave the terminal with this Nix shell open and create an empty text
wile next to &lt;code&gt;default.nix&lt;/code&gt; and call it &lt;code&gt;titanic_api.R&lt;/code&gt; and put this in there
using any text editor of your choice:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#* Would you have survived the Titanic sinking?
#* @param sex Character. "male" or "female"
#* @param age Integer. Your age.
#* @get /prediction
function(sex, age) {

  trained_logreg &amp;lt;- readRDS("trained_logreg.rds")

  dataset &amp;lt;- data.frame(sex = sex, age = as.numeric(age))

  parsnip::predict.model_fit(trained_logreg,
                             new_data = dataset)

}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This script is a &lt;code&gt;{plumber}&lt;/code&gt; api. It’s a simple function that uses an already
&lt;em&gt;trained&lt;/em&gt; logistic regression (lol) by loading it into its scope using the
&lt;code&gt;readRDS()&lt;/code&gt; function. It then returns a prediction. The script that I wrote to
train the model is this one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(parsnip)

titanic_raw &amp;lt;- read.csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")

titanic &amp;lt;- titanic_raw |&amp;gt;
  subset(select = c(Survived,
                    Sex,
                    Age))

names(titanic) &amp;lt;- c("survived", "sex", "age")

titanic$survived = as.factor(titanic$survived)

logreg_spec &amp;lt;- logistic_reg() |&amp;gt;
  set_engine("glm")

trained_logreg &amp;lt;- logreg_spec |&amp;gt;
  fit(survived ~ ., data = titanic)

saveRDS(trained_logreg, "trained_logreg.rds")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you’re familiar with this Titanic prediction task, you will have noticed that
the script above is completely stupid. I only kept two variables to fit the
logistic regression. But the reason I did this is because this blog post is not
about fitting models, but about apis. So bear with me. Anyways, once you’re run
the script above to generate the file &lt;code&gt;trained_logreg.rds&lt;/code&gt; containing the
trained model, you can locally test the api using &lt;code&gt;{plumber}&lt;/code&gt;. Go back to the
terminal that is running your Nix shell, and now type &lt;code&gt;R&lt;/code&gt; to start R in that
session. You can then run your api inside that session using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;plumber::pr("titanic_api.R") |&amp;gt;
  plumber::pr_run(port = "8000")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Open your web browser and visit
&lt;a href="http://localhost:8000/__docs__/"&gt;http://localhost:8000/&lt;strong&gt;docs&lt;/strong&gt;/&lt;/a&gt;
to see the Swagger interface to your api (Swagger is a nice little tool
that makes testing your apis way easier).&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/swagger_plumber.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/swagger_plumber.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using Swagger you can try out your api, click on (1) then on (2). You can enter
some mock data in (3) and (4) and then run the computation by clicking on
“Execute” (5). You’ll see the result in (7). (6) gives you a &lt;code&gt;curl&lt;/code&gt; command to
run exactly this example from a terminal. Congrats, your &lt;code&gt;{plumber}&lt;/code&gt; api is
running on your computer! Now we need to deploy it online and make it available to
the world.&lt;/p&gt;


&lt;h2&gt;Deploying your api&lt;/h2&gt;
&lt;p&gt;So if you have a Digital Ocean account log in (and if you don’t, use my
&lt;a href="https://m.do.co/c/b68adc727710"&gt;referral link&lt;/a&gt; to get 200$ to test things out)
and click on the top-right corner on the “Create” button, and then select “Droplet”
(a fancy name for a VM):&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/digital_ocean_1.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/digital_ocean_1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next screen, select the region closest to you and then select Ubuntu as
the operating system, “Regular” for the CPU options, and then the 4$ (or the 6&lt;span&gt;\(, it doesn't matter at this stage) a month Droplet. We will need to upgrade it immediately after having created it in order to actually build the environment. This is because building the environment requires some more RAM than what the 6\)&lt;/span&gt; option offers, but starting from the cheapest option ensures that we
will then be able to downsize back to it, after the build process is done.&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/digital_ocean_2.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/digital_ocean_2.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next comes how you want to authenticate to your VM. There are two options, one
using an SSH key, another using a password. If you’re already using Git, you can
use the same SSH key. Click on “New SSH Key” and paste the public key in the box
(you should find the key under &lt;code&gt;~/.ssh/id_rsa.pub&lt;/code&gt; if you’re using Linux). If
you’re not using Git and have no idea what SSH keys are, my first piece of
advice is to start using Git and then to generate an SSH key and login using it.
This is much more secure than a password. Finally, click on “Create Droplet”.
This will start building your VM. Once the Droplet is done building, you can
check out its IP address:&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/digital_ocean_3.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/digital_ocean_3.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s immediately resize the Droplet to a larger size. As I said before,
this is only required to build our production environment using Nix. Once
the build is done, we can downsize again to the cheapest Droplet:&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/digital_ocean_4.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/digital_ocean_4.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Choose a Droplet with 2 gigs of RAM to be on the safe side, and also enable the
reserved IP option (this is a static IP that will never change):&lt;/p&gt;

&lt;p&gt;&lt;a href="/img/digital_ocean_5.png" class="article-body-image-wrapper"&gt;&lt;img src="/img/digital_ocean_5.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, turn on your Droplet, it’s time to log in to it using SSH.&lt;/p&gt;
&lt;p&gt;Open a terminal on your computer and connect to your Droplet using SSH (starting
now, &lt;code&gt;user@local_computer&lt;/code&gt; refers to a terminal opened on your computer and
&lt;code&gt;root@droplet&lt;/code&gt; to an active ssh session inside your Droplet):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;user@local_computer &amp;gt; ssh root@IP_ADDRESS_OF_YOUR_DROPLET&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and add a folder that will contain the project’s files:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; mkdir titanic_api&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Great, let’s now copy our files to the Droplet using &lt;code&gt;scp&lt;/code&gt;. Open a terminal on
your computer, and navigate to where the &lt;code&gt;default.nix&lt;/code&gt; file is. If you prefer
doing this graphically, you can use Filezilla. Run the following command to
copy the &lt;code&gt;default.nix&lt;/code&gt; file to the Droplet:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;user@local_computer &amp;gt; scp default.nix root@IP_ADDRESS_OF_YOUR_DROPLET:/root/titanic_api/&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now go back to the terminal that is logged into your Droplet. We now need to
install Nix. For this, follow the instructions from the &lt;a href="https://zero-to-nix.com/start/install"&gt;Determinate
Systems&lt;/a&gt; installer, and run this line in
the Droplet:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pay attention to the final message once the installation is done:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Nix was installed successfully!
To get started using Nix, open a new shell or run `. /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh`&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So run &lt;code&gt;. /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh&lt;/code&gt; to start
the Nix daemon. Ok so now comes the magic of Nix. You can now build the exact
same environment that you used to build the pipeline on your computer in this
Droplet. Simply run &lt;code&gt;nix-build&lt;/code&gt; for the build process to start. I don’t really
know how to describe how easy and awesome this is. You may be thinking &lt;em&gt;well
installing R and a couple of packages is not that hard&lt;/em&gt;, but let me remind you
that we are using a Droplet that is running Ubuntu, which is likely NOT the
operating system that you are running. Maybe you are on Windows, maybe you are
on macOS, or maybe you’re running another Linux distribution. Whatever it is
you’re using, it will be different from that Droplet. Even if you’re running
Ubuntu on your computer, chances are that you’ve changed the CRAN repositories
from the default Ubuntu ones to the Posit ones, or maybe you’re using
&lt;a href="https://github.com/eddelbuettel/r2u"&gt;r2u&lt;/a&gt;. Basically, the chances that you will
have the exact same environment in that Droplet than the one running on your
computer is basically 0. And if you’re already familiar with Docker, I think
that you will admit that this is much, much easier than dockerizing your
&lt;code&gt;{plumber}&lt;/code&gt; api. If you don’t agree, please shoot me an
&lt;a href="mailto:bruno@brodrigues.co"&gt;email&lt;/a&gt; and tell me why, I’m honestly curious. Also,
let me stress again that if you needed to install a package like &lt;code&gt;{xlsx}&lt;/code&gt; that
requires Java to be installed, Nix would install the right version of Java for
you.&lt;/p&gt;
&lt;p&gt;Once the environment is done building, you can downsize your Droplet. Go back to
your Digital Ocean account, select that Droplet and choose “Resize Droplet”, and
go back to the 6$ a month plan.&lt;/p&gt;
&lt;p&gt;SSH back into the Droplet and copy the trained model &lt;code&gt;trained_logreg.rds&lt;/code&gt; and
the api file, &lt;code&gt;titanic_api.R&lt;/code&gt; to the Droplet using &lt;code&gt;scp&lt;/code&gt; or Filezilla. It’s time
to run the api. To do so, the obvious way would be simply to start an R session
and to execute the code to run the api. However, if something happens and the R
session dies, the api won’t restart. Instead, I’m using a CRON job and an
utility called &lt;code&gt;run-one&lt;/code&gt;. This utility, pre-installed in Ubuntu, runs one (1)
script at a time, and ensures that only one instance of said script is running.
So by putting this in a CRON job (CRON is a scheduler, so it executes a script
as often as you specify), &lt;code&gt;run-one&lt;/code&gt; will try to run the script. If it’s still
running, nothing happens, if the script is not running, it runs it.&lt;/p&gt;
&lt;p&gt;So go back to your local computer, and create a new text file, call it
&lt;code&gt;run_api.sh&lt;/code&gt; and write the following text in it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/bin/bash
while true
do
nix-shell /root/titanic_api/default.nix --run "Rscript -e 'plumber::pr_run(plumber::pr(\"/root/titanic_api/titanic_api.R\"), host = \"0.0.0.0\", port=80)'"
 sleep 10
done&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;then copy this to your VM using &lt;code&gt;scp&lt;/code&gt; or Filezilla, to
&lt;code&gt;/root/titanic_api/run_api.sh&lt;/code&gt;. Then SSH back into your Droplet, go to where
the script is using &lt;code&gt;cd&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; cd /root/titanic_api/&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and make the script executable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; chmod +x run_api.sh&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We’re almost done. Now, let’s edit the &lt;code&gt;crontab&lt;/code&gt;, to specify that we want
this script to be executed every hour using &lt;code&gt;run-one&lt;/code&gt; (so if it’s running,
nothing happens, if it died, it gets restarted). To edit the &lt;code&gt;crontab&lt;/code&gt;,
type &lt;code&gt;crontab -e&lt;/code&gt; and select the editor you’re most comfortable with. If
you have no idea, select the first option, &lt;code&gt;nano&lt;/code&gt;. Using your keyboard
keys, navigate all the way down and type:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;*/60 * * * * run-one /root/titanic_api/run_api.sh&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;save the file by typing &lt;code&gt;CTRL-X&lt;/code&gt;, and then type &lt;code&gt;Y&lt;/code&gt; when asked &lt;code&gt;Save modified buffer?&lt;/code&gt;, and then type the &lt;code&gt;ENTER&lt;/code&gt; key when prompted for &lt;code&gt;File name to write&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We are now ready to start the api. Make sure CRON restarts by running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; service cron reload&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then run the script using &lt;code&gt;run-one&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;root@droplet &amp;gt; run-one /root/titanic_api/run_api.sh &amp;amp;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;run-one&lt;/code&gt; will now run the script and will ensure that only one instance of the
script is running (the &lt;code&gt;&amp;amp;&lt;/code&gt; character at the end means “run this in the
background”). If for any reason the process dies, CRON will restart an instance
of the script. We can now call our api using this &lt;code&gt;curl&lt;/code&gt; command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;user@local_computer &amp;gt; curl -X GET "http://IP_ADDRESS_OF_YOUR_DROPLET/prediction?sex=female&amp;amp;age=45" -H "accept: */*"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you don’t have &lt;code&gt;curl&lt;/code&gt; installed, you can use &lt;a href="https://reqbin.com/curl"&gt;this
webservice&lt;/a&gt;. You should see this answer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[{
    ".pred_class": "1"
}]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’ll leave my Droplet running for a few days after I post this, so if you
want you can try it out run this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X GET "http://142.93.164.182/prediction?sex=female&amp;amp;age=45" -H "accept: */*"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The answer is in the JSON format, and can now be ingested by some other script
which can now process it further.&lt;/p&gt;


&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;This was a long blog post. While it is part of my Nix series of blog posts, I
almost didn’t talk about it, and this is actually the neat part. Nix made
something that is usually difficult to solve trivially simple. Without Nix, the
alternative would be to bundle the api with all its dependencies and an R
interpreter using Docker or install everything by hand on the server. But the
issue with Docker is that it’s not necessarily much easier than Nix, and you
still have to make sure building the image is reproducible. So you have to make
sure to use an image that ships with the right version of R and use &lt;code&gt;{renv}&lt;/code&gt; to
restore your packages. If you have system-level dependencies that are required,
you also have to deal with those. Nix takes care of all of this for you, so that
you can focus on all the other aspects of deployment, which take the bulk of the
effort and time.&lt;/p&gt;
&lt;p&gt;In the post I mentioned that you could also run Nix inside a Docker container.
If you are already invested in Docker, Nix is still useful because you can use
base NixOS images (NixOS is a Linux distribution that uses Nix as its package
manager) or you could install Nix inside an Ubuntu image and then benefit from
the reproducibility offered by Nix. Simply add &lt;code&gt;RUN nix-build&lt;/code&gt; to your
Dockerfile, and everything you need gets installed. You can even use Nix to
build Docker images instead of writing a Dockerfile. The possibilities are
endless!&lt;/p&gt;
&lt;p&gt;Now, before you start building apis using R, you may want to read this blog post
&lt;a href="https://matthewrkaye.com/posts/2023-06-29-lessons-learned-from-running-r-in-production/lessons-learned-from-running-r-in-production.html"&gt;here&lt;/a&gt;
as well. I found it quite interesting: it discusses the shortcomings of using
R to build apis like I showed you here, which I think you need to know. If you
have needs like the author of this blog post, then maybe R and &lt;code&gt;{plumber}&lt;/code&gt; is not
the right solution for you.&lt;/p&gt;
&lt;p&gt;Next time, in part 4, I’ll either finally discuss how to do interactive work
using a Nix environment, or I’ll discuss my package, &lt;code&gt;{rix}&lt;/code&gt; in more detail.
We’ll see!&lt;/p&gt;
&lt;p&gt;
Hope you enjoyed! If you found this blog post useful, you might want to follow
me on &lt;a href="https://fosstodon.org/@brodriguesco"&gt;Mastodon&lt;/a&gt; or &lt;a href="https://www.twitter.com/brodriguesco"&gt;twitter&lt;/a&gt; for blog post updates and
&lt;a href="https://www.buymeacoffee.com/brodriguesco"&gt;buy me an espresso&lt;/a&gt; or &lt;a href="https://www.paypal.me/brodriguesco"&gt;paypal.me&lt;/a&gt;, or buy my &lt;a href="https://www.brodrigues.co/about/books/"&gt;ebooks&lt;/a&gt;.
You can also watch my videos on &lt;a href="https://www.youtube.com/c/BrunoRodrigues1988/"&gt;youtube&lt;/a&gt;.
So much content for you to consoom!
&lt;/p&gt;

</description>
      <category>nix</category>
      <category>r</category>
      <category>datascience</category>
      <category>api</category>
    </item>
    <item>
      <title>Reproducible data science with Nix, part 2 -- running {targets} pipelines with Nix</title>
      <dc:creator>Bruno Rodrigues</dc:creator>
      <pubDate>Thu, 20 Jul 2023 09:10:00 +0000</pubDate>
      <link>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-2-running-targets-pipelines-with-nix-g17</link>
      <guid>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-2-running-targets-pipelines-with-nix-g17</guid>
      <description>&lt;p&gt;This is the second post in a series of posts about Nix. Disclaimer: I’m a super
beginner with Nix. So this series of blog posts is more akin to notes that I’m
taking while learning than a super detailed tutorial. So if you’re a Nix expert
and read something stupid in here, that’s normal. This post is going to focus on
R (obviously) but the ideas are applicable to any programming language.&lt;/p&gt;
&lt;p&gt;So in &lt;a href="https://www.brodrigues.co/blog/2023-07-13-nix_for_r_part1/"&gt;part 1&lt;/a&gt; I
explained what Nix was and how you could use it to build reproducible
development environments. Now, let’s go into more details and actually set up
some environments and run a &lt;code&gt;{targets}&lt;/code&gt; pipeline using it.&lt;/p&gt;
&lt;p&gt;Obviously the first thing you should do is install Nix. A lot of what I’m
showing here comes from the &lt;a href="https://nix.dev/tutorials/"&gt;Nix.dev&lt;/a&gt; so if you want
to install Nix, then look at the instructions
&lt;a href="https://nix.dev/tutorials/install-nix"&gt;here&lt;/a&gt;. If you’re using Windows, you’ll
have to have WSL2 installed. If you don’t want to install Nix just yet, you can
also play around with a NixOS Docker image. NixOS is a Linux distribution that
uses the concepts of Nix for managing the whole operating system, and obviously
comes with the Nix package manager installed. But if you’re using Nix inside
Docker you won’t be able to work interactively with graphical applications like
RStudio, due to how Docker works (but more on working interactively with IDEs in
part 3 of this series, which I’m already drafting).&lt;/p&gt;
&lt;p&gt;Assuming you have Nix installed, you should be able to run the following command
in a terminal:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell -p sl&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will launch a Nix shell with the &lt;code&gt;sl&lt;/code&gt; package installed. Because &lt;code&gt;sl&lt;/code&gt; is
not available, it’ll get installed on the fly, and you will get “dropped” into a
Nix shell:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[nix-shell:~]$&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can now run &lt;code&gt;sl&lt;/code&gt; and marvel at what it does (I won’t spoil you). You can quit
the Nix shell by typing &lt;code&gt;exit&lt;/code&gt; and you’ll go back to your usual terminal. If you
try now to run &lt;code&gt;sl&lt;/code&gt; it won’t work (unless you installed on your daily machine).
So if you need to go back to that Nix shell and rerun &lt;code&gt;sl&lt;/code&gt;, simply rerun:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell -p sl&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This time you’ll be dropped into the Nix shell immediately and can run &lt;code&gt;sl&lt;/code&gt;.
So if you need to use R, simply run the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell -p R&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and you’ll be dropped in a Nix shell with R. This version of R will be different
than the one potentially already installed on your system, and it won’t have
access to any R packages that you might have installed. This is because Nix
environment are isolated from the rest of your system (well, not quite, but
again, more on this in part 3). So you’d need to add packages as well (exit the
Nix shell and run this command to add packages):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell -p R rPackages.dplyr rPackages.janitor&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can now start R in that Nix shell and load the &lt;code&gt;{dplyr}&lt;/code&gt; and &lt;code&gt;{janitor}&lt;/code&gt;
packages. You might be wondering how I knew that I needed to type
&lt;code&gt;rPackages.dplyr&lt;/code&gt; to install &lt;code&gt;{dplyr}&lt;/code&gt;. You can look for this information
&lt;a href="https://search.nixos.org/packages"&gt;online&lt;/a&gt;. By the way, if a package uses the
&lt;code&gt;.&lt;/code&gt; character in its name, you should replace that &lt;code&gt;.&lt;/code&gt; character by &lt;code&gt;_&lt;/code&gt; so to
install &lt;code&gt;{data.table}&lt;/code&gt; write &lt;code&gt;rPackages.data_table&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So that’s nice and dandy, but not quite what we want. Instead, what we want is
to be able to declare what we need in terms of packages, dependencies, etc,
inside a file, and have Nix build an environment according to these
specifications which we can then use for our daily needs. To do so, we need to
write a so-called &lt;code&gt;default.nix&lt;/code&gt; file. This is what such a file looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/e11142026e2cef35ea52c9205703823df225c947.tar.gz") {} }:

with pkgs;

let
  my-pkgs = rWrapper.override {
    packages = with rPackages; [dplyr ggplot2 R];
  };
in
mkShell {
  buildInputs = [my-pkgs];
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I wont discuss the intricate details of writing such a file just yet, because
it’ll take too much time and I’ll be repeating what you can find on the
&lt;a href="https://nix.dev/"&gt;Nix.dev&lt;/a&gt; website. I’ll give some pointers though. But for
now, let’s assume that we already have such a &lt;code&gt;default.nix&lt;/code&gt; file that we defined
for our project, and see how we can use it to run a &lt;code&gt;{targets}&lt;/code&gt; pipeline. I’ll
explain how I write such files.&lt;/p&gt;

&lt;h2&gt;Running a {targets} pipeline using Nix&lt;/h2&gt;
&lt;p&gt;Let’s say I have this, more complex, &lt;code&gt;default.nix&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz") {} }:

with pkgs;

let
  my-pkgs = rWrapper.override {
    packages = with rPackages; [
      targets
      tarchetypes
      rmarkdown
    (buildRPackage {
      name = "housing";
      src = fetchgit {
        url = "https://github.com/rap4all/housing/";
        branchName = "fusen";
        rev = "1c860959310b80e67c41f7bbdc3e84cef00df18e";
        sha256 = "sha256-s4KGtfKQ7hL0sfDhGb4BpBpspfefBN6hf+XlslqyEn4=";
      };
    propagatedBuildInputs = [
        dplyr
        ggplot2
        janitor
        purrr
        readxl
        rlang
        rvest
        stringr
        tidyr
        ];
      })
    ];
  };
in
mkShell {
  buildInputs = [my-pkgs];
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So the file above defines an environment that contains all the required packages
to run a pipeline that you can find on &lt;a href="https://github.com/b-rodrigues/nix_targets_pipeline"&gt;this Github
repository&lt;/a&gt;. What’s
interesting is that I need to install a package that’s only been released on
Github, the &lt;code&gt;{housing}&lt;/code&gt; package that I wrote for the &lt;a href="https://raps-with-r.dev/packages.html"&gt;purposes of my
book&lt;/a&gt;, and I can do so in that file as
well, using the &lt;code&gt;fetchgit()&lt;/code&gt; function. Nix has many such functions, called
&lt;em&gt;fetchers&lt;/em&gt; that simplify the process of downloading files from the internet, see
&lt;a href="https://ryantm.github.io/nixpkgs/builders/fetchers/"&gt;here&lt;/a&gt;. This function takes
some self-explanatory inputs as arguments, and two other arguments that might
not be that self-explanatory: &lt;code&gt;rev&lt;/code&gt; and &lt;code&gt;sha256&lt;/code&gt;. &lt;code&gt;rev&lt;/code&gt; is actually the commit
on the Github repository. This commit is the one that I want to use for this
particular project. So if I keep working on this package, then building an
environment with this &lt;code&gt;default.nix&lt;/code&gt; will always pull the source code as it was
at that particular commit. &lt;code&gt;sha256&lt;/code&gt; is the hash of the downloaded repository. It
makes sure that the files weren’t tampered with. How did I obtain that? Well,
the simplest way is to set it to the empty string &lt;code&gt;""&lt;/code&gt; and then try to build the
environment. This error message will pop-up:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;error: hash mismatch in fixed-output derivation '/nix/store/449zx4p6x0yijym14q3jslg55kihzw66-housing-1c86095.drv':
         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
            got:    sha256-s4KGtfKQ7hL0sfDhGb4BpBpspfefBN6hf+XlslqyEn4=&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So simply copy the hash from the last line, and rebuild! Then if in the future
something happens to the files, you’ll know. Another interesting input is
&lt;code&gt;propagatedBuildInputs&lt;/code&gt;. These are simply the dependencies of the &lt;code&gt;{housing}&lt;/code&gt;
package. To find them, see the &lt;code&gt;Imports:&lt;/code&gt; section of the
&lt;a href="https://github.com/rap4all/housing/blob/fusen/DESCRIPTION"&gt;DESCRIPTION&lt;/a&gt; file.
There’s also the &lt;code&gt;fetchFromGithub&lt;/code&gt; fetcher that I could have used, but unlike
&lt;code&gt;fetchgit&lt;/code&gt;, it is not possible to specify the branch name we want to use. Since
here I wanted to get the code from the branch called &lt;code&gt;fusen&lt;/code&gt;, I had to use
&lt;code&gt;fetchgit&lt;/code&gt;. The last thing I want to explain is the very first line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz") {} }:&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In particular the url. This url points to a specific release of &lt;code&gt;nixpkgs&lt;/code&gt;, that
ships the required version of R for this project, R version 4.2.2. How did I
find this release of &lt;code&gt;nixpkgs&lt;/code&gt;? There’s a handy service for that
&lt;a href="https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&amp;amp;package=r"&gt;here&lt;/a&gt;.
So using this service, I get the right commit hash for the release that install
R version 4.2.2.&lt;/p&gt;
&lt;p&gt;Ok, but before building the environment defined by this file, let me just say
that I know what you’re thinking. Probably something along the lines of: &lt;em&gt;damn
it Bruno, this looks complicated and why should I care? Let me just use
{renv}!!&lt;/em&gt; and I’m not going to lie, writing the above file from scratch didn’t
take me long in typing, but it took me long in reading. I had to read quite a
lot (look at &lt;a href="https://www.brodrigues.co/blog/2023-07-13-nix_for_r_part1/"&gt;part
1&lt;/a&gt; for some nice
references) before being comfortable enough to write it. But I’ll just say this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;continue reading, because I hope to convince you that Nix is really worth the effort&lt;/li&gt;
&lt;li&gt;I’m working on a package that will help R users generate &lt;code&gt;default.nix&lt;/code&gt; files like the one from above with minimal effort (more on this at the end of the blog post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re following along, instead of typing this file, you can clone
this &lt;a href="https://github.com/b-rodrigues/nix_targets_pipeline"&gt;repository&lt;/a&gt;.
This repository contains the &lt;code&gt;default.nix&lt;/code&gt; file from above, and a &lt;code&gt;{targets}&lt;/code&gt;
pipeline that I will run in that environment.&lt;/p&gt;
&lt;p&gt;Ok, so now let’s build the environment by running &lt;code&gt;nix-build&lt;/code&gt; inside a terminal
in the folder that contains this file. It should take a bit of time, because
many of the packages will need to be built from source. But they &lt;strong&gt;will&lt;/strong&gt; get
built. Then, you can drop into a Nix shell using &lt;code&gt;nix-shell&lt;/code&gt; and then type R,
which will start the R session in that environment. You can then simply run
&lt;code&gt;targets::tar_make()&lt;/code&gt;, and you’ll see the file &lt;code&gt;analyse.html&lt;/code&gt; appear, which is
the output of the &lt;code&gt;{targets}&lt;/code&gt; pipeline.&lt;/p&gt;
&lt;p&gt;Before continuing, let me just make you realize three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we just ran a targets pipeline with all the needed dependencies which include not only package dependencies, but the right version of R (version 4.2.2) as well, and all required system dependencies;&lt;/li&gt;
&lt;li&gt;we did so WITHOUT using any containerization tool like Docker;&lt;/li&gt;
&lt;li&gt;the whole thing is &lt;strong&gt;completely&lt;/strong&gt; reproducible; the exact same packages will forever be installed, regardless of &lt;em&gt;when&lt;/em&gt; we build this environment, because I’m using a particular release of &lt;code&gt;nixpkgs&lt;/code&gt; (8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8) so each piece of software this release of Nix installs is going to stay constant.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And I need to stress &lt;em&gt;completely reproducible&lt;/em&gt;. Because using {renv}+Docker,
while providing a very nice solution, still has some issues. First of all, with
Docker, the underlying operating system (often Ubuntu) evolves and changes
through time. So lower level dependencies might change. And at some point in the
future, that version of Ubuntu will not be supported anymore. So it won’t be
possible to rebuild the image, because it won’t be possible to download any
software into it. So either we build our Docker image and really need to make
sure to keep it forever, or we need to port our pipeline to newer versions of
Ubuntu, without any guarantee that it’s going to work exactly the same. Also, by
defining &lt;code&gt;Dockerfile&lt;/code&gt;s that build upon &lt;code&gt;Dockerfile&lt;/code&gt;s that build upon
&lt;code&gt;Dockerfile&lt;/code&gt;s, it’s difficult to know what is actually installed in a particular
image. This situation can of course be avoided by writing &lt;code&gt;Dockerfile&lt;/code&gt;s in such
a way that it doesn’t rely on any other &lt;code&gt;Dockerfile&lt;/code&gt;, but that’s also a lot of
effort. Now don’t get me wrong: I’m not saying Docker should be canceled. I
still think that it has its place and that its perfectly fine to use it (I’ll
take a project that uses &lt;code&gt;{renv}&lt;/code&gt;+Docker any day over one that doesn’t!). But
you should be aware of alternative ways of running pipelines in a reproducible
way, and Nix is such a way.&lt;/p&gt;
&lt;p&gt;Going back to our pipeline, we could also run the pipeline with this command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell /path/to/default.nix --run "Rscript -e 'setwd(\"/path/to\");targets::tar_make()'"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;but it’s a bit of a mouthful. What you could do instead is running the pipeline
each time you drop into the nix shell by adding a so-called &lt;code&gt;shellHook&lt;/code&gt;. For
this, we need to change the &lt;code&gt;default.nix&lt;/code&gt; file again. Add these lines in the
&lt;code&gt;mkShell&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;...
mkShell {
  buildInputs = [my-pkgs];
  shellHook = ''
     Rscript -e "targets::tar_make()"
  '';
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, each time you drop into the Nix shell in the folder containing that
&lt;code&gt;default.nix&lt;/code&gt; file, &lt;code&gt;targets::tar_make()&lt;/code&gt; get automatically executed. You can
then inspect the results.&lt;/p&gt;
&lt;p&gt;In the next blog post, I’ll show how we can use that environment with IDEs like
RStudio, VS Code and Emacs to work interactively. But first, let me quickly talk
about a package I’ve been working on to ease the process of writing
&lt;code&gt;default.nix&lt;/code&gt; files.&lt;/p&gt;


&lt;h2&gt;Rix: Reproducible Environments with Nix&lt;/h2&gt;
&lt;p&gt;I wrote a very early, experimental package called &lt;code&gt;{rix}&lt;/code&gt; which will help write
these &lt;code&gt;default.nix&lt;/code&gt; files for us. &lt;code&gt;{rix}&lt;/code&gt; is an R package that hopefully will
make R users want to try out Nix for their development purposes. It aims to
mimic the workflow of &lt;code&gt;{renv}&lt;/code&gt;, or to be more exact, the workflow of what Python
users do when starting a new project. Usually what they do is create a
completely fresh environment using &lt;code&gt;pyenv&lt;/code&gt; (or another similar tool). Using
&lt;code&gt;pyenv&lt;/code&gt;, Python developers can install a per project version of Python and
Python packages, but unlike Nix, won’t install system-level dependencies as
well.&lt;/p&gt;
&lt;p&gt;If you want to install &lt;code&gt;{rix}&lt;/code&gt;, run the following line in an R session:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github("b-rodrigues/rix")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can then using the &lt;code&gt;rix()&lt;/code&gt; function to create a &lt;code&gt;default.nix&lt;/code&gt; file like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rix::rix(r_ver = "current",
         pkgs = c("dplyr", "janitor"),
         ide = "rstudio",
         path = ".")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will create a &lt;code&gt;default.nix&lt;/code&gt; file that Nix can use to build an environment
that includes the current versions of R, &lt;code&gt;{dplyr}&lt;/code&gt; and &lt;code&gt;{janitor}&lt;/code&gt;, and RStudio
as well. Yes you read that right: you need to have a per-project RStudio
installation. The reason is that RStudio modifies environment variables and so
your “locally” installed RStudio would not find the R version installed with
Nix. This is not the case with other IDEs like VS Code or Emacs. If you
want to have an environment with another version of R, simply run:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rix::rix(r_ver = "4.2.1",
         pkgs = c("dplyr", "janitor"),
         ide = "rstudio",
         path = ".")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and you’ll get an environment with R version 4.2.1. To see which versions are
available, you can run &lt;code&gt;rix::available_r()&lt;/code&gt;. Learn more about &lt;code&gt;{rix}&lt;/code&gt; on its
&lt;a href="https://b-rodrigues.github.io/rix/"&gt;website&lt;/a&gt;. It’s in very early stages, and
doesn’t handle packages that have only been released on Github, yet. And the
interface might change. I’m thinking of making it possible to list the packages
in a yaml file and then have &lt;code&gt;rix()&lt;/code&gt; generate the &lt;code&gt;default.nix&lt;/code&gt; file from the
yaml file. This might be cleaner. There is already something like this called
&lt;a href="https://github.com/luispedro/nixml/tree/main"&gt;Nixml&lt;/a&gt;, so maybe I don’t even
need to rewrite anything!&lt;/p&gt;
&lt;p&gt;But I’ll discuss this is more detail next time, where I’ll explain how you can
use development environments built with Nix using an IDE.&lt;/p&gt;


&lt;h2&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The great &lt;a href="https://nix.dev/tutorials/install-nix"&gt;Nix.dev&lt;/a&gt; tutorials.&lt;/li&gt;
&lt;li&gt;This &lt;a href="https://rgoswami.me/posts/rethinking-r-nix/"&gt;blog post: Statistical Rethinking and Nix&lt;/a&gt; I referenced in part 1 as well, it helped me install my &lt;code&gt;{housing}&lt;/code&gt; package from Github.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/luispedro/nixml/tree/main"&gt;Nixml&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Hope you enjoyed! If you found this blog post useful, you might want to follow
me on &lt;a href="https://fosstodon.org/@brodriguesco"&gt;Mastodon&lt;/a&gt; or &lt;a href="https://www.twitter.com/brodriguesco"&gt;twitter&lt;/a&gt; for blog post updates and
&lt;a href="https://www.buymeacoffee.com/brodriguesco"&gt;buy me an espresso&lt;/a&gt; or &lt;a href="https://www.paypal.me/brodriguesco"&gt;paypal.me&lt;/a&gt;, or buy my &lt;a href="https://www.brodrigues.co/about/books/"&gt;ebooks&lt;/a&gt;.
You can also watch my videos on &lt;a href="https://www.youtube.com/c/BrunoRodrigues1988/"&gt;youtube&lt;/a&gt;.
So much content for you to consoom!
&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>nix</category>
      <category>rstats</category>
    </item>
    <item>
      <title>Reproducible data science with Nix, part 1 -- what is Nix</title>
      <dc:creator>Bruno Rodrigues</dc:creator>
      <pubDate>Thu, 20 Jul 2023 09:08:09 +0000</pubDate>
      <link>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-1-what-is-nix-3gg3</link>
      <guid>https://dev.to/brodrigues/reproducible-data-science-with-nix-part-1-what-is-nix-3gg3</guid>
      <description>&lt;p&gt;This is the first of a (hopefully) series of posts about Nix. Disclaimer: I’m a
super beginner with Nix. So this series of blog posts is more akin to notes that
I’m taking while learning than a super detailed tutorial. So if you’re a Nix
expert and read something stupid in here, that’s normal. This post is going to
focus on R (obviously) but the ideas are applicable to any programming language.&lt;/p&gt;
&lt;p&gt;To ensure that a project is reproducible you need to deal with at least four
things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make sure that the required/correct version of R (or any other language) is installed;&lt;/li&gt;
&lt;li&gt;Make sure that the required versions of packages are installed;&lt;/li&gt;
&lt;li&gt;Make sure that system dependencies are installed (for example, you’d need a working Java installation to install the &lt;code&gt;{rJava}&lt;/code&gt; R package on Linux);&lt;/li&gt;
&lt;li&gt;Make sure that you can install all of this for the hardware you have on hand.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the three first bullet points, the consensus seems to be a mixture of Docker
to deal with system dependencies, &lt;code&gt;{renv}&lt;/code&gt; for the packages (or &lt;code&gt;{groundhog}&lt;/code&gt;,
or a fixed CRAN snapshot like those &lt;a href="https://packagemanager.posit.co/__docs__/user/get-repo-url/#ui-frozen-urls"&gt;Posit
provides&lt;/a&gt;)
and the &lt;a href="https://github.com/r-lib/rig"&gt;R installation manager&lt;/a&gt; to install the
correct version of R (unless you use a Docker image as base that already ships
the required version by default). As for the last point, the only way out is to
be able to compile the software for the target architecture. There’s a lot of
moving pieces, and knowledge that you need to know and I even wrote a whole 522
pages &lt;a href="https://raps-with-r.dev/"&gt;book about all of this&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But it turns out that this is not the only solution. Docker + &lt;code&gt;{renv}&lt;/code&gt; (or some
other way to deal with packages) is likely the most popular way to ensure
reproducibility of your projects, but there are other tools to achieve this. One
such tool is called Nix.&lt;/p&gt;
&lt;p&gt;Nix is a package manager for Linux distributions, macOS and apparently it even
works on Windows if you enable WSL2. What’s a package manager? If you’re not a
Linux user, you may not be aware. Let me explain it this way: in R, if you want
to install a package to provide some functionality not included with a vanilla
installation of R, you’d run this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;install.packages("dplyr")&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It turns out that Linux distributions, like Ubuntu for example, work in a
similar way, but for software that you’d usually install using an installer (at
least on Windows). For example you could install Firefox on Ubuntu using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo apt-get install firefox&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(there’s also graphical interfaces that make this process “more user-friendly”).
In Linux jargon, &lt;code&gt;packages&lt;/code&gt; are simply what normies call software (or I guess
it’s all “apps” these days). These packages get downloaded from so-called
repositories (think of CRAN, the repository of R packages) but for any type of
software that you might need to make your computer work: web browsers, office
suites, multimedia software and so on.&lt;/p&gt;
&lt;p&gt;So Nix is just another package manager that you can use to install software.&lt;/p&gt;
&lt;p&gt;But what interests us is not using Nix to install Firefox, but instead to
install R and the R packages that we require for our analysis (or any other
programming language that we need). But why use Nix instead of the usual ways to
install software on our operating systems?&lt;/p&gt;
&lt;p&gt;The first thing that you should know is that Nix’s repository, &lt;code&gt;nixpkgs&lt;/code&gt;, is
huge. Humongously huge. As I’m writing these lines, &lt;a href="https://search.nixos.org/packages"&gt;there’s more than 80’000
pieces of software available&lt;/a&gt;, and the
&lt;em&gt;entirety of CRAN&lt;/em&gt; is also available through &lt;code&gt;nixpkgs&lt;/code&gt;. So instead of installing
R as you usually do and then use &lt;code&gt;install.packages()&lt;/code&gt; to install packages, you
could use Nix to handle everything. But still, why use Nix at all?&lt;/p&gt;
&lt;p&gt;Nix has an interesting feature: using Nix, it is possible to install software in
(relatively) isolated environments. So using Nix, you can install as many
versions of R and R packages that you need. Suppose that you start working on a
new project. As you start the project, with Nix, you would install a
project-specific version of R and R packages that you would only use for that
particular project. If you switch projects, you’d switch versions of R and R
packages. If you are familiar with &lt;code&gt;{renv}&lt;/code&gt;, you should see that this is exactly
the same thing: the difference is that not only will you have a project-specific
library of R packages, you will also have a project-specific R version. So if
you start a project now, you’d have R version 4.2.3 installed (the latest
version available in &lt;code&gt;nixpkgs&lt;/code&gt; but not the latest version available, more on
this later), with the accompagnying versions of R packages, for as long as the
project lives (which can be a long time). If you start a project next year, then
that project will have its own R, maybe R version 4.4.2 or something like that,
and the set of required R packages that would be current at that time. This is
because Nix always installs the software that you need in separate, (isolated)
environments on your computer. So you can define an environment for one specific
project.&lt;/p&gt;
&lt;p&gt;But Nix even goes even further: not only can you install R and R packages using
Nix (in isolated) project-specific environments, Nix even installs the required
system dependencies. So for example if I need &lt;code&gt;{rJava}&lt;/code&gt;, Nix will make sure to
install the correct version of Java as well, always in that project-specific
environment (so if you already some Java version installed on your system, there
won’t be any interference).&lt;/p&gt;
&lt;p&gt;What’s also pretty awesome, is that you can use a specific version of &lt;code&gt;nixpkgs&lt;/code&gt;
to &lt;em&gt;always&lt;/em&gt; get &lt;em&gt;exactly&lt;/em&gt; the same versions of &lt;strong&gt;all&lt;/strong&gt; the software whenever you
build that environment to run your project’s code. The environment gets defined
in a simple plain-text file, and anyone using that file to build the environment
will get exactly, byte by byte, the same environment as you when you initially
started the project. And this also regardless of the operating system that is
used.&lt;/p&gt;
&lt;p&gt;So let me illustrate this. After &lt;a href="https://nix.dev/tutorials/install-nix"&gt;installing
Nix&lt;/a&gt;, I can define an environment by
writing a file called &lt;code&gt;default.nix&lt;/code&gt; that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ pkgs ? import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/e11142026e2cef35ea52c9205703823df225c947.tar.gz") {} }:

with pkgs;

let
  my-pkgs = rWrapper.override {
    packages = with rPackages; [ dplyr ggplot2 R];
  };
in
mkShell {
  buildInputs = [my-pkgs];
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now this certainly looks complicated! And it is. The entry cost to Nix is quite
high, because, actually, Nix is more than a package manager. It is also a
programming language, and this programming language gets used to configure
environments. I won’t go too much into detail, but you’ll see in the first line
that I’m using a specific version of &lt;code&gt;nixpkgs&lt;/code&gt; that gets downloaded directly
from Github. This means that all the software that I will install with that
specific version of &lt;code&gt;nixpkgs&lt;/code&gt; will always install the same software. This is
what ensures that R and R packages are versioned. Basically, by using a specific
version of &lt;code&gt;nixpkgs&lt;/code&gt;, I pin all the versions of all the software that this
particular version of Nix will &lt;em&gt;ever&lt;/em&gt; install. I then define a variable called
&lt;code&gt;my-pkgs&lt;/code&gt; which lists the packages I want to install (&lt;code&gt;{dplyr}&lt;/code&gt;, &lt;code&gt;{ggplot2}&lt;/code&gt; and
&lt;code&gt;R&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;By the way, this may look like it would take a lot of time to install because,
after all, you need to install R, R packages and underlying system dependencies,
but thankfully there is an online cache of binaries that gets automatically used
by Nix (&lt;a href="https://cache.nixos.org/"&gt;cache.nixos.org&lt;/a&gt;) for fast installations. If
binaries are not available, sources get compiled.&lt;/p&gt;
&lt;p&gt;I can now create an environment with these exact specifications using (in the
directory where &lt;code&gt;default.nix&lt;/code&gt; is):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-build&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or I could use the R version from this environment to run some arbitrary code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nix-shell /home/renv/default.nix --run "Rscript -e 'sessionInfo()'" &amp;gt;&amp;gt; /home/renv/sessionInfo.txt&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(assuming my &lt;code&gt;default.nix&lt;/code&gt; file is available in the &lt;code&gt;/home/renv/&lt;/code&gt; directory).
This would build the environment on the fly and run &lt;code&gt;sessionInfo()&lt;/code&gt; inside of
it. Here are the contents of this &lt;code&gt;sessionInfo.txt&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)

Matrix products: default
BLAS/LAPACK: /nix/store/pbfs53rcnrzgjiaajf7xvwrfqq385ykv-blas-3/lib/libblas.so.3

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.2.3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This looks like any other output of the &lt;code&gt;sessionInfo()&lt;/code&gt; function, but there is
something quite unusual: the &lt;code&gt;BLAS/LAPACK&lt;/code&gt; line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;BLAS/LAPACK: /nix/store/pbfs53rcnrzgjiaajf7xvwrfqq385ykv-blas-3/lib/libblas.so.3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;BLAS is a library that R uses for linear algebra, matrix multiplication and
vector operations. R usually ships with its own version of BLAS and LAPACK, but
it’s also possible to use external ones. Here, we see that the path to the
shared object &lt;code&gt;libblas.so.3&lt;/code&gt; is somewhere in &lt;code&gt;/nix/store/....&lt;/code&gt;. &lt;code&gt;/nix/store/&lt;/code&gt; is
where all the software gets installed. The long chain of seemingly random
characters is a hash, essentially the unique identifier of that particular
version of BLAS. This means that unlike Docker, if you’re using Nix you are also
certain than these types of dependencies, that may have an impact on your
results, also get handled properly, and that the exact same version you used
will keep getting installed in the future. Docker images also evolve, and even
if you use an LTS release of Ubuntu as a base, the underlying system packages
will evolve through time as well. And there will be a point in time where this
release will be abandoned (LTS releases receive 5 years of support), so if you
need to rebuild a Docker images based on an LTS that doesn’t get supported
anymore, you’re out of luck.&lt;/p&gt;
&lt;p&gt;If you don’t want to install Nix just yet on your computer, you should know that
there’s also a complete operating system called NixOS, that uses Nix as its
package manager, and that there are Docker images that use NixOS as a base. So
this means that you could use such an image and then build the environment (that
is 100% completely reproducible) inside and run a container that will always
produce the same output. To see an example of this, check out this &lt;a href="https://github.com/b-rodrigues/nix_experiments/tree/master"&gt;Github
repo&lt;/a&gt;. I’m writing a
Dockerfile as I usually do, but actually I could even use Nix to define the
Docker image for me, it’s that powerful!&lt;/p&gt;
&lt;p&gt;Nix seems like a very powerful tool to me. But there are some “issues”:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As I stated above, the entry cost is quite high, because Nix is not “just a tool”, it’s a complete programming language that can even run pipelines, so you could technically even replace something like &lt;code&gt;{targets}&lt;/code&gt; with it;&lt;/li&gt;
&lt;li&gt;If you need to install specific versions of R packages, that are not pinned to dates, then Nix is not for you. Nix will always create a coherent environment with R and R packages that go together for a particular release of &lt;code&gt;nixpkgs&lt;/code&gt;. If for some reason you need a very old version of &lt;code&gt;{ggplot2}&lt;/code&gt; but a much more recent version of &lt;code&gt;{dplyr}&lt;/code&gt;, using Nix won’t make this any easier than other methods;&lt;/li&gt;
&lt;li&gt;There is no easy way (afaik) to find the version of &lt;code&gt;nixpkgs&lt;/code&gt; that you need to download to find the version of R that you may need; &lt;strong&gt;UPDATE&lt;/strong&gt;: turns out that there is such a &lt;a href="https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&amp;amp;package=r"&gt;simple tool&lt;/a&gt;, thanks to &lt;span&gt;&lt;a class="mentioned-user" href="https://dev.to/shane"&gt;@shane&lt;/a&gt;&lt;/span&gt;&lt;span&gt;@hachyderm.io&lt;/span&gt; for the telling me!&lt;/li&gt;
&lt;li&gt;R packages (and I guess others for other programming languages as well) that are available on the stable channel of &lt;code&gt;nixpkgs&lt;/code&gt; lag a bit behind their counterparts on CRAN. These usually all get updated whenever there’s a new release of R. Currently however, R is at version 4.2.3, but R should be at version 4.3.1 on the stable branch of &lt;code&gt;nixpkgs&lt;/code&gt;. This can sometimes happen due to various reasons (there are actual human beings behind this that volunteer their time and they also have a life). There is however an “unstable” &lt;code&gt;nixpkgs&lt;/code&gt; channel that contains bleeding edge versions of R packages (and R itself) if you really need the latest versions of packages (don’t worry about the “unstable” label, from my understanding this simply means that package have not been thoroughly tested yet, but is still pretty much rock-solid);&lt;/li&gt;
&lt;li&gt;If you need something that is not on CRAN (or Bioconductor) then it’s still possible to use Nix to install these packages, but you’ll have to perform some manual configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I will keep exploring Nix, and this is essentially my todo:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;using my environment that I installed with Nix to work interactively;&lt;/li&gt;
&lt;li&gt;write some tool that lets me specify an R version, a list of packages and it generates a &lt;code&gt;default.nix&lt;/code&gt; file automagically (ideally it should also deal with packages only available on Github);&lt;/li&gt;
&lt;li&gt;????&lt;/li&gt;
&lt;li&gt;Profit!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Resources&lt;/h3&gt;
&lt;p&gt;Here are some of the resources I’ve been using:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nix.dev/tutorials/first-steps/towards-reproducibility-pinning-nixpkgs#pinning-nixpkgs"&gt;nix.dev tutorials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nix-tutorial.gitlabpages.inria.fr/nix-tutorial/installation.html"&gt;INRIA’s Nix tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nixos.org/guides/nix-pills/"&gt;Nix pills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/nix-community/nix-data-science"&gt;Nix for Data Science&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://christitus.com/nixos-explained/"&gt;NixOS explained&lt;/a&gt;: NixOS is an entire Linux distribution that uses Nix as its package manager.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rgoswami.me/posts/nix-r-devtools/"&gt;Blog post: Nix with R and devtools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rgoswami.me/posts/rethinking-r-nix/"&gt;Blog post: Statistical Rethinking and Nix&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lazamar.github.io/download-specific-package-version-with-nix/"&gt;Blog post: Searching and installing old versions of Nix packages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;Thanks&lt;/h3&gt;
&lt;p&gt;Many thanks to &lt;a href="https://github.com/jbedo"&gt;Justin Bedő&lt;/a&gt;, maintainer of the R
package for Nix, for answering all my questions on Nix!&lt;/p&gt;
&lt;p&gt;
Hope you enjoyed! If you found this blog post useful, you might want to follow
me on &lt;a href="https://fosstodon.org/@brodriguesco"&gt;Mastodon&lt;/a&gt; or &lt;a href="https://www.twitter.com/brodriguesco"&gt;twitter&lt;/a&gt; for blog post updates and
&lt;a href="https://www.buymeacoffee.com/brodriguesco"&gt;buy me an espresso&lt;/a&gt; or &lt;a href="https://www.paypal.me/brodriguesco"&gt;paypal.me&lt;/a&gt;, or buy my &lt;a href="https://www.brodrigues.co/about/books/"&gt;ebooks&lt;/a&gt;.
You can also watch my videos on &lt;a href="https://www.youtube.com/c/BrunoRodrigues1988/"&gt;youtube&lt;/a&gt;.
So much content for you to consoom!
&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>nix</category>
      <category>rstats</category>
    </item>
    <item>
      <title>Software engineering techniques that non-programmers who write a lot of code can benefit from — the DRY WIT approach</title>
      <dc:creator>Bruno Rodrigues</dc:creator>
      <pubDate>Tue, 07 Mar 2023 21:06:50 +0000</pubDate>
      <link>https://dev.to/brodrigues/software-engineering-techniques-that-non-programmers-who-write-a-lot-of-code-can-benefit-from-the-dry-wit-approach-1ek3</link>
      <guid>https://dev.to/brodrigues/software-engineering-techniques-that-non-programmers-who-write-a-lot-of-code-can-benefit-from-the-dry-wit-approach-1ek3</guid>
      <description>&lt;p&gt;Data scientists, statisticians, analysts, researchers, and many other
professionals write &lt;em&gt;a lot of code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Not only do they write a lot of code, but they must also read and review a lot
of code as well. They either work in teams and need to review each other’s code,
or need to be able to reproduce results from past projects, be it for peer
review or auditing purposes. And yet, they never, or very rarely, get taught
the tools and techniques that would make the process of writing, collaborating,
reviewing and reproducing projects possible.&lt;/p&gt;

&lt;p&gt;Which is truly unfortunate because software engineers face the same challenges
and solved them decades ago. Software engineers developed a set of project
management techniques and tools that non-programmers who write a lot of code
could benefit from as well.&lt;/p&gt;

&lt;p&gt;These tools and techniques can be used right from the start of a project at a
minimal cost, such that the analysis is well-tested, well-documented,
trustworthy and reproducible &lt;em&gt;by design&lt;/em&gt;. Projects are going to be reproducible
simply because they were engineered, from the start, to be reproducible.&lt;/p&gt;

&lt;p&gt;But all these tools, frameworks and techniques boil down to two acronyms that I
like to keep in my head at all times:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DRY: Don’t Repeat Yourself;&lt;/li&gt;
&lt;li&gt;WIT: Write IT down.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DRY WIT: by systematically avoiding not to repeat yourself and
by writing everything down, projects become well-tested, well-documented,
trustworthy and reproducible by design. Why is that?&lt;/p&gt;

&lt;h2&gt;DRY: Don’t Repeat Yourself&lt;/h2&gt;

&lt;p&gt;Let’s start with DRY: what does it mean not having to repeat oneself? It means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;using functions instead of copy-and-pasting bits of code here and there;&lt;/li&gt;
&lt;li&gt;using literate programming, to avoid having to copy and paste graphs and tables into
word or pdf documents;&lt;/li&gt;
&lt;li&gt;treating code as data and making use of templating.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most widely used programming languages for data science/statistics, Python and R,
both have first-class functions. This means that functions can be manipulated like
any other object. So something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Reduce(`+`, seq(1:100))&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] 5050&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;where the function &lt;code&gt;+&lt;/code&gt;() gets used as an argument of the higher-order &lt;code&gt;Reduce()&lt;/code&gt;
function is absolutely valid (and so is Python’s equivalent &lt;code&gt;reduce&lt;/code&gt; from
&lt;code&gt;functools&lt;/code&gt;) and avoids having to use a for-loop which can lead to other issues.
Generally speaking, the functional programming paradigm lends itself very
naturally to data analysis tasks, and in my opinion data scientists and
statisticians would benefit a lot from adopting this paradigm.&lt;/p&gt;

&lt;p&gt;Literate programming is another tool that needs to be in the toolbox of
any person analysing data. This is because at the end of the day, the results
of an analysis need to be in some form of document. Without literate programming,
this is how you would draft reports:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CFukDBn7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/b-rodrigues/rap4all/blob/master/images/report_draft_loop.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CFukDBn7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/b-rodrigues/rap4all/blob/master/images/report_draft_loop.png%3Fraw%3Dtrue" width="880" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But with literate programming, this is how this loop would look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1QqcuTFT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/b-rodrigues/rap4all/blob/master/images/md_draft_loop.png%3Fraw%3Dtrue" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1QqcuTFT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://github.com/b-rodrigues/rap4all/blob/master/images/md_draft_loop.png%3Fraw%3Dtrue" width="880" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://quarto.org/"&gt;Quarto&lt;/a&gt; is the latest open-source scientific and technical
publishing system that leverages Pandoc and supports R, Python, Julia and
ObservableJs right out of the box.&lt;/p&gt;

&lt;p&gt;Below is a little Quarto Hello World:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;---
output: pdf
---

In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a LaTeX document:



```{r}
data (airquality)
kruskal.test(Ozone ~ Month, data = airquality)
```



which shows that the location parameter of the Ozone
distribution varies significantly from month to month.
Finally we include a boxplot of the data:



```{r, echo = FALSE}
boxplot(Ozone ~ Month, data = airquality)
```


&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Compiling this document results in the following:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RnhGgQUG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/b-rodrigues/rap4all/master/images/hello_rmd.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RnhGgQUG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/b-rodrigues/rap4all/master/images/hello_rmd.PNG" width="826" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

Example from Leisch’s 2002 paper.




&lt;p&gt;Of course, you could use Python code chunks instead of R, you could also compile
this document to Word, or HTML, or anything else really. By combining code and
prose, the process of data analysis gets streamlined and we don’t need to repeat
ourselves copy and pasting images and tables into Word documents.&lt;/p&gt;

&lt;p&gt;Finally, treating code as data is also quite useful. This means that it is
possible to compute on the language itself. This is a more advanced topic, but
definitely worth the effort. As an illustration, consider the following R toy example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;show_and_eval &amp;lt;- function(f, ...){
  f &amp;lt;- deparse(substitute(f))
  dots &amp;lt;- list(...)
  message("Evaluating: ", f, "() with arguments: ", deparse(dots))
  do.call(f, dots)
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Running this function does the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;show_and_eval(sqrt, 2)&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## Evaluating: sqrt() with arguments: list(2)&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] 1.414214&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;show_and_eval(mean, x = c(NA, 1, 2))&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## Evaluating: mean() with arguments: list(x = c(NA, 1, 2))&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] NA&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;show_and_eval(mean, x = c(NA, 1, 2), na.rm = TRUE)&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## Evaluating: mean() with arguments: list(x = c(NA, 1, 2), na.rm = TRUE)&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## [1] 1.5&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is incredibly useful when writing packages (to know more about these
techniques in the R programming language, read the chapter &lt;em&gt;Metaprogramming&lt;/em&gt; from
&lt;a href="https://adv-r.hadley.nz/metaprogramming.html"&gt;Advanced R&lt;/a&gt;).&lt;/p&gt;


&lt;br&gt;


&lt;h2&gt;WIT: Write IT down&lt;/h2&gt;

&lt;p&gt;Now on the WIT bit: &lt;em&gt;write it down&lt;/em&gt;. You’ve just written a function. To see if
it works correctly, you test it in the interactive console. You execute the
test, see that it works, and move on. But wait! What you just did is called a
unit test. Instead of writing that in the console and then never use it ever
again, write it down in a script. Now you’ve got a unit test for that function
that you can execute each time you update that function’s code, and make sure
that it keeps working as expected. There are many unit testing frameworks that
can help you how to write unit tests consistently and run them automatically.&lt;/p&gt;

&lt;p&gt;Documentation: write it down! How does the function work? What are its inputs?
Its outputs? What else should the user know to make it work? Very often,
documentation is but a series of comments in your scripts. That’s already nice,
but using literate programming, you could also turn these comments into proper
documentation. You could use &lt;em&gt;docstrings&lt;/em&gt; in Python or &lt;code&gt;{roxygen2}&lt;/code&gt; style
comments in R.&lt;/p&gt;

&lt;p&gt;Another classic: you correct some data manually in the raw dataset (very often a
&lt;code&gt;.csv&lt;/code&gt; or &lt;code&gt;.xlsx&lt;/code&gt; file). For example, when dealing with data on people, sex is
sometimes “M” or “F”, sometimes “Male” or “Female”, sometimes “1” or “0”. You
spot a couple of inconsistencies and decide to &lt;em&gt;quickly&lt;/em&gt; correct them by hand.
Maybe only 3 men were coded as “Male” so you simply erase the “ale” and go on
with your project. Stop!&lt;/p&gt;

&lt;p&gt;Write it down!&lt;/p&gt;

&lt;p&gt;Write a couple of lines of code that does the replacement for you. Not only will
this leave a trace, it will ensure that when you get an update to that data in
the future you don’t have to remember to have to change it by hand.&lt;/p&gt;

&lt;p&gt;You should aim at completely eliminating any required manual intervention when
building your project. A project that can be fully run by a machine is easier to
debug, its execution can be scheduled and can be iterated over very quickly.&lt;/p&gt;

&lt;p&gt;Something else that you should write down, or rather, let another tool do it for
you: how you collaborate with your teammates. For this, you should be using
Git. Who changed what part of what function when? If the project’s code is
versioned, Git writes it down for you. You want to experiment with a new
feature? Write it down by creating a new branch and going nuts. There’s something
wrong in the code? Write it down as an issue on your versioning platform (usually
Github).&lt;/p&gt;

&lt;p&gt;There are many more topics that us disciples of the data could learn from
software engineers. I’m currently working on a free ebook that you can read
&lt;a href="https://raps-with-r.dev/"&gt;here&lt;/a&gt; that teaches these techniques. If this post
opened your appetite, give the book a go!&lt;/p&gt;




</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>reproducibility</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
