<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jérôme Parent-Lévesque</title>
    <description>The latest articles on DEV Community by Jérôme Parent-Lévesque (@jeromepl).</description>
    <link>https://dev.to/jeromepl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F647949%2F82811ac7-d139-4e04-95c1-ebc363023a0d.jpeg</url>
      <title>DEV Community: Jérôme Parent-Lévesque</title>
      <link>https://dev.to/jeromepl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jeromepl"/>
    <language>en</language>
    <item>
      <title>How to safely rename STI models in Rails</title>
      <dc:creator>Jérôme Parent-Lévesque</dc:creator>
      <pubDate>Fri, 18 Aug 2023 18:39:32 +0000</pubDate>
      <link>https://dev.to/potloc/how-to-safely-rename-sti-models-in-rails-25lf</link>
      <guid>https://dev.to/potloc/how-to-safely-rename-sti-models-in-rails-25lf</guid>
      <description>&lt;p&gt;In Rails, Single Table Inheritance (STI) models store  their full model name (including any module namespaces) in a &lt;code&gt;type&lt;/code&gt; column. This column is used by ActiveRecord to determine which model to instantiate when loading a record from the database. This means that renaming such models isn't as easy as just changing the class name; it must also involve a data migration to update the values stored as &lt;code&gt;type&lt;/code&gt;. However, how can we safely perform this in a live, production environment?&lt;/p&gt;




&lt;p&gt;This is a challenge that we recently ran into at Potloc while working on modularization of our codebase. This involved namespacing all of our models under &lt;a href="https://github.com/rubyatscale/packs-rails"&gt;packs&lt;/a&gt;, which meant that STI models's &lt;code&gt;type&lt;/code&gt; values also had to be updated.&lt;/p&gt;

&lt;p&gt;Shopify Engineering posted last year &lt;a href="https://shopify.engineering/changing-polymorphic-type-rails"&gt;a blog post&lt;/a&gt; about this same issue (albeit for Polymorphic models) in which they suggest to change entirely the nature of what is stored as &lt;code&gt;type&lt;/code&gt; in the database. However, they mention that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Our solution adds complexity. It’s probably not worth it for most use cases&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And this was indeed how we felt for our use case. We wanted to perform this in a way that would have &lt;strong&gt;no impact&lt;/strong&gt; on the way Rails works, and all while having zero downtime.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Solution
&lt;/h1&gt;

&lt;p&gt;Let's jump right in to the final solution for those who don't need all the details and just want a quick step-by-step guide!&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In a first deployment;

&lt;ul&gt;
&lt;li&gt;Rename the model to whatever you need&lt;/li&gt;
&lt;li&gt;Create, using the old model name, a new model &lt;em&gt;that inherits from the renamed model&lt;/em&gt; but that is otherwise empty&lt;/li&gt;
&lt;li&gt;Remove all uses of the old model in the codebase&lt;/li&gt;
&lt;li&gt;Make sure that everywhere the &lt;code&gt;type&lt;/code&gt; name was being used (whether as a raw string or through &lt;a href="https://api.rubyonrails.org/v7.0.1/classes/ActiveRecord/Inheritance/ClassMethods.html#method-i-sti_name"&gt;&lt;code&gt;#sti_name&lt;/code&gt;&lt;/a&gt;), both the new and old type name are now supported&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Migrate the data in the &lt;code&gt;type&lt;/code&gt; column of all database records to reflect the new model name&lt;/li&gt;
&lt;li&gt;In a final deployment, remove the deprecated classes and old &lt;code&gt;type&lt;/code&gt; names used in the codebase&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: Renaming the model
&lt;/h2&gt;

&lt;p&gt;To help navigating through these steps, let's use a simple example:&lt;br&gt;
Your team is currently modularizing the codebase and wants to create a new pack for their aerospace 🚀 division. You are therefore tasked to move an STI model named &lt;code&gt;Rocket&lt;/code&gt; (say this model is under a base &lt;code&gt;Vehicle&lt;/code&gt; model and &lt;code&gt;vehicles&lt;/code&gt; database table) into a new namespace: &lt;code&gt;Aerospace::Rocket&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can start by renaming the model directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# models/aerospace/rocket.rb&lt;/span&gt;
&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;Aerospace&lt;/span&gt;
  &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Rocket&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Vehicle&lt;/span&gt;
    &lt;span class="c1"&gt;# ...&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, here comes the neat trick: We will create a sub-type of &lt;code&gt;Aerospace::Rocket&lt;/code&gt; using the old model name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# models/rocket.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Rocket&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Aerospace&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that this model is completely empty. In fact, we shouldn't use it &lt;em&gt;anywhere&lt;/em&gt; in the codebase (except for its &lt;code&gt;#sti_name&lt;/code&gt;, we'll come back to that later).&lt;/p&gt;

&lt;p&gt;This is not by accident. It turns out that ActiveRecord, under the hood, will use the &lt;code&gt;sti_name&lt;/code&gt; of the current model, &lt;strong&gt;as well as the &lt;code&gt;sti_name&lt;/code&gt; of any child models&lt;/strong&gt; when querying records!&lt;br&gt;
This means that by making the old model name inherit from the new one, we get for free the following behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;Aerospace&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; SELECT * FROM vehicles WHERE type IN ('Aerospace::Rocket', 'Rocket');&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will therefore pave the way for us to then run a data migration that changes all &lt;code&gt;Rocket&lt;/code&gt; types stored in the database to &lt;code&gt;Aerospace::Rocket&lt;/code&gt; without breaking anything! 🎉&lt;br&gt;
But before we do that, we have to take care of a couple more cases.&lt;/p&gt;

&lt;p&gt;First, we want all new records created to use the new type name. This simply means replacing all uses of &lt;code&gt;Rocket&lt;/code&gt; by &lt;code&gt;Aerospace::Rocket&lt;/code&gt; in the codebase.&lt;/p&gt;

&lt;p&gt;Second, if this model's &lt;code&gt;#sti_name&lt;/code&gt; or its raw string ("Rocket") were used anywhere (for example in active record queries) we now have to make sure to support both the new and the old names.&lt;br&gt;
In a typical ActiveRecord query, this might look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From:&lt;/span&gt;
&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vehicles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;type: &lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sti_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# To:&lt;/span&gt;
&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vehicles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;type: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="no"&gt;Aerospace&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sti_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sti_name&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# Or, better yet:&lt;/span&gt;
&lt;span class="no"&gt;Aerospace&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;fleets: &lt;/span&gt;&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, there might be other instances in your code where you might be using the &lt;code&gt;#sti_name&lt;/code&gt; in a different way. You'll need to individually take a look at each of these. For example, since at Potloc we are using GraphQL and have some &lt;code&gt;Enum&lt;/code&gt; types defined for STI models, we had to make sure that both possible &lt;code&gt;type&lt;/code&gt; values would coerce to the same enum value that is sent back from the API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: The data migration
&lt;/h2&gt;

&lt;p&gt;That was the hard part! After step 1 is deployed, the rest is pretty much just business-as-usual when working in a continuous deployment environment.&lt;/p&gt;

&lt;p&gt;In this step, we need to rename all old type names stored in the database to the new one. We can achieve this with a data migration (a good guide for this is the &lt;a href="https://github.com/ankane/strong_migrations#backfilling-data"&gt;strong-migrations gem readme&lt;/a&gt;).&lt;br&gt;
Note that this step may vary depending on your team's choice of how to run data migrations, but no matter the approach the following command (or equivalent) needs to be run in the production environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;Vehicle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;type: &lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sti_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;type: &lt;/span&gt;&lt;span class="no"&gt;Aerospace&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Rocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sti_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Cleanup
&lt;/h2&gt;

&lt;p&gt;We should now be at a point where no records in the database are using the old &lt;code&gt;sti_name&lt;/code&gt; anymore and any newly created records are all stored using the new name as &lt;code&gt;type&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We can therefore cleanup everything!&lt;/p&gt;

&lt;p&gt;First, we can remove the old &lt;code&gt;Rocket&lt;/code&gt; model (the one that was  empty and inherited from &lt;code&gt;Aerospace::Rocket&lt;/code&gt;).&lt;br&gt;
And finally, we can remove any special logic we added in Step 1 to support both &lt;code&gt;Rocket.sti_name&lt;/code&gt; and &lt;code&gt;Aerospace::Rocket.sti_name&lt;/code&gt; to now only support the latter.&lt;/p&gt;

&lt;p&gt;And that's it! Migration complete! 🔥&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It took a few steps, but by leveraging Rails' mechanism that fetches database records matching any of a model's children &lt;code&gt;#sti_name&lt;/code&gt;s, we were able to rename our &lt;code&gt;Rocket&lt;/code&gt; model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;without any downtime, and;&lt;/li&gt;
&lt;li&gt;without any changes to Rails' handling of STI models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, although this blog post didn't cover it, a similar process can also be used for renaming models used in Polymorphic associations. This might be the subject of a future article.&lt;/p&gt;

&lt;p&gt;Hopefully this guide can help you to easily rename STI models, especially when it comes to modularization of your large Rails monoliths (something we can strongly recommend after a few months of trying &lt;a href="https://github.com/rubyatscale"&gt;&lt;code&gt;packs-rails&lt;/code&gt;&lt;/a&gt; internally)!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Interested in what we do at Potloc? Come join us! &lt;a href="https://jobs.lever.co/Potloc?team=Engineering"&gt;We are hiring&lt;/a&gt; 🚀&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
    </item>
    <item>
      <title>Automatic "Ready for Review" Github Action</title>
      <dc:creator>Jérôme Parent-Lévesque</dc:creator>
      <pubDate>Fri, 01 Apr 2022 18:40:47 +0000</pubDate>
      <link>https://dev.to/potloc/automatic-ready-for-review-github-action-5eb6</link>
      <guid>https://dev.to/potloc/automatic-ready-for-review-github-action-5eb6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;TLDR&lt;/em&gt;: We wanted a GitHub Action to automatically assign reviewers and mark a draft pull request as "Ready for review" after our test suite passes. The final code can be found in &lt;a href="https://gist.github.com/jeromepl/02e70f3ea4a4e8103da6f96f14eb213c" rel="noopener noreferrer"&gt;this gist here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At Potloc, our continuous integration process involves, among other things, a GitHub workflow running on each &lt;code&gt;push&lt;/code&gt; that tests the code against our full test suite. This check must pass for a pull request to be merged.&lt;/p&gt;

&lt;p&gt;Our test suite has gotten to a size where it is difficult to run on a personal computer in a reasonable amount of time, hence our developers usually rely on this GitHub workflow to run the full test suite.&lt;/p&gt;

&lt;p&gt;The process looks something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Push code for a new feature&lt;/li&gt;
&lt;li&gt;Create a new Pull Request in "Draft" mode&lt;/li&gt;
&lt;li&gt;Wait for all the tests to pass&lt;/li&gt;
&lt;li&gt;Mark the Pull Request as "Ready for review" and assign reviewers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Note that we consider it a good practice to wait until tests pass before assigning reviewers in order to prevent notifying them only to realize that some more changes are necessary.&lt;/p&gt;

&lt;p&gt;In practice, we have an in-house tool to help us automate most of these tasks through the &lt;a href="https://cli.github.com/" rel="noopener noreferrer"&gt;GitHub CLI&lt;/a&gt;, but for a long time we didn't have a way to automatically mark a pull request as "Ready for review" when the all tests passed, meaning we had to wait and periodically check the status of each of our PR.&lt;/p&gt;

&lt;p&gt;Inspired by Artur Dryomov's excellent post on &lt;a href="https://arturdryomov.dev/posts/auto-github-pull-requests/" rel="noopener noreferrer"&gt;Autonomous GitHub Pull Requests&lt;/a&gt;, we set out to create a GitHub Action to help us automate this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution:
&lt;/h2&gt;

&lt;p&gt;At the moment of creating the draft pull request, we want to be able to specify what to do in the event that all tests pass.&lt;/p&gt;

&lt;p&gt;To achieve this, we will use a tag named &lt;code&gt;autoready&lt;/code&gt; that we can put on our pull requests to signify that this PR should be automatically marked as "Ready for review" when all tests pass.&lt;/p&gt;

&lt;p&gt;In addition, we want to be able to automatically assign reviewers when that happens. For that, we will be using a specific comment format that looks like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;autoready-reviewers: reviewer1,reviewer2,organization/team1&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our workflow will automatically detect comments like this and assign each of the listed individual or team reviewers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GitHub Workflow configuration
&lt;/h3&gt;

&lt;p&gt;Our workflow should run after each run of our &lt;code&gt;Test&lt;/code&gt; workflow and use its output status to determine whether or not to mark the pull request as "Ready for review".&lt;br&gt;
&lt;code&gt;.github/workflows/ready_for_review.yml&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;

&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ready For Review&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;workflows&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Test"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;branches-ignore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;completed&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mark_as_ready_for_review&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;self-hosted&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ github.event.workflow_run.conclusion == 'success' }}&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout Code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Mark as Ready for Review&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash .github/workflows/mark_as_ready_for_review.sh "${{ secrets.ACCESS_TOKEN }}" "${{ join(github.event.workflow_run.pull_requests.*.number) }}"&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This will run our custom script &lt;code&gt;mark_as_ready_for_review.sh&lt;/code&gt; after each &lt;strong&gt;successful&lt;/strong&gt; run of the &lt;code&gt;Test&lt;/code&gt; workflow.&lt;/p&gt;

&lt;p&gt;Some noteworthy points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We need the &lt;code&gt;Checkout Code&lt;/code&gt; action to get the latest version of this &lt;code&gt;mark_as_ready_for_review.sh&lt;/code&gt; script.&lt;/li&gt;
&lt;li&gt;Our script takes a couple of arguments as input:

&lt;ol&gt;
&lt;li&gt;A &lt;a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token" rel="noopener noreferrer"&gt;GitHub access token&lt;/a&gt; of the "user" on behalf of whom we will be performing these automatic actions. In our case, we have a dedicated bot account for this. We store this value in a &lt;a href="https://docs.github.com/en/actions/security-guides/encrypted-secrets" rel="noopener noreferrer"&gt;GitHub secret&lt;/a&gt; &lt;code&gt;secrets.ACCESS_TOKEN&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A comma-separated list of all pull request IDs associated with this workflow run. Since a workflow run is attached to a particular commit hash, it is possible that multiple PRs have that same commit hash as &lt;code&gt;HEAD&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Bash Script
&lt;/h3&gt;

&lt;p&gt;Here is the script dissected and explained (scroll to the bottom for the full script):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-eou&lt;/span&gt; pipefail &lt;span class="c"&gt;# Make sure we get useful error messages on failure&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Our inputs and constants:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;PR_NUMBERS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;2&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"autoready"&lt;/span&gt; &lt;span class="c"&gt;# the name of the 'label' on the PR used to detect whether or not this script should run&lt;/span&gt;
&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-repository"&lt;/span&gt; &lt;span class="c"&gt;# the name of your repository on GitHub&lt;/span&gt;
&lt;span class="nv"&gt;ORGANIZATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"potloc"&lt;/span&gt; &lt;span class="c"&gt;# the name of your GitHub organization or user to which the repository belongs&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Then, we want to repeat the whole thing for as many pull requests as have been passed as input:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Split the numbers string (comma-delimited)&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;pr_number &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$PR_NUMBERS&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s2"&gt;","&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Fetch the labels from the pull request. We will also need the Node ID of the PR to use GitHub's GraphQL API in a later step, so we also grab this at the same time.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Get the node_id (and labels) from the PR number&lt;/span&gt;
&lt;span class="c"&gt;# - https://docs.github.com/en/graphql/guides/using-global-node-ids&lt;/span&gt;
&lt;span class="c"&gt;# - https://docs.github.com/en/rest/reference/pulls#get-a-pull-request&lt;/span&gt;
&lt;span class="nv"&gt;out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--fail&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--silent&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--show-error&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/vnd.github.v3+json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: token &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--request&lt;/span&gt; &lt;span class="s2"&gt;"GET"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.github.com/repos/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ORGANIZATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/pulls/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pr_number&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
      &lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;node_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.node_id'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$out&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;contains_label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="s2"&gt;"any(.labels[].name == &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;; .)"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$out&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;comments_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".comments_url"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$out&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Check if the PR contains the label we want&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$contains_label&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"true"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
  &lt;span class="c"&gt;# Continued below&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Note that we use &lt;a href="https://stedolan.github.io/jq/" rel="noopener noreferrer"&gt;&lt;code&gt;jq&lt;/code&gt;&lt;/a&gt; to simplify parsing of the JSON body returned by the GitHub API. This needs to be installed on the workers that will run this Workflow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the label exists on the PR, then we can mark is as "Ready for review". This API only exists in GitHub's &lt;a href="https://docs.github.com/en/graphql" rel="noopener noreferrer"&gt;GraphQL API&lt;/a&gt;, hence the different request. This is where we make use of the previously-retrieved &lt;code&gt;node_id&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Mark the PR as ready for review&lt;/span&gt;
curl &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--fail&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--silent&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--show-error&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: token &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--request&lt;/span&gt; &lt;span class="s2"&gt;"POST"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s2"&gt;"{ &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;query&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;mutation { markPullRequestReadyForReview(input: { pullRequestId: &lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;node_id&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\\\"&lt;/span&gt;&lt;span class="s2"&gt; }) { pullRequest { id } } }&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; }"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://api.github.com/graphql


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Delete the label to prevent running this script for this PR:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Remove the label&lt;/span&gt;
curl &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--request&lt;/span&gt; &lt;span class="s2"&gt;"DELETE"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/vnd.github.v3+json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: token &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"https://api.github.com/repos/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ORGANIZATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/issues/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pr_number&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/labels/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Finally, we want to find which reviewers to assign to this PR. To do this, we fetch all comments on the PR and use a regex to find a comment matching our &lt;code&gt;autoready-reviewers:&lt;/code&gt; format we defined:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Get the comments on the PR&lt;/span&gt;
&lt;span class="nv"&gt;comments_out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--fail&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--silent&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--show-error&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/vnd.github.v3+json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: token &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--request&lt;/span&gt; &lt;span class="s2"&gt;"GET"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
                &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="nv"&gt;$comments_url&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Look for a comment matching the 'autoready-reviewers: ' pattern&lt;/span&gt;
&lt;span class="c"&gt;# If found, assign the mentionned reviewers to review this PR&lt;/span&gt;
jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".[].body"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$comments_out&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="nb"&gt;read &lt;/span&gt;comment&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$comment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ autoready-reviewers:[[:space:]]&lt;span class="o"&gt;([&lt;/span&gt;a-zA-Z0-9,&lt;span class="se"&gt;\-\/&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;+&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nv"&gt;all_reviewers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASH_REMATCH&lt;/span&gt;&lt;span class="p"&gt;[1]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="c"&gt;# Get the first matching group of the regex (the comma-separated list of reviewers)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Using this list of reviewers, we differentiate between teams (e.g. &lt;code&gt;potloc/devs&lt;/code&gt;) and individuals to assign by looking for the &lt;code&gt;/&lt;/code&gt; character:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;# Split the reviewers between teams and individuals&lt;/span&gt;
&lt;span class="nv"&gt;reviewers_array&lt;/span&gt;&lt;span class="o"&gt;=()&lt;/span&gt;
&lt;span class="nv"&gt;team_reviewers_array&lt;/span&gt;&lt;span class="o"&gt;=()&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;reviewer &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$all_reviewers&lt;/span&gt; | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="s2"&gt;","&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ &lt;span class="o"&gt;[&lt;/span&gt;a-zA-Z0-9,&lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;+&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;a-zA-Z0-9,&lt;span class="se"&gt;\-&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;+ &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="c"&gt;# In the case of a team reviewer, only take the part of the username after the '/':&lt;/span&gt;
    &lt;span class="nv"&gt;slug_array&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;//\// &lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nv"&gt;team_slug&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;slug_array&lt;/span&gt;&lt;span class="p"&gt;[1]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
    team_reviewers_array+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$team_slug&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;else
    &lt;/span&gt;reviewers_array+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$reviewer&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;fi
done&lt;/span&gt;

&lt;span class="c"&gt;# Join the array elements into a single comma-separated string:&lt;/span&gt;
&lt;span class="nv"&gt;reviewers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;, &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;reviewers_array&lt;/span&gt;&lt;span class="p"&gt;[*]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;team_reviewers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;, &lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;team_reviewers_array&lt;/span&gt;&lt;span class="p"&gt;[*]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The very last step is to make the API call to assign these individual and teams as reviewers to the PR:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="c"&gt;# Assign reviewers&lt;/span&gt;&lt;br&gt;
curl &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--fail&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--silent&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--show-error&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /dev/null &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Accept: application/vnd.github.v3+json"&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: token &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--request&lt;/span&gt; &lt;span class="s2"&gt;"POST"&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; &lt;span class="s2"&gt;"&lt;a href="https://api.github.com/repos/" rel="noopener noreferrer"&gt;https://api.github.com/repos/&lt;/a&gt;&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ORGANIZATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;REPO&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/pulls/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;pr_number&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/requested_reviewers"&lt;/span&gt; &lt;span class="se"&gt;&amp;lt;/span&amp;gt;&lt;br&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;reviewers&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:[&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;reviewers&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;], &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;team_reviewers&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:[&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;team_reviewers&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]}"&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Conclusion&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;And that is it! Now, to use this tool we can put the &lt;code&gt;autoready&lt;/code&gt; label on a draft pull request and write a comment in the form &lt;code&gt;autoready-reviewers: reviewer1,reviewer2,organization/team1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In practice, at Potloc, we have a little helper in-house tool do these steps for us using the &lt;a href="https://cli.github.com/" rel="noopener noreferrer"&gt;GitHub CLI&lt;/a&gt; and &lt;a href="https://github.com/piotrmurach/tty-prompt" rel="noopener noreferrer"&gt;&lt;code&gt;tty-prompt&lt;/code&gt;&lt;/a&gt; to&lt;br&gt;
ease the selection of reviewers/teams and the formatting of this comment.&lt;/p&gt;

&lt;p&gt;And this is what it looks like on GitHub's interface!&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36rnqut08bzusrjv7tzw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36rnqut08bzusrjv7tzw.png" alt="GitHub interface flow"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Interested in what we do at Potloc? Come join us! &lt;a href="https://jobs.lever.co/Potloc?team=Product%20and%20Dev" rel="noopener noreferrer"&gt;We are hiring&lt;/a&gt;&lt;/em&gt; 🚀&lt;/p&gt;

&lt;p&gt;Full code:&lt;br&gt;
&lt;a href="https://gist.github.com/jeromepl/02e70f3ea4a4e8103da6f96f14eb213c" rel="noopener noreferrer"&gt;https://gist.github.com/jeromepl/02e70f3ea4a4e8103da6f96f14eb213c&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>github</category>
    </item>
    <item>
      <title>Optimally Taking Out Extra Survey Respondents</title>
      <dc:creator>Jérôme Parent-Lévesque</dc:creator>
      <pubDate>Tue, 05 Oct 2021 17:54:34 +0000</pubDate>
      <link>https://dev.to/potloc/optimally-taking-out-extra-survey-respondents-c30</link>
      <guid>https://dev.to/potloc/optimally-taking-out-extra-survey-respondents-c30</guid>
      <description>&lt;p&gt;Sometimes when analysing the results of a survey, one needs to remove some respondents from their sample. This is something we do fairly commonly at &lt;a href="https://www.potloc.com/"&gt;Potloc&lt;/a&gt; in order to obtain a more representative sample of the target population in our surveys. In other words, we use this as a way of performing &lt;em&gt;stratified sampling&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We use a system of &lt;em&gt;quotas&lt;/em&gt; to keep track of every agreement on the respondent sample we make with our clients. We have three types of quotas, each corresponding to a different way of assessing whether their target is met or not.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To match targets &lt;em&gt;exactly&lt;/em&gt; (like we want to achieve for stratified sampling) we use one of these quota types - &lt;strong&gt;the &lt;code&gt;strict&lt;/code&gt; quota type&lt;/strong&gt;. For example, our clients might want &lt;em&gt;exactly&lt;/em&gt; 50 respondents who work as electricians. No matter whether we have one respondent missing or one more than 50 in this category, our quota is not achieved.&lt;/p&gt;

&lt;p&gt;The second type of quota we use is &lt;strong&gt;the &lt;code&gt;minimum&lt;/code&gt; type&lt;/strong&gt;. This type, as the name suggests, simply indicates that we must have at least as many respondents of a specific category as the target number.&lt;/p&gt;

&lt;p&gt;The third and final type of quota is &lt;strong&gt;the &lt;code&gt;weighted&lt;/code&gt; type&lt;/strong&gt;. As we often use a weighting process to obtain a more representative sample of our population, we make sure to communicate with our clients where survey responses may be weighted. This communication in turn gets converted into quotas of type &lt;code&gt;weighted&lt;/code&gt; which behave similarly to &lt;code&gt;minimum&lt;/code&gt; quotas, but with more flexibility. The targets don't need to be matched exactly and will instead be achieved through an independent weighting process (don't worry, this will be explained in more details in Step 2 below). The "minimum" for this type of quota is (arbitrarily) set to 50% of the target as a way to limit the scale of the weights (this way weights should rarely be more than 2).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ux2a8ojg41hpavjpxah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ux2a8ojg41hpavjpxah.png" alt="quotas" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The image above shows an example of a combination of quotas we could have. Here, we want to end up with a minimum of one respondent who has a cat, exactly one respondent who is a doctor, and we want &lt;strong&gt;after weighting&lt;/strong&gt; to have one &lt;em&gt;effective&lt;/em&gt; respondent whose name contain the letter 'a' and two &lt;em&gt;effective&lt;/em&gt; respondents whose name is shorter than 7 letters.&lt;/p&gt;

&lt;p&gt;Imagine now that we have received the following responses to our survey:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvksz4qd8lg4mtn2bjwlr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvksz4qd8lg4mtn2bjwlr.png" alt="respondents" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this initial state, we have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 respondent who has a cat (Alice)&lt;/li&gt;
&lt;li&gt;2 doctors (Bob and Catherine)&lt;/li&gt;
&lt;li&gt;2 respondents whose name contains the letter 'a' (Alice and Catherine)&lt;/li&gt;
&lt;li&gt;2 respondents whose name is shorter than 7 letters (Alice and Bob)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The minimum quota is therefore satisfied (but Alice cannot be removed without breaking it) and there is one too many doctor. For weighted quotas, we always have at least 50% of the target number of respondents. As we will see later, the weighted quotas will be useful in determining the optimal respondents to take out.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We will keep referring to these quotas and respondents throughout this article to provide a practical example of how we select respondents to take out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We set out to find the &lt;em&gt;optimal&lt;/em&gt; selection of respondents to take out given a set of quotas such as this one. Below is the full step-by-step explanation of the algorithm we use to perform this and an example of how it is applied to this fictional set of quotas and respondents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1
&lt;/h2&gt;

&lt;p&gt;We first identified that by determining which respondents belonged to which quotas, we could split the respondents into 3 different categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Respondents that cannot be taken out&lt;/strong&gt; are respondents that belong to quotas for which the target is not exceeded. For example, our &lt;code&gt;minimum&lt;/code&gt; quota "has a cat" has a target of 1 and only Alice fits into this category. Therefore, Alice cannot be taken out as otherwise the "has a cat" quota would be broken. The same goes for quotas with fewer respondents than the target, for example if the target was to have 2 respondents who own a cat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Respondents that should be taken out as a priority&lt;/strong&gt; are respondents that belong specifically to a &lt;code&gt;strict&lt;/code&gt; quota for which the target is exceeded. Since for this type of quota we want to end up with exactly the target number of respondents, we have to take out respondents belonging to this quota until that target is matched. This group takes priority over the &lt;em&gt;Respondents that cannot be taken out&lt;/em&gt; as we prioritise taking out respondents in exceeded strict quotas until those quotas are satisfied.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Respondents that may be taken out&lt;/strong&gt; are all remaining respondents. These respondents may or may not belong to any quota. If they do, then that quota's target has to be exceeded — otherwise they would be in the &lt;em&gt;Respondents that cannot be taken out&lt;/em&gt; category. Note that these respondents logically cannot belong to any &lt;code&gt;strict&lt;/code&gt; quota since those belonging to this type of quota must fit in one of the first 2 categories.&lt;/p&gt;

&lt;p&gt;Going back to our example, our three respondents would belong into the following groups:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt49f1dpj3ka5vaga5uh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdt49f1dpj3ka5vaga5uh.png" alt="buckets" width="800" height="325"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;The &lt;em&gt;respondents that should be taken out as a priority&lt;/em&gt; group includes both doctors (Bob and Catherine) as there is one too many to satisfy the strict quota. Alice cannot be taken out because she is the only respondent who has a cat. The last group is empty as all respondents already belong to other groups.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2
&lt;/h2&gt;

&lt;p&gt;Now that we have a categorisation of each respondent, we are almost ready to start taking out respondents. However, since our objective is to &lt;em&gt;optimally&lt;/em&gt; take out respondents, we need to compute one more piece of data related to &lt;code&gt;weighted&lt;/code&gt;-type quotas.&lt;/p&gt;

&lt;p&gt;First, we need to define a bit better what we mean by &lt;em&gt;optimally&lt;/em&gt; here.&lt;br&gt;
Our survey results are usually calculated on &lt;em&gt;weighted&lt;/em&gt; data in order to better match the target population demographics. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In other words, as part of our survey workflow, we compute a weight for each respondent and use it as a multiplicative factor to scale the "importance" of each survey response. This is a process called &lt;em&gt;weighting&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The weights can be interpreted as a measure of the quality of our respondents sample by looking at their distribution. The further away from the value of 1 the weights are, the worse the quality. Indeed, a small weight indicates that we have too many similar respondents and a large weight indicates that we are missing respondents with similar characteristics.&lt;br&gt;
For more details on the weighting process I invite you to read &lt;a href="https://dev.to/potloc/generalized-raking-for-survey-weighting-2d1d"&gt;my previous blog post on &lt;em&gt;Generalized Weighting&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thus, when taking out extra respondents, we would like to ensure that our weighting quality will be unaffected. &lt;strong&gt;This is the key to our notion of &lt;em&gt;optimality&lt;/em&gt; — we want not only to satisfy all quotas but also to obtain the highest possible weighting quality as a result.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To achieve this, we compute weights for each respondent based on the targets of the &lt;code&gt;weighted&lt;/code&gt; quotas. Using a &lt;a href="https://dev.to/potloc/generalized-raking-for-survey-weighting-2d1d"&gt;raked weighting algorithm&lt;/a&gt;, we use all weighted quota numbers as "targets" to obtain respondent weights.&lt;/p&gt;

&lt;p&gt;In our example, we obtain the following weights by using the targets from the two weighted quotas:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr3r0k2xyw44ea3e1wbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr3r0k2xyw44ea3e1wbm.png" alt="weights" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that by multiplying the respondent's (numerical) answer the weighted quota targets are matched perfectly! The count of respondents whose name contains the letter 'a' becomes 1 (from 2) as it is now the sum of 0.59 and 0.41. Meanwhile, the count of respondents whose name is shorter than 7 letters stays 2, although the weight of each respondent differs.&lt;/p&gt;

&lt;p&gt;In the next step, we will be removing respondents with the smallest weights first whenever we cannot decide who to take out!&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3
&lt;/h2&gt;

&lt;p&gt;Our respondents now belong to one of the 3 categories presented in Step 1 and each have a weight resulting from the raked weighting computation from Step 2. It is now time to start taking out respondents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The core of the strategy here is to take out respondents &lt;strong&gt;one-by-one&lt;/strong&gt;. After each respondent that is taken out, our quotas and weighting need to be updated, meaning that steps 1 and 2 need to be performed again! We therefore perform this step in a loop where in each iteration we recompute the first 2 steps before choosing and taking out 1 respondent.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This respondent is chosen according to the given priority list:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select a pool of respondents to pick from:

&lt;ul&gt;
&lt;li&gt;If there are any respondents in the &lt;em&gt;Respondents that should be taken out as a priority&lt;/em&gt; category, then limit our selection to this group only&lt;/li&gt;
&lt;li&gt;Otherwise, if there are any &lt;em&gt;Respondents that may be taken out&lt;/em&gt;, select this group&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;From this pool, select the optimal respondent to be taken out:

&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;optimal&lt;/em&gt; respondent corresponds to the respondent with the &lt;strong&gt;smallest&lt;/strong&gt; weight, since a small weight indicates that we have many similar respondents&lt;/li&gt;
&lt;li&gt;In the case of a tie, or if there are no &lt;code&gt;weighted&lt;/code&gt; quotas, we remove the last respondent to have answered the survey. (&lt;em&gt;Note: the statistically correct thing to do here would be to remove a random respondent from the pool, but we choose this approach as it is idempotent — we can re-run the algorithm and the selected respondents will be the same. Additionally, this replicates the behaviour of traditional sampling tools that have quota-stops&lt;/em&gt;)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Take out the selected respondent and repeat from Step 1 until all &lt;code&gt;strict&lt;/code&gt; quotas are met!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how this would play out in our fictional example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We have 2 respondents in the &lt;em&gt;Respondents that should be taken out as a priority&lt;/em&gt; category (Bob and Catherine) and thus only these respondents are taken into consideration&lt;/li&gt;
&lt;li&gt;To determine who to remove from these two respondents, we take a look at their data:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5saox2kwrjoxu683v5hp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5saox2kwrjoxu683v5hp.png" alt="decision-respondents" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Since Catherine has the smallest weight (0.41 vs. 1.41), she is taken out. Intuitively, this makes sense as we had one too many respondent whose name contained the letter 'a' to satisfy the weighted quota without even applying a weighting. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are now left with one respondent whose name contains the letter 'a' and two respondents whose name is shorter than 7 letters, meaning that our final weights will be exactly 1 — the optimal value for weights!&lt;/p&gt;

&lt;p&gt;Additionally, now that respondent "Catherine" has been taken out, all of our quotas are satisfied and we can stop the algorithm here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Using this process, we are able to remove respondents so as to match our quotas as best as we can, while also leading to a better survey weighting. Indeed, since we always remove respondents with the smallest weights, our weighting gets progressively better as the minimum weight gets closer and closer to 1 (the optimal value). This means that the final data presented for this survey — after the weighting step — will be more representative of the target population, a win for both Potloc and our clients!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Interested in what we do at Potloc? Come join us! &lt;a href="https://jobs.lever.co/Potloc?team=Product%20and%20Dev"&gt;We are hiring&lt;/a&gt; 🚀&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Appendix - The case of multiple overlapping &lt;code&gt;strict&lt;/code&gt; quotas
&lt;/h3&gt;

&lt;p&gt;We might sometimes have respondents that correspond to multiple different &lt;code&gt;strict&lt;/code&gt; quotas. In this scenario, it is more complicated to select the optimal respondents to take out as it is not always obvious what is the smallest possible set of respondents that need to be removed in order to satisfy all such quotas. It is, for example, possible to have a respondent (let's call them respondent A) that we take out since it belongs to a &lt;code&gt;strict&lt;/code&gt; quota which have both exceeded their target. However, it is possible to then take out other respondents which also correspond to this quota because they also belong to another &lt;code&gt;strict&lt;/code&gt; quota which was exceeded. This could now break the first quota if it had met its target exactly. In this scenario, we end up with a respondent (respondent A) which can now be reinstated as the strict quota it belonged to is now under its target.&lt;/p&gt;

&lt;p&gt;To alleviate this problem while avoiding a complicated and expensive decision process solutions from the field of operational research, we employ two mechanisms.&lt;/p&gt;

&lt;p&gt;First, we try to more optimally pick which &lt;code&gt;strict&lt;/code&gt; quota respondents to take out first. To do this, we also consider the following factors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the respondent can be disqualified or not (if all of its quotas are exceeded)&lt;/li&gt;
&lt;li&gt;The number of exceeded &lt;code&gt;strict&lt;/code&gt; quotas a respondent is a part of (more = higher priority)&lt;/li&gt;
&lt;li&gt;The total number of &lt;code&gt;strict&lt;/code&gt; quotas a respondent is a part of (fewer = higher priority)

&lt;ul&gt;
&lt;li&gt;This is used to minimise the impact of taking out respondents on other quotas&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;The minimum difference between the current count and the target count of &lt;code&gt;strict&lt;/code&gt; quotas that are exceeding their target (bigger = higher priority, as there is more room to remove respondents)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Second, we add a final step at the end of the process in which we &lt;em&gt;restore&lt;/em&gt; respondents that can be without breaking any quota. This solves the issue highlighted in the example above.&lt;/p&gt;

&lt;p&gt;Using these two approximations we are able to get a result that is close to optimal, for a minimal cost.&lt;/p&gt;

</description>
      <category>survey</category>
      <category>sampling</category>
      <category>statistics</category>
    </item>
    <item>
      <title>Generalized Raking for Survey Weighting</title>
      <dc:creator>Jérôme Parent-Lévesque</dc:creator>
      <pubDate>Tue, 29 Jun 2021 15:28:44 +0000</pubDate>
      <link>https://dev.to/potloc/generalized-raking-for-survey-weighting-2d1d</link>
      <guid>https://dev.to/potloc/generalized-raking-for-survey-weighting-2d1d</guid>
      <description>&lt;p&gt;In the world of surveys, it is very common that our acquired responses need to be weighted in order to achieve a sample that is &lt;em&gt;representative&lt;/em&gt; of some target population. This process of &lt;em&gt;weighting&lt;/em&gt; simply consists of assigning a &lt;em&gt;weight&lt;/em&gt; (a.k.a. &lt;em&gt;factor&lt;/em&gt;) to each respondent, and calculating all survey results as a weighted sum of respondents.&lt;/p&gt;

&lt;p&gt;For example, we might have surveyed 100 male respondents and 150 female respondents but were targeting a male / female ratio of 48% / 52%. In this simple case, we could achieve the target ratio by weighting the male responses by a factor of &lt;code&gt;0.48 / (100 / (100 + 150)) = 1.2&lt;/code&gt; and weighting the female responses by &lt;code&gt;0.52 / (150 / (100 + 150) = 0.867&lt;/code&gt;.&lt;br&gt;
The technical term for this method of computing weights is &lt;em&gt;Post-Stratification&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;However, in a more complex scenario, where we have &lt;em&gt;many&lt;/em&gt; different measurable demographic targets, how can we determine weights for all the survey respondents?&lt;/p&gt;
&lt;h2&gt;
  
  
  Raking
&lt;/h2&gt;

&lt;p&gt;At Potloc, it is very common that our clients desire survey populations matching a lot of such targets. For example, we might have targets looking like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;42% male&lt;/li&gt;
&lt;li&gt;58% female&lt;/li&gt;
&lt;li&gt;20% students&lt;/li&gt;
&lt;li&gt;80% non-students&lt;/li&gt;
&lt;li&gt;15% dog owners&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this setting, weights cannot be calculated using a simple ratio as in the male/female example shown above. Here, we instead need to rely on more involved algorithms, notably a process called &lt;strong&gt;&lt;em&gt;Raking&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Iterative Proportional Fitting
&lt;/h3&gt;

&lt;p&gt;One common approach to solve the problem of finding good weights that will satisfy our demographic targets is &lt;em&gt;Iterative Proportional Fitting&lt;/em&gt;. Typically in the industry, when the term "raking" is used it refers to this algorithm. In this method, weights for each respondents are computed &lt;strong&gt;for a single target at a time&lt;/strong&gt; using Post-Stratification. By iteratively computing this for each target and repeating a few times, the weights end up converging to values that satisfy our targets.&lt;/p&gt;

&lt;p&gt;Great! Problem solved!&lt;/p&gt;

&lt;p&gt;...but what if we could do even better? 🤔&lt;/p&gt;
&lt;h3&gt;
  
  
  Generalized Raking
&lt;/h3&gt;

&lt;p&gt;Beyond satisfying the demographic targets, the most desirable property for the weights is that they should be as close as possible to &lt;em&gt;1&lt;/em&gt;. Indeed, weights that are really large mean that those respondents' responses will count for a lot more than the "average" respondent in our survey results. For example, a respondent with a weight of 10 will count for 10 times more than the average respondent, and 100 times more than a respondent with weight 0.1 . Similarly, small weights mean that some responses will have very little impact on the final results.&lt;/p&gt;

&lt;p&gt;Unfortunately, Iterative Proportional Fitting does nothing to encourage weights to be close to &lt;em&gt;1&lt;/em&gt;, which leads to sub-optimal weights. This is where &lt;strong&gt;&lt;em&gt;Generalized Raking&lt;/em&gt;&lt;/strong&gt;, an algorithm introduced by &lt;em&gt;Deville et al.&lt;/em&gt; (1992), comes into play.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; This is where we get into the more mathematical part of this blog post 🤓. Don't care about this part? No worries! Simply skip to the next section!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The authors of this paper formulated the weighting problem as a constrained optimization method where the objective is that the weights are as close to one as possible and where the constraint is that the targets are matched. Mathematically this looks like this:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;arg min⁡w  G(w)    s.t.  XTw=TG(x)=x(log⁡(x)−1)+1
 \argmin_{w} \; G(w) \;\; \text{s.t.} \; X^T w = T \newline
 G(x) = x(\log(x) - 1) + 1
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop"&gt;&lt;span class="mord mathrm"&gt;arg&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathrm"&gt;min&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;s.t.&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace newline"&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;where 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;G(x)G(x)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the &lt;em&gt;raking&lt;/em&gt; function which encourages weights to be close to &lt;em&gt;1&lt;/em&gt;, 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ww&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the vector of weights, 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;TT&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the vector of targets (in absolute numbers, not percentages) and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;XX&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(numRespondents×numTargets)(\text{numRespondents} \times \text{numTargets})&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numRespondents&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numTargets&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 matrix of responses. The matrix 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;XX&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is binary where cells are filled with a '1' if the respondent belongs to the target category and '0' otherwise.&lt;/p&gt;

&lt;p&gt;In other words, this is saying that we want to optimize the weights to be as close to 1 as possible while satisfying the target constraints. This is achieved by minimizing the function 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;G(x)G(x)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 which looks like this (notice the global minimum at 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;x=1x=1&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
!):&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3nsmhkkonuiknwa4h8dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3nsmhkkonuiknwa4h8dj.png" alt="image" width="457" height="285"&gt;&lt;/a&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  The Generalized Raking Algorithm
&lt;/h2&gt;

&lt;p&gt;While it is possible to solve this optimization problem using general methods such as &lt;a href="https://docs.scipy.org/doc/scipy/reference/optimize.minimize-slsqp.html" rel="noopener noreferrer"&gt;Sequential Least Squares Programming&lt;/a&gt;, the authors of Generalized Raking have devised a more efficient and robust algorithm for this specific problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Initialize variables

&lt;ul&gt;
&lt;li&gt;A 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(numRespondents×1)(\text{numRespondents} \times 1)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numRespondents&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 vector 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ww&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 to ones&lt;/li&gt;
&lt;li&gt;A 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(numTargets×1)(\text{numTargets} \times 1)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numTargets&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 vector 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;λ\lambda&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 to zeros&lt;/li&gt;
&lt;li&gt;A 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(numRespondents×numRespondents)(\text{numRespondents} \times \text{numRespondents})&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numRespondents&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;×&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;numRespondents&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 square matrix 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;HH&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;H&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 to the Identity matrix&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;While the weights have not converged, repeat:

&lt;ol&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;λ=λ+(XTHX)−1(T−XTw)\lambda = \lambda + (X^T H X)^{-1} (T - X^T w)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;H&lt;/span&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;T&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;w=G−1(Xλ)w = G^{-1}(X \lambda)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;H=diag(G−1′(Xλ))H = \text{diag}({G^{-1}}'(X \lambda))&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;H&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;diag&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="mord mathnormal"&gt;λ&lt;/span&gt;&lt;span class="mclose"&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here, 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;G−1(x)G^{-1}(x)&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the inverse of the derivative of the raking function, i.e. 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;exe^x&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;br&gt;

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;G−1′{G^{-1}}'&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;G&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;′&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is its derivative, in this case also 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;exe^x&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
.&lt;/p&gt;

&lt;p&gt;The final value of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;ww&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 corresponds to the weighting factors we are looking for!&lt;/p&gt;
&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;While there are many implementations of this algorithm in R, we were not able to find one in Ruby that could play well with our codebase and be easily maintainable.&lt;br&gt;
We therefore decided to make our own and to share it here for anyone looking for something similar. We started by making an implementation in python with the popular &lt;code&gt;numpy&lt;/code&gt; library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;d_raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;graking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tolerance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1e-6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="c1"&gt;# Based on algo in (Deville et al., 1992) explained in detail on page 37 in
&lt;/span&gt;  &lt;span class="c1"&gt;# https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf
&lt;/span&gt;
  &lt;span class="c1"&gt;# Initialize variables - Step 1
&lt;/span&gt;  &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
  &lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Lagrange multipliers (lambda)
&lt;/span&gt;  &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Our weights (will get progressively updated)
&lt;/span&gt;  &lt;span class="n"&gt;H&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eye&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pinv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;H&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.1
&lt;/span&gt;    &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.2
&lt;/span&gt;    &lt;span class="n"&gt;H&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;diag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;d_raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;L&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.3
&lt;/span&gt;
    &lt;span class="c1"&gt;# Termination condition:
&lt;/span&gt;    &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tolerance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Did not converge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ruby Implementation
&lt;/h3&gt;

&lt;p&gt;After validating the algorithm in python, we then proceeded to replicate it in Ruby. For this, we had to find an equivalent to &lt;code&gt;numpy&lt;/code&gt; which we found in &lt;a href="https://github.com/ruby-numo" rel="noopener noreferrer"&gt;Numo&lt;/a&gt;. Numo is an awesome library for vector and matrix operations, and its &lt;a href="https://github.com/ruby-numo/numo-linalg" rel="noopener noreferrer"&gt;&lt;code&gt;linalg&lt;/code&gt;&lt;/a&gt; sub-library was perfect for us as we needed to compute a matrix pseudo-inverse. This allowed us to translate the code to Ruby almost line by line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"numo/narray"&lt;/span&gt;
&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"numo/linalg"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;NMath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;d_raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;NMath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;graking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;max_steps: &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;tolerance: &lt;/span&gt;&lt;span class="mf"&gt;1e-6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;# Based on algo in (Deville et al., 1992) explained in detail on page 37 in&lt;/span&gt;
  &lt;span class="c1"&gt;# https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf&lt;/span&gt;

  &lt;span class="c1"&gt;# Initialize variables - Step 1&lt;/span&gt;
  &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shape&lt;/span&gt;
  &lt;span class="no"&gt;L&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;DFloat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;DFloat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="no"&gt;H_diag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;DFloat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;

  &lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;L&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="no"&gt;Numo&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pinv&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;H_diag&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.1&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;L&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.2&lt;/span&gt;
    &lt;span class="no"&gt;H_diag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d_raking_inverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;L&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Step 2.3&lt;/span&gt;

    &lt;span class="c1"&gt;# Termination condition:&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="no"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;abs&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tolerance&lt;/span&gt;
      &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Did not converged"&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;
  &lt;span class="n"&gt;w&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may have noticed that the code doesn't quite match exactly the algorithm described above, notably steps 2.1 and 2.3. This is because we have found it to be vastly faster with Numo to store the sparse matrix 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;HH&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;H&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 as a flat vector &lt;code&gt;h_matrix_diagonal&lt;/code&gt; since it only contains values on the diagonal. As a result, the step of taking the product 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;XTHX^T H&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;X&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;T&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;H&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 can be rewritten as &lt;code&gt;X.Transpose * h_matrix_diagonal&lt;/code&gt;, making use of Numo's implicit broadcasting.&lt;/p&gt;

&lt;p&gt;In practice, we optimize this code a bit further by exiting early whenever possible (for example if our loss becomes &lt;code&gt;NaN&lt;/code&gt;) and by allowing to pass as input an initial value for the vector &lt;code&gt;lambdas&lt;/code&gt; if we believe to have an initialisation value better than the default.&lt;/p&gt;

&lt;p&gt;With these few lines of code, we are now able to support complex survey weighting scenarios while having all of our code in our beautiful Ruby monolith 🎉&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Interested in what we do at Potloc? Come join us! &lt;a href="https://jobs.lever.co/Potloc?team=Engineering" rel="noopener noreferrer"&gt;We are hiring&lt;/a&gt; 🚀&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://orca.cf.ac.uk/109727/1/2018daviesgpphd.pdf" rel="noopener noreferrer"&gt;Gareth Davies' excellent PhD Thesis on the subject&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jstor.org/stable/2290793" rel="noopener noreferrer"&gt;Deville et al.'s 1992 &lt;em&gt;Generalized Raking Procedures&lt;/em&gt; paper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ruby</category>
      <category>python</category>
      <category>statistics</category>
    </item>
  </channel>
</rss>
