<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Chaney</title>
    <description>The latest articles on DEV Community by Michael Chaney (@mdchaney).</description>
    <link>https://dev.to/mdchaney</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F584888%2Fc883bbc3-b394-4931-9af9-882362fa5857.jpeg</url>
      <title>DEV Community: Michael Chaney</title>
      <link>https://dev.to/mdchaney</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mdchaney"/>
    <language>en</language>
    <item>
      <title>Handling X -&gt; Y -&gt; X Relationships in Rails</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Wed, 28 May 2025 02:18:28 +0000</pubDate>
      <link>https://dev.to/mdchaney/handling-x-y-x-relationships-in-rails-16k0</link>
      <guid>https://dev.to/mdchaney/handling-x-y-x-relationships-in-rails-16k0</guid>
      <description>&lt;p&gt;I'm creating another publishing database, and this time I have to get the relationships between publishers correct.  Up until now, I've been able to cheat a bit as the publishers that I work with tend to be in the film and TV music business and have pretty simple publishing needs.  But it's time to really model this properly.&lt;/p&gt;

&lt;p&gt;Publishers can make deals with other publishers.  There are two deals that we're specifically interested in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Subpublishing&lt;/li&gt;
&lt;li&gt;Administration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's no need to get too far into the specifics of these deals, but in general one publisher is the "assignor" and the other is the "assignee" (or "acquiror").  The agreement itself has information as well: minimally an agreement number (assigned by a performing rights organization), some sort of royalty share percentage, and a territory(s) covered by this agreement.&lt;/p&gt;

&lt;p&gt;So, the basic form of this is Publisher -&amp;gt; SubpublishingAgreement -&amp;gt; Publisher, or Publisher -&amp;gt; AdministrationAgreement -&amp;gt; Publisher.  Data-wise, these agreements are practically the same so I'll simply talk about SubpublishingAgreement.&lt;/p&gt;

&lt;p&gt;In the world of Ruby on Rails, we might use models like these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Publisher&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements_as_assignor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;class_name: &lt;/span&gt;&lt;span class="s2"&gt;"SubpublishingAgreement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;foreign_key: :assignor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;inverse_of: :assignor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;dependent: :destroy&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements_as_assignee&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;class_name: &lt;/span&gt;&lt;span class="s2"&gt;"SubpublishingAgreement"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;foreign_key: :assignee_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;inverse_of: :assignee&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;dependent: :destroy&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SubpublishingAgreement&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:assignor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;class_name: &lt;/span&gt;&lt;span class="s1"&gt;'Publisher'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;foreign_key: :assignor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;inverse_of: :subpublishing_agreements_as_assignor&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:assignee&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;class_name: &lt;/span&gt;&lt;span class="s1"&gt;'Publisher'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;foreign_key: :assignee_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;inverse_of: :subpublishing_agreements_as_assignee&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are a couple of important aspects of this to keep in mind as we program it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;This isn't a general graph where there might be loops.  Every relationship should be part of a tree with the base being the original publisher.&lt;/li&gt;
&lt;li&gt;There can be many such relationships.  It's quite possible for a publisher to have an administrator or set of administrators as well as a few subpublishers, and the subpublishers themselves may have administrators.  Each administrator and subpublisher would have a territory or set of territories that they handle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This picture taken from the Common Works Registration functional specifications gives an idea of how this works in practice:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfr1zxkuw5exdk6wyr8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfr1zxkuw5exdk6wyr8s.png" alt="Publishing Hierarchy" width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can be modeled pretty easily as shown above.  If I want to know what subpublishers a publisher has, for instance, I can simply pull them from the SubpublishingAgreement records or use a "has_many...through" relationship.&lt;/p&gt;

&lt;p&gt;From a RESTful standpoint, we have a problem.  Let's say I'm looking at a publisher record and want to know what subpublishing agreements the publisher is a party to. The issue is that the publisher may be the assignor or the assignee.&lt;/p&gt;

&lt;p&gt;Normally, I'd build a RESTful resource path like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/publishers/1/subpublishing_agreements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But that path doesn't really capture the complete picture.  Are we looking for subpublishing agreements where Publisher 1 is the assignor, assignee, or perhaps either?&lt;/p&gt;

&lt;p&gt;There are a couple of options.  The first is to add the role into the path part for agreements:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/publishers/1/subpublishing_agreements_as_assignor
/publishers/2/subpublishing_agreements_as_assignee
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's readable and makes sense, but it's a bear to add to Rails routing.&lt;/p&gt;

&lt;p&gt;How about this?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/assignors/1/subpublishing_agreements
/assignees/2/subpublishing_agreements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This puts the role into the path up one level.  But, it's also a bear to add to Rails routing.&lt;/p&gt;

&lt;p&gt;Let's talk about that first one.  It's nice that the nomenclature matches that from the model. But, this is a bit ugly in terms of routing.  My goal is to have a basic resource-based route.  But here's the issue: the later part of the path dictates what the earlier part references.  Let me explain that.  In this route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/publishers/1/subpublishing_agreements_as_assignor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can see that Publisher 1 is the assignor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/publishers/2/subpublishing_agreements_as_assignee
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in this route, Publisher 2 is the assignee.  But I can't know that from the first part of the route.&lt;/p&gt;

&lt;p&gt;Ugh.&lt;/p&gt;

&lt;p&gt;Let's go down this path and see where it leads.&lt;/p&gt;

&lt;p&gt;First, how do we make routes for this?  I want for my "publisher_id" to be called "assignor_id" or "assignee_id" in the params.  I'm not proud of this, but it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Nested routes under publishers for role-specific agreements&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="s2"&gt;"/publishers"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="c1"&gt;# Agreements where the publisher is the assignor&lt;/span&gt;
      &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="s2"&gt;":assignor_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;as: :publisher&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;controller: &lt;/span&gt;&lt;span class="s2"&gt;"subpublishing_agreements"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;as: :subpublishing_agreements_as_assignor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;path: :subpublishing_agreements_as_assignor&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="c1"&gt;# Agreements where the publisher is the assignee&lt;/span&gt;
      &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="s2"&gt;":assignee_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;as: :publisher&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;controller: &lt;/span&gt;&lt;span class="s2"&gt;"subpublishing_agreements"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;as: :subpublishing_agreements_as_assignee&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;path: :subpublishing_agreements_as_assignee&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's difficult to really tell what this is doing, but it creates the routes shown above.&lt;/p&gt;

&lt;p&gt;There is one major problem, though.  Let's say I want to put this inside another resource.  I do, because this is a multi-tenanted application and the full path will be something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/libraries/1/publishers/1/subpublishing_agreements_as_assignor/1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(note that I personally have no problem with long paths like this that give details of the entire relationship)&lt;/p&gt;

&lt;p&gt;Normally, we would have a path like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/libraries/1/publishers/1/subpublishing_agreements/1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, the issue here is that "subpublishing_agreements" is ambiguous.  But let's consider it for a minute.  If I go this route, then in the controller I need to handle the fact that the publisher might be the assignor or the assignee:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_publisher&lt;/span&gt;
  &lt;span class="vi"&gt;@publisher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:publisher_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;
  &lt;span class="vi"&gt;@subpublishing_agreements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignor&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignee&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, that's one way to do it, but note that &lt;code&gt;@subpublishing_agreements&lt;/code&gt; is an array in this case, not an ActiveRecord relation.  There's no way to build on it, sort it, whatever you might want to do in the database.  It's possible to do whatever you want in Ruby, but if you try to add &lt;code&gt;.order(:starts_on)&lt;/code&gt; or something like that it'll fail.&lt;/p&gt;

&lt;p&gt;You can make it into a regular relationship like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="vi"&gt;@subpublishing_agreements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"? in (assignor_id, assignee_id)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's possible, but really not normal.&lt;/p&gt;

&lt;p&gt;Related to that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="vi"&gt;@subpublishing_agreements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;assignor_id: &lt;/span&gt;&lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;assignee_id: &lt;/span&gt;&lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Technically, you can turn that into a single "where" and put your "or" inside the SQL snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="vi"&gt;@subpublishing_agreements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"assignor_id = :publisher_id or assignee_id = :publisher_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;publisher_id: &lt;/span&gt;&lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, this works, kind of.  When I create multi-tenanted applications, I always restrict my queries back to the main organization model - &lt;code&gt;library&lt;/code&gt; in this case.  I can still do that, just not in the way that I'd like.  One great thing about this is that I have a relation that I can build on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="vi"&gt;@subpublishing_agreement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@subpublishing_agreements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:subpublishing_agreement_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other issue arises in displaying this data.  Normally, I'd show this list of subpublishing agreements in a simple table, but there would be no need to put the parent datum in this table since it's the same for all and can be shown above.  But in this scenario the parent datum may be in one of two different places.  In that case, I can either create two tables - agreements as assignor and agreements as assignee, sort them based on this type, or at least put both data fields on each row.&lt;/p&gt;

&lt;p&gt;To get back to our way of handling it, we change the routes to show which role the "publisher" in the path is taking on.  This allows us to use some of the Rails tools at our disposal for creating routes, but it also breaks a lot of it.&lt;/p&gt;

&lt;p&gt;A big problem that I face is that I want to be able to have the library at the front of the path.  Each account may have multiple libraries, so that's important.  But with the scopes in the routes as above, putting that code within a &lt;code&gt;resources :libraries do...end&lt;/code&gt; block ends up with paths that we don't want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;publisher_library_subpublishing_agreements_as_assignee
/publishers/:assignee_id/libraries/:library_id/subpublishing_agreements_as_assignee/:id(.:format)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's right - the &lt;code&gt;library&lt;/code&gt; ends up after the scoping.  Argh.&lt;/p&gt;

&lt;p&gt;But, let's explore this, anyway.&lt;/p&gt;

&lt;p&gt;When I create controllers like this, I like to use a helper that I wrote called &lt;code&gt;filled_path&lt;/code&gt;.  It allows me to pretty easily reuse a lot of code across different controller end points and at different paths.&lt;/p&gt;

&lt;p&gt;For instance, just looking at this particular sort of data model as an administrator I might want to view all of the publishers or just the publishers for a particular library.  It's also the case that an ordinary user may access their libraries in a similar manner, but in a path prefixed with "/my".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/admin/publishers/1
/admin/libraries/1/publishers/1
/my/libraries/1/publishers/1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, there's no reason for a regular user to access a publisher outside of their library-specific path.  But, for an administrator, that might be optional.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="ss"&gt;:my&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:libraries&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="ss"&gt;:admin&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:libraries&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows an administrator to access publishers either directly or under the libraries path.  This works out well because Rails path helpers accept an array with the path items.  For instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:my&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only difference in code is the first piece.  There are a few tricks to know here.  If you want to get the "edit" path, for instance, then the action must be prepended to the array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:edit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:my&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you want to see the &lt;code&gt;subpublishing_agreements&lt;/code&gt; for a publisher, you would append the relevant path item:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:my&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My &lt;code&gt;filled_path&lt;/code&gt; helper constructs those arrays, and you can use such an array for any path in Rails, including in links or in form actions.  I'll talk a little more about filled_path later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;form_for&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="ss"&gt;:my&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works, until you break it with scopes.&lt;/p&gt;

&lt;p&gt;The fundamental issue is that scopes will allow you to construct a path, but instead of an object for &lt;code&gt;subpublishing_agreement&lt;/code&gt; you have to use the plain id.  Let me show you a bit of the controller.  Again, this is for administrative access so tying this to a particular library doesn't matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Admin::SubpublishingAgreementsController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Admin&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ApplicationController&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_library&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_assignor_or_assignee&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:figure_out_path_filling&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_collection&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_subpublishing_agreement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only: &lt;/span&gt;&lt;span class="sx"&gt;%i[ show edit update destroy ]&lt;/span&gt;

&lt;span class="o"&gt;...&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;
    &lt;span class="c1"&gt;# This sets @publisher as well as either @assignor or @assignee&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_assignor_or_assignee&lt;/span&gt;
      &lt;span class="vi"&gt;@publisher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignor_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
          &lt;span class="vi"&gt;@assignor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignor_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignee_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
          &lt;span class="vi"&gt;@assignee&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignee_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_collection&lt;/span&gt;
      &lt;span class="vi"&gt;@collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@assignor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
          &lt;span class="vi"&gt;@assignor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignor&lt;/span&gt;
        &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="vi"&gt;@assignee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
          &lt;span class="vi"&gt;@assignee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignee&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;
          &lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_subpublishing_agreement&lt;/span&gt;
      &lt;span class="vi"&gt;@subpublishing_agreement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;figure_out_path_filling&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignor_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
        &lt;span class="vi"&gt;@subpub_agreement_path_piece&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements_as_assignor&lt;/span&gt;
      &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignee_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
        &lt;span class="vi"&gt;@subpub_agreement_path_piece&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements_as_assignee&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt;
        &lt;span class="vi"&gt;@subpub_agreement_path_piece&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sub_pub_path_helper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subpub_agreement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
        &lt;span class="n"&gt;general_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@subpub_agreement_path_piece&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sym&lt;/span&gt;
        &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:new&lt;/span&gt;
          &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="ss"&gt;:new&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;general_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:edit&lt;/span&gt;
          &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="ss"&gt;:edit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;general_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="n"&gt;subpub_agreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:show&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:destroy&lt;/span&gt;
          &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;general_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;id: &lt;/span&gt;&lt;span class="n"&gt;subpub_agreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;:index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:create&lt;/span&gt;
          &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;general_path&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:index&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;
          &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="s2"&gt;"Unknown action &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What a mess.  The real action here is in &lt;code&gt;sub_pub_path_helper&lt;/code&gt;.  Imagine that I have this path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/admin/publishers/1/subpublishing_agreements_as_assignor/2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, the params will have the &lt;code&gt;assignor_id&lt;/code&gt; key.  This means that &lt;code&gt;@publisher&lt;/code&gt; and &lt;code&gt;@assignor&lt;/code&gt; will be set to Publisher 1 and the SubpublishingAgreement will be 2.  &lt;code&gt;@subpub_agreement_path_piece&lt;/code&gt; will be set to &lt;code&gt;:subpublishing_agreements_as_assignor&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, in &lt;code&gt;sub_pub_path_helper&lt;/code&gt; the &lt;code&gt;general_path&lt;/code&gt; will be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subpublishing_agreements_as_assignor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you wish to create the "edit" path for this item, the helper will prepend &lt;code&gt;:edit&lt;/code&gt; to the array, and also pass &lt;code&gt;id: 2&lt;/code&gt; to &lt;code&gt;polymorphic_path&lt;/code&gt;.  In the end, you'll get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/admin/publishers/1/subpublishing_agreements_as_assignor/2/edit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a lot of work to basically add "/edit" to the path.  The form looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;form_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;model: &lt;/span&gt;&lt;span class="n"&gt;subpublishing_agreement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;url: &lt;/span&gt;&lt;span class="n"&gt;sub_pub_path_helper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subpublishing_agreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_record?&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="ss"&gt;:create&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="ss"&gt;:update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subpublishing_agreement&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, we have to decide which URL to provide based on the record status.  That &lt;code&gt;sub_pub_path_helper&lt;/code&gt; ends up everywhere.  Simply put, every URL has to use that helper or include all of that logic.  It's a bit of a mess.  As stated earlier, it's also difficult if I want to include another resource (e.g. "library") at the front of the path.&lt;/p&gt;

&lt;p&gt;Let's try something else.&lt;/p&gt;

&lt;p&gt;Here's a piece of a route file to attempt another approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="ss"&gt;:admin&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:libraries&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;

      &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s1"&gt;'assignors'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;as: :assignors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only: &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;

      &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:publishers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;path: &lt;/span&gt;&lt;span class="s1"&gt;'assignees'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;as: :assignees&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only: &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt; &lt;span class="ss"&gt;:subpublishing_agreements&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has potential.  Just showing the paths for the generic &lt;code&gt;show&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;     admin_library_assignor_subpublishing_agreement GET    /admin/libraries/:library_id/assignors/:assignor_id/subpublishing_agreements/:id(.:format)        admin/subpublishing_agreements#show
     admin_library_assignee_subpublishing_agreement GET    /admin/libraries/:library_id/assignees/:assignee_id/subpublishing_agreements/:id(.:format)        admin/subpublishing_agreements#show
              admin_library_subpublishing_agreement GET    /admin/libraries/:library_id/subpublishing_agreements/:id(.:format)                               admin/subpublishing_agreements#show

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's some potential here.  This allows us to make a path like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/admin/libraries/1/assignors/5/subpublishing_agreements/2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would be Library 1, Publisher 5, and SubpublishingAgreement 2 with the assumption that Publisher 5 would be the assignor for that particular agreement.&lt;/p&gt;

&lt;p&gt;But, how do we specify such a path using helpers?&lt;/p&gt;

&lt;p&gt;It turns out that &lt;code&gt;polymorphic_path&lt;/code&gt; can handle these with a little care.  Take the above as an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;Library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="ss"&gt;:assignor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;SubpublishingAgreement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="ss"&gt;assignor_id: &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not optimal, but it's pretty easy to get it to work.  This is my &lt;code&gt;filled_path&lt;/code&gt; helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;FilledPathHelper&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_path_filling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@path_filling&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_path_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="vi"&gt;@path_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pre&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@path_filling&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;merged_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@path_options&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;merged_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;empty?&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;polymorphic_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;merged_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;normalize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;
      &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="no"&gt;String&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sym&lt;/span&gt;
      &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt;  &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;arg&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;
    &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="no"&gt;Array&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="no"&gt;Symbol&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="no"&gt;Symbol&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
      &lt;span class="n"&gt;final_filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="no"&gt;ArgumentError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Invalid arguments to filled_path"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filled_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;polymorphic_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Normally, you use it as such:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Admin::BooksController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Admin&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ApplicationController&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_library&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;set_path_filling&lt;/span&gt; &lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_collection&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_book&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="ss"&gt;:show&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:edit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:destroy&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

  &lt;span class="nf"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_library&lt;/span&gt;
   &lt;span class="vi"&gt;@library&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:library_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_collection&lt;/span&gt;
    &lt;span class="vi"&gt;@collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;books&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_book&lt;/span&gt;
    &lt;span class="vi"&gt;@book&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is assuming that a Book belongs to a Library.  A typical URL might be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/admin/libraries/2/books/5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To link to a particular book:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Is the equivalent of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# or&lt;/span&gt;
&lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;admin_library_book_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nice thing about &lt;code&gt;filled_path&lt;/code&gt; is that you can use your mostly same code in a different path by simply changing the initial path filling.&lt;/p&gt;

&lt;p&gt;But it requires a little more fun when using it with these paths, as we have to get the &lt;code&gt;assignor_id&lt;/code&gt; or &lt;code&gt;assignee_id&lt;/code&gt; into the path.  And to do that we have to also set some path options.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Admin::SubpublishingAgreementsController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Admin&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ApplicationController&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_library&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_publisher_context&lt;/span&gt;
  &lt;span class="n"&gt;before_action&lt;/span&gt; &lt;span class="ss"&gt;:set_subpublishing_agreement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;only: &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:show&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:edit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:destroy&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

  &lt;span class="nf"&gt;private&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_library&lt;/span&gt;
      &lt;span class="vi"&gt;@library&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:library_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_publisher_context&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignor_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="vi"&gt;@assignor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publishers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignor_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;set_path_filling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:assignor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;set_path_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;assignor_id: &lt;/span&gt;&lt;span class="vi"&gt;@assignor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignee_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="vi"&gt;@assignee&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publishers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:assignee_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;set_path_filling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:admin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vi"&gt;@library&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:assignee&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;set_path_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;assignee_id: &lt;/span&gt;&lt;span class="vi"&gt;@assignee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_subpublishing_agreement&lt;/span&gt;
      &lt;span class="vi"&gt;@subpublishing_agreement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@assignor&lt;/span&gt;
          &lt;span class="vi"&gt;@assignor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;
          &lt;span class="vi"&gt;@assignee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subpublishing_agreements_as_assignee&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this in place, I can use &lt;code&gt;filled_path&lt;/code&gt; to handle all path generation to automatically include the assignor or assignee id.&lt;/p&gt;

&lt;p&gt;This allows me to create paths easily for the subpublishers functionality, and it's pretty easy to dip in from the publishers' "show" page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sx"&gt;%= link_to "Subpublishing Agreements as Assignor", polymorphic_path([:admin, @publisher.library, :assignor, :subpublishing_agreements], :assignor_id =&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sx"&gt;%= link_to "Subpublishing Agreements as Assignee", polymorphic_path([:admin, @publisher.library, :assignee, :subpublishing_agreements], :assignee_id =&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="vi"&gt;@publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sx"&gt;%= link_to "Edit", filled_path(:edit, @publisher) %&amp;gt; |
  &amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="s2"&gt;"Back to publishers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filled_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:publishers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason I go to these lengths is to keep the context.  If I ask for the subpublishing agreements where a certain publisher is the assignor, I can see those agreements and see that the publisher in the path is the assignor.  If I add an agreement, the assignor is already set, so I just have to choose an assignee on the form.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  &amp;lt;div class="row"&amp;gt;
    &amp;lt;div class="col-md-6"&amp;gt;
      &amp;lt;div class="mb-3"&amp;gt;
        &amp;lt;%= form.label :assignor_id, class: "form-label" %&amp;gt;
        &amp;lt;% if @assignor.present? %&amp;gt;
          &amp;lt;%= form.text_field :assignor_id, value: @assignor.name, class: "form-control", disabled: true %&amp;gt;
        &amp;lt;% else -%&amp;gt;
          &amp;lt;%= form.collection_select :assignor_id, @library.publishers.order(:name), :id, :name, { prompt: true }, { class: "form-select" }  %&amp;gt;
        &amp;lt;% end -%&amp;gt;
      &amp;lt;/div&amp;gt;
    &amp;lt;/div&amp;gt;

    &amp;lt;div class="col-md-6"&amp;gt;
      &amp;lt;div class="mb-3"&amp;gt;
        &amp;lt;%= form.label :assignee_id, class: "form-label" %&amp;gt;
        &amp;lt;% if @assignee.present? %&amp;gt;
          &amp;lt;%= form.text_field :assignee_id, value: @assignee.name, class: "form-control", disabled: true %&amp;gt;
        &amp;lt;% else -%&amp;gt;
          &amp;lt;%= form.collection_select :assignee_id, @library.publishers.order(:name), :id, :name, { prompt: true }, { class: "form-select" }  %&amp;gt;
        &amp;lt;% end -%&amp;gt;
      &amp;lt;/div&amp;gt;
    &amp;lt;/div&amp;gt;
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This context allows you to put together user workflows that are intuitive:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose a publisher&lt;/li&gt;
&lt;li&gt;View subpublishing agreements as the assignor&lt;/li&gt;
&lt;li&gt;Create a new agreement&lt;/li&gt;
&lt;li&gt;Automatically get redirected back to subpublishing agreements as assignor&lt;/li&gt;
&lt;li&gt;Return to the publisher's main page&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Wrap Up
&lt;/h2&gt;

&lt;p&gt;This particular pattern isn't very common, but knowing how to put it together can definitely make an easy to maintain application that's also easy for your users.&lt;/p&gt;

&lt;p&gt;Feel free to ask questions in the comments.&lt;/p&gt;

</description>
      <category>rails</category>
      <category>ruby</category>
      <category>database</category>
    </item>
    <item>
      <title>COBOL, Dates, May 20, 1875, and Disinformation</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Wed, 19 Feb 2025 22:33:00 +0000</pubDate>
      <link>https://dev.to/mdchaney/cobol-dates-may-20-1875-and-disinformation-5ggh</link>
      <guid>https://dev.to/mdchaney/cobol-dates-may-20-1875-and-disinformation-5ggh</guid>
      <description>&lt;p&gt;Let me start at the end for those who don't want to read the whole thing.  COBOL does not have a "default date" or any kind of default value for a missing date.  COBOL doesn't even have a "date" data type.  ISO8601 is an interchange format for dates.  "May 20, 1875" has no prominence in that standard or any other date standard.&lt;/p&gt;

&lt;p&gt;That's the short version.  How did we get here?&lt;/p&gt;

&lt;p&gt;Elon Musk mentioned in a press briefing that there were many names in the Social Security database of people who weren't marked as dead but were 150 years old.  Clearly, this is bad data.  At the time, he provided no other information, so speculation abounded on the internet.&lt;/p&gt;

&lt;p&gt;In the middle of all of it, this appeared on threads:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ffy2gp18jmdfx6m767i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ffy2gp18jmdfx6m767i.png" alt="Image description" width="724" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.threads.net/@ashmore_glenn/post/DGDfmj6TsZS" rel="noopener noreferrer"&gt;https://www.threads.net/@ashmore_glenn/post/DGDfmj6TsZS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's interesting that he starts off making the claim that "in COBOL, if a date is missing the program defaults to 1875.  e.g. 2025-1875=150".  This claim isn't just false, it's false in multiple ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;There is no "date" data type in COBOL.  Dates are stored however the programmer wants, but usually numeric character strings&lt;/li&gt;
&lt;li&gt;There's no "default" date, even if there were such a data type&lt;/li&gt;
&lt;li&gt;Even if there were a default, 1875 would be a bizarre choice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;His initial post has over 20,000 "likes" and numerous shares as of this writing.  The followup - included in the picture above - has him backpedaling hard and pointing out that they were "taught to never leave a null date" and "I assume some group at IBM or elsewhere decided that May 20,1875, the date of the Convention on the Meter, would be the standard filler."  That got 10 likes.  Welcome to the internet.&lt;/p&gt;

&lt;p&gt;So, later on in the same thread he gets a lot of push back from other programmers, and then we find this with 1000+ likes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgihtq3wn355dgkyx1aie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgihtq3wn355dgkyx1aie.png" alt="Image description" width="709" height="185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'll put the whole thing here:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Correct. The original ISO standard for a default reference date was May 20, 1875 - the date the international standards and metrics treaty was signed. Geeky to be sure but that is coders for you. This standard was changed in 2019.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As you've probably guessed, that's not correct, either.&lt;/p&gt;

&lt;p&gt;But this was added to the lore and we had some disinformation with running shoes on.&lt;/p&gt;

&lt;h2&gt;
  
  
  ISO8601
&lt;/h2&gt;

&lt;p&gt;In the ISO8601:2004 standard, "May 20, 1875" is mentioned as a "reference date" in a section discussing the Gregorian calendar.&lt;/p&gt;

&lt;p&gt;For those of you unaware, the Gregorian calendar is almost certainly the calendar that you're using.  The "Julian calendar" is still used in some contexts in the west, mostly among Orthodox churches.  You most likely see this as "Orthodox Christmas" and "Orthodox Easter" on your (Gregorian) calendar.&lt;/p&gt;

&lt;p&gt;Without getting into all the specifics, Julius Caesar adopted what's now known as the "Julian calendar" in 46 BC.  It has 365.25 days in a year, so every four years a leap year is added.  That's pretty close.&lt;/p&gt;

&lt;p&gt;Unfortunately, "pretty close" doesn't work over the course of centuries, and people determined that the real number of days in a year is slightly less than 365.25.  So, in 1582 Pope Gregory XIII established a new calendar system that's pretty similar, but to get closer to the number of days in the year the leap year is skipped every hundred years, unless the year is a multiple of "400".&lt;/p&gt;

&lt;p&gt;In order to "reset" the calendar, October 4, 1582 (Julian) was followed by October 15, 1582 (Gregorian), which was October 5, 1582 (Julian).&lt;/p&gt;

&lt;p&gt;The Julian calendar will continue to drift apart from the Gregorian calendar.  The starting offset in 1582 was 10 days.  The Gregorian calendar skipped leap years in 1700, 1800, and 1900, but the Julian didn't skip those.  As of 2025, there are 13 days between the two.&lt;/p&gt;

&lt;p&gt;So, Christmas is December 25 in both calendar systems.  But those aren't the same day.  "December 25, 2024" in the Julian calendar is "January 7, 2025" Gregorian.  That's why "Orthodox Christmas" is celebrated in what most of us call "January".  Inside the Julian calendar, it's still December.&lt;/p&gt;

&lt;p&gt;In the ISO8601 specification, 2004 edition, section 3.2.1 discusses the Gregorian calendar as it is considered the "standard" calendar and the one that ISO8601 assumes.  In this discussion of the Gregorian calendar, it is mentioned:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Gregorian calendar has a reference point that assigns 20 May 1875 to the calendar day that the "Convention du Mètre" was signed in Paris.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a statement about the Gregorian calendar, not ISO8601.  It is saying that the date that the "Meter Convention" was signed in Paris is "May 20, 1875" on the Gregorian calendar, as an example.&lt;/p&gt;

&lt;p&gt;Equally valid statements:&lt;/p&gt;

&lt;p&gt;"The Julian calendar has a reference point that assigns 8 May 1875 to the calendar day that the 'Convention du Mètre' was signed in Paris."&lt;/p&gt;

&lt;p&gt;Even more:&lt;/p&gt;

&lt;p&gt;"The Gregorian calendar has a reference point that assigned 29 Aug 1958 to the calendar day that Michael Joseph Jackson was born."&lt;/p&gt;

&lt;p&gt;That's not a statement about ISO8601 - it's a statement about a calendar system.  It was removed from the 2019 update to ISO8601.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.loc.gov/standards/datetime/iso-tc154-wg5_n0038_iso_wd_8601-1_2016-02-16.pdf" rel="noopener noreferrer"&gt;https://www.loc.gov/standards/datetime/iso-tc154-wg5_n0038_iso_wd_8601-1_2016-02-16.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Due to what was shown on threads, this morphed into a full-scale disinformation campaign that usually disparaged Elon Musk or his "DOGE" team as hapless idiots who don't know about COBOL and thus don't know that "May 20, 1875 is the epoch in COBOL".&lt;/p&gt;

&lt;p&gt;Ah, the epoch.  What's an "epoch"?&lt;/p&gt;

&lt;p&gt;Let's back up a bit.  The ISO8601 standard for exact dates is pretty simple:&lt;/p&gt;

&lt;p&gt;yyyy-mm-dd (dashes optional)&lt;/p&gt;

&lt;p&gt;That's the entire standard for an exact date.  Four-digit year, two-digit month, and two-digit day of month, each with leading zeroes as applicable.  So, "May 20, 1875" is "1875-05-20".  "May 19, 1875" is "1875-05-19".  "October 15, 1582" is "1582-10-15".  "October 10, 1582" - ha! fooled you there, that date technically doesn't exist.  The Gregorian calendar didn't exist then, so we use the "proleptic" version which just numbers the days as if they did exist.  So, "October 10, 1582" Gregorian would be "September 30, 1582" Julian, and we would still use "1582-10-10".  Anyway....&lt;/p&gt;

&lt;p&gt;I use ISO8601 all the time.  It's the format used by HTML form fields for dates and times.  It's a date/time interchange format, and it's useful because a date/time value includes all information to accurately specify the exact date and time of an event, even if someone in a different time zone consumes the data being produced or if daylight savings time is different.  It's also great for dates as they can be sorted easily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Epochs
&lt;/h2&gt;

&lt;p&gt;An "epoch" in computer lore is basically when a system of timekeeping starts.  Year "1" is the first year in our Gregorian calendar system.  Times before that are just "BC".  There is no year "0", which is a bit odd, but exact dates that long ago typically don't matter.&lt;/p&gt;

&lt;p&gt;In the Unix operating system, the "epoch" is January 1, 1970 at 12:00:00 AM.  Later, that was determined to be "UTC".  A standard Unix timestamp is 32 bits, but it's considered to be a "signed" number so 31 bits are accessible.  That means that we can count up to 2,147,483,647 seconds past January 1, 1970 in this system, which puts us at 2038 when the number "overflows" and resets to a negative value.&lt;/p&gt;

&lt;p&gt;Put another way, with this system we can specify a date and time within about 68 years either way from January 1, 1970.  Here, I show what this means using the Ruby programming language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :001 &amp;gt; Time.at(0).utc
 =&amp;gt; 1970-01-01 00:00:00 UTC
3.3.6 :002 &amp;gt; Time.at(2**31-1).utc
 =&amp;gt; 2038-01-19 03:14:07 UTC
3.3.6 :003 &amp;gt; Time.at(-2**31).utc
 =&amp;gt; 1901-12-13 20:45:52 UTC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I type this, we are at 1,740,000,100 seconds since the epoch.&lt;/p&gt;

&lt;p&gt;The system that I just described has a couple of features that we can see as relevant:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It only works within a fairly narrow range of years&lt;/li&gt;
&lt;li&gt;A timestamp of "0" will be obvious since it's always the same value&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When I'm working with systems and I see the date/time is "January 1, 1970 12:00:00am UTC" or "December 31, 1969 6:00:00pm CST" I know right away that I have a timestamp that is 0.  This is a pretty common bug form - we forget to initialize a value, it ends up being 0, so the date that shows up is January 1, 1970 or December 31, 1969.&lt;/p&gt;

&lt;p&gt;There are other epochs.  A bunch of others.  One example is a timestamp for a file in MS-DOS, which starts in 1980.&lt;/p&gt;

&lt;p&gt;Now, in ISO8601 the specification for an exact date is just the four-digit year, two-digit month, and two-digit day of month.  There's no "epoch" other than, basically, year "1" or something like that.  In this system, May 20, 1875 has no significance.  That's why the line was removed from the standard in the 2019 edition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disinformation Multiplies
&lt;/h2&gt;

&lt;p&gt;Musk countered the disinformation by showing the counts by age bracket:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/elonmusk/status/1891350795452654076" rel="noopener noreferrer"&gt;https://x.com/elonmusk/status/1891350795452654076&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg1ahd8cuapnxsuv66fz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg1ahd8cuapnxsuv66fz.jpg" alt="Image description" width="376" height="854"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By this point, the disinformation was "if Musk and company were competent they would have noticed the huge spike at 150 years old and known that was important."  As the data shows, there is no spike at 150 but rather the downward slope past 70 that one would expect with this data.  And it ends not long after 150, not surprising since those are the first Social Security recipients.  There are a couple of outliers on there as well.&lt;/p&gt;

&lt;p&gt;It was too late to stop the disinformation campaign.  It was repeated by Rachel Maddow on MSNBC, and shows up in other places.  For instance, it's repeated by this "Politifact" "fact-check":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.politifact.com/article/2025/feb/17/are-150-year-old-americans-receiving-social-securi/" rel="noopener noreferrer"&gt;https://www.politifact.com/article/2025/feb/17/are-150-year-old-americans-receiving-social-securi/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This guy repeated it on X and got quite a few views:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/toshiHQ/status/1889928670887739902" rel="noopener noreferrer"&gt;https://x.com/toshiHQ/status/1889928670887739902&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It showed up in this Wired article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://t.co/79FN6dSvBO" rel="noopener noreferrer"&gt;https://t.co/79FN6dSvBO&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that the guy who wrote this - David Gilbert - supposedly covers "disinformation".  Yeah, he covered it.  By repeating it.&lt;/p&gt;

&lt;p&gt;Here it shows up in a yahoo!news article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.yahoo.com/news/trump-press-secretary-hit-embarrassing-191941730.html" rel="noopener noreferrer"&gt;https://www.yahoo.com/news/trump-press-secretary-hit-embarrassing-191941730.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And this article from someone on Daily Kos:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dailykos.com/stories/2025/2/14/2303889/-Nope-There-are-no-150-year-olds-on-Social-Security-It-s-COBOL" rel="noopener noreferrer"&gt;https://www.dailykos.com/stories/2025/2/14/2303889/-Nope-There-are-no-150-year-olds-on-Social-Security-It-s-COBOL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This disinformation is showing up almost exclusively on left-wing media and "fact checkers", and usually includes digs at Elon Musk and the "young" DOGE employees.&lt;/p&gt;

&lt;h2&gt;
  
  
  COBOL
&lt;/h2&gt;

&lt;p&gt;So, how does COBOL store dates?  That's up to the programmer, and I'm sure there are many, many different ways that dates have been stored.&lt;/p&gt;

&lt;p&gt;I pulled out my "Structured COBOL, Pseudocode Edition" by Shelly, Cashman, and Forsythe, 1986 edition.  ISBN 0-87835-196-5.  This is pre-Y2K, and it's mind-blowing to me that they weren't planning for Y2K yet.&lt;/p&gt;

&lt;p&gt;On page 6.32, "Defining the Date Work Area":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;On most computers, the current date is stored in computer memory as a 6 character field (Year, Month, Day).  In the sample program, the date, which will be printed in the report heading, is obtained from computer memory and is placed in a work area within the Working Storage Section of the Data Division.  The work area and the method used in the Procedure Division of the sample program to place the date in the work area are shown in Figure 6-48.&lt;/p&gt;

&lt;p&gt;In Figure 6-48, the DATE-WORK field is defined on line 006080.  The DATE-WORK field is then subdivided into three 2-digit numeric fields - the YEAR-WORK field, the MONTH-WORK field, and the DAY-WORK field.  The date is six characters in length and is stored in YYMMDD format (i.e. - January 25, 1987 would be stored as 870125).&lt;/p&gt;

&lt;p&gt;The date is obtained from computer memory using the Accept statement.  When the Accept statement is executed, the reserved work DATE identifies that the current date is to be copied from the area in main computer memory where it is stored by the operating system to the field DATE-WORK which has been defined in the program.&lt;/p&gt;

&lt;p&gt;The Accept statement can also be used to retrieve the day of the year and the time of day.  The day of the year is returned in a Julian date format.  The first two numeric characters are the year and the next three numeric characters are the day of the year.  Thus, the value returned for January 25, 1987 would be 87025.  The time is returned as a two-digit numeric hour, a two-digit numeric minute, and a two-digit numeric second, and a two-digit numeric hundredths of a second.  Thus, the time 9:15 a.m. would be returned as 09150000.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Figure 6-48, Data Division:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;006070 01  WORK-AREAS
006080     05  DATE-WORK
006090         10  YEAR-WORK       PIC 99
006100         10  MONTH-WORK      PIC 99
006110         10  DAY-WORK        PIC 99
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Procedure Division:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;014070     ACCEPT DATE-WORK FROM DATE
014080     MOVE MONTH-WORK TO MONTH-HEADING
014090     MOVE DAY-WORK TO DAY-HEADING
014100     MOVE YEAR-WORK TO YEAR-HEADING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For those of you wondering how we had the Y2K bug, there it is in all its glory.  This was written 14 years before 2000, and in the world of COBOL nobody apparently thought to point out that this was going to cause problems very, very soon.&lt;/p&gt;

&lt;p&gt;The part of the book shown above is the entire treatise on handling dates (times aren't even mentioned except in that section) out of hundreds of pages.  They show two possible date formats: "YYMMDD" or "YYDDD".  COBOL generally stored data like this as characters, so the standard date format would be six separate characters.&lt;/p&gt;

&lt;p&gt;Contrast that with a Unix timestamp which is a binary format that uses the space of four characters total and is able to specify more than 130 years of dates and times.  The YYMMDD format can specify only 100 years of dates, and uses six character spaces.  In modern terms, that's 48 bits to store around 36,525 days, which handily fits into 16 bits.&lt;/p&gt;

&lt;p&gt;But the nice thing about YYMMDD is that there is no endianness to worry about, no epoch, no real calculation to determine the date.  Just add a couple more Ys and you have ISO8601.  That's why it's a great interchange format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;But, "May 20, 1875" has nothing to do with ISO8601, COBOL, or basically anything else other than the day that the Meter Convention was signed in Paris.&lt;/p&gt;

&lt;p&gt;It was interesting to see a disinformation campaign take off like this, and I think at this point those involved have moved on to something else.  But this will live on because of the internet and all the newly-minted COBOL date experts expounding on how Musk and company don't know something that's just &lt;em&gt;so obvious&lt;/em&gt;.  Sigh.&lt;/p&gt;

&lt;p&gt;Never change, internet.  Never change.&lt;/p&gt;

</description>
      <category>development</category>
    </item>
    <item>
      <title>Understanding Hexadecimal Numbers</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Thu, 30 Jan 2025 21:24:20 +0000</pubDate>
      <link>https://dev.to/mdchaney/understanding-hexadecimal-numbers-4ohk</link>
      <guid>https://dev.to/mdchaney/understanding-hexadecimal-numbers-4ohk</guid>
      <description>&lt;p&gt;I'm going old-school here, but some of these basics are overlooked nowadays.  This is not a post for my Ruby on Rails friends, but rather for folks who may be learning low-level programming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hex?
&lt;/h2&gt;

&lt;p&gt;Hexadecimal ("hex" from now on) is just a shorthand binary notation.  We don't use normal decimal numbers because they don't line up with their binary equivalents.&lt;/p&gt;

&lt;p&gt;All modern computers have a word size that is 8 times a power of 2.  Simply stated, it will be 8, 16, 32, or 64 bits.  There have been computers with larger word sizes, and there have been computers with weird word sizes.&lt;/p&gt;

&lt;p&gt;The DEC PDP-6 was a 36-bit computer.  I have worked on a CDC 830 which used a 60-bit word size.  Each of these had the capability of storing 6-bit characters in their words.&lt;/p&gt;

&lt;p&gt;There's something interesting about those word sizes - they also have "3" as a factor.&lt;/p&gt;

&lt;p&gt;Back when I was a kid, octal was more popular than hex when handling binary data.  For a 36-bit computer, each word is 12 octal digits and those line up nicely with the bits.  Things get messy when dealing with modern word sizes and octal.  Why?&lt;/p&gt;

&lt;p&gt;Consider what "all bits on" looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Binary: 1111 1111
Octal: 377
Binary: 1111 1111 1111 1111
Octal: 177777
Binary: 1111 1111 1111 1111 1111 1111 1111 1111
Octal: 37777777777
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hopefully you can see the issue here.  It's not obvious what octal digits line up to what binary digits.&lt;/p&gt;

&lt;p&gt;You'll notice that I spaced out the bits into groups of four bits each.  8 bits is a byte, and 4 bits is a nybble (pronounced as "nibble").  This is common in any literature to make things more readable.  And, luckily, each nybble corresponds to a hex digit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hex to the Rescue
&lt;/h2&gt;

&lt;p&gt;Hexadecimal is just a base-16 number system.  We have our normal 10 digits in base 10 - 0 through 9 - so for hex we just add a-f to represent the numbers 10-15 in decimal.  Hex is really easy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decimal&lt;/th&gt;
&lt;th&gt;Hex&lt;/th&gt;
&lt;th&gt;Binary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0010&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0011&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0101&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;0110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;0111&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;1001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;1010&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;1011&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;1100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;td&gt;1101&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;1110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;1111&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It's identical to decimal until we get to "10", and we start using those letters.  "F" is all four bits on, "0" is all four bits off.&lt;/p&gt;

&lt;p&gt;So, again, hex is just a way to write binary that's shorter and often more readable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memorizing Bit Patterns
&lt;/h2&gt;

&lt;p&gt;You can think of each hex digit as the corresponding bit pattern, and if you're going to do any low-level programming you really want to be able to have the above table memorized.  Here's how I did it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Comments&lt;/th&gt;
&lt;th&gt;Binary&lt;/th&gt;
&lt;th&gt;Hex&lt;/th&gt;
&lt;th&gt;Decimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No bits&lt;/td&gt;
&lt;td&gt;0000&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;All bits&lt;/td&gt;
&lt;td&gt;1111&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One bit&lt;/td&gt;
&lt;td&gt;0001&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0010&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0100&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two bits - outer&lt;/td&gt;
&lt;td&gt;1001&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two bits - inner&lt;/td&gt;
&lt;td&gt;0110&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two bits - ends&lt;/td&gt;
&lt;td&gt;1100&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0011&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two bits - spaced&lt;/td&gt;
&lt;td&gt;1010&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0101&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three bits - ends&lt;/td&gt;
&lt;td&gt;1110&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;0111&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three bits - space&lt;/td&gt;
&lt;td&gt;1011&lt;/td&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;1101&lt;/td&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's it.  Remember those patterns and you'll be able to "see" the bits and vice versa.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Hex
&lt;/h2&gt;

&lt;p&gt;One other thing - now and then you need to know certain numbers in decimal.  You should have powers of two memorized up to 65,536, along with the corresponding 2^x-1.  You should also know the factors of 16 up to 16*16.  For instance, if I see "C0" I know that's 192 decimal because "12 * 16" is 192 in decimal.&lt;/p&gt;

&lt;p&gt;This isn't for everybody.  If you're a web programmer it's unlikely that you'll need to care about any of this.&lt;/p&gt;

&lt;p&gt;On the other hand, if you're reading spec sheets for a piece of hardware, you'll need every bit of this.&lt;/p&gt;

&lt;p&gt;Enjoy, and ask questions below.&lt;/p&gt;

</description>
      <category>hex</category>
      <category>assembly</category>
    </item>
    <item>
      <title>Matching Fuzzy Data</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Mon, 27 Jan 2025 21:55:21 +0000</pubDate>
      <link>https://dev.to/mdchaney/matching-fuzzy-data-1ek1</link>
      <guid>https://dev.to/mdchaney/matching-fuzzy-data-1ek1</guid>
      <description>&lt;p&gt;I do a lot of work in the music industry.  One thing about the music industry - and really every industry - is that there's a lot of junk data out there.&lt;/p&gt;

&lt;p&gt;Now, there are various ways that this manifests in the music industry.  The obvious one is what I'll cover today - composition titles and associated filenames.  Then we have composer names.  And publisher names.  This isn't even getting into the realm of metadata.&lt;/p&gt;

&lt;p&gt;There are lots of places where we need to match up pieces of data that might not match with a simple string compare.  There are also issues of missing data - audio files might not be present or lines may be missing from the sheet.&lt;/p&gt;

&lt;p&gt;Managing and ingesting data of this kind requires fuzzy matching as part of a process that alerts the user to all of the ways that the data is problematic.&lt;/p&gt;

&lt;p&gt;In Ruby, I use a gem called "&lt;a href="https://github.com/kiyoka/fuzzy-string-match" rel="noopener noreferrer"&gt;fuzzy-string-match&lt;/a&gt;" which implements what's known as the &lt;a href="https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance" rel="noopener noreferrer"&gt;Jaro-Winkler algorithm&lt;/a&gt;.  (An alternative is the &lt;a href="https://en.wikipedia.org/wiki/Levenshtein_distance" rel="noopener noreferrer"&gt;Levenshtein distance algorithm&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;As I said before, I'll cover the concept of composition titles and filenames.  It's very common that I'll receive a group of files along with a spreadsheet of metadata.  The trick is to match the lines in the spreadsheet with the associated file path.&lt;/p&gt;

&lt;p&gt;Right now, I'm dealing with a dataset where the file path looks like this:&lt;/p&gt;

&lt;p&gt;country/{sometimes genre/}album/filename.[mp3|wav]&lt;/p&gt;

&lt;p&gt;It turns out that the "genre" - if present - may or may not match what's in the spreadsheet, so it'll have to be ignored.&lt;/p&gt;

&lt;p&gt;The album column contains an "album number" consisting of a few letters to identify the publisher along with a serial number plus the title, so it's guaranteed to have some uniqueness.&lt;/p&gt;

&lt;p&gt;The filename, though.  May God help me.  The filename might be any number of things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The composition title&lt;/li&gt;
&lt;li&gt;The album number/name plus the composition title&lt;/li&gt;
&lt;li&gt;The album number/name, the track number, and the composition title&lt;/li&gt;
&lt;li&gt;The track number and composition title&lt;/li&gt;
&lt;li&gt;(I'm not making this up) The album number/name, the track number, the track number again, and the composition title&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For #5, it looks like the composition title came from somewhere that had the track number already tacked on.  This belief is supported by some of those also have an extension of ".mp3.mp3".&lt;/p&gt;

&lt;p&gt;I would also point out that the various components of this filename have delimiters between them, typically some combination of dashes, spaces, and underscores.&lt;/p&gt;

&lt;p&gt;I'm not going to embarrass my client if for no other reason than their data is normal.  It's no worse than any number of other datasets that I've encountered.  So consider this filepath which has some stuff changed:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;So,&lt;br&gt;
country: &lt;code&gt;Canada&lt;/code&gt;&lt;br&gt;
album: &lt;code&gt;ZZZ 001_Very Nice Music Vol. 1&lt;/code&gt;&lt;br&gt;
track number: &lt;code&gt;10&lt;/code&gt;&lt;br&gt;
composition title: &lt;code&gt;Fighter&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here's another one:&lt;br&gt;
&lt;code&gt;Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Note that in this one the album designation is missing "Vol. 1" in the filename.  Plus we have an extraneous space.  What can you trust here?&lt;/p&gt;

&lt;p&gt;If you're thinking I should just dig out the album name and track number - good idea.  Except, records still might not match up.  It's important to me to have a bit of a double-check to make sure the titles are correct(ish) as well.&lt;/p&gt;

&lt;p&gt;The general way to accomplish this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create likely possible file paths for each row in the spreadsheet.&lt;/li&gt;
&lt;li&gt;Use the fuzzy matcher to determine if it's really close to one of the file paths&lt;/li&gt;
&lt;li&gt;If so, match it up and take that file path out of the pool&lt;/li&gt;
&lt;li&gt;Output a new sheet with a new "full file path" column&lt;/li&gt;
&lt;li&gt;Output separately a list of unmatched file paths&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this list, the second item is the meat of the algorithm. It's also where we can get the most gains from a correct implementation.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Jaro-Winkler Algorithm
&lt;/h2&gt;

&lt;p&gt;Let's talk about the Jaro-Winkler algorithm.  To make it easy, let's look at examples using irb.&lt;/p&gt;

&lt;p&gt;The implementation of the Jaro-Winkler algorithm in Ruby is a simple function call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ irb
3.3.6 :001 &amp;gt; require 'fuzzystringmatch'
 =&amp;gt; true
3.3.6 :002 &amp;gt; fuzzy = FuzzyStringMatch::JaroWinkler.create(:pure)
 =&amp;gt; #&amp;lt;FuzzyStringMatch::JaroWinklerPure:0x000000011d0c0120&amp;gt;
3.3.6 :003 &amp;gt; fuzzy.getDistance('string 1', 'string 2')
 =&amp;gt; 0.975
3.3.6 :004 &amp;gt; fuzzy.getDistance('the same thing', 'the same thing')
 =&amp;gt; 1.0
3.3.6 :005 &amp;gt; fuzzy.getDistance('a string like this', 'something different')
 =&amp;gt; 0.5467836257309941
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see here that the value "1.0" means equality, and that number diminishes to "0.0" as the strings diverge.  I would note that the linked article shows the opposite, which makes sense given that it's a "distance", but as long as it's consistent we can work with whatever we get.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :006 &amp;gt; fuzzy.getDistance('monday', 'MONDAY')
 =&amp;gt; 0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ack.  It's case-sensitive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :007 &amp;gt; fuzzy.getDistance('a new day', 'a_new_day')
 =&amp;gt; 0.8666666666666666
3.3.6 :008 &amp;gt; fuzzy.getDistance('a new day', 'a  new  day')
 =&amp;gt; 0.8898071625344353
3.3.6 :009 &amp;gt; fuzzy.getDistance('a new day', 'a   new   day')
 =&amp;gt; 0.8505369274600044
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It cares about whitespace and non-alphanumeric characters as well.  But there's something to be noticed here - adding more spaces doesn't make the score drop by much.&lt;/p&gt;

&lt;p&gt;Just from what I've shown here, we can come up with a few ideas to make the filename matches more likely:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove all file extensions from the filenames, including duplicate extensions&lt;/li&gt;
&lt;li&gt;Force the file path to lowercase&lt;/li&gt;
&lt;li&gt;Remove all non-alphanumeric characters&lt;/li&gt;
&lt;li&gt;Compress multiple spaces down to a single space&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's then consider this file path:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;First, I can cut it up into four pieces:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Mexico&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Rock&lt;/code&gt;&lt;br&gt;
&lt;code&gt;YYY 001_Rock Vol. 1&lt;/code&gt;&lt;br&gt;
&lt;code&gt;YYY 001_Rock_9_9_ Love You Much.mp3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I'll ignore the genre.  Let's munge the album name:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;yyy 001 rock vol 1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And let's do the same for the filename, removing the extension as well:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;yyy 001 rock 9 9 love you much&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now, the row in the spreadsheet will show that the album name is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;YYY 001_Rock Vol. 1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That'll then turn into the same thing we see above:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;yyy 001 rock vol 1&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;There are two issues in the filename.  First, the "Vol. 1" is missing from the album designation.  Second, the track number is duplicated.&lt;/p&gt;

&lt;p&gt;Let's see how those things affect the scoring.  The ideal filename from the data would be &lt;code&gt;YYY 001_Rock Vol. 1_9_Love You Much.mp3&lt;/code&gt; given the established pattern.  Let's munge that (&lt;code&gt;yyy 001 rock vol 1 9 love you much&lt;/code&gt;) and compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :010 &amp;gt; fuzzy.getDistance('yyy 001 rock vol 1 9 love you much', 'yyy 001 rock 9 9 love you much')
 =&amp;gt; 0.9202640894085828
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, it comes in at 0.92, which is close.  Is it close enough?&lt;/p&gt;

&lt;p&gt;A naive algorithm might have a cutoff value that is considered "close enough".  But if we do a complete cartesian product and sort by the fuzzy distance (descending) we'll get all of the best matches at the top.  This can be an expensive operation, and it might make sense to look for exact matches before running the fuzzy matcher.  I won't optimize it here, but you get the idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Munging Strings For Better Matching
&lt;/h2&gt;

&lt;p&gt;Taking our example, we can run a set of mungers against the two strings, compare all possible pairs, and extract the highest score as the official score for those two strings.  Let's look at how to do this in Ruby.&lt;/p&gt;

&lt;p&gt;First, what are our "two strings".  On one side, we have the file path.  On the other side, we have the expected file path that will be generated from the spreadsheet data.  In reality, there will be a few different "expected" file paths since there are a few ways those are generated.&lt;/p&gt;

&lt;p&gt;Let's create a simple set of mungers. We will munge the country, album name, and filename items separately.  All will go through the basic munging:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Force to lowercase&lt;/li&gt;
&lt;li&gt;Change all non-alphanumeric characters to spaces&lt;/li&gt;
&lt;li&gt;Collapse multiple contiguous spaces to one single space&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Filenames will have to be treated differently.  Just looking at the examples shown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Remove "vol x", where "x" is a number&lt;/li&gt;
&lt;li&gt;Change " x x " to " x " - in other words, if a number is duplicated remove one of them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There's a potential pitfall here.  If we have "vol. 1" and track "1" there'll likely be a duplicated number.  For those mungers, though, we'll try all combinations - including none, so the original filename will still be in the list.&lt;/p&gt;

&lt;p&gt;Let's start with our simple mungers (I'm going to skip the new Ruby syntaxes to make this work back into the Ruby 2 series):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;downcase_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;alphanumeric_filter_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\P{Alnum}/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;spaces_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/ +/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And our specific filename mungers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;remove_vol_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/ vol \d+ /&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;remove_dup_track_number_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/ (\d+) (\1) /&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' \\1 '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we also want to remove the extensions from the filenames, knowing that they might be duplicated (or replicated):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;remove_ext_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/(\.(mp3|wav))+$/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're going to use a fun trick to apply every combination of mungers to our strings.  Note that &lt;code&gt;downcase_munger&lt;/code&gt;, &lt;code&gt;alphanumeric_filter_munger&lt;/code&gt;, and &lt;code&gt;spaces_munger&lt;/code&gt; are always applied to our strings at the start, with &lt;code&gt;spaces_munger&lt;/code&gt; applied again at the end. But our other two mungers for filenames need to be applied in all combinations.  Since there are only two, this is very easy: none, both, and each individually.  I have code that uses up to seven mungers, so generalizing this makes sense.&lt;/p&gt;

&lt;p&gt;Ruby has an awesomely simple way to do this with the &lt;code&gt;combination&lt;/code&gt; method for arrays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :001 &amp;gt; [1,2,3,4].combination(3).to_a
 =&amp;gt; [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]
3.3.6 :002 &amp;gt; [1,2,3,4].combination(2).to_a
 =&amp;gt; [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
3.3.6 :003 &amp;gt; [1,2].combination(2).to_a
 =&amp;gt; [[1, 2]]
3.3.6 :004 &amp;gt; [1,2].combination(1).to_a
 =&amp;gt; [[1], [2]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, given an array it returns all possible combinations of "n" elements.&lt;/p&gt;

&lt;p&gt;What if those elements are mungers?&lt;/p&gt;

&lt;p&gt;In Ruby, starting in version 2.5 of the language, it's possible to easily &lt;a href="https://medium.com/rubycademy/function-composition-in-ruby-d9ca64f65abb" rel="noopener noreferrer"&gt;compose functions&lt;/a&gt; using the "&lt;a href="https://docs.ruby-lang.org/en/3.3/Proc.html#method-i-3E-3E" rel="noopener noreferrer"&gt;&amp;gt;&amp;gt;&lt;/a&gt;" or "&lt;a href="https://docs.ruby-lang.org/en/3.3/Proc.html#method-i-3C-3C" rel="noopener noreferrer"&gt;&amp;lt;&amp;lt;&lt;/a&gt;" operator.&lt;/p&gt;

&lt;p&gt;As an aside, I'm going to assume that my mungers can be called in any order, meaning we only need to worry about "combinations" and not "permutations".  In reality order does matter, but it doesn't matter enough for me to bother with it.&lt;/p&gt;

&lt;p&gt;Here are our steps so far:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For the spreadsheet: come up with a list of all expected file paths given the country, album name, track number, and composition title.&lt;/li&gt;
&lt;li&gt;For the file paths: remove the genre if applicable, apply the general mungers to the country and album name, then run all mungers against the filename.  Reassemble the paths.&lt;/li&gt;
&lt;li&gt;Run the cartesian product of expected file paths and file paths.  Since there will be multiple items for each spreadsheet line and each file path, we'll again run a cartesian product and take only the closest match.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We will create an array that has these elements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;spreadsheet line&lt;/li&gt;
&lt;li&gt;full file path&lt;/li&gt;
&lt;li&gt;Jaro-Winkler score&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Scores below a certain threshold will have to be ignored.  We'll figure that out later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The General Algorithm
&lt;/h2&gt;

&lt;p&gt;So, let's talk about the spreadsheet lines.  Here is the data for the file paths that we've referenced above:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;country&lt;/th&gt;
&lt;th&gt;album&lt;/th&gt;
&lt;th&gt;track_number&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Canada&lt;/td&gt;
&lt;td&gt;ZZZ 001_Very Nice Music Vol. 1&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Fighter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mexico&lt;/td&gt;
&lt;td&gt;YYY 001_Rock Vol. 1&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Love You Much&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here are all of our combinations of these data items:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;title - "%s"&lt;/li&gt;
&lt;li&gt;The album plus the title - "%s - %s"&lt;/li&gt;
&lt;li&gt;The album, track_number, and title - "%s - %d - %s"&lt;/li&gt;
&lt;li&gt;The track_number and title - "%d - %s"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I can ignore the duplicated track number as the mungers will handle that.  Let's see how this works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;downcase_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;alphanumeric_filter_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\P{Alnum}/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;spaces_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/ +/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;general_munger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;downcase_munger&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;alphanumeric_filter_munger&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;spaces_munger&lt;/span&gt;

&lt;span class="n"&gt;filename_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"%&amp;lt;title&amp;gt;s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"%&amp;lt;album&amp;gt;s - %&amp;lt;title&amp;gt;s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"%&amp;lt;album&amp;gt;s - %&amp;lt;track_number&amp;gt;d - %&amp;lt;title&amp;gt;s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"%&amp;lt;track_number&amp;gt;d - %&amp;lt;title&amp;gt;s"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# Note that the keys need to be symbols&lt;/span&gt;
&lt;span class="n"&gt;csv_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;country: &lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;album: &lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;track_number: &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;title: &lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;country: &lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;album: &lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;track_number: &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;title: &lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With all of that in place, we can get a group of possible (expected) filenames for each csv row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;add_possible_filenames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;album&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;path_prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/"&lt;/span&gt;
    &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filename_patterns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="nb"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;path_prefix&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :067 &amp;gt; add_possible_filenames.call csv_row
 =&amp;gt;
[{:country=&amp;gt;"Canada",
  :album=&amp;gt;"001_Very Nice Music Vol. 1",
  :track_number=&amp;gt;10,
  :title=&amp;gt;"Fighter",
  :possible_filenames=&amp;gt;
   ["canada/zzz 001 very nice music vol 1/fighter",
    "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter",
    "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter",
    "canada/zzz 001 very nice music vol 1/10 fighter"]},
 {:country=&amp;gt;"Mexico",
  :album=&amp;gt;"YYY 001_Rock Vol. 1",
  :track_number=&amp;gt;9,
  :title=&amp;gt;"Love You Much",
  :possible_filenames=&amp;gt;
   ["mexico/yyy 001 rock vol 1/love you much",
    "mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much",
    "mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much",
    "mexico/yyy 001 rock vol 1/9 love you much"]}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I placed the possible filenames in the csv_rows under "possible_filenames", and they're already generally munged.&lt;/p&gt;

&lt;p&gt;Now, let's look at the filepaths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;filename_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;filename_mungers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;remove_vol_munger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remove_dup_track_number_munger&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;apply_filename_mungers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;munged_filenames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename_mungers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;filename_mungers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;combination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;munger_combo&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;munger_combo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;
  &lt;span class="n"&gt;munged_filenames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;spaces_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniq&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;create_filename_lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filename_list&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;filename_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;path_pieces&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;path_pieces&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path_pieces&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="n"&gt;country&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;path_pieces&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;album&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;path_pieces&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;remove_ext_munger&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;general_munger&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;path_pieces&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;filenames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apply_filename_mungers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filenames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;country&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;album&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, to apply this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :075 &amp;gt; file_path_variations = create_filename_lists.call filename_list
 =&amp;gt;
[["Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3",
  ["mexico/yyy 001 rock vol 1/yyy 001 rock 9 9 love you much",
   "mexico/yyy 001 rock vol 1/yyy 001 rock 9 love you much"]],
 ["Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3",
  ["canada/zzz 001 very nice music vol 1/zzz 001 very nice music 10 10 fighter",
   "canada/zzz 001 very nice music vol 1/zzz 001 very nice music 10 fighter",
   "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 10 fighter",
   "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"]]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we have an array where the first element is the file path on disk and the second element is a subarray with a few different variations, based on the mungers.  Again, there may be any number of mungers depending on the data.&lt;/p&gt;

&lt;p&gt;At this point, we want to combine everything.  For each spreadsheet row and each file path we want to get the Jaro-Winkler distance between all variations and capture the closest (highest, in our case).&lt;/p&gt;

&lt;p&gt;If you're using a language other than Ruby or JavaScript you might want to have a slightly different data structure than what I'm doing here.  These languages use references, meaning 1) data isn't copied and 2) equality can be guaranteed by checking the actual object reference.&lt;/p&gt;

&lt;p&gt;However, note that this will grow exponentially as the data set grows, so it works great for hundreds and maybe thousands of records.  Beyond that you'll likely need to start working on optimization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;compute_cartesian_product_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_path_variations&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;file_path_variations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variations&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;closest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;possible_filename&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="n"&gt;variations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;variation&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;fuzzy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;possible_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;
      &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;closest&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just with two items we have a four element array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6296653796653796&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.9804808984852584&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6403194506642782&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we can process this array to get the likely matches, starting with the most likely and descending.  At each step, we will remove "used" items from the cartesian_product_array - based on both the "csv_row" and the "filepath".  The array will quickly shrink as this action is destructive.  Again, this is unoptimized and there are many obvious optimizations to speed this up.&lt;/p&gt;

&lt;p&gt;As I've written this, everything will be "consumed".  What that means is that when we finish whichever set is smaller (csv rows or file paths) will have a match for every element.  Some of those matches may be incorrect simply because there was no match and it grabbed the closest.  You can see above that absolutely wrong items still scored above 0.6.  We probably need to program in a cutoff of .8 or .9, which we can work on next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Naive Algorithm
&lt;/h2&gt;

&lt;p&gt;Let's do the naive algorithm first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_by!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:last&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will put the most likely matches at the end.  We can grab the last element in this array and assume it's a match.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :313 &amp;gt; cartesian_product_array.last
 =&amp;gt;
[{:country=&amp;gt;"Canada",
  :album=&amp;gt;"ZZZ 001_Very Nice Music Vol. 1",
  :track_number=&amp;gt;10,
  :title=&amp;gt;"Fighter",
  :possible_filenames=&amp;gt;
   ["canada/zzz 001 very nice music vol 1/fighter",
    "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter",
    "canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter",
    "canada/zzz 001 very nice music vol 1/10 fighter"]},
 "Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3",
 1.0]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a nice match - the score is "1.0" meaning it was an exact match.  We need to remove every instance of this csv_row and every instance of this file path from our array.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :317 &amp;gt; current_match = cartesian_product_array.last
3.3.6 :318 &amp;gt; cartesian_product_array.reject! { |row|
3.3.6 :319 &amp;gt;   row[0] == current_match[0] || row[1] == current_match[1]
3.3.6 :320 &amp;gt; }
 =&amp;gt;
[[{:country=&amp;gt;"Mexico",
   :album=&amp;gt;"YYY 001_Rock Vol. 1",
   :track_number=&amp;gt;9,
   :title=&amp;gt;"Love You Much",
   :possible_filenames=&amp;gt;
    ["mexico/yyy 001 rock vol 1/love you much",
     "mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much",
     "mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much",
     "mexico/yyy 001 rock vol 1/9 love you much"]},
  "Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3",
  0.9804808984852584]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That knocked out three lines!  This algorithm continues to speed up as it goes along.  At this point, we save the current_match and do this again, and we'll have both matches.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :321 &amp;gt; current_match = cartesian_product_array.last
 =&amp;gt;
[{:country=&amp;gt;"Mexico",
...
3.3.6 :322 &amp;gt; cartesian_product_array.reject! { |row|
3.3.6 :323 &amp;gt;   row[0] == current_match[0] || row[1] == current_match[1]
3.3.6 :324 &amp;gt; }
 =&amp;gt; []
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's turn that into a function.  I'll just add the matched file path to the csv_row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;add_file_path_to_csv_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_by!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:last&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any?&lt;/span&gt;
    &lt;span class="n"&gt;current_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt;
    &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reject!&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run that, and it'll add the "matched_file_path" to the csv_rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  More Complicated Data
&lt;/h2&gt;

&lt;p&gt;Okay, let's be frank.  That example is a little sterile given that we knew up front that our two files would match the two lines in the sheet.  Real life is never so nice.  I mean, it just isn't.  So given that, how can we make this work when stuff doesn't match?  Let's add some crap data and see what happens.&lt;/p&gt;

&lt;p&gt;Let's add an extraneous file path first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;filename_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"USA/AAA 001_I Rock, You Don't/AAA 001_I Rock, You Don't - 1 - I Rock, You Don't.mp3"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, we can compute a new set of file path variations, and I'll also reset my csv_rows.&lt;/p&gt;

&lt;p&gt;Let's get a new cartesian product array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6296653796653796&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/zzz 001 very nice music vol 1 10 fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"canada/zzz 001 very nice music vol 1/10 fighter"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6125451577579237&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.9804808984852584&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6403194506642782&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
 &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/yyy 001 rock vol 1 9 love you much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="s2"&gt;"mexico/yyy 001 rock vol 1/9 love you much"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
  &lt;span class="s2"&gt;"USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="mf"&gt;0.6289747064137309&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "I Rock, You Don't" file isn't matching anything.  If I now run &lt;code&gt;add_file_path_to_csv_rows&lt;/code&gt;, I get the same result.  Now, we can find missing file paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;find_missing_file_paths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename_list&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;filename_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :428 &amp;gt; find_missing_file_paths.call(csv_rows, filename_list)
 =&amp;gt; ["USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, what happens if we have an extra row in the CSV file?  Let's start out with the same number of rows, but items that clearly don't match.  I'll leave "I Rock, You Don't" in the list of files, and we'll add another line to the CSV file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;country&lt;/th&gt;
&lt;th&gt;album&lt;/th&gt;
&lt;th&gt;track_number&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Canada&lt;/td&gt;
&lt;td&gt;ZZZ 001_Very Nice Music Vol. 1&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Fighter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mexico&lt;/td&gt;
&lt;td&gt;YYY 001_Rock Vol. 1&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Love You Much&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UK&lt;/td&gt;
&lt;td&gt;UUU 003_Folk Vol. 3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Serious Sunday&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first two will match up, and we'll see "Serious Sunday" get matched with "I Rock, You Don't".  Let's modify the matcher to store the JW distance in csv_rows as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;add_file_path_to_csv_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_by!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:last&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any?&lt;/span&gt;
    &lt;span class="n"&gt;current_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;last&lt;/span&gt;
    &lt;span class="n"&gt;cartesian_product_array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reject!&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, I'll run everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :451 &amp;gt; add_possible_filenames.call csv_rows;
3.3.6 :452 &amp;gt; cartesian_product_array = compute_cartesian_product_array.call(csv_rows, file_path_variations);
3.3.6 :453 &amp;gt; add_file_path_to_csv_rows.call cartesian_product_array;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the output (note that "possible_filenames" have been removed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;0.9804808984852584&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UK"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UUU 003_Folk Vol. 3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Serious Sunday"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;0.5716374269005847&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, you can see that "I Rock, You Don't" and "Serious Sunday" matched, but the score is 0.57.  We can get around this by ignoring low scores, we just have to define what "low" is.  Ultimately, that's something to decide with a trial and error approach.  For now, I think 0.9 is reasonable.&lt;/p&gt;

&lt;p&gt;Let's take this one step further and see what happens if we add another CSV row:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;country&lt;/th&gt;
&lt;th&gt;album&lt;/th&gt;
&lt;th&gt;track_number&lt;/th&gt;
&lt;th&gt;title&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Canada&lt;/td&gt;
&lt;td&gt;ZZZ 001_Very Nice Music Vol. 1&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Fighter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mexico&lt;/td&gt;
&lt;td&gt;YYY 001_Rock Vol. 1&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Love You Much&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UK&lt;/td&gt;
&lt;td&gt;UUU 003_Folk Vol. 3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Serious Sunday&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USA&lt;/td&gt;
&lt;td&gt;AAA 002_Country&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Let's Go Out&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At this point, we now have two items in the CSV file and one file which are orphans.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;0.9804808984852584&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UK"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UUU 003_Folk Vol. 3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Serious Sunday"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"USA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"AAA 002_Country"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Let's Go Out"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;0.6940492321589883&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see that "Serious Sunday" didn't get matched, and "Let's Go Out" got matched to "I Rock, You Don't" with a score of 0.69.&lt;/p&gt;

&lt;h2&gt;
  
  
  A More Sophisticated Algorithm
&lt;/h2&gt;

&lt;p&gt;We have a couple of places to cut out the low scores, but earlier in the process is better since there's no reason to even drag those around.  I will note here that if you have every reason to believe that there will be 1:1 matching you might want to leave the low scores in.  But for what we're doing here - we won't miss them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;compute_cartesian_product_array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_path_variations&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;csv_rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="n"&gt;file_path_variations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variations&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;closest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:possible_filenames&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;possible_filename&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="n"&gt;variations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;variation&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
          &lt;span class="n"&gt;fuzzy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDistance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;possible_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;
      &lt;span class="n"&gt;closest&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;csv_row&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;closest&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compact&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Canada"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"ZZZ 001_Very Nice Music Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Fighter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Canada/ZZZ 001_Very Nice Music Vol. 1/ZZZ 001_Very Nice Music Vol. 1 - 10 - 10_Fighter.mp3.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Mexico"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"YYY 001_Rock Vol. 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Love You Much"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_path&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
   &lt;span class="s2"&gt;"Mexico/Rock/YYY 001_Rock Vol. 1/YYY 001_Rock_9_9_ Love You Much.mp3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:matched_file_score&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;0.9804808984852584&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UK"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"UUU 003_Folk Vol. 3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Serious Sunday"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:country&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"USA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:album&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"AAA 002_Country"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:track_number&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="s2"&gt;"Let's Go Out"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And, of course, we now have a missing file path (really, a missing CSV entry):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"USA/AAA 001_I Rock You Don't/AAA 001_I Rock You Don't - 1 - I Rock You Don't.mp3"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrap Up
&lt;/h2&gt;

&lt;p&gt;There are a lot of options of how to handle this in your UI/UX.  I like to show the matches and make anything that's of lower quality red to draw attention to it.  It's often the case that I'll do this with a DropBox or Box directory, so the user has to find the directory and then be shown a matching list of files.  If things don't match properly, they can rename them, add the missing files, whatever is appropriate, and then try matching again.&lt;/p&gt;

&lt;p&gt;While this algorithm time increases exponentially as data is added, cutting off low-quality matches up front eases a lot of that pain.  It still should be fine for even thousands of rows as long as you have enough memory.&lt;/p&gt;

&lt;p&gt;Feel free to ask questions below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Addendum
&lt;/h2&gt;

&lt;p&gt;I wanted to make an addendum on this since I didn't really go into detail about why I do the matching in this manner.  I'm doing a cartesian product of two sets, getting the distance, and then matching individually starting at the best matches and working my way down.&lt;/p&gt;

&lt;p&gt;Why not just determine what's "close enough" and assume those are matches?&lt;/p&gt;

&lt;p&gt;A lot of the data that I work with has really close names.  Even the data set referenced above has this issue, but some other data sets even more so.&lt;/p&gt;

&lt;p&gt;It's common for me to get a set of files with names like these (note that this is real data except for the title):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I Rock, You Don't - full.wav
I Rock, You Don't - full no drums.wav
I Rock, You Don't - under.wav
I Rock, You Don't - under no drums.wav
I Rock, You Don't - bass stem.wav
I Rock, You Don't - brass stem.wav
I Rock, You Don't - drums stem.wav
I Rock, You Don't - fx stem.wav
I Rock, You Don't - keys stem.wav
I Rock, You Don't - strings stem.wav
I Rock, You Don't - synth drones stem.wav
I Rock, You Don't - synth pulse stem.wav
I Rock, You Don't - sting.wav
I Rock, You Don't - 30.wav
I Rock, You Don't - 60.wav
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'll direct your attention to the "bass" and "brass" stems.  Check this out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.3.6 :003 &amp;gt; fuzzy.getDistance("I Rock, You Don't - bass stem.wav", "I Rock, You Don't - brass stem.wav")
 =&amp;gt; 0.9962514417531718
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those are &lt;em&gt;really&lt;/em&gt; close, and if you're doing a really simple "take anything over .95" type algorithm you'll have problems.&lt;/p&gt;

&lt;p&gt;I also can't just take exact matches, because it's so common with these filenames to have "drum stem" instead of "drums stem", etc.  Plus typos.&lt;/p&gt;

&lt;p&gt;The idea is to find the closest match first, and then mark those items on both sides as "matched" and remove them from the pool.  By doing that, it's extremely unlikely to have issues such as "bass stem" and "brass stem" getting mixed up.&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>devlopment</category>
      <category>data</category>
    </item>
    <item>
      <title>Replacing Cron Jobs With Active Jobs in Rails</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Wed, 18 Dec 2024 22:11:01 +0000</pubDate>
      <link>https://dev.to/mdchaney/replacing-cron-jobs-with-active-jobs-in-rails-3h4i</link>
      <guid>https://dev.to/mdchaney/replacing-cron-jobs-with-active-jobs-in-rails-3h4i</guid>
      <description>&lt;p&gt;For years - decades, actually - I've been using the rails runner to handle recurring jobs via cron.  This works, and the output of the job is emailed to me so I can keep track of what's going on.&lt;/p&gt;

&lt;p&gt;But with Rails 8 out and Solid Queue being a contender, it's time to change things up a bit and move the cron jobs to Solid Queue as recurring jobs.  But, I don't want to lose the ability to view the output easily.  I can capture the output and email it to myself.&lt;/p&gt;

&lt;p&gt;Hmm, even better, I can capture the output and keep a log.  And with AI tools I can create an agent to watch those logs for me and alert me to something that looks like trouble.&lt;/p&gt;

&lt;p&gt;This is going to rock.&lt;/p&gt;

&lt;p&gt;This particular project has a rather robust crontab with 16 entries.  most of them look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# new autocomplete stuff at 7:02AM each morning
2 7 * * *   /bin/bash -l -c 'rvm 3.2.3 --quiet &amp;amp;&amp;amp; cd $APP_HOME &amp;amp;&amp;amp; bundle exec rails runner batch/autocomplete_load.rb'

# Daily CWR ack processing
57 15 * * * /bin/bash -l -c 'rvm 3.2.3 --quiet &amp;amp;&amp;amp; cd $APP_HOME &amp;amp;&amp;amp; bundle exec rails runner CwrDestination.download_and_process_all_ack_files'

# Weekly batch scripts
15 0 * * 1  /bin/bash -l -c 'rvm 3.2.3 --quiet &amp;amp;&amp;amp; cd $APP_HOME &amp;amp;&amp;amp; bundle exec rails runner Track.make_renewals'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, these aren't implemented as ActiveJobs.  Instead, most are class methods on my models which implement effectively procedural job code, while others with no obvious home are simply scripts.  I can move them to jobs, and likely will at some point.  But, for now, I can make these work in Solid Queue as recurring jobs, anyway.  The script will be moved to a Job, but the others will simply run as "commands".&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started With Solid Queue
&lt;/h3&gt;

&lt;p&gt;We have to start by adding the &lt;a href="https://github.com/rails/solid_queue" rel="noopener noreferrer"&gt;Solid Queue&lt;/a&gt; gem to our bundle, installing it, then running the installer to set up the initial config files for our application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bundle add solid_queue
bin/rails solid_queue:install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two config files generated - &lt;code&gt;config/queue.yml&lt;/code&gt; and &lt;code&gt;config/recurring.yml&lt;/code&gt;. Additionally, &lt;code&gt;db/queue_schema.rb&lt;/code&gt; is generated, along with the &lt;code&gt;bin/jobs&lt;/code&gt; script.&lt;/p&gt;

&lt;p&gt;(See the &lt;a href="https://github.com/rails/solid_queue/blob/main/lib/generators/solid_queue/install/install_generator.rb" rel="noopener noreferrer"&gt;install_generator.rb&lt;/a&gt; file for more information)&lt;/p&gt;

&lt;p&gt;It also changes your &lt;code&gt;config/environments/production.rb&lt;/code&gt; file in two ways. First, it sets &lt;code&gt;config.active_job.queue_adapter&lt;/code&gt; to ":solid_queue", and it adds this line as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;solid_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connects_to&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;database: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;writing: :queue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That line assumes that Solid Queue will use a separate database referenced in your &lt;code&gt;config/databases.yml&lt;/code&gt; as "queue".  The Solid Queue docs show how to modify your file to accommodate this.&lt;/p&gt;

&lt;p&gt;I'm using Postgres and it's all on one server, so setting up a separate database doesn't make sense.  I simply commented that line out, causing Solid Queue to use the primary database.&lt;/p&gt;

&lt;h3&gt;
  
  
  An Aside - The Issue When Using SQL Schema Format
&lt;/h3&gt;

&lt;p&gt;As an aside, there's an issue that will arise here if you're going to have a separate database for Solid Queue and you're using the &lt;code&gt;sql&lt;/code&gt; schema format.  Solid Queue creates the "queue_schema.rb", which is the reference for the database in the &lt;code&gt;config/database.yml&lt;/code&gt; file ("queue") plus "_schema.rb".&lt;/p&gt;

&lt;p&gt;As the example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;production&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*default&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;storage/production.sqlite3&lt;/span&gt;
  &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*default&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;storage/production_queue.sqlite3&lt;/span&gt;
    &lt;span class="na"&gt;migrations_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;db/queue_migrate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm taking this aside to warn you that if you are using the &lt;code&gt;sql&lt;/code&gt; schema format the &lt;code&gt;queue_schema.rb&lt;/code&gt; file will be ignored when you try to set up the queue database.  In that case, you need to turn it into a migration and let the &lt;code&gt;db:setup&lt;/code&gt; task create the SQL schema file for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Back To Setup
&lt;/h3&gt;

&lt;p&gt;Since I'm using the same database, I won't have a separate schema file and migration directory for Solid Queue.  So, I created a migration called &lt;code&gt;add_solid_queue&lt;/code&gt;, and for its &lt;code&gt;change&lt;/code&gt; method I added all the code from the &lt;code&gt;db/queue_schema.rb&lt;/code&gt; file, which I then removed.&lt;/p&gt;

&lt;p&gt;At this point, I just have to run &lt;code&gt;bin/rails db:migrate&lt;/code&gt; to create the Solid Queue tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  The queue.yml Config File
&lt;/h3&gt;

&lt;p&gt;The runner allows for a very robust configuration, but I have a fairly simple application.  I need three workers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A general worker for sending email and doing quick jobs in the "default" queue&lt;/li&gt;
&lt;li&gt;A worker for handling audio encoding and importing - I just want to handle one of these at a time&lt;/li&gt;
&lt;li&gt;A worker to handle the old crontab stuff - again, one at a time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;My polling interval is set to 1 second for most of these as there's no reason to beat on the database for stuff that's going to take more time to run.&lt;/p&gt;

&lt;p&gt;Because the &lt;code&gt;config/queue.yml&lt;/code&gt; and &lt;code&gt;config/recurring.yml&lt;/code&gt; files are sectioned by env, I add them to the git repo.  Not necessary for what I'm doing, but makes life easier.&lt;/p&gt;

&lt;p&gt;Here's my &lt;code&gt;queue.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nl"&gt;&amp;amp;default&lt;/span&gt;
  &lt;span class="na"&gt;dispatchers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;polling_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;
  &lt;span class="na"&gt;workers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;queues&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
      &lt;span class="na"&gt;threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
      &lt;span class="na"&gt;processes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;polling_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.1&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;queues&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;encoding"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;import"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;processes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;polling_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;queues&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cron"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;processes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
      &lt;span class="na"&gt;polling_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;

&lt;span class="na"&gt;development&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*default&lt;/span&gt;

&lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*default&lt;/span&gt;

&lt;span class="na"&gt;production&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is probably not useful for you, but I wanted you to see what I've done.&lt;/p&gt;

&lt;h3&gt;
  
  
  The recurring.yml Config File
&lt;/h3&gt;

&lt;p&gt;Here's where the real fun is: the &lt;code&gt;recurring.yml&lt;/code&gt; file.  Taking the three items shown from the crontab, here's what I have in &lt;code&gt;recurring.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;production&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;autocomplete_load&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AutocompleteLoadJob&lt;/span&gt;
    &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cron&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;every&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;7:02am'&lt;/span&gt;
  &lt;span class="na"&gt;cwr_process_ack_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CwrDestination.download_and_process_all_ack_files"&lt;/span&gt;
    &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cron&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;every&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;7:10am'&lt;/span&gt;
  &lt;span class="na"&gt;make_renewals&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Track.make_renewals"&lt;/span&gt;
    &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cron&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;every&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;monday&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;12:15am'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For some items, the "schedule" is straight from the crontab.  For instance, I have one that runs at 5:15AM on the first day of each quarter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15 5 1 1,4,7,10 *   /bin/bash ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I just put that crontab day/time string directly in the "schedule" key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;quarterly_term_emails&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Track.quarterly_term_emails"&lt;/span&gt;
    &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cron&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;15&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1,4,7,10&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Logging
&lt;/h3&gt;

&lt;p&gt;All of my crontab entries just log to stdout, but that means the output will end up in a log file somewhere.  That's fine, but ultimately I want something like I had before where each one was emailed to me.&lt;/p&gt;

&lt;p&gt;Better than email is to simply log to a database table so I can look back over these later, plus keep track of errors easily.&lt;/p&gt;

&lt;p&gt;To do this, I first create a new database table and associated model called &lt;code&gt;JobLog&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CreateJobLogs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;8.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;change&lt;/span&gt;
    &lt;span class="n"&gt;create_table&lt;/span&gt; &lt;span class="ss"&gt;:job_logs&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt; &lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt; &lt;span class="ss"&gt;:job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt; &lt;span class="ss"&gt;:output&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt; &lt;span class="ss"&gt;:started_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt; &lt;span class="ss"&gt;:finished_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt; &lt;span class="ss"&gt;:success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;default: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt; &lt;span class="ss"&gt;:needs_attention&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;default: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;index: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt; &lt;span class="ss"&gt;:error_details&lt;/span&gt;

      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="n"&gt;add_index&lt;/span&gt; &lt;span class="ss"&gt;:job_logs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:created_at&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And, the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JobLog&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:recent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;created_at: :desc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:for_job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;job_class: &lt;/span&gt;&lt;span class="n"&gt;job_class&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:needs_attention&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;needs_attention: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ss"&gt;:problematic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"needs_attention = ? OR success = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;failed?&lt;/span&gt;
    &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;duration&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;finished_at&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="n"&gt;validates&lt;/span&gt; &lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;presence: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two flags here of interest: &lt;code&gt;success&lt;/code&gt; and &lt;code&gt;needs_attention&lt;/code&gt;.  If the script fails, we can set success to "false" and log the error information.  But it's also possible that the script won't "fail" but will find a problem.  In that case, we can flag this in code so that it'll be called out to the administrator later.  It might make sense to also send an email in these cases, and doing so would be an easy addition.&lt;/p&gt;

&lt;p&gt;Of course, I also created a simple view for my admin dashboard where I can see these as they complete.&lt;/p&gt;

&lt;p&gt;To do all of this, I created a concern which I can include in my jobs.  It lives in &lt;code&gt;app/jobs/concerns/job_logging.rb&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;JobLogging&lt;/span&gt;
  &lt;span class="kp"&gt;extend&lt;/span&gt; &lt;span class="no"&gt;ActiveSupport&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Concern&lt;/span&gt;

  &lt;span class="n"&gt;included&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;around_perform&lt;/span&gt; &lt;span class="ss"&gt;:capture_job_output&lt;/span&gt;
    &lt;span class="nb"&gt;attr_accessor&lt;/span&gt; &lt;span class="ss"&gt;:current_job_log&lt;/span&gt;
    &lt;span class="nb"&gt;attr_accessor&lt;/span&gt; &lt;span class="ss"&gt;:job_class&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;flag_for_attention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;
      &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;[NEEDS ATTENTION] &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="vi"&gt;@needs_attention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;alt_job_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;job_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@job_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;job_name&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;capture_job_output&lt;/span&gt;
    &lt;span class="n"&gt;original_stdout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vg"&gt;$stdout&lt;/span&gt;
    &lt;span class="n"&gt;captured_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;
    &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
    &lt;span class="n"&gt;error_details&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;nil&lt;/span&gt;
    &lt;span class="vi"&gt;@needs_attention&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;

    &lt;span class="vg"&gt;$stdout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;captured_output&lt;/span&gt;
    &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Job started at: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;begin&lt;/span&gt;
      &lt;span class="k"&gt;yield&lt;/span&gt;
    &lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
      &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;
      &lt;span class="n"&gt;error_details&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;generate_error_details&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;error_details&lt;/span&gt;
      &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="k"&gt;ensure&lt;/span&gt;
      &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;
      &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Job finished at: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
      &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"Total duration: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; seconds"&lt;/span&gt;

      &lt;span class="vg"&gt;$stdout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;original_stdout&lt;/span&gt;
      &lt;span class="n"&gt;captured_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;
      &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;captured_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;

      &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="c1"&gt;# Save to database&lt;/span&gt;
      &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current_job_log&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;JobLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="ss"&gt;job_class: &lt;/span&gt;&lt;span class="vi"&gt;@job_class&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;job_id: &lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;output: &lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;started_at: &lt;/span&gt;&lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;finished_at: &lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;success: &lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;error_details: &lt;/span&gt;&lt;span class="n"&gt;error_details&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="ss"&gt;needs_attention: &lt;/span&gt;&lt;span class="vi"&gt;@needs_attention&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_error_details&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;ERROR_DETAILS&lt;/span&gt;&lt;span class="sh"&gt;

ERROR DETAILS
============
Error class: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;class&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
Error message: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
Backtrace:
&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;backtrace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;    ERROR_DETAILS&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(note that I simplified this from an earlier version)&lt;/p&gt;

&lt;p&gt;I use the CustomWriter class in here because I want the output to go to the database log as well as the Rails logger.  With this, I can create a superclass called "LoggedJob":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoggedJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="kp"&gt;include&lt;/span&gt; &lt;span class="no"&gt;JobLogging&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then simply use it as the superclass for any jobs to be logged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AutocompleteLoadJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;LoggedJob&lt;/span&gt;
  &lt;span class="n"&gt;queue_as&lt;/span&gt; &lt;span class="ss"&gt;:cron&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Handling Commands
&lt;/h3&gt;

&lt;p&gt;We still need to handle the recurring jobs that are "commands".  The issue is that they won't be logged to our job_logs table automatically, but it's pretty easy to hook them up.&lt;/p&gt;

&lt;p&gt;First, we need to create a new superclass just for those, and it'll in turn be based on the class already used for recurring commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RecurringLoggedJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;SolidQueue&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RecurringJob&lt;/span&gt;
  &lt;span class="kp"&gt;include&lt;/span&gt; &lt;span class="no"&gt;JobLogging&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;alt_job_name&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that I'm using &lt;code&gt;alt_job_name&lt;/code&gt; - that was added specifically for these jobs.  Since the job logger uses the job name by default, all of these end up showing up as "RecurringJob" unless we do something different.  In this case, we just take the command that's passed in and log it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Administration
&lt;/h3&gt;

&lt;p&gt;I'm going to copy in the controller and views here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Admin::JobLogsController&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Admin&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ApplicationController&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;index&lt;/span&gt;
    &lt;span class="vi"&gt;@job_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;JobLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recent&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt;
      &lt;span class="vi"&gt;@job_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;for_job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:filter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="s1"&gt;'attention'&lt;/span&gt;
      &lt;span class="vi"&gt;@job_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;needs_attention&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="s1"&gt;'problematic'&lt;/span&gt;
      &lt;span class="vi"&gt;@job_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;problematic&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="vi"&gt;@job_logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;add_pagination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show&lt;/span&gt;
    &lt;span class="vi"&gt;@job_log&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;JobLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erb"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Job Logs&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;

&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;form_tag&lt;/span&gt; &lt;span class="n"&gt;admin_job_logs_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;method: :get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;class: &lt;/span&gt;&lt;span class="s1"&gt;'mb-4'&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"row"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"col"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;select_tag&lt;/span&gt; &lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                     &lt;span class="n"&gt;options_for_select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;JobLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pluck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:job_class&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; 
                     &lt;span class="ss"&gt;prompt: &lt;/span&gt;&lt;span class="s2"&gt;"All Jobs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="ss"&gt;class: &lt;/span&gt;&lt;span class="s1"&gt;'form-select'&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"col"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;select_tag&lt;/span&gt; &lt;span class="ss"&gt;:filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="n"&gt;options_for_select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'All'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Needs Attention'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'attention'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'Problematic (Failed or Needs Attention)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'problematic'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                     &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:filter&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                     &lt;span class="ss"&gt;class: &lt;/span&gt;&lt;span class="s1"&gt;'form-select'&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;submit_tag&lt;/span&gt; &lt;span class="s2"&gt;"Filter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;class: &lt;/span&gt;&lt;span class="s1"&gt;'btn btn-primary my-2'&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;table&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"table table-striped"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;thead&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;tr&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;th&amp;gt;&lt;/span&gt;Job&lt;span class="nt"&gt;&amp;lt;/th&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;th&amp;gt;&lt;/span&gt;Started&lt;span class="nt"&gt;&amp;lt;/th&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;th&amp;gt;&lt;/span&gt;Duration&lt;span class="nt"&gt;&amp;lt;/th&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;th&amp;gt;&lt;/span&gt;Status&lt;span class="nt"&gt;&amp;lt;/th&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;th&amp;gt;&amp;lt;/th&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/tr&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/thead&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;tbody&amp;gt;&lt;/span&gt;
    &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;tr&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="s1"&gt;'table-warning'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;needs_attention&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;td&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;job_class&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;td&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%Y-%m-%d %l:%M:%S%p %Z'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"text-end"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;duration&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;td&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"badge bg-danger"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Failed&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;needs_attention&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"badge bg-warning text-dark"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Needs Attention&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"badge bg-success"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Success&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;td&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="s2"&gt;"View"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;admin_job_log_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="ss"&gt;class: &lt;/span&gt;&lt;span class="s1"&gt;'btn btn-sm btn-primary'&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;/tr&amp;gt;&lt;/span&gt;
    &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/tbody&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/table&amp;gt;&lt;/span&gt;

&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;will_paginate&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;renderer: &lt;/span&gt;&lt;span class="no"&gt;WillPaginate&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ActionView&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Bootstrap4LinkRenderer&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@job_logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond_to?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:total_pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;-%&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight erb"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"job-log"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Job Log: &lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;job_class&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card mb-4"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card-body"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;dl&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Job ID&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dd&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;job_id&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&lt;/span&gt;

        &lt;span class="nt"&gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Started At&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dd&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%Y-%m-%d %l:%M:%S%p %Z'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&lt;/span&gt;

        &lt;span class="nt"&gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Duration&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dd&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;duration&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt; seconds&lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&lt;/span&gt;

        &lt;span class="nt"&gt;&amp;lt;dt&amp;gt;&lt;/span&gt;Status&lt;span class="nt"&gt;&amp;lt;/dt&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dd&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;success&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"badge bg-success"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Success&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"badge bg-danger"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Failed&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
          &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dd&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;/dl&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card-header"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      Output
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card-body"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;output&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

  &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error_details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;present?&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card mt-4"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card-header text-white bg-danger"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        Error Details
      &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"card-body"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;pre&amp;gt;&amp;lt;code&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="vi"&gt;@job_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error_details&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="cp"&gt;&amp;lt;%&lt;/span&gt; &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

&lt;span class="cp"&gt;&amp;lt;%=&lt;/span&gt; &lt;span class="n"&gt;link_to&lt;/span&gt; &lt;span class="s1"&gt;'Back'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;admin_job_logs_path&lt;/span&gt; &lt;span class="cp"&gt;%&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  In Production
&lt;/h3&gt;

&lt;p&gt;You'll also have to make this work in production.  I'm using Capistrano to deploy, and this simple tutorial from &lt;a href="https://www.zolkos.com/" rel="noopener noreferrer"&gt;Rob Zolkos&lt;/a&gt; shows how to set it up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.zolkos.com/2024/02/21/how-i-deploy-solid-queue-with-capistrano" rel="noopener noreferrer"&gt;https://www.zolkos.com/2024/02/21/how-i-deploy-solid-queue-with-capistrano&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're using kamal or another deployment method it will be different.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Moving from old style cron to recurring jobs with Solid Queue is a great way to keep your entire application in one place.  I used to have to worry about the crontab when moving servers and such, and I got too many emails every day.  This takes care of them.  I have also created a simple job to check every day and email me any jobs where "success" is false or "needs_attention" is true.&lt;/p&gt;

&lt;p&gt;After I get a few weeks of these I'll set up a simple agent using an LLM that'll view the logs and alert me when something seems off.&lt;/p&gt;

&lt;p&gt;Solid Queue also helped me remove the redis dependency, which is important moving forward.&lt;/p&gt;

&lt;p&gt;Let me know below if you have questions, or reach out on X.&lt;/p&gt;

</description>
      <category>rails</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Moving From auto_strip_attributes to normalizes</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Sat, 31 Aug 2024 16:05:44 +0000</pubDate>
      <link>https://dev.to/mdchaney/moving-from-autostripattributes-to-normalizes-dp6</link>
      <guid>https://dev.to/mdchaney/moving-from-autostripattributes-to-normalizes-dp6</guid>
      <description>&lt;p&gt;I ran across the excellent "&lt;a href="https://github.com/holli/auto_strip_attributes" rel="noopener noreferrer"&gt;auto_strip_attributes&lt;/a&gt;" gem years ago, and it's been a staple gem that gets added to my Gemfile right off the bat when making a new project.  I even have a standard initializer that adds some extra functionality to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;AutoStripAttributes&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setup&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;set_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;fix_curly_quotes: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blank?&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond_to?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:gsub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\u201c\u201d]/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'"'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\u2018\u2019]/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'\''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="n"&gt;set_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;downcase: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blank?&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond_to?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:downcase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="n"&gt;set_filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;unicode_normalize: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blank?&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond_to?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:unicode_normalize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These should be self-explanatory, but just in case:&lt;/p&gt;

&lt;p&gt;fix_curly_quotes - Changes Unicode curly double quotes “” to a standard ASCII quotation mark ", and changes Unicode curly single quotes ‘’ to a standard ASCII apostrophe.&lt;/p&gt;

&lt;p&gt;downcase - Force text to lowercase.&lt;/p&gt;

&lt;p&gt;unicode_normalize - Normalizes Unicode characters to the "nfc" form.&lt;/p&gt;

&lt;p&gt;This has been an awesome tool in the tool belt for years now.  With the advent of Rails 7.1, we now have "normalizes" built in to ActiveModel.  It takes the place of auto_strip_attributes, but there are some differences that require your attention if you're changing from one to the other.&lt;/p&gt;

&lt;p&gt;Here's a standard setup for auto_strip_attributes that is pulled from my code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Track&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;auto_strip_attributes&lt;/span&gt; &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:original_title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:artist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;squish: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;nullify: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;fix_curly_quotes: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;

  &lt;span class="n"&gt;auto_strip_attributes&lt;/span&gt; &lt;span class="ss"&gt;:lyric_notes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;nullify: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;fix_curly_quotes: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;squish: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;

  &lt;span class="n"&gt;auto_strip_attributes&lt;/span&gt; &lt;span class="ss"&gt;:lyrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;nullify: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is all normal stuff.  In general, you don't want to "squish" multi-line text blocks and if a field can be nullified then you add "nullify: true" to leave it as "null" if there is no value.&lt;/p&gt;

&lt;p&gt;One serious "gotcha" here is that there is no checking for bad filter names.  If you misspell a filter name it will be silently ignored.&lt;/p&gt;

&lt;p&gt;The code for this gem is really simple, and you can view it &lt;a href="https://github.com/holli/auto_strip_attributes/blob/master/lib/auto_strip_attributes.rb" rel="noopener noreferrer"&gt;here&lt;/a&gt;.  It creates a "before_validation" callback to apply the filters right before validation.&lt;/p&gt;

&lt;p&gt;Keep that in mind.&lt;/p&gt;

&lt;p&gt;Starting in Rails 7.1 we have "&lt;a href="https://github.com/rails/rails/blob/5c0b7496ab32c25c80f6d1bdc8b32ec6f75ce1e4/activerecord/lib/active_record/normalization.rb" rel="noopener noreferrer"&gt;normalizes&lt;/a&gt;" as a declaration to make in our models.  It provides the same functionality, but in a different manner.&lt;/p&gt;

&lt;p&gt;From a different project, also with a "Track" model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Track&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:lyrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:lyrics_chorus_only&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is pretty similar, but there are differences:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;There are no "built-in" normalizers.  You can munge the string with any methods on string or create a lambda/proc that'll do what you need.&lt;/li&gt;
&lt;li&gt;There are no "default" normalizers.  auto_strip_attributes strips by default, "normalizes" has no defaults.&lt;/li&gt;
&lt;li&gt;By default, nils are ignored, so you don't have to worry about getting something other than a string.  However, you can use the "apply_to_nil" option to force your normalizer to be called even with nil values.&lt;/li&gt;
&lt;li&gt;Each attribute has its own declaration.  This is interesting because the model class has a method "normalize_value_for" which accepts an attribute name plus a value, allowing you to "normalize" a random string as if it were the named attribute.&lt;/li&gt;
&lt;li&gt;Big one: the normalizer is applied when the attribute is assigned a value.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  5 has an interesting side effect which is useful.  Consider this code:
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Artist&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unicode_normalize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="no"&gt;Artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;name: &lt;/span&gt;&lt;span class="s2"&gt;"Beyoncé"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, we have a record where the name is "beyoncé".  What does this code do?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;beyonce&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Artist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_or_initialize_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;name: &lt;/span&gt;&lt;span class="s2"&gt;" Beyoncé "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the big difference between the two.  If you're using "auto_strip_attributes" here you'll get a new record.  Using "normalizes" - the existing record will be found instead as it'll run " Beyoncé " through the normalizer before doing the find.&lt;/p&gt;

&lt;p&gt;This isn't a gotcha, though.  It's very desired functionality, as it allows you to perform this function without duplicating somehow the normalization code.&lt;/p&gt;

&lt;p&gt;These are both great ways to accomplish normalizing your attributes, which is a very important aspect of bringing data from outside into your database.  I am moving to "normalizes" for all new code simply because it's one less gem, but I still have lots of code using "auto_strip_attributes".&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>rails</category>
    </item>
    <item>
      <title>Adding A Confirmation Interstitial On Create in REST</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Wed, 24 Jul 2024 04:31:23 +0000</pubDate>
      <link>https://dev.to/mdchaney/adding-a-confirmation-interstitial-on-create-in-rest-4an0</link>
      <guid>https://dev.to/mdchaney/adding-a-confirmation-interstitial-on-create-in-rest-4an0</guid>
      <description>&lt;p&gt;On X, Tom Rossi asked about handling a confirmation page on a form while conforming to RESTful routes.&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1814001018923004250-819" src="https://platform.twitter.com/embed/Tweet.html?id=1814001018923004250"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1814001018923004250-819');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1814001018923004250&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;p&gt;This is an excellent question and one which I've answered in a couple of different ways in various projects.  But, I'm a bit of a REST snob these days and I want any solution to be as RESTy as possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's UnRESTy About It?
&lt;/h3&gt;

&lt;p&gt;Well, that's a good question.  Let's take the kind of straightforward method of doing this and consider it in the context of REST and Ruby on Rails, with a simple "todo list" application.&lt;/p&gt;

&lt;p&gt;The TodoList application has a single model called "Todo".  Each todo record has a varchar field called "body" and a boolean field called "finished".  Very simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CreateTodos&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Migration&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;7.1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;change&lt;/span&gt;
    &lt;span class="n"&gt;create_table&lt;/span&gt; &lt;span class="ss"&gt;:todos&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt; &lt;span class="ss"&gt;:body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;limit: &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;
      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt; &lt;span class="ss"&gt;:finished&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;null: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;default: &lt;/span&gt;&lt;span class="kp"&gt;false&lt;/span&gt;

      &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The model:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Todo&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;validates&lt;/span&gt; &lt;span class="ss"&gt;:body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;presence: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;length: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;maximum: &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With the standard scaffold, we can now create, list, examine, edit, and delete todos.&lt;/p&gt;

&lt;p&gt;If we hit the route &lt;code&gt;/todos/new&lt;/code&gt;, we'll get a form which we can fill in to create a new item.  When we submit the form - which is a POST request to the &lt;code&gt;/todos&lt;/code&gt; route, the item is checked for validity and created if it's valid.  This is standard REST.&lt;/p&gt;

&lt;p&gt;How do you add a confirmation page in there?&lt;/p&gt;

&lt;p&gt;This is where it gets unRESTy.  Let's consider the flow with the interstitial in place:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User hits the "new" route, gets the form&lt;/li&gt;
&lt;li&gt;User fills out the form, hits "Save"&lt;/li&gt;
&lt;li&gt;If information doesn't pass validation, the user is shown the form again along with error information so they can correct it.&lt;/li&gt;
&lt;li&gt;If the information passes validation, the user is shown the information again, and given the choice to a) Really Save or b) Make Changes&lt;/li&gt;
&lt;li&gt;If "Make Changes", go back to the form with the information filled in to allow changes&lt;/li&gt;
&lt;li&gt;If "Really Save", save the information to the database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One aspect of this to consider is that the user wants to see any validation errors at step 3.  This brings up an issue because we then have to validate the data somehow after step 2 but before it's actually saved.  With RoR, this requires a bit of thought because it's easier to wait until step 6.  But users will complain that your application is moronic if they don't see errors until that late in the game.  I might agree with them.&lt;/p&gt;

&lt;p&gt;There are multiple ways to handle this.  I have created a simple Rails app to demonstrate all of them:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/mdchaney" rel="noopener noreferrer"&gt;
        mdchaney
      &lt;/a&gt; / &lt;a href="https://github.com/mdchaney/todo_with_confirmation" rel="noopener noreferrer"&gt;
        todo_with_confirmation
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      This is a sample rails app for a todo app with confirmation
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;TODO - Basic&lt;/h1&gt;

&lt;/div&gt;

&lt;p&gt;This ia a simple todo application which is discussed in this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/mdchaney/adding-a-confirmation-interstitial-on-create-in-rest-4an0" rel="nofollow"&gt;https://dev.to/mdchaney/adding-a-confirmation-interstitial-on-create-in-rest-4an0&lt;/a&gt;&lt;/p&gt;

&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/mdchaney/todo_with_confirmation" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;h4&gt;
  
  
  The "please" parameter
&lt;/h4&gt;

&lt;p&gt;Add a "please" parameter to the form, and only perform the "create" action if it is present. This allows you to show the information again and stick it all in hidden fields along with your "please" parameter.  Then, the user can hit "Really Save" and it'll hit the same create route and actually save it.&lt;/p&gt;

&lt;p&gt;This works (I know, I did it 20 something years ago) but it's unRESTy and makes me want to shower after I write it.  The "create" route may or may not create the item depending on the presence of the "please" parameter.&lt;/p&gt;

&lt;p&gt;A challenge with this method is that after you have the data payload it's required to use the HTTP post method to make sure your data is moved forward continuously.  That means that if they hit "Make Changes" you need to either a) allow the "new" route to accept a post with some data or b) add a different route.  In the sample app I use a "redo" route that accepts a post.&lt;/p&gt;

&lt;p&gt;If you think about it in terms of an API, the whole thing is kind of silly because the client can simply send the "please" parameter the first time and not worry about confirmation, anyway.  More on this later.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hit A Different Route For the Interstitial
&lt;/h4&gt;

&lt;p&gt;This is RESTier than the former solution.  The idea here is that the form doesn't just POST to the &lt;code&gt;/todos&lt;/code&gt; route (which is "create" in RESTland).  Instead, it posts to another route - such as &lt;code&gt;/todos/confirm&lt;/code&gt; - which validates the data and handles steps 3 and 4.  Then, if the user is happy with the data, they can hit "Really Save" and the data is posted to the real "create" route.  Note that this still requires the extra "redo" route as well.&lt;/p&gt;

&lt;p&gt;This is definitely the RESTier way to do it.  But one issue that this and the former method have in common is that the data is changeable and the user still has the capability of doing silly stuff.  They're not constrained by the interface.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hit A Different Route For the Interstitial, and Sign the Data
&lt;/h4&gt;

&lt;p&gt;This fixes the problems with the second method.  In this method, the data from the form is encrypted and signed, then stored in a hidden field.  The data is shown to the user, but the only data that is transmitted back to the server on the "create" route is an encrypted (or just signed) blob that can be used to recreate the data.&lt;/p&gt;

&lt;p&gt;This strikes me as less RESTy, and the reason is that the I feel (yes, this is purely subjective) that the data that is sent to the "create" route should be the data for the record in form data or JSON format.&lt;/p&gt;

&lt;p&gt;There is another way that gets around that issue, and that is to simply create a cryptographic signature on the back end in step 4 which is sent back to the confirmation form along with the field data, then verified before the item is created.  That's a little more work as you have to make sure to create the signature in a repeatable manner.&lt;/p&gt;

&lt;p&gt;So, this is a reasonable way to do it, but it's still not as RESTy as I'd like.  But I think it's RESTy enough.&lt;/p&gt;

&lt;p&gt;I know I sound a bit anal about making sure the data wasn't changed, but this really is a UI issue that we're handling between the front end and the back end.  The back end ultimately decides whether to accept a set of data or not based on its validations, and I personally believe that the UI part that makes this tricky is handling the fact that after the back end vouches for a set of data and says "it validated" we want to make sure that's the set of data that is later sent to it to be saved.  It'll help prevent some weird errors.&lt;/p&gt;

&lt;p&gt;Going back to the API - if you're creating something like this using React then the front end has to really handle everything.  In that case, you can think about the discrete components easier:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The front end sends a set of data to the back end for validation&lt;/li&gt;
&lt;li&gt;If it's not valid, a set of errors are returned&lt;/li&gt;
&lt;li&gt;If it is valid, a signature is sent to the front end that can be sent along with the final "create" to make it save&lt;/li&gt;
&lt;li&gt;When the user hits "Really Save", the signature is sent along with the form data and the database saves the information.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Alternately, this also works if the server just sends the binary blob representing the marshalled/encrypted/signed data.&lt;/p&gt;

&lt;p&gt;But, isn't this just the "please" parameter all over again?  No.  The reason it isn't is that the "please" parameter can be easily faked and sent to the server at any time.  This method forces the front end to first ask the back end "is this data valid?", and then send the proof of validity (the signature) back along with the data so the server can trust that it validated it before.&lt;/p&gt;

&lt;h3&gt;
  
  
  How To Accomplish This
&lt;/h3&gt;

&lt;p&gt;Again, for me the easiest way to handle this is the binary blob.  I use this pattern in multiple places, and sometimes it's not for confirmation directly.  I have some multi-step forms that require some data to be entered, and then saved across the next page which might do something completely different.  It's again important to ensure that after the server has vouched for the data that it get the exact same data next time.&lt;/p&gt;

&lt;p&gt;Another common place to use this is in ordering.  I don't want to create the order in the database until payment is secured.  And I don't want the user to be able to edit the form and get free items or something like that.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using MessageEncryptor
&lt;/h3&gt;

&lt;p&gt;Rails provides &lt;code&gt;ActiveSupport::MessageEncryptor&lt;/code&gt; to help encrypt and sign.  The basic framework is this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a key based on your secret key base&lt;/li&gt;
&lt;li&gt;Generate a secret based on a salt value that you provide&lt;/li&gt;
&lt;li&gt;Use this to encrypt and sign your value&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Later:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use the same "secret" to decrypt and verify the signature
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TodoWithBlob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;normalizes&lt;/span&gt; &lt;span class="ss"&gt;:body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;with: &lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encryptor&lt;/span&gt;
    &lt;span class="vi"&gt;@encryptor&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt;
      &lt;span class="k"&gt;begin&lt;/span&gt;
        &lt;span class="n"&gt;salt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"very salty salt"&lt;/span&gt;
        &lt;span class="n"&gt;key_gen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ActiveSupport&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;KeyGenerator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;application&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;secret_key_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;iterations: &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key_gen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;salt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="vi"&gt;@encryptor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;ActiveSupport&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;MessageEncryptor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;as_json_string&lt;/span&gt;
    &lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;except&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"updated_at"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to_json&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;as_encrypted_blob&lt;/span&gt;
    &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encryptor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encrypt_and_sign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_json_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encrypted_blob_to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypted_blob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encryptor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decrypt_and_verify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypted_blob&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_encrypted_blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypted_blob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypted_blob_to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encrypted_blob&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="n"&gt;validates&lt;/span&gt; &lt;span class="ss"&gt;:body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;presence: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;length: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;maximum: &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also use this basic pattern when handling orders that need to be validated but not saved in the database until the payment has processed. &lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/hopsoft/universalid" rel="noopener noreferrer"&gt;Universal ID gem&lt;/a&gt; will be usable here as well when signing is added to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;We've looked at some various ways to try to keep our app as RESTy as possible while providing a decent UI to the user.  The UI shows the user any errors immediately at form submission and assumes that when they confirm their input on the next page it will be valid.  I strongly recommend one of the latter two options (signature or signed blob) to ensure that the data isn't tampered with intentionally or accidentally between validation on the server and ultimately saving it.&lt;/p&gt;

&lt;p&gt;When I get time I'll use a similar technique to make a multi-page form.&lt;/p&gt;

</description>
      <category>rails</category>
      <category>ruby</category>
      <category>restapi</category>
    </item>
    <item>
      <title>Considerations for Unicode and Searching</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Thu, 04 Jul 2024 23:06:41 +0000</pubDate>
      <link>https://dev.to/mdchaney/considerations-for-unicode-and-searching-jo4</link>
      <guid>https://dev.to/mdchaney/considerations-for-unicode-and-searching-jo4</guid>
      <description>&lt;p&gt;In previous posts here and on Twitter (now X) I've written about using Unicode and UTF-8 in general application development.  In this article I'm going to expand a bit on how to handle searching when Unicode is present.&lt;/p&gt;

&lt;p&gt;To quickly summarize using Unicode in modern application development - it's probably something you won't even have to think about if you're using a modern high-level language.  Strings are almost universally assumed to be in UTF-8 format, so everything just works.  The biggest problem is when importing data from outside - there's still a ton of code out there that is writing using older 8-bit encodings (generally Latin-1/ISO-8859-1 in the western world) so it's best to check your data and force it to UTF-8 before processing.&lt;/p&gt;

&lt;p&gt;With that out of the way, let's consider "Beyoncé".  Why?  For purposes of this discussion simply because that name has a &lt;a href="https://www.compart.com/en/unicode/U+00E9"&gt;Latin Small Letter E with Acute&lt;/a&gt; character - also known as "eacute" - in it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;michael-chaneys-computer-2:~ mdchaney&lt;span class="nv"&gt;$ &lt;/span&gt;hexdump &lt;span class="nt"&gt;-C&lt;/span&gt;
Beyoncé
00000000  42 65 79 6f 6e 63 c3 a9  0a                       |Beyonc...|
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this hex dump, "c3 a9" is the two-byte UTF-8 encoding for unicode character E9.  In a text file, on this web site's database, etc. that will be stored as the two bytes that you see above.&lt;/p&gt;

&lt;p&gt;Here's the issue, though.  If I'm searching for "Beyoncé" - how do I even get a funky accent thing above the "e"?  I'll tell you how if you're on a Mac - press "option e", let them both up, then press the vowel over which you wish to place the acute symbol.  In other words, option-e, then e.  Beyoncé.&lt;/p&gt;

&lt;p&gt;So, that's it, right?&lt;/p&gt;

&lt;p&gt;Of course not.  Let's say I have a music database (I really do have a bunch of them) and Beyoncé is in there.  I want people to be able to search for her name and find her music.  The problem?  I'll tell you what the problem is - a bunch of Neanderthals using the internet that don't know that little Mac trick that I learned 2 minutes ago about typing an e-with-an-acute-mark from my keyboard.&lt;/p&gt;

&lt;p&gt;Okay, I'll confess.  Even I would just type in "beyonce" if I was searching for her music.  Not even a capital B, much less the funky acute mark over the e.  Google has been so good for so long that we expect that search to just work because it always has at Google.&lt;/p&gt;

&lt;p&gt;So, how do we handle this when we write the search code?&lt;/p&gt;

&lt;p&gt;Well, what are we using for search?&lt;/p&gt;

&lt;p&gt;I want to spend the brunt of this article talking about how to do this in Postgres, partly because it's a little more difficult there.  But let me start in &lt;a href="https://solr.apache.org/"&gt;Apache Solr&lt;/a&gt;, which is where I first worked on these issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apache Solr Configuration
&lt;/h2&gt;

&lt;p&gt;Solr is based on the &lt;a href="https://lucene.apache.org/"&gt;Lucene&lt;/a&gt; library, as is &lt;a href="https://www.elastic.co/"&gt;Elastic&lt;/a&gt;.  I mention this because these two search engines provide most private web site search capabilities on the internet.  I started using Lucene around 25 years ago in Perl - it works well.&lt;/p&gt;

&lt;p&gt;In Apache Solr, there's a filter that can be added to the config for a search index called the "&lt;a href="https://github.com/Orbiter/lucene-solr/blob/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java"&gt;ASCIIFoldingFilter&lt;/a&gt;".  The code is a sight to behold, as it contains the logic for turning "Beyoncé" into "Beyonce".  There's another case folder to further turn that into "beyonce".&lt;/p&gt;

&lt;p&gt;Go ahead and check out the source code linked above.  I would direct your attention specifically to line 434 which matches "é", aka "latin small letter e with acute".  If you scroll all the way to line 474 you'll find that the "é" along with 40 other friends gets mapped to a plain old "e".&lt;/p&gt;

&lt;p&gt;You've probably caught on even though I haven't explicitly stated it yet: for purposes of searching we have to get our text into the simplest form possible both for storage as well as for searching.  After we munge both sets of data in the same way, we end up with this being stored:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;beyonce
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And any of these terms matching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Beyoncé
beyoncé
Beyonce
beyonce
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oddly enough, a bunch of things map to the plain old "e", so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Beyoncǝ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;also works.  If you want to &lt;em&gt;really&lt;/em&gt; get nuts, Imagine all the things that map to any of those letters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ⓑǝẙǾɳꜾᶔ
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run that string through the solr.ASCIIFoldingFilter, you'll get "beyonce" out the other side.&lt;/p&gt;

&lt;p&gt;I'm not saying that so that you can do it, although nobody's stopping you.  The point is that there are many characters that map to any single ASCII character in this scenario.  They will be stored in plain ASCII where possible.&lt;/p&gt;

&lt;p&gt;There are plenty of Unicode characters that &lt;em&gt;won't&lt;/em&gt; be changed to anything else, by the way.  Emojis are a common group of characters that'll go straight through.  Chinese characters (regardless of what the multi-lingual genius at the tattoo parlor tells you) don't actually map to ASCII characters and vice versa.  Same with Korean, Thai, Arabic, etc.  In fact, the vast majority of Unicode characters are untouched by ASCIIFoldingFilter.&lt;/p&gt;

&lt;p&gt;But, for standard text from the Western World (The Americas and Europe) the characters will map down to the closest ASCII equivalent.  Accents, umlauts, little circles - all those things get dropped for indexing.&lt;/p&gt;

&lt;p&gt;Apache Solr is a great search engine and I've used it extensively for well over a decade.  It handles all of this text munging extremely well.  But I also use PostgreSQL and it has a very robust full-text search engine built right in that might be good for your project, especially if you're already using it as your RDBMS data store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring PostgreSQL For Search
&lt;/h2&gt;

&lt;p&gt;I love Postgres and use it just about everywhere.  But the full-text search functionality requires a little bit of work to handle text in the manner that you want it to.  Thankfully, their documentation is great, AI assistants such as ChatGPT and Claude seem to know everything about this subject, and Postgres itself comes with some awesome tools to make testing a breeze.&lt;/p&gt;

&lt;p&gt;Solr comes with various filters that you can use to munge your input text into the output text. Postgres has fewer available options, but the default setup is pretty good for English text.  This includes a standard "stopword" dictionary to keep common words out of your index, implement case folding, and word stemming.&lt;/p&gt;

&lt;p&gt;Relevant to our discussion today, it also comes with the "unaccent" dictionary to handle fixing "Beyoncé", and "mañana", and whatever else you might come up with.  But it's not included by default, so you'll have to do a little work.&lt;/p&gt;

&lt;p&gt;Let's first see how it handles all of this by default.&lt;/p&gt;

&lt;p&gt;Postgres includes the excellent "ts_debug" function which allows you to see exactly what it thinks about your text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;img3_dev=&amp;gt; select * from ts_debug('Beyoncé is coming mañana');
   alias   |    description    |  token  |  dictionaries  |  dictionary  |  lexemes  
-----------+-------------------+---------+----------------+--------------+-----------
 word      | Word, all letters | Beyoncé | {english_stem} | english_stem | {beyoncé}
 blank     | Space symbols     |         | {}             |              | 
 asciiword | Word, all ASCII   | is      | {english_stem} | english_stem | {}
 blank     | Space symbols     |         | {}             |              | 
 asciiword | Word, all ASCII   | coming  | {english_stem} | english_stem | {come}
 blank     | Space symbols     |         | {}             |              | 
 word      | Word, all letters | mañana  | {english_stem} | english_stem | {mañana}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows the default parser in action.  "is" is a stop word, so it gets kicked out.  "coming" is run through the default stemmer and is normalized as "come".  But "Beyoncé" and "mañana" pass through unchanged save for being downcased.&lt;/p&gt;

&lt;p&gt;Now, one can make the case that letters like "é" and "ñ" don't occur in regular English so this is the proper functionality.  I'm handling searches where names are a common element in the search, and accented characters are much more common in names.  Even if we restrict ourselves to the United States where English is the most common language, clearly Spanish is the second most common and French is quite common as well.&lt;/p&gt;

&lt;p&gt;It's not just Beyoncé - Michael Bublé has an e-with-acute in his name as well.  And many others.&lt;/p&gt;

&lt;p&gt;Just searching Google for "singers with accents in their names" brings up a litany of issues from programmers handling this situation.  Look at this one, which also seems to be based on lack of Unicode normalization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.discogs.com/forum/thread/826212"&gt;https://www.discogs.com/forum/thread/826212&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's someone talking about this problem in 2009:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://musicmachinery.com/2009/04/10/removing-accents-in-artist-names/"&gt;https://musicmachinery.com/2009/04/10/removing-accents-in-artist-names/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anyway, the point is that even if you're handling straight English text, removing diacritics is important for name matching.  And since the English language doesn't have diacritics, it won't really hurt to turn those letters into their straight ASCII equivalent since they won't exist outside of names or some non-English words.&lt;/p&gt;

&lt;p&gt;We have to create the "unaccent" extension in our database, create the new text search configuration, and finally modify it to add "unaccent" to the mix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
img3_dev=&amp;gt; CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE EXTENSION
img3_dev=&amp;gt; CREATE TEXT SEARCH CONFIGURATION music_search ( COPY = english );
CREATE TEXT SEARCH CONFIGURATION
img3_dev=&amp;gt; ALTER TEXT SEARCH CONFIGURATION music_search
  ALTER MAPPING FOR hword, hword_part, word
  WITH unaccent, english_stem;
ALTER TEXT SEARCH CONFIGURATION
img3_dev=&amp;gt; select * from ts_debug('music_search', 'Beyoncé is coming mañana');
   alias   |    description    |  token  |      dictionaries       |  dictionary  |  lexemes  
-----------+-------------------+---------+-------------------------+--------------+-----------
 word      | Word, all letters | Beyoncé | {unaccent,english_stem} | unaccent     | {Beyonce}
 blank     | Space symbols     |         | {}                      |              | 
 asciiword | Word, all ASCII   | is      | {english_stem}          | english_stem | {}
 blank     | Space symbols     |         | {}                      |              | 
 asciiword | Word, all ASCII   | coming  | {english_stem}          | english_stem | {come}
 blank     | Space symbols     |         | {}                      |              | 
 word      | Word, all letters | mañana  | {unaccent,english_stem} | unaccent     | {manana}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that ts_debug isn't perfect.  If you run this through &lt;code&gt;to_tsvector&lt;/code&gt;, you get a slightly different result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;img3_dev=&amp;gt; select to_tsvector('music_search', 'Beyoncé is coming mañana');
          to_tsvector           
--------------------------------
 'beyonc':1 'come':3 'manana':4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that the stemming algorithm turns "beyonce" into "beyonc", and that's fine.  The words will match.&lt;/p&gt;

&lt;p&gt;To show you how this works, consider the following queries where I use the standard built-in "english" search configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;img3_dev=&amp;gt; select to_tsvector('english', 'Beyoncé is coming mañana') @@  
    to_tsquery('english', 'beyoncé');
 ?column? 
----------
 t
(1 row)

img3_dev=&amp;gt; select to_tsvector('english', 'Beyoncé is coming mañana') @@  
    to_tsquery('english', 'beyonce');
 ?column? 
----------
 f
(1 row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we search for "beyoncé", it's a match, but if we search for "beyonce" there's no match.&lt;/p&gt;

&lt;p&gt;Switching to our new "music_search" configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;img3_dev=&amp;gt; select to_tsvector('music_search', 'Beyoncé is coming mañana') @@  
    to_tsquery('music_search', 'beyonce');
 ?column? 
----------
 t
(1 row)

img3_dev=&amp;gt; select to_tsvector('music_search', 'Beyoncé is coming mañana') @@  
    to_tsquery('music_search', 'beyoncé');
 ?column? 
----------
 t
(1 row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There it is - works either way.&lt;/p&gt;

&lt;p&gt;I would also note that the "unaccent" dictionary in Postgres has fewer mappings than the ASCIIFoldingFactory in Solr.  But, quite a few are still there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;img3_dev=&amp;gt; select to_tsvector('music_search', 'Beyoncé is coming mañana') @@ 
    to_tsquery('music_search', 'Ƃⅇẙỗɳcᶔ');
 ?column? 
----------
 t
(1 row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to see the dictionary, Postgres has these stored under your "share" directory, basically ".../share/postgresqlxx/tsearch_data", where "xx" is the version number.  Depending on the installation, this will likely be under "/usr", but it might be under "/usr/local" or wherever Postgres is installed.&lt;/p&gt;

&lt;p&gt;You can view the "unaccent.rules" file to see the list.&lt;/p&gt;

&lt;p&gt;As an aside, if you want to gꝏf around (see what I did there?) with it and come up with weird ways to spell "Beyoncé" so it'll match, use "grep":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;michael-chaneys-computer-2:dl5 mdchaney&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'o$'&lt;/span&gt; /opt/local/share/postgresql16/tsearch_data/unaccent.rules 
ò  o
ó  o
ô  o
õ  o
ö  o
ø  o
ō  o
ŏ  o
ő  o
ơ  o
ǒ  o
ǫ  o
ǭ  o
ȍ  o
ȏ  o
ȫ  o
ȭ  o
ȯ  o
ȱ  o
ṍ o
ṏ o
ṑ o
ṓ o
ọ o
ỏ o
ố o
ồ o
ổ o
ỗ o
ộ o
ớ o
ờ o
ở o
ỡ o
ợ o
℅ c/o
№ No
ℴ o
ⱺ o
ꜵ ao
ꝋ o
ꝍ o
ꝏ oo
ｏ o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That shows you all the characters that will map to "o".  Do that for all the letters, choose your favs, copy/paste, you get the idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outtro
&lt;/h2&gt;

&lt;p&gt;There's a lot to configuring a full-text search application on Postgres, but if you're going to search names this is pretty much required.  And it's not difficult.&lt;/p&gt;

&lt;p&gt;Feel free to hit me up in comments here or on X (@MichaelDChaney) for any further ideas that you might need help with.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>unicode</category>
    </item>
    <item>
      <title>Making Common Table Expression SQL More Railsy</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Sun, 30 Jun 2024 22:53:39 +0000</pubDate>
      <link>https://dev.to/mdchaney/making-common-table-expression-sql-more-railsy-363j</link>
      <guid>https://dev.to/mdchaney/making-common-table-expression-sql-more-railsy-363j</guid>
      <description>&lt;p&gt;In our &lt;a href="https://dev.to/mdchaney/inserting-and-selecting-new-records-one-query-2m4"&gt;last episode&lt;/a&gt;, I talked about using the common table expression  syntax to make a query run much faster and allow me to insert and query the new records at the same time.&lt;/p&gt;

&lt;p&gt;Starting in Rails 7.1, it's now possible to add common table expressions to ActiveRecord relations.  I can rewrite the query to use some of the good parts of ActiveRecord and hopefully make the code a little more readable.&lt;/p&gt;

&lt;p&gt;As a reminder, I'm working with these three tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RawRoyaltyRecord&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:royalty_input_batch_partial&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:track&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RoyaltyInputBatchPartial&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:pro&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_records&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RawRoyaltyRecordsSale&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_record&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:sale&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ultimately, this is the big SQL query that inserts the new records and returns them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;-- This gets a list of existing track_id, customer, and sale_ids&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
         &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only piece of information that I need to pass in there is the pro_id, which is "960" in the example.  It would be easy to use to use &lt;code&gt;select_all&lt;/code&gt; in my Ruby code to accomplish this, but using ActiveRecord will be fun.&lt;/p&gt;

&lt;p&gt;There is, however, one interesting limitation with ActiveRecord.  I have to somehow base this entire query on a table if I want to be able to use the various methods to build the query.  It's not a problem here because our actual query pulls from &lt;code&gt;raw_royalty_records&lt;/code&gt;.  I just need to add a join with the CTE &lt;code&gt;inserted_records&lt;/code&gt; from above.&lt;/p&gt;

&lt;p&gt;One other issue is that the CTEs need to be somehow built.  The first one is pretty straightforward.  But the second one is a weird insert and all that, I'm going to be creating some hand-coded SQL there or something.&lt;/p&gt;

&lt;p&gt;Putting together a query like this requires starting from the end. Here's the final query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
         &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's change that around so that we're pulling from &lt;code&gt;raw_royalty_records&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we can use ActiveRecord to build this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RawRoyaltyRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"INNER JOIN inserted_records on inserted_records.raw_royalty_record_id = raw_royalty_records.id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, that's invalid as-is because "inserted_records" isn't a table.  But if I look at the generated SQL using the &lt;code&gt;to_sql&lt;/code&gt; method, we're on the right track:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, we need to add our CTEs to this query, and we do that using the &lt;code&gt;with&lt;/code&gt; method.  We can add them both at the same time, but I'm going to do one at a time because they're different and we need to look at those differences.&lt;/p&gt;

&lt;p&gt;Here's the first CTE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Turns out, we can use standard ActiveRecord to put this together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;   &lt;span class="n"&gt;eligible_records_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
     &lt;span class="no"&gt;RawRoyaltyRecord&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:royalty_input_batch_partial&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"royalty_input_batch_partials.pro_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.track_id is not null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"LOWER(raw_royalty_records.customer) AS lower_customer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"raw_royalty_records_sales.sale_id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, we can add it as a CTE to our query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;eligible_records: &lt;/span&gt;&lt;span class="n"&gt;eligible_records_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's actually really cool to be able to see this work in parts, and common table expressions built this way allow us to easily break the query up and test the individual parts.&lt;/p&gt;

&lt;p&gt;The query above stands on its own.  But how can we test it as a CTE?&lt;/p&gt;

&lt;p&gt;Our original giant query had two CTEs as you may recall, with the second one being the insert.  But we can pull it apart and use the &lt;code&gt;select&lt;/code&gt; part of it to see this query working as a CTE.&lt;/p&gt;

&lt;p&gt;But, first, let's make it really simple.  Here's the simplest way you can test a CTE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;-- This gets a list of existing track_id, customer, and sale_ids&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How do we rubyize this?  Not easily, because &lt;code&gt;eligible_records&lt;/code&gt; doesn't exist outside of this query.  But let's revisit the original giant query, specifically the second CTE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's chop out the &lt;code&gt;select&lt;/code&gt; statement and use it as the primary query to find some &lt;code&gt;raw_royalty_records&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;-- This gets a list of existing track_id, customer, and sale_ids&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a simplified version of the original giant query which will give us a list of &lt;code&gt;raw_royalty_record&lt;/code&gt; ids along with &lt;code&gt;sales&lt;/code&gt; ids.  While we're at it, we'll grab the customer as well so we can see what's matching up.&lt;/p&gt;

&lt;p&gt;Now, with the query in this shape, it's time to Rubyize it.  We already have the CTE Rubyized, let's work on the main query.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;ActiveRecord&lt;/code&gt;, we have to start from a model.  For this, we're pulling in ids from two different tables, but &lt;code&gt;raw_royalty_records&lt;/code&gt; is the primary.  I'll base this on &lt;code&gt;RawRoyaltyRecord&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# See above for "eligible_records_query"&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RawRoyaltyRecord&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:royalty_input_batch_partial&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"royalty_input_batch_partials.pro_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"INNER JOIN eligible_records
  ON eligible_records.track_id = raw_royalty_records.track_id
    AND eligible_records.lower_customer = LOWER(raw_royalty_records.customer)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;left_joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records_sales.id is null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.track_id is not null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;eligible_records: &lt;/span&gt;&lt;span class="n"&gt;eligible_records_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"eligible_records.sale_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"raw_royalty_records.customer"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, there's an argument that can be made that the Ruby code is more complicated than the SQL.  And, in a way, it is.  But it also is safe since we're bringing in a possibly dangerous parameter (the &lt;code&gt;pro_id&lt;/code&gt; of "960) and still takes care of a lot of the grunt work of joining tables and all that.  I think that it's easier to read.  But those are opinions, nothing more.&lt;/p&gt;

&lt;p&gt;Back to the original issue - How do I automatically add these records in a CTE and then get them back out?  Is it worth the trouble?&lt;/p&gt;

&lt;p&gt;Let's figure out how to do it first.  The issue that we have is that the main query above needs to be changed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;remove "customer" as it won't be needed&lt;/li&gt;
&lt;li&gt;add created_at and updated_at&lt;/li&gt;
&lt;li&gt;remove the now-extraneous &lt;code&gt;with&lt;/code&gt; as I'll keep them both at top level&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At that point, it'll be ready to insert into &lt;code&gt;raw_royalty_records_sales&lt;/code&gt;.  The problem is that it's difficult to turn the select into an insert using standard &lt;code&gt;ActiveRecord&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Again, here's the part of the query that performs the &lt;code&gt;insert&lt;/code&gt; and &lt;code&gt;select&lt;/code&gt; as a common table expression:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
      &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is basically the same &lt;code&gt;select&lt;/code&gt; as above, only it's in the CTE.  We can make this work.&lt;/p&gt;

&lt;p&gt;Let's start with the final query.  Ultimately, I'm going to pull in this information, which I'm just using to log the transaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a denormalized list of sales, tracks, and customers from the &lt;code&gt;raw_royalty_records&lt;/code&gt; and friends tables (it's possible for one &lt;code&gt;raw_royalty_record&lt;/code&gt; to be tied to multiple sales because the reporting doesn't give enough information sometimes).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;select_for_insert_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RawRoyaltyRecord&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:royalty_input_batch_partial&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"royalty_input_batch_partials.pro_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"INNER JOIN eligible_records
  ON eligible_records.track_id = raw_royalty_records.track_id
    AND eligible_records.lower_customer = LOWER(raw_royalty_records.customer)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;left_joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records_sales.id is null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.track_id is not null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.id AS raw_royalty_record_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"eligible_records.sale_id AS sale_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"CURRENT_TIMESTAMP AS created_at"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"CURRENT_TIMESTAMP AS updated_at"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is maybe a little ugly, but I can take the SQL from that, wrap it with &lt;code&gt;INSERT INTO...RETURNING *&lt;/code&gt;, and I've got a CTE.&lt;/p&gt;

&lt;p&gt;One note here - the CTE cannot just be a string.  In this case, we'll just have to wrap the string in &lt;code&gt;Arel.sql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;insert_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Arel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;"INSERT INTO raw_royalty_records_sales (raw_royalty_record_id, sale_id, created_at, updated_at)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="n"&gt;select_for_insert_query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;RETURNING *"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, I can add both CTEs to a basic query.  I'll call that one "inserted_records".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RawRoyaltyRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"
  INNER JOIN inserted_records
    ON inserted_records.raw_royalty_record_id = raw_royalty_records.id
  "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"raw_royalty_records.id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"inserted_records.sale_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"raw_royalty_records.track_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"raw_royalty_records.customer"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="ss"&gt;eligible_records: &lt;/span&gt;&lt;span class="n"&gt;eligible_records_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="ss"&gt;inserted_records: &lt;/span&gt;&lt;span class="n"&gt;insert_query&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And with that, this monstrosity works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nv"&gt;"eligible_records"&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"track_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"sale_id"&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"royalty_input_batch_partial_id"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"raw_royalty_record_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"pro_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="nv"&gt;"inserted_records"&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nv"&gt;"eligible_records"&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"track_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"sale_id"&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"royalty_input_batch_partial_id"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"raw_royalty_record_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"pro_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"royalty_input_batch_partial_id"&lt;/span&gt; &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records_sales"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"raw_royalty_record_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nv"&gt;"royalty_input_batch_partials"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"pro_id"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"inserted_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"sale_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"track_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"customer"&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"raw_royalty_records"&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But is it worth it?  I can short-cut this and return the &lt;code&gt;raw_royalty_records_sales&lt;/code&gt;, or I can do it the way I've done it above.  All I want the records for is to log them, and my preference is to log them in the format:&lt;/p&gt;

&lt;p&gt;track id (and maybe title)&lt;br&gt;
customer&lt;br&gt;
list of new raw_royalty_record ids&lt;br&gt;
list of sale ids (and maybe the customer info attached to the sales for confirmation)&lt;/p&gt;

&lt;p&gt;Given the denormalized nature of the data, I'm still going to have to do some magic in Ruby to get this log format.  I could simply use &lt;code&gt;pluck&lt;/code&gt; to pull the somewhat raw data out and handle it myself.&lt;/p&gt;

&lt;p&gt;The cool thing about using the railsy way of building these queries is that I can swap &lt;code&gt;raw_royalty_records&lt;/code&gt; out for &lt;code&gt;raw_royalty_records_sales&lt;/code&gt; easily:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RawRoyaltyRecordsSale&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"inserted_records"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;joins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"INNER JOIN raw_royalty_records ON raw_royalty_records.id=inserted_records.raw_royalty_record_id"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"inserted_records.id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"inserted_records.raw_royalty_record_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"inserted_records.sale_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"raw_royalty_records.track_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="s2"&gt;"raw_royalty_records.customer"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
         &lt;span class="ss"&gt;eligible_records: &lt;/span&gt;&lt;span class="n"&gt;eligible_records_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="ss"&gt;inserted_records: &lt;/span&gt;&lt;span class="n"&gt;insert_query&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One issue with this is that - regardless of how I do it - &lt;code&gt;find_each&lt;/code&gt; is not going to work here.  So I have to be realistic about the amount of data that I'm loading in.  With the PRO that I'm testing with  I have around 4000 created records.  Not too many, definitely more than I want to risk an N+1 with.  Plus, it's difficult to say how this may progress in the future as the database grows.&lt;/p&gt;

&lt;p&gt;Anyway, that's how to use a common table expression to insert new records while simultaneously instantiating objects based on them.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>rails</category>
    </item>
    <item>
      <title>Inserting and Selecting New Records - One Query</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Fri, 28 Jun 2024 23:16:44 +0000</pubDate>
      <link>https://dev.to/mdchaney/inserting-and-selecting-new-records-one-query-2m4</link>
      <guid>https://dev.to/mdchaney/inserting-and-selecting-new-records-one-query-2m4</guid>
      <description>&lt;p&gt;Just found something very interesting in PostgreSQL, thanks to Claude.&lt;/p&gt;

&lt;p&gt;I have a really nasty query on some huge tables.  Here's the basic layout that we're concerned with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RawRoyaltyRecord&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:royalty_input_batch_partial&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:track&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RoyaltyInputBatchPartial&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:pro&lt;/span&gt;
  &lt;span class="n"&gt;has_many&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_records&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RawRoyaltyRecordsSale&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationRecord&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:raw_royalty_record&lt;/span&gt;
  &lt;span class="n"&gt;belongs_to&lt;/span&gt; &lt;span class="ss"&gt;:sale&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't worry about "Pro" and "Sale" - we just need their IDs.&lt;/p&gt;

&lt;p&gt;The concept is this.  We get spreadsheets in every quarter or so with new raw_royalty_records.  I have to match them to sales, which can be a bit tricky.  Because of this trickiness, the simplest way to match them is to look at previously matched raw_royalty_records from the same Pro (through the &lt;code&gt;royalty_input_batch_partials&lt;/code&gt; table) that have matching &lt;code&gt;track_id&lt;/code&gt; and &lt;code&gt;customer&lt;/code&gt; values.&lt;/p&gt;

&lt;p&gt;Note that the &lt;code&gt;customer&lt;/code&gt; value is stripped of extraneous leading/trailing spaces before being saved the in the database, and we have an index on &lt;code&gt;lower(customer)&lt;/code&gt; to allow us to search quickly for them.&lt;/p&gt;

&lt;p&gt;This thing was dog slow.  It started out fine, but as we've added millions of records it got slower.&lt;/p&gt;

&lt;p&gt;The way this worked was that we'd go get a list of prior matched raw_royalty_records and create a hash to look up the sale_ids based on the customer and track_id:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="vi"&gt;@pro&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Pro&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:pro_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="vi"&gt;@raw_royalty_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@pro&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raw_royalty_records&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sans_sales&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"track_id is not null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:raw_royalty_records_sales&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;track: :library&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="vi"&gt;@previously_matched_raw_royalty_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="vi"&gt;@pro&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_sales&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"track_id is not null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:sales&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;each_with_object&lt;/span&gt;&lt;span class="p"&gt;({})&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
      &lt;span class="nb"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;downcase&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sales&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="c1"&gt;# Later on, looping through @raw_royalty_records&lt;/span&gt;
        &lt;span class="vi"&gt;@matches&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;previously_matched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@previously_matched_raw_royalty_records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;previously_matched&lt;/span&gt;
          &lt;span class="vi"&gt;@matches&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;customer_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;match_type: &lt;/span&gt;&lt;span class="s1"&gt;'previously_matched'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;sales: &lt;/span&gt;&lt;span class="n"&gt;previously_matched&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="c1"&gt;#break&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not the most beautiful code I've written, but you get the idea.&lt;/p&gt;

&lt;p&gt;That worked great when there were a couple hundred thousand records, but now we have a hundred and something thousand unmatched records, and many matched records.  Looking in the database, for one particular PRO that I'm testing with has around 287,000 records total, with 259,000 not connected to a sale.  So, our query there was pulling in 259,000 records total along with some associated records.  That requires CPU and memory.  Way too much.&lt;/p&gt;

&lt;p&gt;But I realized quickly that I can match up the records in an SQL query.  It's ugly, but using a common table expression takes a lot of  the complexity out and works well because it needs to compute certain expressions only a single time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
      &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;
      &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's awesome, but what I really want to do is just create the new records in &lt;code&gt;raw_royalty_records_sales&lt;/code&gt;.  But I also want to get a list of what was just created.  Technically, I could do that by saving the highest id in &lt;code&gt;raw_royalty_records_sales&lt;/code&gt;, then looking at the row count.  This thing shouldn't run overlapping (famous last words), but I'm not going to rely on such silliness.&lt;/p&gt;

&lt;p&gt;This is where common table expressions shine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;-- This gets a list of existing track_id, customer, and sale_ids&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lower_customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt;
    &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;royalty_input_batch_partials&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;royalty_input_batch_partial_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;OUTER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records_sales&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;eligible_records&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower_customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;LOWER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ribp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pro_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;960&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rrs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;track_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;inserted_records&lt;/span&gt; &lt;span class="n"&gt;ir&lt;/span&gt; &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;raw_royalty_records&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;
         &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rrr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_royalty_record_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a very similar query, but now it's using &lt;code&gt;INSERT INTO...RETURNING *&lt;/code&gt; - in a CTE.  So the actual query that is run selects from the items that were just inserted, joins them with &lt;code&gt;raw_royalty_records&lt;/code&gt; record to get the &lt;code&gt;track_id&lt;/code&gt; and &lt;code&gt;customer&lt;/code&gt;, and returns the records in a normalized fashion.&lt;/p&gt;

&lt;p&gt;I have 3448 &lt;code&gt;raw_royalty_records&lt;/code&gt; records that can match up like this.  Queries that took multiple seconds now take around a second, including the insert.  This is a huge win.&lt;/p&gt;

&lt;p&gt;Unfortunately, I'm not really using a lot of the cool Railsy ActiveRecord goodies to make this more readable, but that's what good documentation covers.  I've sped this up by a couple of orders of magnitude and made it use 1% of the memory of the original.  This is a huge win.  I can handle an ugly query with those numbers.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>rails</category>
    </item>
    <item>
      <title>Refactoring fix_encoding</title>
      <dc:creator>Michael Chaney</dc:creator>
      <pubDate>Tue, 25 Jun 2024 16:10:21 +0000</pubDate>
      <link>https://dev.to/mdchaney/refactoring-fixencoding-1d34</link>
      <guid>https://dev.to/mdchaney/refactoring-fixencoding-1d34</guid>
      <description>&lt;p&gt;I've been writing about Unicode on Twitter over the last week, and specifically handling Unicode in Ruby.  Ruby has robust Unicode support, along with robust support for the older code pages.&lt;/p&gt;

&lt;p&gt;In my work in the music publishing industry I have to write code to process all manner of spreadsheets, typically in the form of CSV files.  CSV files can be a crap shoot in terms of encoding.  Thankfully, everything I've had to deal with up to now has been either Unicode or Latin-1 (ISO-8859-1) or the Windows-1252 variant.&lt;/p&gt;

&lt;p&gt;I created a piece of code some years back to handle the issue of determining the encoding of a file and coercing the bytes into a standard Unicode format, specifically UTF-8.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;FixEncoding&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fix_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# The "b" method returns a copied string with encoding ASCII-8BIT&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;
    &lt;span class="c1"&gt;# Strip UTF-8 BOM if it's at start of file&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/([\xc0-\xff][\x80-\xbf]{1,3})+/n&lt;/span&gt;
      &lt;span class="c1"&gt;# String has actual UTF-8 characters&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/[\x80-\xff]/n&lt;/span&gt;
      &lt;span class="c1"&gt;# Get rid of Microsoft stupid quotes&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/[\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94]/n&lt;/span&gt;
        &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"''''''&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="c1"&gt;# There was no UTF-8, but there are high characters.  Assume to&lt;/span&gt;
      &lt;span class="c1"&gt;# be Latin-1, and then convert to UTF-8&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ISO-8859-1'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="c1"&gt;# No high characters, just mark as UTF-8&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There it is in all its glory.  I realized after looking at it that it's in not great shape.  I'm going to refactor it and talk about my decisions.&lt;/p&gt;

&lt;p&gt;There are a few things that stick out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I'm making extensive use of &lt;code&gt;=~&lt;/code&gt; instead of using the &lt;code&gt;String#match&lt;/code&gt;.  In Ruby, &lt;code&gt;=~&lt;/code&gt; causes a performance hit due to the fact that it sets various globals (a la Perl) after the match.&lt;/li&gt;
&lt;li&gt;I'm using regular expressions where I don't need to - specifically when checking for the BOM (Unicode byte order mark) at the start of the string.  Some of these strings are many megabytes, so there can be performance gains.&lt;/li&gt;
&lt;li&gt;I realized that I'm using a regular expression to check for high (128 and above) characters.  Ruby has &lt;code&gt;String#ascii_only?&lt;/code&gt; to do that.&lt;/li&gt;
&lt;li&gt;The logic can be changed around to handle the faster cases first.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, let's first talk about what this does.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# The "b" method returns a copied string with encoding ASCII-8BIT&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm telling you right there - this gets a copy of the string with the encoding set to ASCII-8BIT.  That's basically "no encoding", which is what we want.  The string is mostly a string of boring bytes, where the collation order is the character code and there are 26 upper and lowercase letters.  This is what Ruby essentially used in version 1.8.&lt;/p&gt;

&lt;p&gt;With the string in this encoding, we can look at individual bytes regardless of whether they're part of a UTF-8 set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Strip UTF-8 BOM if it's at start of file&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(note that the "n" flag on the regular expression is a Rubyism that makes the regular expression have the ASCII-8BIT encoding)&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/Byte_order_mark"&gt;Unicode Byte Order Mark&lt;/a&gt; can optionally occur at the start of a file.  Microsoft software adds these, and some other software has no idea what they are.  If you understand &lt;a href="https://en.wikipedia.org/wiki/UTF-8"&gt;UTF-8 encoding&lt;/a&gt; you can see that this BOM is really character FEFF, which oddly is a &lt;a href="https://www.compart.com/en/unicode/U+FEFF"&gt;zero-width non-breaking space&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The BOM isn't needed, and you'll notice that I do nothing but remove it.  That's because I've received files that have a BOM at the start, and Latin-1 characters later on in the same file.  There's no reason to "believe" the BOM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/([\xc0-\xff][\x80-\xbf]{1,3})+/n&lt;/span&gt;
      &lt;span class="c1"&gt;# String has actual UTF-8 characters&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we're getting to the meat of it.  That regexp will find real UTF-8 characters in the binary byte stream.  It'll really find the first one, but that's all I care about.  I could make that regexp more precise, although it's of limited value to do so.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/MichaelDChaney/status/1804357474155466869"&gt;In this tweet&lt;/a&gt; I cover the  format of a UTF-8 character in-depth.  Here are the basics, though.  a UTF-8 character will be of one of these forms:&lt;/p&gt;

&lt;p&gt;0xxxxxxx&lt;br&gt;
110xxxxx 10xxxxxx&lt;br&gt;
1110xxxx 10xxxxxx 10xxxxxx&lt;br&gt;
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx&lt;/p&gt;

&lt;p&gt;The first form is just ASCII.  Any ordinal value above 127 will always occupy two, three, or four bytes in UTF-8 and will be one of the forms shown above.  The first character will always be in the ranges (in hex) C0-DF, E0-EF, or F0-F7.  The subsequent characters will always be in the range 80-BF.  Having a character in the first range that's not followed by a character in the 80-BF range would be invalid, and having a character in the 80-BF range that's not preceded by one of the first characters or another 80-BF is also not valid.&lt;/p&gt;

&lt;p&gt;In the referenced tweet I include a large regexp that will determine if a given string is fully valid as UTF-8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/\A(?:\xef\xbb\xbf)?
  (?:
  (?:[\x00-\x7f]) |
  (?:[\xc0-\xdf][\x80-\xbf]) |
  (?:[\xe0-\xef][\x80-\xbf]{2}) |
  (?:[\xf0-\xf7][\x80-\xbf]{3})
  )*
\z/nx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a beauty, but it's overkill for what I'm doing here. I'll assume that if there's a single valid UTF-8 character then the string is UTF-8.  I'm willing to take that risk.&lt;/p&gt;

&lt;p&gt;The only change that I see is that the first character should match &lt;code&gt;[\xc0-\xf7]&lt;/code&gt; instead of &lt;code&gt;[\xc0-\xff]&lt;/code&gt; (note the final "7" in the former).  I can also use "match" here to speed it up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/[\x80-\xff]/n&lt;/span&gt;
      &lt;span class="c1"&gt;# Get rid of Microsoft stupid quotes&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/[\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94]/n&lt;/span&gt;
        &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"''''''&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="c1"&gt;# There was no UTF-8, but there are high characters.  Assume to&lt;/span&gt;
      &lt;span class="c1"&gt;# be Latin-1, and then convert to UTF-8&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ISO-8859-1'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, lots going on here.  We first check if the string has any "high characters", defined as a character code greater than 127.  Put another way - the high bit is set.  Standard ASCII goes from 0 to 127.  When I was a kid the high bit was often used as a parity bit, which isn't needed now.  For most applications.  I'm sure someone's still using a parity bit.&lt;/p&gt;

&lt;p&gt;We've already ruled out this being a UTF-8 string, so if there are high characters we're in either &lt;a href="https://en.wikipedia.org/wiki/ISO/IEC_8859-1"&gt;Latin-1&lt;/a&gt; or its inbred cousin &lt;a href="https://en.wikipedia.org/wiki/Windows-1252"&gt;Windows-1252&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In Latin-1 the character codes 80-9F were reserved as extended "control codes", kind of mirroring the control code concept in the first 32 ASCII characters.  I'm not sure they were ever used as such, and interestingly the character table in Wikipedia simply shows them as "undefined".&lt;/p&gt;

&lt;p&gt;Microsoft and Apple both had an idea of what to put in that range, and this caused calamity 20+ years ago as a text file that looked great on Windows or Mac would be full of weird question marks when viewed elsewhere.&lt;/p&gt;

&lt;p&gt;Microsoft referred to this "feature" as "smart quotes", so we usually referred to them as "stupid quotes" (as a side note, if you think I'm the only one who refers to them as such Github Copilot knew what to do when I created the "has_stupid_quotes?" method).  There are a few other characters in there as well which don't have Latin-1 equivalents, including the Euro sign "€".&lt;/p&gt;

&lt;p&gt;Anyway, the next chunk replaces the fancy quote characters with the standard ASCII equivalents.&lt;/p&gt;

&lt;p&gt;One possible change to this piece of code would be to force the encoding to Windows-1252 and then transcode to UTF-8, which would preserve the fancy quotation marks and apostrophes.  I don't do that simply because I prefer to standardize quote marks to the ASCII versions.  In some other contexts that might be a less preferred choice.&lt;/p&gt;

&lt;p&gt;Here, I use a regular expression to find them and, if found, use the "tr" method to replace them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;        &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"''''''&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, I force the string to Latin-1 encoding, then transcode to UTF-8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ISO-8859-1'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A better way to do this would be to check for the presence of characters in the 80-9F range and use Windows-1252 instead of ISO-8859-1 (aka "Latin-1").  That would pick up Euro signs and such.&lt;/p&gt;

&lt;p&gt;In the last part, there were no high characters at all, so we force the encoding to be UTF-8.  A regular ASCII string is also a standard UTF-8 or Latin-1 string as well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="c1"&gt;# No high characters, just mark as UTF-8&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, let's turn this on its head.  First, we need to start out with the check for high characters, and there's good reason for that.  Ruby has a built-in method "String#ascii_only?".  I'm going to give some high praise here - this method is written how I would write it.  Go ahead, have a look:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ruby/ruby/blob/bed34b3a52afde6d98fcef19e199d0af293577be/string.c#L618"&gt;https://github.com/ruby/ruby/blob/bed34b3a52afde6d98fcef19e199d0af293577be/string.c#L618&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's actually the opposite - "search_nonascii", but that's ultimately what "ascii_only?" uses.  Why do I like it?  It is as fast as the CPU can perform this check.  It looks at each word and sees if any of the high bits are set.  Instead of looking byte by byte, it looks at entire 32 or 64-bit words.&lt;/p&gt;

&lt;p&gt;So, that's way preferable to using a regular expression.  Better yet, if the string has no high characters there's no reason to even continue with the rest of this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="nn"&gt;FixEncoding&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fix_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascii_only?&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;
      &lt;span class="c1"&gt;# Rest of code&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Putting that check first will short-circuit the rest of our checks if there are no high characters.  And since that's the fastest check we have, it should speed this up dramatically in that case.  Note that this also precludes the string copy, so there's an even bigger win.&lt;/p&gt;

&lt;p&gt;Next, we need to strip the BOM.  This can be done without a regular expression to speed it up.  Here's the old way again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=~&lt;/span&gt; &lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gsub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\A\xEF\xBB\xBF/n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use &lt;code&gt;String#byteslice&lt;/code&gt; in both places to make this faster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove_bom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;byteslice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\xEF\xBB\xBF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;byteslice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;..-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, this is very different.  First, we're slicing the first 3 bytes off and comparing to the BOM (also as a binary string).  If they match, we replace &lt;code&gt;str&lt;/code&gt; with all but the first three bytes of &lt;code&gt;str&lt;/code&gt;.  Both parts of this are much faster than the original, and &lt;code&gt;String#byteslice&lt;/code&gt; is the fastest way to handle it.&lt;/p&gt;

&lt;p&gt;Next, we check for UTF-8 characters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has_utf8?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\xc0-\xf7][\x80-\xbf]/n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is mostly the same, but I've simplified the regexp by removing the extraneous capture and repetition.&lt;/p&gt;

&lt;p&gt;Next, we can check for stupid quotes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has_stupid_quotes?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94]/n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and replace them if we find them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace_stupid_quotes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\x82\x8b\x91\x92\x9b\xb4\x84\x93\x94&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"''''''&lt;/span&gt;&lt;span class="se"&gt;\"\"\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This remains unchanged, save for moving it to its own function.&lt;/p&gt;

&lt;p&gt;The final piece of the puzzle is to determine what the likely encoding is, force it to that encoding, then transcode to UTF-8.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has_win1252?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[\x80-\x9f]/n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;likely_8bit_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascii_only?&lt;/span&gt;
      &lt;span class="s2"&gt;"ASCII-8BIT"&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;has_win1252?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="s2"&gt;"WINDOWS-1252"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="s2"&gt;"ISO-8859-1"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that we again do the "ascii_only?" check.  Why?  I've replaced the high quote marks with standard ASCII equivalents, so we may well have an ASCII string again.  That's faster than the regular expression check, so we're again looking for a short-circuit.&lt;/p&gt;

&lt;p&gt;With that, we can write our final helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transcode_to_utf8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"UTF-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;likely_8bit_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that using the &lt;code&gt;encode&lt;/code&gt; method like that is the equivalent of using &lt;code&gt;force_encoding&lt;/code&gt; with the second argument followed by &lt;code&gt;encode&lt;/code&gt; with the first argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;   &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;likely_8bit_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"UTF-8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, our &lt;code&gt;fix_encoding&lt;/code&gt; function is simpler, fully testable, and pretty much all acting at the same semantic level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nc"&gt;FixEncoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fix_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ascii_only?&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;
      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b&lt;/span&gt;

      &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;remove_bom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;has_utf8?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;force_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'UTF-8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;has_stupid_quotes?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
          &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;replace_stupid_quotes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;transcode_to_utf8&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire thing is longer now, but in reality there's no more code that before and what code there is will run faster.  While I don't normally worry too much about the speed of Ruby code this is often used in processing multi-megabyte files where any speed improvement is appreciated.&lt;/p&gt;

&lt;p&gt;The complete code is available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://gist.github.com/mdchaney/e2b05eafab81cbdc4dfed6dd2f8e69a6"&gt;https://gist.github.com/mdchaney/e2b05eafab81cbdc4dfed6dd2f8e69a6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's not tested, though.  Next time, I'll create some tests and find out how I did.&lt;/p&gt;

</description>
      <category>ruby</category>
      <category>unicode</category>
    </item>
  </channel>
</rss>
