Jeroen van Baarsen for AppSignal

Posted on Feb 27, 2019

Diving into Ruby's #dup and #clone

#academy #engineering #ruby

In today's post, we will look into Ruby's #dup and #clone. We'll start with a real-life example that triggered this interest. After that, we'll dive deeper with the goal of learning how #dup is implemented in Ruby and how it compares to #clone. We'll then close of by implementing our own #dup method. Let's go!

How I Started Using Dup

When I worked at a company that specialized in setting up campaigns for NGO's to collect donations, I regularly had to copy campaigns and create new ones. For example, after the 2018 campaign ended, a new one for 2019 was needed.

A campaign usually had loads of configuration options, which I didn't really feel like setting up again. It would take quite some time and was error-prone. So I started by copying the DB record and went from there.

For the first few campaigns, I actually copied that by hand. It looked something like this:

current_campaign = Campaign.find(1)
new_campaign = current_campaign
new_campaign.id = nil
new_campaign.created_at = nil
new_campaign.updated_at = nil
new_campaign.title = "Campaign 2019"
new_campaign.save!

This works, but requires a lot of typing, not to mention it is error-prone. I have forgotten to set the created_at to nil a few times in the past.

Since this felt like a bit of a pain, I couldn't imagine that it was the best way to go about it. And it turns out, there is a better way!

new_campaign = Campaign.find(1).dup
new_campaign.title = "Campaign 2019"
new_campaign.save!

This will set the ID and the timestamps to nil, which is exactly what we want to accomplish.

This was how I first got into using #dup. Now, let's go and take a deeper look into how #dup actually works.

What Is Going on Under the Hood?

The default Ruby implementation of the #dup method allows you to add a special initializer to your object that is only called when an object is initialized via the #dup method. These methods are:

initialize_copy
initialize_dup

The implementation of these methods is actually quite interesting, as they don't do anything by default. They are basically placeholders for you to override.

This is taken directly from the Ruby source code:

VALUE
rb_obj_dup(VALUE obj)
{
    VALUE dup;

    if (special_object_p(obj)) {
            return obj;
    }
    dup = rb_obj_alloc(rb_obj_class(obj));
    init_copy(dup, obj);
    rb_funcall(dup, id_init_dup, 1, obj);

    return dup;
}

For us, the interesting part is on line 11 where Ruby calls the initializer method #intialize_dup.

The rb_funcall is a function that is used in the ruby C code a lot. It is used to call methods on an object. In this case it will call id_init_dup on the dup object. The 1 tells how many arguments there are, in this case only one: obj

Let's dive a bit deeper and look at that implementation:

VALUE
rb_obj_init_dup_clone(VALUE obj, VALUE orig)
{
    rb_funcall(obj, id_init_copy, 1, orig);
    return obj;
}

As you can see in this example, nothing is actually happening other than it calling id_init_copy. Now that we are down the rabbit hole, let's look at that method as well:

VALUE
rb_obj_init_copy(VALUE obj, VALUE orig)
{
    if (obj == orig) return obj;
    rb_check_frozen(obj);
    rb_check_trusted(obj);
    if (TYPE(obj) != TYPE(orig) || rb_obj_class(obj) != rb_obj_class(orig)) {
    rb_raise(rb_eTypeError, "initialize_copy should take same class object");
    }
    return obj;
}

Even though there is more code, nothing special is happening except for some checks that are required internally (but that might be a good subject for another time).

So what happens in the implementation is that Ruby gives you an endpoint and provides you with the tools needed to implement your own interesting behavior.

Rails' Dup Implementation

This is exactly what Rails did in a bunch of places, but for now, we are only interested in how the id and timestamp fields get cleared.

The ID gets cleared in the core module for ActiveRecord. It takes into account what your primary key is, so even if you changed that, it will still reset it.

# activerecord/lib/active_record/core.rb
def initialize_dup(other) # :nodoc:
  @attributes = @attributes.deep_dup
  @attributes.reset(self.class.primary_key)

  _run_initialize_callbacks

  @new_record               = true
  @destroyed                = false
  @_start_transaction_state = {}
  @transaction_state        = nil

  super
end

The timestamps are cleared in the Timestamps module. It tells Rails to clear out all the timestamps that Rails can use for creating and updating (created_at, created_on, updated_at and updated_on).

# activerecord/lib/active_record/timestamp.rb
def initialize_dup(other) # :nodoc:
  super
  clear_timestamp_attributes
end

An interesting fact here is that Rails deliberately chose to override the #initialize_dup method instead of the #initialize_copy method. Why would it do that? Let's investigate.

Object#initialize_copy Explained

In the above code snippets, we saw how Ruby calls #initialize_dup when you use .dup on a method. But there is also an #initialize_copy method. To better explain where this is used, let's look at an example:

class Animal
  attr_accessor :name

  def initialize_copy(*args)
    puts "#initialize_copy is called"
    super
  end

  def initialize_dup(*args)
    puts "#initialize_dup is called"
    super
  end
end

animal = Animal.new
animal.dup

# => #initialize_dup is called
# => #initialize_copy is called

We can now see what the calling order is. Ruby first calls out to #initialize_dup and then calls to #initialize_copy. If we would have kept the call to super out of the #initialize_dup method, we would never have called initialize_copy, so it is important to keep that in.

Are There Other Methods to Copy Something?

Now that we have seen this implementation, you might be wondering what is the use case for having two #initialize_* methods. The answer is: there is another way to copy objects, called #clone. You generally use #clone if you want to copy an object including its internal state.

This is what Rails is using with its #dup method on ActiveRecord. It uses #dup to allow you to duplicate a record without its "internal" state (id and timestamps), and leaves #clone up to Ruby to implement.

Having this extra method also asks for a specific initializer when using the #clone method. For this, you can override #initialize_clone. This method uses the same lifecycle as #initialize_dup and will call up towards #initialize_copy.

Knowing this, the naming of the initializer methods makes a bit more sense. We can use #initialize_(dup|clone) for specific implementations depending on whether you use #dup or #clone. If we have overarching behavior that is used for both, you can place it inside #initialize_copy.

Cloning an Animal

(just an example, no animals were hurt for this blog post)

Now let's look at an example of how it works in practice.

class Animal
  attr_accessor :name, :dna, :age

  def initialize
    self.dna = generate_dna
  end

  def initialize_copy(original_animal)
    self.age = 0
    super
  end

  def initialize_dup(original_animal)
    self.dna = generate_dna
    self.name = "A new name"
    super
  end

  def initialize_clone(original_animal)
    self.name = "#{original_animal.name} 2"
    super
  end

  def generate_dna
    SecureRandom.hex
  end
end

bello = Animal.new
bello.name = "Bello"
bello.age = 10

bello_clone = bello.clone
bello_dup = bello.dup

bello_clone.name # => "Bello 2"
bello_clone.age # => 0

bello_dup.name # => "A new name"
bello_dup.age # => 0

Let's break down what is actually happening here. We have a class called Animal, and depending on how we copy the animal, it should have different behavior:

When we clone the animal, the DNA remains the same, and its name will be the original name with 2 appended to it.
When we duplicate the animal, we make a new animal based on the original one. It gets its own DNA and a new name.
In all cases the animal starts as a baby.

We implemented three different initializers to make this happen. The #initialize_(dup|clone) method will always call up to #initialize_copy, thus ensuring that the age is set to 0.

Rounding up the CLONES and Other Animals

Starting by explaining the itch we needed to scratch ourselves, we looked into copying a database record. We went from copying by hand in the Campaign example, to #dup and #clone. We then took it from the practical to the fascinating and looked into how this is implemented in Ruby. We also played around with #cloneing and #duping animals. We hope you enjoyed our deep dive as much as we did writing it.