In today's post, we will look into Ruby's #dup
and #clone
. We'll start with a real-life example that triggered this interest. After that, we'll dive deeper with the goal of learning how #dup
is implemented in Ruby and how it compares to #clone
. We'll then close of by implementing our own #dup
method. Let's go!
How I Started Using Dup
When I worked at a company that specialized in setting up campaigns for NGO's to collect donations, I regularly had to copy campaigns and create new ones. For example, after the 2018 campaign ended, a new one for 2019 was needed.
A campaign usually had loads of configuration options, which I didn't really feel like setting up again. It would take quite some time and was error-prone. So I started by copying the DB record and went from there.
For the first few campaigns, I actually copied that by hand. It looked something like this:
current_campaign = Campaign.find(1)
new_campaign = current_campaign
new_campaign.id = nil
new_campaign.created_at = nil
new_campaign.updated_at = nil
new_campaign.title = "Campaign 2019"
new_campaign.save!
This works, but requires a lot of typing, not to mention it is error-prone. I have forgotten to set the created_at
to nil
a few times in the past.
Since this felt like a bit of a pain, I couldn't imagine that it was the best way to go about it. And it turns out, there is a better way!
new_campaign = Campaign.find(1).dup
new_campaign.title = "Campaign 2019"
new_campaign.save!
This will set the ID and the timestamps to nil
, which is exactly what we want to accomplish.
This was how I first got into using #dup
. Now, let's go and take a deeper look into how #dup
actually works.
What Is Going on Under the Hood?
The default Ruby implementation of the #dup
method allows you to add a special initializer to your object that is only called when an object is initialized via the #dup
method. These methods are:
initialize_copy
initialize_dup
The implementation of these methods is actually quite interesting, as they don't do anything by default. They are basically placeholders for you to override.
This is taken directly from the Ruby source code:
VALUE
rb_obj_dup(VALUE obj)
{
VALUE dup;
if (special_object_p(obj)) {
return obj;
}
dup = rb_obj_alloc(rb_obj_class(obj));
init_copy(dup, obj);
rb_funcall(dup, id_init_dup, 1, obj);
return dup;
}
For us, the interesting part is on line 11 where Ruby calls the initializer method #intialize_dup
.
The
rb_funcall
is a function that is used in the ruby C code a lot. It is used to call methods on an object. In this case it will callid_init_dup
on thedup
object. The1
tells how many arguments there are, in this case only one:obj
Let's dive a bit deeper and look at that implementation:
VALUE
rb_obj_init_dup_clone(VALUE obj, VALUE orig)
{
rb_funcall(obj, id_init_copy, 1, orig);
return obj;
}
As you can see in this example, nothing is actually happening other than it calling id_init_copy
. Now that we are down the rabbit hole, let's look at that method as well:
VALUE
rb_obj_init_copy(VALUE obj, VALUE orig)
{
if (obj == orig) return obj;
rb_check_frozen(obj);
rb_check_trusted(obj);
if (TYPE(obj) != TYPE(orig) || rb_obj_class(obj) != rb_obj_class(orig)) {
rb_raise(rb_eTypeError, "initialize_copy should take same class object");
}
return obj;
}
Even though there is more code, nothing special is happening except for some checks that are required internally (but that might be a good subject for another time).
So what happens in the implementation is that Ruby gives you an endpoint and provides you with the tools needed to implement your own interesting behavior.
Rails' Dup Implementation
This is exactly what Rails did in a bunch of places, but for now, we are only interested in how the id
and timestamp fields get cleared.
The ID gets cleared in the core module for ActiveRecord. It takes into account what your primary key is, so even if you changed that, it will still reset it.
# activerecord/lib/active_record/core.rb
def initialize_dup(other) # :nodoc:
@attributes = @attributes.deep_dup
@attributes.reset(self.class.primary_key)
_run_initialize_callbacks
@new_record = true
@destroyed = false
@_start_transaction_state = {}
@transaction_state = nil
super
end
The timestamps are cleared in the Timestamps module. It tells Rails to clear out all the timestamps that Rails can use for creating and updating (created_at
, created_on
, updated_at
and updated_on
).
# activerecord/lib/active_record/timestamp.rb
def initialize_dup(other) # :nodoc:
super
clear_timestamp_attributes
end
An interesting fact here is that Rails deliberately chose to override the #initialize_dup
method instead of the #initialize_copy
method. Why would it do that? Let's investigate.
Object#initialize_copy Explained
In the above code snippets, we saw how Ruby calls #initialize_dup
when you use .dup
on a method. But there is also an #initialize_copy
method. To better explain where this is used, let's look at an example:
class Animal
attr_accessor :name
def initialize_copy(*args)
puts "#initialize_copy is called"
super
end
def initialize_dup(*args)
puts "#initialize_dup is called"
super
end
end
animal = Animal.new
animal.dup
# => #initialize_dup is called
# => #initialize_copy is called
We can now see what the calling order is. Ruby first calls out to #initialize_dup
and then calls to #initialize_copy
. If we would have kept the call to super
out of the #initialize_dup
method, we would never have called initialize_copy
, so it is important to keep that in.
Are There Other Methods to Copy Something?
Now that we have seen this implementation, you might be wondering what is the use case for having two #initialize_*
methods. The answer is: there is another way to copy objects, called #clone
. You generally use #clone
if you want to copy an object including its internal state.
This is what Rails is using with its #dup
method on ActiveRecord. It uses #dup
to allow you to duplicate a record without its "internal" state (id and timestamps), and leaves #clone
up to Ruby to implement.
Having this extra method also asks for a specific initializer when using the #clone
method. For this, you can override #initialize_clone
. This method uses the same lifecycle as #initialize_dup
and will call up towards #initialize_copy
.
Knowing this, the naming of the initializer methods makes a bit more sense. We can use #initialize_(dup|clone)
for specific implementations depending on whether you use #dup
or #clone
. If we have overarching behavior that is used for both, you can place it inside #initialize_copy
.
Cloning an Animal
(just an example, no animals were hurt for this blog post)
Now let's look at an example of how it works in practice.
class Animal
attr_accessor :name, :dna, :age
def initialize
self.dna = generate_dna
end
def initialize_copy(original_animal)
self.age = 0
super
end
def initialize_dup(original_animal)
self.dna = generate_dna
self.name = "A new name"
super
end
def initialize_clone(original_animal)
self.name = "#{original_animal.name} 2"
super
end
def generate_dna
SecureRandom.hex
end
end
bello = Animal.new
bello.name = "Bello"
bello.age = 10
bello_clone = bello.clone
bello_dup = bello.dup
bello_clone.name # => "Bello 2"
bello_clone.age # => 0
bello_dup.name # => "A new name"
bello_dup.age # => 0
Let's break down what is actually happening here. We have a class called Animal
, and depending on how we copy the animal, it should have different behavior:
- When we clone the animal, the DNA remains the same, and its name will be the original name with 2 appended to it.
- When we duplicate the animal, we make a new animal based on the original one. It gets its own DNA and a new name.
- In all cases the animal starts as a baby.
We implemented three different initializers to make this happen. The #initialize_(dup|clone)
method will always call up to #initialize_copy
, thus ensuring that the age is set to 0.
Rounding up the CLONES and Other Animals
Starting by explaining the itch we needed to scratch ourselves, we looked into copying a database record. We went from copying by hand in the Campaign example, to #dup
and #clone
. We then took it from the practical to the fascinating and looked into how this is implemented in Ruby. We also played around with #clone
ing and #dup
ing animals. We hope you enjoyed our deep dive as much as we did writing it.
Top comments (0)