I was recently working on building my first web app with Sinatra and ran into the issue that I needed to store an array/hash in my database. I should mention that there are certainly databases (like Postgres) that support an Array datatype; unfortunately, I was using SQLite3, which does not. That led me down a rabbit hole to find the simplest workaround, and I found that it was actually a pretty common question.
Here's what I want to be able to do:
class CreatePandas < ActiveRecord::Migration
def change
create_table :pandas do |t|
t.string :name
t.integer :age
t.array :favorite_foods # obviously can't do this because there's no array datatype
end
end
end
Serialize the data
The most prominent answer by far was to serialize using the ActiveRecord method serialize. The idea here is that #serialize
uses YAML to turn our array or hash into a string, so that it can be stored in the database and reconstructed later. For an idea of how that's working behind the scenes, we can take a look at this totally unbiased list of vegetables:
veggies = ["carrots", "peas", "mushrooms"].to_yaml
#=> "---\n- carrots\n- peas\n- mushrooms\n"
preferences = {great: ["carrots"], gross: ["peas", "mushrooms"]}.to_yaml
#=> "---\n:great:\n- carrots\n:gross:\n- peas\n- mushrooms\n"
YAML.load(veggies) #=> ["carrots", "peas", "mushrooms"]
YAML.load(preferences) #=> {:great=>["carrots"], :gross=>["peas", "mushrooms"]}
So that's how serialize
works, but how do I use it? Well, now that we know our list of favorite foods will actually be stored as a string, we can change our datatype in the migration.
...
create_table :pandas do |t|
t.string :name
t.integer :age
t.text :favorite_foods
end
We also need to tell our model to encode and decode favorite_foods
so that we can interact with it as an array in our program.
class Panda < ActiveRecord::Base
serialize :favorite_foods, Array
end
The second argument is optional, but helps to specify what type of coder you'd like to use or what type of object to expect when entering or retrieving data in the database. Here we're using Array
as our object type to expect, but we could also use Hash
(if we were working with hashes instead of arrays) or JSON
(if we wanted to use JSON instead of YAML as our coder).
Now we're off to the races. We can create and edit Pandas to our hearts' content, without worrying about the favorite_foods
attribute being an array.
fluf = Panda.create(name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo"])
#=> #<Panda:0x00007fc39e0d7960 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo"]>
fluf.favorite_foods.first #=> "pizza"
fluf.favorite_foods << "pasta" #=> ["pizza", "bamboo", "pasta"]
fluf.save #=> true
Drawbacks to using serialize
For a lot of situations, this is the perfect simple solution, and we don't need to go any further. But if you play around with serialize
you'll notice there are a couple of limitations that keep it from being the one-size-fits-all, perfect tool.
The biggest drawback is querying. If I want to find Fluffy at any time, I can simply...
fluf = Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fc39e0d7960 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>
And if we want to see what food Fluffy likes...
fluf.favorite_foods.include?("pizza") #=> true
fluf.favorite_foods.include?("gyros") #=> false
But what if we want to find all the pandas in my database that like pizza? Even assuming that my array is ordered by preference, and we just want to know who has pizza as their top choice, we would need something like this:
Panda.find_by(:favorite_foods.first => "pizza") # or
Panda.find_by(:favorite_foods[0] => "pizza") # or maybe even
Panda.find_by(:favorite_foods.include?("pizza"))
None of that's going to work though, and the reason is pretty simple. If we remember back to the beginning, the whole point of using serialize
was to enable us to turn our arrays into strings. Now our database is very confused (and going to send all kinds of errors) because we're trying to call Array methods on something that we've explicitly asked to be stored as text. For pandas and pizza that may not matter very much, especially at a small scale (we can simply request all the pandas from our database – turning our favorite_foods
back into an array and allowing us to iterate over it with no problems). But what if instead of pandas we had a dataset of thousands of social media users... that method quickly becomes impractical.
That takes us to the other potential drawback: performance. The reason the above situation isn't practical is because for every user we pull from the database, Ruby will create a new object. We can see that even when we retrieve the same database entry more than once.
Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fbb8b0881c8 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>
Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fbb877f69e0 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>
Also, because of the serialization process, if we update a user's name (or any other attribute), the serialized column will also be updated, even if we didn't change anything about it. That's a lot of extra work for our program.
What if I need to query the data?
If you do need to query based on the information in an array-like structure (and it's not practical to switch the database you're using), an easy option is to create another table and model for the data previously held in your array/hash. Let's update our simple example, starting with the migrations.
# /db/migrate/001_create_pandas.rb
...
create_table :pandas do |t|
t.string :name
t.integer :age
# t.text :favorite_foods # no longer necessary
end
# /db/migrate/002_create_favorites.rb
...
create_table :favorites do |t|
t.string :food
t.belongs_to :panda
end
Now we can create a standard has_many
/belongs_to
relationship between the two models. I'm assuming a familiarity with ActiveRecord Associations here, but if this is new or not making sense, check out the Rails Guide for a good explanation.
# /app/models/pandas.rb
class Panda < ActiveRecord::Base
has_many :favorites
end
# /app/models/favorites.rb
class Favorites < ActiveRecord::Base
belongs_to :panda
end
And that about does it. We wanted to be able to search for pizza, and now we can.
pizza_luvers = Favorite.where("food = 'pizza'")
#=> [#<Favorite:0x00007fbb8aa5b1b0 id: 1, food: "pizza", panda_id: 1>,
#<Favorite:0x00007fbb8aa5afd0 id: 5, food: "pizza", panda_id: 3>]
## To get the pandas' names ##
pizza_luvers.collect{|p| p.panda.name}
#=> ["Fluffy", "Mei Xiang"]
## To see a panda's favorite foods ##
Panda.find_by(name: "Fluffy").favorites
#=> [#<Favorite:0x00007fbb8771e018 id: 1, food: "pizza", panda_id: 1>,
#<Favorite:0x00007fbb8771ded8 id: 3, food: "bamboo", panda_id: 1>,
#<Favorite:0x00007fbb875f7c98 id: 4, food: "pasta", panda_id: 1>]
Obviously our array of favorite panda foods has a limited utility, but even with an overly-simplistic example, I like the added functionality of an additional table over serialized data. For my real-life use case, this was a much better solution, because I needed to be able to query and sort by hash data. I was actually surprised to see that serialize
was recommended so much, so if I'm missing something here, definitely let me know.
Hopefully this is helpful for anyone who finds themself in the same position I did working on my project!
Top comments (1)
Great explanation. Thanks!