cole-flournoy

Posted on Feb 25, 2021 • Edited on Feb 27, 2021

Workaround for Storing Arrays in Databases With No Array Type

#ruby #webdev #rails

I was recently working on building my first web app with Sinatra and ran into the issue that I needed to store an array/hash in my database. I should mention that there are certainly databases (like Postgres) that support an Array datatype; unfortunately, I was using SQLite3, which does not. That led me down a rabbit hole to find the simplest workaround, and I found that it was actually a pretty common question.

Here's what I want to be able to do:

class CreatePandas < ActiveRecord::Migration
  def change
    create_table :pandas do |t|
      t.string :name
      t.integer :age
      t.array :favorite_foods   # obviously can't do this because there's no array datatype
    end 
  end
end

Serialize the data

The most prominent answer by far was to serialize using the ActiveRecord method serialize. The idea here is that #serialize uses YAML to turn our array or hash into a string, so that it can be stored in the database and reconstructed later. For an idea of how that's working behind the scenes, we can take a look at this totally unbiased list of vegetables:

veggies = ["carrots", "peas", "mushrooms"].to_yaml 
#=> "---\n- carrots\n- peas\n- mushrooms\n"
preferences = {great: ["carrots"], gross: ["peas", "mushrooms"]}.to_yaml
#=> "---\n:great:\n- carrots\n:gross:\n- peas\n- mushrooms\n"

YAML.load(veggies) #=> ["carrots", "peas", "mushrooms"]
YAML.load(preferences) #=> {:great=>["carrots"], :gross=>["peas", "mushrooms"]}

So that's how serialize works, but how do I use it? Well, now that we know our list of favorite foods will actually be stored as a string, we can change our datatype in the migration.

...
create_table :pandas do |t|
  t.string :name
  t.integer :age
  t.text :favorite_foods 
end

We also need to tell our model to encode and decode favorite_foods so that we can interact with it as an array in our program.

class Panda < ActiveRecord::Base 
  serialize :favorite_foods, Array  
end

The second argument is optional, but helps to specify what type of coder you'd like to use or what type of object to expect when entering or retrieving data in the database. Here we're using Array as our object type to expect, but we could also use Hash (if we were working with hashes instead of arrays) or JSON (if we wanted to use JSON instead of YAML as our coder).

Now we're off to the races. We can create and edit Pandas to our hearts' content, without worrying about the favorite_foods attribute being an array.

fluf = Panda.create(name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo"]) 
#=> #<Panda:0x00007fc39e0d7960 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo"]>

fluf.favorite_foods.first #=> "pizza"
fluf.favorite_foods << "pasta" #=> ["pizza", "bamboo", "pasta"]
fluf.save #=> true

Drawbacks to using serialize

For a lot of situations, this is the perfect simple solution, and we don't need to go any further. But if you play around with serialize you'll notice there are a couple of limitations that keep it from being the one-size-fits-all, perfect tool.

The biggest drawback is querying. If I want to find Fluffy at any time, I can simply...

fluf = Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fc39e0d7960 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>

And if we want to see what food Fluffy likes...

fluf.favorite_foods.include?("pizza") #=> true
fluf.favorite_foods.include?("gyros") #=> false

But what if we want to find all the pandas in my database that like pizza? Even assuming that my array is ordered by preference, and we just want to know who has pizza as their top choice, we would need something like this:

Panda.find_by(:favorite_foods.first => "pizza") # or
Panda.find_by(:favorite_foods[0] => "pizza") # or maybe even
Panda.find_by(:favorite_foods.include?("pizza"))

None of that's going to work though, and the reason is pretty simple. If we remember back to the beginning, the whole point of using serialize was to enable us to turn our arrays into strings. Now our database is very confused (and going to send all kinds of errors) because we're trying to call Array methods on something that we've explicitly asked to be stored as text. For pandas and pizza that may not matter very much, especially at a small scale (we can simply request all the pandas from our database – turning our favorite_foods back into an array and allowing us to iterate over it with no problems). But what if instead of pandas we had a dataset of thousands of social media users... that method quickly becomes impractical.

That takes us to the other potential drawback: performance. The reason the above situation isn't practical is because for every user we pull from the database, Ruby will create a new object. We can see that even when we retrieve the same database entry more than once.

Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fbb8b0881c8 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>
Panda.find_by(name: "Fluffy")
#=> #<Panda:0x00007fbb877f69e0 id: 1, name: "Fluffy", age: 13, favorite_foods: ["pizza", "bamboo", "pasta"]>

Also, because of the serialization process, if we update a user's name (or any other attribute), the serialized column will also be updated, even if we didn't change anything about it. That's a lot of extra work for our program.

What if I need to query the data?

If you do need to query based on the information in an array-like structure (and it's not practical to switch the database you're using), an easy option is to create another table and model for the data previously held in your array/hash. Let's update our simple example, starting with the migrations.

# /db/migrate/001_create_pandas.rb
...
create_table :pandas do |t|
  t.string :name
  t.integer :age
  # t.text :favorite_foods  # no longer necessary
end

# /db/migrate/002_create_favorites.rb
...
create_table :favorites do |t|
  t.string :food
  t.belongs_to :panda   
end

Now we can create a standard has_many/belongs_to relationship between the two models. I'm assuming a familiarity with ActiveRecord Associations here, but if this is new or not making sense, check out the Rails Guide for a good explanation.

# /app/models/pandas.rb
class Panda < ActiveRecord::Base 
  has_many :favorites
end

# /app/models/favorites.rb
class Favorites < ActiveRecord::Base 
  belongs_to :panda
end

And that about does it. We wanted to be able to search for pizza, and now we can.

pizza_luvers = Favorite.where("food = 'pizza'")
#=> [#<Favorite:0x00007fbb8aa5b1b0 id: 1, food: "pizza", panda_id: 1>, 
     #<Favorite:0x00007fbb8aa5afd0 id: 5, food: "pizza", panda_id: 3>]

  ## To get the pandas' names ##
pizza_luvers.collect{|p| p.panda.name}   
#=> ["Fluffy", "Mei Xiang"]

  ## To see a panda's favorite foods ##
Panda.find_by(name: "Fluffy").favorites   
#=> [#<Favorite:0x00007fbb8771e018 id: 1, food: "pizza", panda_id: 1>, 
     #<Favorite:0x00007fbb8771ded8 id: 3, food: "bamboo", panda_id: 1>,
     #<Favorite:0x00007fbb875f7c98 id: 4, food: "pasta", panda_id: 1>]

Obviously our array of favorite panda foods has a limited utility, but even with an overly-simplistic example, I like the added functionality of an additional table over serialized data. For my real-life use case, this was a much better solution, because I needed to be able to query and sort by hash data. I was actually surprised to see that serialize was recommended so much, so if I'm missing something here, definitely let me know.

Hopefully this is helpful for anyone who finds themself in the same position I did working on my project!

Deliver your unique apps, your own way.

Heroku tackles the toil — patching and upgrading, 24/7 ops and security, build systems, failovers, and more. Stay focused on building great data-driven applications.

Learn More

Top comments (1)

Kevin Gathuku • Jun 15 '23

Great explanation. Thanks!