Mathias Polligkeit

Posted on Jun 5, 2020

Adventures in Ecto and PostgreSQL: Partial unique indexes and upserts with associations

#elixir #ecto #postgres

Let’s say you were asked by your government to implement a national dog registry in order to help them to enforce their new one-dog-policy. Everyone who wants to buy a dog will need to register it at the municipal government.

Since your government cares a lot about the well-being of the nation’s dogs and generally invests a lot of time and resources into research endeavours, they want to collect some additional data on all the dogs and want all dog owners to check in once a year to keep the information up to date. Because a year is a long time between updates to conduct proper research and also to keep the feedback loop short regarding new dog-happiness policies, the government also plans to equip dogs and dog-related products with sensors in the future.

Setup

Versions: Elixir 1.10.0, ecto 3.3.3, ecto_sql 3.3.4, PostgreSQL 11.5

Let’s start off with this Ecto migration:

defmodule DogRegistry.Repo.Migrations.CreateTables do
  use Ecto.Migration

  def change do
    create table(:owners, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :name, :string, null: false
      add :city, :string, null: false
      timestamps()
    end

    create table(:dogs, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :name, :string, null: false
      add :status, :string, null: false
      add :owner_id, references(:owners, on_delete: :nilify_all, type: :uuid)
      timestamps()
    end

    create table(:dog_toys, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :product, :string
      add :rating, :integer
      add :dog_id, references(:dogs, on_delete: :nilify_all, type: :uuid)
      timestamps()
    end

    create index(:dog_toys, :dog_id)
  end
end

With this we create an owners table, a dogs table and a dog_toys table. A dog belongs to an owner and a dog toy belongs to a dog. The government will need a lot more information on the dogs, of course, including various psychological measures and health statistics, but let’s keep it simple for now.

We also create Ecto schemas for each of these tables, but since they are quite unremarkable, I’ll only give you the Dog schema:

defmodule DogRegistry.Dog do
  use Ecto.Schema

  import Ecto.Changeset

  alias DogRegistry.DogToy
  alias DogRegistry.Owner
  alias DogRegistry.Repo

  @primary_key {:id, :binary_id, autogenerate: true}
  @foreign_key_type :binary_id
  schema "dogs" do
    field :name, :string
    field :status, :string
    belongs_to :owner, Owner
    has_many :toys, DogToy, on_replace: :delete
    timestamps()
  end

  def changeset(dog, attrs) do
    dog
    |> cast(attrs, [:name, :owner_id, :status])
    |> cast_assoc(:toys)
    |> validate_required([:name, :owner_id, :status])
    |> validate_inclusion(:status, ["alive", "dead"])
    |> foreign_key_constraint(:owner_id)
  end
end

Partial unique indexes

You decided to handle both dog registrations and updates to existing dogs with a single function. Don’t ask me why, you didn’t tell me.

This means we need to implement this as an upsert. The first thing we need is a unique index on the dogs table to prevent the clerk from inserting a second dog with the same owner. However, one-dog-policy does not mean that you may only have one single dog throughout your entire lifetime. Once your dog died, you can get a new one. This means the unique index may only apply if the dog is alive.

PostgreSQL allows you to define partial indexes to do just that.
In SQL that index would be defined like this:

CREATE UNIQUE INDEX dogs_owner_id_alive_ix ON dogs (owner_id, status)
WHERE status = 'alive';

Ecto.Migration.unique_index/3 allows you to add a where option, so the statement above looks like this in your application:

defmodule DogRegistry.Repo.Migrations.DogsOwnerIdAliveIx do
  use Ecto.Migration
  def change do
    create unique_index(:dogs, [:owner_id, :status],
             name: "dogs_owner_id_alive_ix",
             where: "status = 'alive'"
           )
  end
end

The first tests

We create a test module that uses the DataCase module Phoenix creates and sets up some fixtures. The complete code example is linked below. The first describe block only has a simple test to make sure you can insert a new dog into the database:

describe "create_or_update/1" do
  test "creates a dog" do
    owner = insert_owner!(@owner)
    assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))
    assert [%Dog{}] = Repo.all(Dog)
  end

  test "allows to add multiple dead dogs" do
    owner = insert_owner!(@owner)
    assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_1, owner))
    assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_2, owner))
    assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))
    assert {:ok, %Dog{}} = Dog.create_or_update(params(@dead_dog_3, owner))
    assert [_, _, _, _] = Repo.all(Dog)
  end
end

The params/2 function does nothing but add the ID of the second argument to the map that is passed as the first argument. I know that you know that there is a pipe operator.

This is the function that passes the tests:

def create_or_update(attrs) do
  %__MODULE__{}
  |> changeset(attrs)
  |> Repo.insert()
end

The upsert test

Now on to the more interesting test case.

test "replaces previous dog if alive" do
  owner = insert_owner!(@owner)
  assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))
  assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_2, owner))
  assert [%Dog{name: name}] = Repo.all(Dog)
  assert name == @living_dog_2.name
end

This looks similar as the previous test, but now you ensure that there is still only one dog in the database after calling the function a second time. This means if there is already a living dog for the owner in the database, the entry will be replaced silently. You slowly come to the conclusion that this behaviour is probably not what you would want in this scenario, but since you are already invested in this example, you decide to continue.

Unsurprisingly, this test fails with the current implementation. To perform an upsert, you can set the :on_conflict option when calling Repo.insert/2. When using PostgreSQL, you also need to set the :conflict_target option to tell PostgreSQL which constraints you expect. This can be a list of columns or the constraint name itself. Since we named the unique index, we can try this:

def create_or_update(attrs) do
  %__MODULE__{}
  |> changeset(attrs)
  |> Repo.insert(
    on_conflict: :replace_all,
    conflict_target: {:constraint, :dogs_owner_id_alive_ix}
  )
end

However, now we're getting this error in all the tests:

** (Postgrex.Error) ERROR 42704 (undefined_object) constraint "dogs_owner_id_alive_ix" for table "dogs" does not exist

What’s going on?

Ecto turns the options above into this SQL fragment:

ON CONFLICT ON CONSTRAINT "dogs_owner_id_alive_ix"`

This is syntactically correct, but PostgreSQL requires you to specify the constraint including the whole WHERE clause. So we need to find a way to tell Ecto to do that. Unfortunately, there is no shortcut for this at the moment, but instead of passing the constraint name, Ecto also allows us to pass an unsafe SQL fragment.

def create_or_update(attrs) do
  %__MODULE__{}
  |> changeset(attrs)
  |> Repo.insert(
    on_conflict: :replace_all,
    conflict_target:
      {:unsafe_fragment, "(owner_id, status) WHERE status = 'alive'"}
  )
end

This works!

Associations

Your plan was to update all the associations of the dog (except for the owner) with that same function call. So far, the test fixtures look like this:

@dead_dog_1 %{name: "Clarence", status: "dead"}
@dead_dog_2 %{name: "Penelope", status: "dead"}
@dead_dog_3 %{name: "Christopher", status: "dead"}
@living_dog_1 %{name: "Dwayne", status: "alive"}
@living_dog_2 %{name: "Claire", status: "alive"}

Very simple, no associations. Let’s give Dwayne some toys and see what happens.

@living_dog_1 %{
    name: "Dwayne",
    status: "alive",
    toys: [
      %{product: "round chewie thing", rating: 5},
      %{product: "long catchie thing", rating: 4}
    ]
  }

Running the tests again, we get a foreign key constraint error for the test create_or_update/1 replaces previous dog if alive:

1) test create_or_update/1 replaces previous dog if alive (DogRegistry.DogTest)
     test/dog_registry/dog_test.exs:50
     ** (Ecto.ConstraintError) constraint error when attempting to insert struct:
* dog_toys_dog_id_fkey (foreign_key_constraint)
If you would like to stop this constraint violation from raising an
     exception and instead add it as an error to your changeset, please
     call `foreign_key_constraint/3` on your changeset with the constraint
     `:name` as an option.
The changeset defined the following constraints:
* dogs_owner_id_fkey (foreign_key_constraint)

The other tests pass, though. So inserting a dog including toys is no problem, but updating an existing dog with toys is.
Let’s remove the toys from the fixtures for a moment and update the test like this:

test "replaces previous dog if alive" do
  owner = insert_owner!(@owner)
  assert {:ok, %Dog{id: id_1}} =
               Dog.create_or_update(params(@living_dog_1, owner))
  assert {:ok, %Dog{id: id_2}} =
               Dog.create_or_update(params(@living_dog_2, owner))
  IO.inspect(id_1, label: "ID1")
  IO.inspect(id_2, label: "ID2")
  assert [_] = Repo.all(Dog)
end

You will get an output similar to this:

ID1: "6a274d9b-f0c1-4f06-b8db-1ef3c7103da4"
ID2: "7e9f7b9c-7a64-4a96-a0a3-2aaa08360b33"

As it happens, passing on_conflict: :replace_all to Repo.insert/2 will not only replace the values in the changeset, but also generate new values for all autogenerated fields including timestamps and the ID. If we want to keep some of the existing values, we can pass {:replace_all_except, fields} or {:replace, fields} as the option value. Let's change the on_conflict option of create_or_update/1 to:

on_conflict: {:replace_all_except, [:id, :inserted_at]}

We want to keep the original ID, but we also want to keep the inserted_at value, while we are happy to get an updated updated_at value.

If we run the test again, we will still see two different IDs. Why is that?

We configured our application with UUIDs as primary keys. Unlike auto-incrementing integer IDs, UUIDs are auto-generated by Ecto, not in the database (unless you set autogenerate to false in your schema and configure your database to generate them). And since Ecto already knows all values it generated or set itself, it doesn’t ask them from the database when running the query. So in our case, Ecto generates a UUID, sends an insert statement to the database, the database sees a conflict and makes a replacement instead, but keeps the old ID, and this existing ID is not returned back to Ecto.

The fix is easy, though. Repo.insert/2 accepts another option called :returning, which specifies which fields should be returned from the database. If you set returning to true, all fields will be returned.

So the final version of create_or_update/1 looks like this:

def create_or_update(attrs) do
  %__MODULE__{}
  |> changeset(attrs)
  |> Repo.insert(
    returning: [:id],
    on_conflict: {:replace_all_except, [:id, :inserted_at]},
    conflict_target:
      {:unsafe_fragment, "(owner_id, status) WHERE status = 'alive'"}
  )
end

Now the IDs in the test output will match and the tests will also pass with fixtures including associations.

However, our tests still don't have any assertions about the associations. Let's change that by adding this test:

test "replaces all associations" do
  owner = insert_owner!(@owner)
  assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_1, owner))
  assert {:ok, %Dog{}} = Dog.create_or_update(params(@living_dog_2, owner))
  assert [%DogToy{product: "small squeaky thing"}] = Repo.all(DogToy)
end

Similar to the previous test, except that we actually have a look at the dog toys after the upsert. @living_dog_1 had two toys, @living_dog_2 had only one, so we would expect to be only the single toy of dog 2 in the database after the update. But this test fails, the database includes both toys of dog 1 and the toy of dog 2.

The problem is that PostgreSQL can handle the replacement of the dog entry itself, but it doesn't know anything about the associations, and since we only passed an empty struct as the first argument to Ecto.Changeset.changeset/2 in our create_or_update/2 function, Ecto has no way of tracking the associations either.

Now, we could change that by 1) changing the function signature of create_or_update/2 and 2) passing a fully preloaded struct to the function, but that would defeat the purpose of the upsert function, because we could then as easily call a regular update function with that preloaded resource. Which means that if you want to do a complete replacement of the existing associations, you shouldn't use an upsert function after all.

Complete source code: github.com/woylie/article-dog-registry

Don't look directly into the sun.

originally published on Medium