Jamie Gaskins

Posted on Feb 11, 2018

Fast Type-Checked Serializers for Ruby Web APIs

#ruby #json #api #web

When you need two different processes to be able to talk to each other, they need to be able to connect and speak the same language. If one of them is a web app, you're using HTTP (and everything it's built on top of), so that covers most of it. If you're sending requests and receiving responses serialized as JSON, you're even closer. But we still need to figure out that last mile — what JSON objects are we going to send and receive?

If our web server is running on Ruby, we just need to get our models converted into hashes and collections converted into arrays. Conversion to JSON is a simple to_json call at that point. How we convert them into those hashes and arrays is the choice to be made.

Conversion Options

There are plenty of gems available to convert arbitrary Ruby objects into our JSON-serializable format. The most well-known of these is active_model_serializers, but a couple weeks ago Netflix announced their own serialization gem called fast_jsonapi.

With both gems, you declare your serializers like this:

class OrderSerializer < ActiveModel::Serializer
  attributes(
    :id,
    :customer_name,
    :customer_id,
    :line_item_ids,
    :delivery_address,
    :delivery_instructions,
    :total_price,
  )
end

The Downside of Those Options

Both of those gems are great. You can tell by looking at this serializer exactly what fields it'll be sending. The thing they're missing is validation of the types of the data they are emitting. Is delivery_address a string or is it broken out into a hash with each address part? Which of these fields can be nil? Is total_price sent as a string to be used presentationally, a floating-point number for calculation, or an integer number of cents to avoid floating-point error? These are impossible to tell by looking at the attribute declaration in the serializer.

You might be thinking "why do I want type checking in a dynamic language?!" While I can see the benefits of static typing, I really do enjoy the freedom that dynamic typing gives us to build applications. However, when you have two different applications that must both agree on how they speak to each other, you need to be 100% sure about all of the details, just like getting contract details ironed out in a consulting agreement. Adding unit tests to verify that a serializer emits the right data types is a lot of work. What if the serializers did that for us and would even let us know if our data was incorrect in production?

Well, I wouldn't be writing this if I didn't already have a solution for that. Enter primalize, a serialization gem that comes with type checking out of the box. Primalize is so named because it converts more advanced objects into their primitive counterparts, but it does it in an intelligent way.

How does it work?

Let's start by declaring a serializer:

class OrderSerializer < Primalize::Single
  attributes(
    id: string, # UUIDs for primary keys are great
    customer_name: string,
    customer_id: string,
    line_item_ids: array(string),
    delivery_address: primalize(AddressSerializer),
    delivery_instructions: optional(string),
    total_price: integer,
  )
end

Notice the attributes we declare all have types. We say the id is a string, the line_item_ids is an array containing string values. The total_price is an integer. These are pretty easy to understand and they tell us exactly what we're getting.

Notice the delivery_instructions attribute is marked as optional. This means it can be nil. If an attribute isn't marked as optional, it can't be nil.

Here's a list of the various types that Primalize supports for model serializers:

integer: whole numbers
float: floating-point numbers
number: any numeric value
string: text
boolean: explicitly true or false (not "truthy" or "falsy" values)
array(*types): an array containing values of the specified types
- Example: array(string, integer)
optional(*types): any of the specified types or nil
- Example: optional(string), both "foo" and nil are acceptable values
enum(*values): must be one of the specified values
- Example: enum('requested', 'shipped', 'delivered')
timestamp: a Date, Time, or DateTime value
any(*types): any value of the given types
- Example: any(string, integer) will only match on strings and integers
- If no types are specified, any value will match
primalize(YourPrimalizerClass): primalizes the specified attribute with the given Primalize::Single subclass
- Example: primalize(OrderSerializer)
object(**types): a hash of the specified structure
- Example: object(id: integer, name: string)
- Only the required keys need to be specified. The rest of the hash will pass.
- If no keys are specified, all of them are optional and it will match any hash.
- Ruby objects already define a method called hash that's used for resolving hash keys and determining Set inclusion, so we had to use the more language-agnostic name object. If we'd used hash, it would be effectively impossible to use a serializer class as a hash key or store it in a Set.

These type declarations are composable, so we can set up some really complex type declarations if our API calls for it:

attributes(
  user: object(
    name: optional(string),
    email: string,
    nicknames: array(string),
    role: optional(enum('user', 'agent', 'manager', 'admin')),
  ),
)

If you find yourself writing a lot of nested objects, though, it might be worth extracting that to another serializer and using primalize(ThatSerializer).

What happens if the type check fails?

In Ruby, we can't enforce that our models don't have the wrong types of attributes because any variable can hold any value, but we can run a type check at the time of serialization.

Serializers have a default type-mismatch handler, which is a callable object (as in, responds to call) that receives serializer_class, attribute, type, value. By default, it raises an exception, which is great for a development environment. In production, though, it might be preferable to let it pass through while still sending alerts.

You can customize what you do when a type mismatch occurs on your individual serializer class by setting MySerializerClass.type_mismatch_handler to your preferred method of handling it. To set it for all model serializers, use Primalize::Single.type_mismatch_handler. For example:

MySerializerClass.type_mismatch_handler = proc do |serializer, attr, type, value|
  message = "#{serializer}##{attr} is specified as #{type.inspect}, but is #{value.inspect}"

  Slack.notify '#bugs', message
  BugTracker.notify message

  # the return value of the block is the value to be used for that attribute
  value
end

Attribute conversion

If an attribute isn't already the type you're expecting, you can provide a block to its type declaration to specify how to coerce it to that type. For example, if an Address#city returns a City object instead of just the name, we could serialize it like this:

class AddressSerializer < Primalize::Single
  attributes(
    city: string { |city| city.name },
    # ...
  )
end

This indicates that we would call name on the city that's in the address's city attribute. The type check will still occur here. If the result of that block isn't a string, we'll trigger a type mismatch.

Virtual attributes

Sometimes you want to provide attributes that don't actually exist on the model being serialized. Just like how our server-side domain models don't need to match the database schema, a client doesn't need to know that the server-side model doesn't have a particular attribute. For those "virtual" attributes, you can define a method that will compute the attribute from what the model does have:

class AddressSerializer < Primalize::Single
  attributes(
    latitude: float,
    longitude: float,
    # ...
  )

  def latitude
    object.coordinates.latitude
  end

  def longitude
    object.coordinates.longitude
  end
end

Composite Serializers

That's just individual model serializers. There's also first-class support for returning associated objects in a single response without nesting them with primalize(AnotherSerializer):

class OrderResponse < Primalize::Many
  attributes(
    order: OrderSerializer,
    line_items: enumerable(LineItemSerializer),
    address: AddressSerializer,
    customer: CustomerSerializer,

    # Only required for corporate accounts
    purchase_order: optional(PurchaseOrderSerializer),
  )
end

I typically refer to these as "response serializers" and, while it's possible to serialize just the domain model in an API response, I almost always wrap them inside a serializer like this in case I need to return associated models in the future. If you have any API consumers you don't control, once you start returning the model as the top-level object, you're stuck with it most of the time.

To use this serializer, you instantiate it with the keys you gave it:

OrderResponse.new(
  order: order,
  line_items: order.line_items,
  address: order.delivery_address,
  customer: order.customer,
  purchase_order: order.purchase_order,
)

Primalize::Many doesn't traverse the object graph for you, and this might feel inconvenient, but it's intended functionality. It ensures that you can customize what you're sending. For example, if your order is an ActiveRecord model, you may not want to send all of its line_items together. You might split them between taxable_line_items and non_taxable_line_items for some business reason. If the serializer traversed the association for you, you might not be able to specify that without adding a method on the Order model. With Primalize, though, you can set up your serializer like this:

class OrderResponse < Primalize::Many
  attributes(
    # ...
    taxable_line_items: enumerable(LineItemSerializer),
    non_taxable_line_items: enumerable(LineItemSerializer),
  )
end

OrderResponse.new(
  # ...
  taxable_line_items: order.line_items.taxable,
  non_taxable_line_items: order.line_items.non_taxable,
)

The type checking comes into play with response serializers, as well. For example, if we passed a single line item in where we specified enumerable(LineItemSerializer), we would get an error. If we passed nil for a field without specifying it as optional, we would also get an error.

All this helps make more robust API endpoints.

JSONAPI

Some client applications might be written in a way that is more suited to the JSONAPI response structure. For such applications, there is a JSONAPI wrapper for Primalize.

Performance

I mentioned in the title that these serializers are fast, but I haven't touched on that part yet. When I benchmarked Primalize::JSONAPI against AMS and Netflix's fast_jsonapi gem, it was over 1200% as fast as AMS and about half as fast as the Netflix gem. When you consider that the selling point of fast_jsonapi is that … well … it's fast, and the difference between it and Primalize to serialize over 1000 models is 12ms (less than 12µs each), it's still close enough. On typical payloads (dozens of models, maybe 100), you'll have garbage-collection pauses longer than the difference between them.

Also, that benchmark was using the JSONAPI wrapper, which is doing considerably more work (including its own naive traversal of associated models). If you don't require that particular format, you can stick with Primalize::Many to cut off about 25% of that time.

I don't know about you, but I'm certainly willing to trade 12ms (much less on typical payloads) for peace of mind that my response payloads match what the client expects.

Future Development

Primalize is a gem based on a pattern I implemented at a previous employer. It has been used in production for around two years and has stabilized the communication between their API and its consumers significantly while also improving its performance. With that said, there are still ways to improve.

One idea that a coworker at that employer requested was the ability to generate RAML or Swagger documentation. For example, if an API endpoint returns an OrderResponse, you should be able to generate the exact structure a client could expect for that endpoint. RAML docs can be imported into Postman for easier testing and consumption of REST APIs.

Performance could probably be improved even more, both in the baseline Primalize classes and the JSONAPI wrapper.

If you'd like to contribute, I'm always open to conversation (gitter / twitter / comments on this post), suggestions, and pull requests. Thanks, everyone! ❤️

Top comments (5)

John Carroll • Feb 13 '18 • Edited

Interesting. Have you given any thought to support validating a json string or a hash using a similar API? (i.e. the opposite of this) I'm guessing (admittedly, without giving it much thought) that the logic would be similar. I ask because, personally, I use GraphQL-ruby which handles serialization. GraphQL's ability to validate params is limited, however.

Jamie Gaskins • Feb 13 '18

I have, indeed. A lot. :-) There's an issue on the Primalize repo that discusses this. TL;DR: It's hard without knowing how your objects are constructed and insight into how things were deserialized. It's a feature I'd love to add if someone has thoughts on how to do it, though.

John Carroll • Feb 13 '18

Currently I make use of the dry-validation & dry-types gems. I really like dry-validation's API, but dry-validation is pretty buggy (in their defense, it is in beta). Your API seems similar, which I like (albeit yours seems a tad less flexible). I'd happily give up some flexibility for stability, however. I'll check out the issue in the Repo.

John Carroll • Feb 13 '18

Hmm, having just read the issue in the repo, It looks like you might be considering something a bit different from what I have in mind. Seems like you're interested in validating AND immediately instantiating objects?

Personally, I've been looking for a great solution to just validate a params (or other) hash--leaving me to confidently do whatever I want with the hash afterwards. But I'll keep an eye Primalize. I appreciate the simplicity of the gem.

Jamie Gaskins • Feb 14 '18

It does sound like you and I have different goals. Primalize is for converting domain objects into primitive data structures for the purpose serialization. It doesn’t have to be serialized, you can do whatever you like with it, but that’s the use case it’s optimized for.

The purpose of the type checking is only for ensuring that the structure I send matches what the client expects. The values of those types are beyond its scope because you’re probably already allowing your domain model to store that value. You can use Primalize’s attribute coercion or virtual attributes to massage those values, but I'm not sure it should be up to the serializer to validate the values.

Then again, maybe it’s not beyond the scope. You could argue that it’s a serializer’s responsibility to ensure it’s not emitting an unexpected value. For example, if part of the contract of the REST API is that a particular attribute returned would be a percentage represented as a float between 0-1, it might be worth arguing and I would totally entertain a discussion about it in the repo’s issues. :-) If it helps, it’d be really simple to implement.

Also, Dry::Types and Dry::Validation are more flexible, but they’re also quite a bit more verbose. I can give Primalize attributes declaration directly to a consumer of my REST API and it would be immediately obvious to them what to expect, even if they don’t know Ruby, because the attributes declaration is even in the same shape as the data they would receive. They might have to spend a little more time deciphering the Dry::Types DSL.

DEV Community