loading...

Serialization, JSON, and Distributed Systems

adriennedomingus profile image Adrienne Domingus Originally published at Medium on ・2 min read

Photo by Jonathan Pielmayer on Unsplash. It’s pickles, because “Pickling” is a form of Python object serialization, even though we won’t talk about that today :)

Working with a teammate who was running into an error recently when testing their work in the staging environment, that they hadn’t seen when developing locally. This was the error they were seeing:

TypeError: <Object> is not JSON serializable

But, “I’m not calling json.dumps() on anything!” they said. And sure enough, they weren’t. But they were enqueueing background tasks, which, when developing locally happen in the foreground without ever being enqueued. This the first hint to the difference between the two environments, and where the serialization was happening.

But first, let’s define our terms.

Serialization

Serialization is the process of turning data structures into another format that can be stored or transmitted over the network.

Examples of serialization formats include JSON, XML, and YAML, so the process of turning any data type into one of these formats is called “serialization. ”

Spoiler alert: In this case, what we’re looking at is the attempt to turn an object returned from the Django ORM into JSON.

JSON

JSON is one of the most common ways of formatting data for sending over the network. It stands for JavaScript Object Notation, because it was derived from JavaScript, but it is actually language agnostic, and most languages have built in functions to move data into and out of JSON. It supports:

  • Numbers
  • Strings
  • Arrays or lists
  • Hashes/dictionaries/objects
  • Boolean values
  • Empty values, using null

Distributed Systems, or: What can’t be serialized

Ok, so what was happening in this case? My teammate had edited some existing code so that a background task was being passed an ORM object as a keyword argument. This worked locally because background tasks are never enqueued in the local dev environment, but once we moved to staging, they were.

More specifically: when a background task is enqueued, the calling arguments and keyword arguments are passed along the network to be enqueued, so they can be picked up later by separate worker processes. In order to pass these calling arguments along the network, they must first be serialized. The library’s (celery, in this case) attempt to do this was throwing the error in question: TypeError: <Object> is not JSON serializable.

So…what do we do about this? There are a couple options:

  • Pass the id of the object as the calling argument instead of the object itself, and have the task re-retrieve it from the database. This does require an additional database query, but if the task needs to operate on the object itself, this isn’t avoidable
  • Retrieve the data necessary from the object before enqueuing the task, and pass only that along instead.

Discussion

pic
Editor guide