The lightspeed returns. ⚡
Today I'm going to cover pickling in Python. Very briefly, pickling is the act of serializing python objects, which will either be sent to another python program or will be saved to disk and read by the same Python program in case it is stopped and restarted.
There are 5 different pickling formats in Python, each newer than the other, and they are versions 1, 2, 3, 4 and 5. This post will only cover the latest format, version 5, which was added in Python 3.4. In particular, Python 2 does not support this format.
There is also a simpler serialization module called marshal
, but that should not be used because it is not portable across python versions since it's mainly used for .pyc files.
Heads Up: the pickle module is not secure if it's used by itself. Because the pickled object is basically python opcodes it is possible to make an opcode sequence, a malicious pickled object, that crashes the interpreter or exploits a security vulnerability. Always sign your picked data with the hmac
python module (basically the difference between HTTP and HTTPS to give you an analogy).
hmac
and SSL in general will be saved for a future post and in this one I will only cover pickle
.
Why not just use JSON?
Hey, JSON is a great data format to use... for data. It can't help you if you're trying to send over a function or class because that's not what it's designed to do. The pickle
module is designed to handle almost every single python object in the language. So while you can serialize lists, dictionaries (also called maps and hash tables), strings and numbers in JSON, and be able to read the file that is made since JSON files are human-readable, that's all it can do.
Dump
To serialize an object we call pickle.dump(obj, filehandle, protocol=None)
. This function has some other arguments you don't need to know. This dumps the object into an file handle (the file must already be open) - think of it as a time capsule. the protocol=None
argument means it will choose a protocol to use by itself (usually the latest protocol), but you can set the protocol number to use in this argument.
When using dump()
make sure you pass an opened file handle. Don't give it the file name or it won't work.
You want the actual bytes
of the serialized object instead of dumping it into a file? You should use pickle.dumps(obj)
instead. It returns the actual serialized object.
Load
To load an object from a file, we call pickle.load(filehandle)
function, which returns the actual python object which was serialized. Please note that you need some way to determine the type of the object pickled, so you could use something like obj=pickle.load(filehandle); type(obj)
to get the type of the object.
Similarly, one could also load an object from the serialized bytes instead of a file. This is accomplished by calling pickle.loads(bytes_object)
.
Catching Errors
If for some reason your pickled data gets corrupted (or as I like to call it, spoiled), then load()
and loads()
will raise an UnpicklingError
. This is an exception you can catch for unpickling failures. Note that picked objects don't actually expire. They get messed up if you only copy part of the serialized data among other things. From the top of my head I recount that incomplete downloads of the picked data can do that too.
Also, dump()
and dumps()
will throw a PicklingError
if the object can't be pickled.
Copied from the Python documentation (Don't worry if you don't understand some of these types, as long as the type you use can be pickled):
What can be pickled and unpickled?
The following types can be pickled:
None, True, and False
integers, floating point numbers, complex numbers
strings, bytes, bytearrays
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module (using def, not lambda)
built-in functions defined at the top level of a module
classes that are defined at the top level of a module
instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).
Do note that when a PicklingError
is thrown the object may have been partially written to the file! This can happen to dump()
(but not dumps()
since no files are involved). So, I advise you always pickle your object with dumps()
, take the resulting serialized data, get a file descriptor with open(file, 'wb')
and write the serialized data into the file descriptor that you got.
Want to catch both of these exceptions at the same time? Use pickle.PickleError
instead.
Serializing multiple objects in a file
Python has a ready-made class for you for putting an object into a file called pickle.Pickler(filehandle, protocol=None)
, one at a time. Also there is a class for reading one object at a time from a file which is called pickle.Unpickler(filehandle, protocol=None)
. These classes return a Pickler
and Unpickler
object respectively.
An example might help clear things up:
>>> import pickle
>>> writefile = open('somefile', 'wb')
>>> p = pickle.Pickler(writefile)
>>> p.dump([1, 2, 3])
>>> p.dump('string')
>>> p.dump(None)
>>> writefile.close()
>>> readfile = open('somefile', 'rb')
>>> u = pickle.Unpickler(readfile)
>>> u.load()
[1, 2, 3]
>>> u.load()
'string'
>>> u.load() # This loads None but the result is not shown.
>>> readfile.seek(0, 0) # Rewinds to beginning of file
0
>>> u.load()
[1, 2, 3]
>>> u.load()
'string'
>>> u.load() # This loads None again.
>>> u.load()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: Ran out of input
These classes also have various advanced parameters which won't be discussed here. You probably don't need to know them either, as they are mainly useful for the Python maintainers.
TL;DR
Was this pickle
post confusing? Here is how you get started with it at lightspeed:
>>> import pickle
>>> f = open('somefile', 'wb')
>>> pickle.dump({'some': 'dictionary'}, f)
>>> f.close()
>>> f = open('somefile', 'rb')
>>> pickle.load(f)
{'some': 'dictionary'}
If you're going to send the object across a network connection, try this instead:
>>> import pickle
>>> obj = pickle.dumps({'some': 'dictionary'})
>>> # Send obj somewhere
>>> # ...
>>> # Some other python instance which received obj
>>> import pickle
>>> pickle.loads(obj)
{'some': 'dictionary'}
And we're done
If you see any glaring mistakes in this post, be sure to notify me so I can fix them.
Top comments (0)