- Notifications
You must be signed in to change notification settings - Fork26.3k
[jit] add docs for serialization#23456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
suo commentedJul 29, 2019
ping@ZolotukhinM, you mentioned you had some questions about this :) |
Differential Revision: [D16552602](https://our.internmc.facebook.com/intern/diff/D16552602)
Differential Revision: [D16552602](https://our.internmc.facebook.com/intern/diff/D16552602)
ZolotukhinM left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Super-helpful doc, thank you!
torch/csrc/jit/docs/serialization.md Outdated
| @@ -0,0 +1,237 @@ | |||
| # Torchscript serialization | |||
| This document explains the Torchscript serialization format, and the anatomy of a call to `torch::jit::save()` or `torch::jit::load()`. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
s/Torchscript/TorchScript/
torch/csrc/jit/docs/serialization.md Outdated
| You'll notice that there are `.py` and `.pkl` files in this archive. That's because our serialization format tries to mimic Python's. All "code-like" information (methods, modules, classes, functions) are stored as human-readable `.py` containing valid Python syntax, and all "data-like" information (attributes, objects, etc.) are pickled using a subset of Python's pickle protocol. | ||
| A model is really a top-level module with some submodules, parameters, and so on depending on what the author needs. So, `data.pkl` contains the pickled top-level module. Deserializing the model is as simple as calling `unpickle()` on `data.pkl`, which will restore the module object with its associated code and data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Deserializing the model is as simple as calling
unpickle()ondata.pkl
What confuses me in this paragraph is that you were previously saying that pickle is only for data - how do we pick up the code as well from this? Some clarification might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
While this was meant to be high-level, understandably, I agree with@ZolotukhinM, that it won't hurt to go a little bit deeper here. I also got a few questions.
- Do we pickle submodule filenames into
data.pklwhich are then used to locate the correspondingpyfiles in a zip archive and compile them? - How do module's parameters get assigned their tensors' data (their weights) back? Do we use
state_dictor do we do it in some other way? - Are UDTs treated any differently from
Modules?
If we add a higher-level description of whatload/__getstate__ forScriptModule do, it will be most helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the feedback! I will add a section on how models are loaded, hopefully that will clarify these questions
Krovatkin left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Added a few comments.
torch/csrc/jit/docs/serialization.md Outdated
| You'll notice that there are `.py` and `.pkl` files in this archive. That's because our serialization format tries to mimic Python's. All "code-like" information (methods, modules, classes, functions) are stored as human-readable `.py` containing valid Python syntax, and all "data-like" information (attributes, objects, etc.) are pickled using a subset of Python's pickle protocol. | ||
| A model is really a top-level module with some submodules, parameters, and so on depending on what the author needs. So, `data.pkl` contains the pickled top-level module. Deserializing the model is as simple as calling `unpickle()` on `data.pkl`, which will restore the module object with its associated code and data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
While this was meant to be high-level, understandably, I agree with@ZolotukhinM, that it won't hurt to go a little bit deeper here. I also got a few questions.
- Do we pickle submodule filenames into
data.pklwhich are then used to locate the correspondingpyfiles in a zip archive and compile them? - How do module's parameters get assigned their tensors' data (their weights) back? Do we use
state_dictor do we do it in some other way? - Are UDTs treated any differently from
Modules?
If we add a higher-level description of whatload/__getstate__ forScriptModule do, it will be most helpful.
torch/csrc/jit/docs/serialization.md Outdated
| 1. It is the owner (in a C++ sense) for all code objects. | ||
| 2. It forms a namespace in which code objects must have unique names. | ||
| A `CompilationUnit` is created whenever `torch::jit::load()` is invoked, to place the deserializednewly code objects in. In Python, there is a single global `CompilationUnit` that holds all code objects defined in Python. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
deserializednewly
missing a whitespace.
| A `CompilationUnit` is created whenever `torch::jit::load()` is invoked, to place the deserializednewly code objects in. In Python, there is a single global `CompilationUnit` that holds all code objects defined in Python. | ||
| ### `CompilationUnit` ownership semantics | ||
| There are a few different entities that participate in the ownership model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe, the topic ofownership deserves its own section, which can be read independently from theserialization topic?
| At a high level, code serialization means: | ||
| 1. Transforming `ClassType`s and `Function`s (called "code objects") into Python source code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
ClassType includesScriptModules and UDTs?
Differential Revision: [D16552602](https://our.internmc.facebook.com/intern/diff/D16552602)
Uh oh!
There was an error while loading.Please reload this page.
Stack fromghstack:
Differential Revision:D16552602