Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[jit] add docs for serialization#23456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
suo wants to merge6 commits intogh/suo/106/basefromgh/suo/106/head
Closed

Conversation

@suo
Copy link
Member

@suosuo commentedJul 26, 2019
edited
Loading

Stack fromghstack:

Differential Revision:D16552602

@pytorchbotpytorchbot added the oncall: jitAdd this issue/PR to JIT oncall triage queue labelJul 26, 2019
@pytorchbotpytorchbot added the module: docsRelated to our documentation, both in docs/ and docblocks labelJul 26, 2019
suo added a commit that referenced this pull requestJul 26, 2019
ghstack-source-id:7743b42Pull Requestresolved:#23456
@suo
Copy link
MemberAuthor

suo commentedJul 29, 2019

ping@ZolotukhinM, you mentioned you had some questions about this :)

Copy link

@ZolotukhinMZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Super-helpful doc, thank you!

@@ -0,0 +1,237 @@
# Torchscript serialization

This document explains the Torchscript serialization format, and the anatomy of a call to `torch::jit::save()` or `torch::jit::load()`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

s/Torchscript/TorchScript/


You'll notice that there are `.py` and `.pkl` files in this archive. That's because our serialization format tries to mimic Python's. All "code-like" information (methods, modules, classes, functions) are stored as human-readable `.py` containing valid Python syntax, and all "data-like" information (attributes, objects, etc.) are pickled using a subset of Python's pickle protocol.

A model is really a top-level module with some submodules, parameters, and so on depending on what the author needs. So, `data.pkl` contains the pickled top-level module. Deserializing the model is as simple as calling `unpickle()` on `data.pkl`, which will restore the module object with its associated code and data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Deserializing the model is as simple as callingunpickle() ondata.pkl

What confuses me in this paragraph is that you were previously saying that pickle is only for data - how do we pick up the code as well from this? Some clarification might be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While this was meant to be high-level, understandably, I agree with@ZolotukhinM, that it won't hurt to go a little bit deeper here. I also got a few questions.

  • Do we pickle submodule filenames intodata.pkl which are then used to locate the correspondingpy files in a zip archive and compile them?
  • How do module's parameters get assigned their tensors' data (their weights) back? Do we usestate_dict or do we do it in some other way?
  • Are UDTs treated any differently fromModules?

If we add a higher-level description of whatload/__getstate__ forScriptModule do, it will be most helpful.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the feedback! I will add a section on how models are loaded, hopefully that will clarify these questions

Copy link
Contributor

@KrovatkinKrovatkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added a few comments.


You'll notice that there are `.py` and `.pkl` files in this archive. That's because our serialization format tries to mimic Python's. All "code-like" information (methods, modules, classes, functions) are stored as human-readable `.py` containing valid Python syntax, and all "data-like" information (attributes, objects, etc.) are pickled using a subset of Python's pickle protocol.

A model is really a top-level module with some submodules, parameters, and so on depending on what the author needs. So, `data.pkl` contains the pickled top-level module. Deserializing the model is as simple as calling `unpickle()` on `data.pkl`, which will restore the module object with its associated code and data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While this was meant to be high-level, understandably, I agree with@ZolotukhinM, that it won't hurt to go a little bit deeper here. I also got a few questions.

  • Do we pickle submodule filenames intodata.pkl which are then used to locate the correspondingpy files in a zip archive and compile them?
  • How do module's parameters get assigned their tensors' data (their weights) back? Do we usestate_dict or do we do it in some other way?
  • Are UDTs treated any differently fromModules?

If we add a higher-level description of whatload/__getstate__ forScriptModule do, it will be most helpful.

1. It is the owner (in a C++ sense) for all code objects.
2. It forms a namespace in which code objects must have unique names.

A `CompilationUnit` is created whenever `torch::jit::load()` is invoked, to place the deserializednewly code objects in. In Python, there is a single global `CompilationUnit` that holds all code objects defined in Python.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

deserializednewly
missing a whitespace.

A `CompilationUnit` is created whenever `torch::jit::load()` is invoked, to place the deserializednewly code objects in. In Python, there is a single global `CompilationUnit` that holds all code objects defined in Python.

### `CompilationUnit` ownership semantics
There are a few different entities that participate in the ownership model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe, the topic ofownership deserves its own section, which can be read independently from theserialization topic?


At a high level, code serialization means:

1. Transforming `ClassType`s and `Function`s (called "code objects") into Python source code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

ClassType includesScriptModules and UDTs?

@facebook-github-bot
Copy link
Contributor

@suo merged this pull request in1c86b8a.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@zdevitozdevitozdevito approved these changes

+2 more reviewers

@ZolotukhinMZolotukhinMZolotukhinM approved these changes

@KrovatkinKrovatkinKrovatkin approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

Mergedmodule: docsRelated to our documentation, both in docs/ and docblocksoncall: jitAdd this issue/PR to JIT oncall triage queue

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@suo@facebook-github-bot@zdevito@ZolotukhinM@Krovatkin@pytorchbot@mruberry

[8]ページ先頭

©2009-2025 Movatter.jp