- Notifications
You must be signed in to change notification settings - Fork26k
Closed
Description
I have created a PyTorch model checkpoint usingtorch.save; however, I'm unable to load this model usingtorch.load. I run into the following error:
>>> torch.load('model_best.pth.tar')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/anaconda3/envs/pytorch_source/lib/python3.7/site-packages/torch/serialization.py", line 358, in load return _load(f, map_location, pickle_module) File "/home/ubuntu/anaconda3/envs/pytorch_source/lib/python3.7/site-packages/torch/serialization.py", line 549, in _load deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)RuntimeError: storage has wrong size: expected -7659745797817883467 got 512The model was saved using code like this:
defsave_checkpoint(epoch,model,best_top5,optimizer,is_best=False,filename='checkpoint.pth.tar'):state= {'epoch':epoch+1,'state_dict':model.state_dict(),'best_top5':best_top5,'optimizer' :optimizer.state_dict(), }torch.save(state,filename)ifargs.local_rank==0:ifis_best:save_checkpoint(epoch,model,best_top5,optimizer,is_best=True,filename='model_best.pth.tar')
The model was trained across multiplep3.16xlarge instances.
Metadata
Metadata
Assignees
Labels
No labels