NotificationsYou must be signed in to change notification settings
Fork2.8k
Star9.5k

[Bugfix] Fix bug in cross entropy loss#3457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

xiexinch merged 1 commit intoopen-mmlab:dev-1.xfrommmeendez8:main

Dec 4, 2023

Merged

[Bugfix] Fix bug in cross entropy loss#3457

xiexinch merged 1 commit intoopen-mmlab:dev-1.xfrommmeendez8:main

Dec 4, 2023

Conversation

Copy link

Contributor

mmeendez8 commentedNov 30, 2023

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Fixes#3412

Modification

We just need to replace tensor creation using torch.stack() instead of torch.tensor().

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
The documentation has been modified accordingly, like docstring or example tutorials.

Fix bug in cross entropy loss

1253791

xiexinch changed the base branch frommain todev-1.x

December 4, 2023 06:13

xiexinch approved these changes

Dec 4, 2023

View reviewed changes

xiexinch merged commite51f511 intoopen-mmlab:dev-1.x

Dec 4, 2023

mmeendez8 mentioned this pull request

Dec 7, 2023

issue with class weight and cross entropy loss#3412

Closed

Copy link

call560 commentedJan 2, 2024•
edited
Loading

When I used this 'bug fix' to fix the WCE loss error reported during KNET training, I got this assertion error again. The error message is as follows:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [96,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [97,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [98,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [99,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [100,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [101,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [102,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [103,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [104,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [105,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [106,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [107,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [108,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [109,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [110,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [111,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [112,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [113,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [114,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [115,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [116,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [117,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [118,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [119,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [120,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [121,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [122,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [123,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [124,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [125,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [126,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [203,0,0], thread: [127,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [96,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [97,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [98,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [99,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [100,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [101,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [102,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [103,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [104,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [105,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [106,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [107,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [108,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [109,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [110,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [111,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [112,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [113,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [114,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [115,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [116,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [117,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [118,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [119,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [120,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [121,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [122,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [123,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [124,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [125,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [126,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [250,0,0], thread: [127,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [96,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [97,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [98,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [99,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [100,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [101,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [102,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [103,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [104,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [105,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [106,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [107,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [108,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [109,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [110,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [111,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [112,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [113,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [114,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [115,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [116,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [117,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [118,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [119,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [120,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [121,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [122,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [123,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [124,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [125,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [126,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [249,0,0], thread: [127,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "tools/train.py", line 104, in
main()
File "tools/train.py", line 100, in main
runner.train()
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 278, in run
self.run_iter(data_batch)
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/runner/loops.py", line 301, in run_iter
outputs = self.runner.model.train_step(
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward
results = self(**data, mode=mode)
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/base.py", line 94, in forward
return self.loss(inputs, data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 178, in loss
loss_decode = self._decode_head_forward_train(x, data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 139, in _decode_head_forward_train
loss_decode = self.decode_head.loss(inputs, data_samples,
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 262, in loss
losses = self.loss_by_feat(seg_logits, batch_data_samples)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/knet_head.py", line 456, in loss_by_feat
loss = self.kernel_generate_head.loss_by_feat(
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 324, in loss_by_feat
loss[loss_decode.loss_name] = loss_decode(
File "/root/miniconda3/envs/mmseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py", line 288, in forward
loss_cls = self.loss_weight * self.cls_criterion(
File "/root/autodl-tmp/defect_mmseg/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py", line 73, in cross_entropy
avg_factor = label_weights.sum()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile withTORCH_USE_CUDA_DSA to enable device-side assertions.

This is my configuration file:

albu_train_transforms = [
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(blur_limit=(
9,
11,
), p=1.0, type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(clip_limit=4.0, p=1, tile_grid_size=(
8,
8,
), type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(brightness_coefficient=0.8, p=1.0, type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
]
checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth'
conv_kernel_size = 1
crop_size = (
512,
512,
)
data_preprocessor = dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor')
data_root = './data/coco/'
dataset_type = 'ZBr10KDataset'
default_hooks = dict(
checkpoint=dict(
by_epoch=False,
interval=2500,
max_keep_ckpts=2,
save_best='mIoU',
type='CheckpointHook'),
logger=dict(interval=100, log_metric_by_epoch=False, type='LoggerHook'),
param_scheduler=dict(type='ParamSchedulerHook'),
sampler_seed=dict(type='DistSamplerSeedHook'),
timer=dict(type='IterTimerHook'),
visualization=dict(type='SegVisualizationHook'))
default_scope = 'mmseg'
env_cfg = dict(
cudnn_benchmark=True,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
img_ratios = [
0.5,
0.75,
1.0,
1.25,
1.5,
1.75,
]
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=False)
model = dict(
auxiliary_head=dict(
align_corners=False,
channels=256,
concat_input=False,
dropout_ratio=0.1,
in_channels=768,
in_index=2,
loss_decode=dict(
class_weight=[
1.0,
5.133,
5.9931,
5.0811,
4.3589,
],
loss_weight=0.4,
type='CrossEntropyLoss',
use_sigmoid=False),
norm_cfg=dict(requires_grad=True, type='BN'),
num_classes=5,
num_convs=1,
type='FCNHead'),
backbone=dict(
attn_drop_rate=0.0,
depths=[
2,
2,
18,
2,
],
drop_path_rate=0.3,
drop_rate=0.0,
embed_dims=192,
mlp_ratio=4,
num_heads=[
6,
12,
24,
48,
],
out_indices=(
0,
1,
2,
3,
),
patch_norm=True,
qk_scale=None,
qkv_bias=True,
type='SwinTransformer',
use_abs_pos_embed=False,
window_size=7),
data_preprocessor=dict(
bgr_to_rgb=True,
mean=[
123.675,
116.28,
103.53,
],
pad_val=0,
seg_pad_val=255,
size=(
512,
512,
),
std=[
58.395,
57.12,
57.375,
],
type='SegDataPreProcessor'),
decode_head=dict(
kernel_generate_head=dict(
align_corners=False,
channels=512,
dropout_ratio=0.1,
in_channels=[
192,
384,
768,
1536,
],
in_index=[
0,
1,
2,
3,
],
loss_decode=dict(
class_weight=[
1.0,
5.133,
5.9931,
5.0811,
4.3589,
],
loss_weight=1.0,
type='CrossEntropyLoss',
use_sigmoid=False),
norm_cfg=dict(requires_grad=True, type='BN'),
num_classes=5,
pool_scales=(
1,
2,
3,
6,
),
type='UPerHead'),
kernel_update_head=[
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
dict(
conv_kernel_size=1,
dropout=0.0,
feat_transform_cfg=dict(
act_cfg=None, conv_cfg=dict(type='Conv2d')),
feedforward_channels=2048,
ffn_act_cfg=dict(inplace=True, type='ReLU'),
in_channels=512,
kernel_updator_cfg=dict(
act_cfg=dict(inplace=True, type='ReLU'),
feat_channels=256,
in_channels=256,
norm_cfg=dict(type='LN'),
out_channels=256,
type='KernelUpdator'),
num_classes=5,
num_ffn_fcs=2,
num_heads=8,
num_mask_fcs=1,
out_channels=512,
type='KernelUpdateHead',
with_ffn=True),
],
num_stages=3,
type='IterativeDecodeHead'),
pretrained=
'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth',
test_cfg=dict(mode='whole'),
train_cfg=dict(),
type='EncoderDecoder')
norm_cfg = dict(requires_grad=True, type='BN')
num_stages = 3
optim_wrapper = dict(
clip_grad=dict(max_norm=1, norm_type=2),
optimizer=dict(
betas=(
0.9,
0.999,
), lr=6e-05, type='AdamW', weight_decay=0.0005),
paramwise_cfg=dict(
custom_keys=dict(
absolute_pos_embed=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0),
relative_position_bias_table=dict(decay_mult=0.0))),
type='OptimWrapper')
optimizer = dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0005)
param_scheduler = [
dict(
begin=0, by_epoch=False, end=1000, start_factor=0.001,
type='LinearLR'),
dict(
begin=1000,
by_epoch=False,
end=80000,
milestones=[
60000,
72000,
],
type='MultiStepLR'),
]
randomness = dict(seed=0)
resume = False
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/test', seg_map_path='annotations/test'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
iou_metrics=[
'mIoU',
'mDice',
'mFscore',
], type='IoUMetric')
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
]
train_cfg = dict(max_iters=40000, type='IterBasedTrainLoop', val_interval=500)
train_dataloader = dict(
batch_size=6,
dataset=dict(
data_prefix=dict(
img_path='images/train', seg_map_path='annotations/train'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
1024,
),
type='RandomResize'),
dict(
cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(
transforms=[
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(
blur_limit=(
9,
11,
),
p=1.0,
type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(
clip_limit=4.0,
p=1,
tile_grid_size=(
8,
8,
),
type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(
brightness_coefficient=0.8,
p=1.0,
type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
],
type='Albu'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=2,
persistent_workers=True,
sampler=dict(shuffle=True, type='InfiniteSampler'))
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations'),
dict(
keep_ratio=True,
ratio_range=(
0.5,
2.0,
),
scale=(
2048,
1024,
),
type='RandomResize'),
dict(cat_max_ratio=0.75, crop_size=(
512,
512,
), type='RandomCrop'),
dict(
transforms=[
dict(limit=45, p=0.5, type='SafeRotate'),
dict(p=0.5, type='Flip'),
dict(
p=0.3,
transforms=[
dict(p=1, type='RandomBrightnessContrast'),
dict(p=1, scale=0.4, type='RandomToneCurve'),
],
type='OneOf'),
dict(
n=2,
p=0.3,
transforms=[
dict(blur_limit=(
9,
11,
), p=1.0, type='GaussianBlur'),
dict(p=1.0, type='GridDistortion'),
dict(
clip_limit=4.0,
p=1,
tile_grid_size=(
8,
8,
),
type='CLAHE'),
dict(
alpha=(
0.8,
1.0,
),
blur_limit=(
11,
31,
),
p=1,
threshold=0,
type='UnsharpMask'),
dict(
color_shift=(
0.1,
0.3,
),
intensity=(
0.3,
0.5,
),
p=1.0,
type='ISONoise'),
dict(p=0.3, type='RandomGravel'),
],
type='SomeOf'),
dict(
p=0.1,
transforms=[
dict(
alpha_coef=0.1,
fog_coef_lower=0.2,
fog_coef_upper=0.5,
p=0.5,
type='RandomFog'),
dict(brightness_coefficient=0.8, p=1.0, type='RandomRain'),
dict(
brightness_coeff=1.0,
p=0.5,
snow_point_lower=0.2,
snow_point_upper=0.5,
type='RandomSnow'),
dict(
angle_lower=0.5,
flare_roi=(
0,
0,
1,
0.5,
),
p=0.2,
src_radius=50,
type='RandomSunFlare'),
dict(
num_shadows_lower=1,
num_shadows_upper=1,
p=0.2,
type='RandomShadow'),
dict(
cutout_threshold=(
0.3,
0.6,
),
mean=0.4,
p=0.2,
std=0.3,
type='Spatter'),
],
type='OneOf'),
dict(
p=0.1,
transforms=[
dict(
p=1.0,
quality_lower=30,
quality_upper=70,
type='ImageCompression'),
dict(p=1.0, type='RingingOvershoot'),
],
type='OneOf'),
],
type='Albu'),
dict(type='PackSegInputs'),
]
tta_model = dict(type='SegTTAModel')
tta_pipeline = [
dict(file_client_args=dict(backend='disk'), type='LoadImageFromFile'),
dict(
transforms=[
[
dict(keep_ratio=True, scale_factor=0.5, type='Resize'),
dict(keep_ratio=True, scale_factor=0.75, type='Resize'),
dict(keep_ratio=True, scale_factor=1.0, type='Resize'),
dict(keep_ratio=True, scale_factor=1.25, type='Resize'),
dict(keep_ratio=True, scale_factor=1.5, type='Resize'),
dict(keep_ratio=True, scale_factor=1.75, type='Resize'),
],
[
dict(direction='horizontal', prob=0.0, type='RandomFlip'),
dict(direction='horizontal', prob=1.0, type='RandomFlip'),
],
[
dict(type='LoadAnnotations'),
],
[
dict(type='PackSegInputs'),
],
],
type='TestTimeAug'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
batch_size=1,
dataset=dict(
data_prefix=dict(
img_path='images/val', seg_map_path='annotations/val'),
data_root='./data/coco/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(keep_ratio=True, scale=(
2048,
1024,
), type='Resize'),
dict(type='LoadAnnotations'),
dict(type='PackSegInputs'),
],
type='ZBr10KDataset'),
num_workers=4,
persistent_workers=True,
sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
iou_metrics=[
'mIoU',
'mDice',
'mFscore',
], type='IoUMetric')
vis_backends = [
dict(type='LocalVisBackend'),
]
visualizer = dict(
name='visualizer',
type='SegLocalVisualizer',
vis_backends=[
dict(type='LocalVisBackend'),
])
work_dir = './work_dirs/ZBr10KDataset-KNet-albu-loss'

This is my repository version information:
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 2.0.1+cu118
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.7
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2+cu118
OpenCV: 4.8.1
MMEngine: 0.10.1
MMSegmentation: 1.2.1+cbf9af1

Copy link

shiomi326 commentedFeb 7, 2024

I have the same issue.

nahidnazifi87 pushed a commit to nahidnazifi87/mmsegmentation_playground that referenced this pull request

Apr 5, 2024

[Bugfix] Fix bug in cross entropy loss (open-mmlab#3457)

0df84b7

Thanks for your contribution and we appreciate it a lot. The followinginstructions would make your pull request more healthy and more easilyget feedback. If you do not understand some items, don't worry, justmake the pull request and seek help from maintainers.## MotivationFixesopen-mmlab#3412## ModificationWe just need to replace tensor creation using torch.stack() instead oftorch.tensor().## BC-breaking (Optional)Does the modification introduce changes that break thebackward-compatibility of the downstream repos?If so, please describe how it breaks the compatibility and how thedownstream projects should modify their code to keep compatibility withthis PR.## Use cases (Optional)If this PR introduces a new feature, it is better to list some use caseshere, and update the documentation.## Checklist1. Pre-commit or other linting tools are used to fix the potential lintissues.2. The modification is covered by complete unit tests. If not, pleaseadd more unit test to ensure the correctness.3. If the modification has potential influence on downstream projects,this PR should be tested with downstream projects, like MMDet orMMDet3D.4. The documentation has been modified accordingly, like docstring orexample tutorials.

Copy link

hadariru commentedApr 17, 2024

Is there progress on this?
I found out that the index is being 255 which is more than the index defined in class_weight

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix bug in cross entropy loss#3457

[Bugfix] Fix bug in cross entropy loss#3457

Uh oh!