- Notifications
You must be signed in to change notification settings - Fork264
[GSoC] Add block quantized models#270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
- constant weight category supported- add data type saturation- handled the case in which all the elements within a block are the samebenchmark script modified to support block quantized modelsblock quantized some models
…dpose blocked model fix, removed blocked CRNN EN,
vpisarev commentedOct 25, 2024
@fengyuentau, when it's expected to be merged? I believe, the patch is very useful. |
fengyuentau commentedOct 25, 2024
We can, but since block quantization in opencv does not have any acceleration in terms of inference speed, merging this patch only increases the size of zoo and we are not gaining any benifit from it for now. Merging is postponed until block quantization works practically better in opencv. |
vpisarev commentedOct 25, 2024
I don't quite get it. We don't see any performancedegradation with block-wise quantized modelsand the models get smaller, not bigger. That is, we get smaller size + same speed. But if you are still in doubt, let's discuss it next week at our meeting, pls, include it into the agenda |
fengyuentau commentedOct 25, 2024
Okay, lets do it in the next meeting. |
DaniAffCH commentedOct 25, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Just to add my two cents: despite not achieving an inference speed improvement, this PR significantly reduces the network size while retaining the original accuracy. Further optimization could be done in the future, also including adapting block-wise quantization for the new inference engine. |
fengyuentau commentedOct 30, 2024
@DaniAffCH Thank you for all the effort on this pull request. We decided to merge this pull request with the following changes:
|
fengyuentau commentedOct 30, 2024
Also need to provide each command that you use to generate the block-quantized models. Just want to ensure reproducibility. |
fengyuentau commentedNov 6, 2024
@DaniAffCH Do you plan to push commits to finalize this PR? If no, I will merge this one first then do it in the subsequent PRs. |
DaniAffCH commentedNov 6, 2024
Yes I'll definitely address your comments. I've been busy with other projects, but now I can finalize this PR. |
DaniAffCH commentedNov 6, 2024
I've just updated the file suffixes and the related README files.
Regarding text detection, I couldn't find a suitable evaluation script ineval, so I don't know how to test it. Regarding text recognition, I decided not to include the English version because of a severe drop in accuracy. Such a drop in accuracy doesn't occur in the Chinese version, so I decided to include only |
DaniAffCH commentedNov 6, 2024
All the models have been block-quantized usingthe block quantization script with the following command: python block_quantize.py --input_model INPUT_PATH --block_size 64 |
fengyuentau left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Generally looks good to me. It is suggested to finish all renamings from bint8 to int8bq.
Also add the following content in the sectionBlockwise quantization usage intools/quantize/README.md.
Block-quantized models under each model directory are generated with `--block_size=64`Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
DaniAffCH commentedNov 6, 2024
Done! |
fengyuentau left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Great! Thank you! 👍
Uh oh!
There was an error while loading.Please reload this page.
This PR introduces block-quantized versions for most of the opencv_zoo models.
All the models have been quantized using a block size of 64, as this configuration demonstrated good performance empirically.
Additionally, the block quantization tool has been enhanced to handle more cases:
constantin ONNX (previously, it only supportedinitializers).Finally, the benchmark tool has been modified to support block quantized models.
The following table contains block quantization statistics of the quantized models
The tables below summarize the metrics change between the original fp32 model, block quantized and int8 quantized version:
The following models haven't been quantized: