
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2016-10-22 19:13 byserhiy.storchaka, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| 28509-smaller-update.patch | methane,2016-10-25 19:05 | review | ||
| 28509-smaller-update2.patch | methane,2016-10-26 10:08 | review | ||
| Messages (11) | |||
|---|---|---|---|
| msg279215 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-10-22 19:13 | |
The size of small key-sharing dictionary (PEP 412) can be larger than the size of normal dictionary.Python 3.6:>>> def dictsizes(k):... d = {'a%d'%i: None for i in range(k)}... class C:... def __init__(self):... self.__dict__.update(d)... return sys.getsizeof(d), sys.getsizeof(C().__dict__)... >>> for i in range(20):... print(i, dictsizes(i))... 0 (128, 60)1 (128, 60)2 (128, 60)3 (128, 60)4 (128, 60)5 (128, 60)6 (196, 196)7 (196, 196)8 (196, 344)9 (196, 344)10 (196, 344)11 (344, 344)12 (344, 344)13 (344, 344)14 (344, 344)15 (344, 344)16 (344, 628)17 (344, 628)18 (344, 628)19 (344, 628)Normal dictionaries of size 8-10 are more compact than corresponding key-sharing dictionaries.Python 3.5:>>> for i in range(20):... print(i, dictsizes(i))... 0 (144, 48)1 (144, 48)2 (144, 48)3 (144, 48)4 (144, 240)5 (144, 240)6 (240, 240)7 (240, 240)8 (240, 432)9 (240, 432)10 (240, 432)11 (240, 432)12 (432, 432)13 (432, 432)14 (432, 432)15 (432, 432)16 (432, 816)17 (432, 816)18 (432, 816)19 (432, 816)In Python 3.5 normal dictionaries of size 4-5 and 8-11 are more compact than corresponding key-sharing dictionaries. And note that key-sharing dictionaries of size 0-3 consume more memory on 3.6 that on 3.5. I think we should use more thrifty strategy for allocating key-sharing dictionaries. | |||
| msg279234 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2016-10-23 02:25 | |
> 0 (128, 60)> 1 (128, 60)> 2 (128, 60)> 3 (128, 60)> 4 (128, 60)> 5 (128, 60)Minimum dict keysize is 8, and it's capacity is 5.> 6 (196, 196)Dict is resized. And since __dict__.update() caused the resizing,both are normal (combined) dict.> 7 (196, 196)> 8 (196, 344)dict.update() cause faster increasing keysize.Adding to dict cause resizing when number of items reaches 2/3 of keysize.On the other hand, dict.update() targets 1/2 of keysize is filled.In this case, keysize is 16 and 16 * 2 // 3 = 10. Since 8 < 10, addingitem to keydoesn't increase it's size. Since 8 >= (16 / 2), dict.update() createsdict having keysize == 32.(note: My patch inhttp://bugs.python.org/issue28147 changes >= to >.So keysize == 16 whennumber of items == 8).But, while checking this, I found another bug in dict_merge. /* Do one big resize at the start, rather than * incrementally resizing as we insert new items. Expect * that there will be no (or few) overlapping keys. */ if (mp->ma_keys->dk_usable * 3 < other->ma_used * 2) if (dictresize(mp, (mp->ma_used + other->ma_used)*2) != 0) return -1;dk_usable means "how many new items can be inserted without resizing".So this if statement should be: if (mp->ma_keys->dk_usable < other->ma_used) | |||
| msg279235 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2016-10-23 02:41 | |
And I feel current target size of dict_merge is bit larger.When inserting new item:* ma_used = dk_size*2 / 3 when right before increasing keys* ma_used = dk_size / 3 when right after increasing keysOn the other hand, current dict_merge creates:* ma_used = dk_size / 2 when all keys in two dict is distinct* ma_used = dk_size / 4 when all keys in two dict is sameIf changing it to dictresize(mp, (mp->ma_used + other->ma_used)*3/2),* ma_used = dk_size*2 / 3 when all keys in two dict is distinct* ma_used = dk_size / 3 when all keys in two dict is sameI think this is more consistent. | |||
| msg279240 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-10-23 07:15 | |
> Dict is resized. And since __dict__.update() caused the resizing,both are normal (combined) dict.Ah, my bad, I missed this point. The original issue now disappears. Thanks Naoki.But dict.update (and maybe inserting) can use more economical allocation strategy. | |||
| msg279447 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2016-10-25 19:05 | |
script:import sysfor i in range(25): a = {} b = {j: j for j in range(i)} a.update(b) print(i, sys.getsizeof(a))before:0 2561 2562 2563 2564 2565 2566 3847 3848 6649 66410 66411 66412 66413 66414 66415 66416 120017 120018 120019 120020 120021 120022 120023 120024 1200patched:0 2561 2562 2563 2564 2565 2566 3847 3848 3849 38410 38411 66412 66413 66414 66415 66416 66417 66418 66419 66420 66421 66422 120023 120024 1200 | |||
| msg279458 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-10-25 21:10 | |
LGTM.Here is a script that produces more compact output. The first column is a size after which the dict is resized.import sysp = 10000b = {}for i in range(10000): a = {} b[i] = i a.update(b) s = sys.getsizeof(a) if s > p: print(i, p) p = sUnpatched:5 1287 19615 34431 62863 1208127 2612255 5176511 102921023 205362047 410124095 819768191 163892Patched:5 12810 19621 34442 62885 1208170 2612341 5176682 102921365 205362730 410125461 81976But I suggest instead the condition mp->ma_keys->dk_usable < other->ma_useduse the condition mp->ma_used + mp->ma_keys->dk_usable < other->ma_usedIf there are overlapping keys this can allow to avoid resizing. In worst keys one resizing will be happened in dictinsert().Yes one estimation is: USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_usedDict size is at least doubled after resizing. No need to make preliminary resizing if the final result is the same. The benefit is that if there are many overlapping keys the final size can be ESTIMATE_SIZE(other->ma_used) instead of ESTIMATE_SIZE(mp->ma_used + other->ma_used).All this conditions can be combined (because they have different computational cost): mp->ma_keys->dk_usable < other->ma_used && mp->ma_used + mp->ma_keys->dk_usable < other->ma_used && USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used | |||
| msg279480 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2016-10-26 01:47 | |
I feel that accept one resize while merging is better.How about this? /* Do one big resize at the start, rather than incrementally * resizing. At most one resize happen while merging. */ if (USABLE_FRACTION(mp->ma_keys->dk_size) < other->ma_used) { assert(mp->ma_used < other->ma_used); if (dictresize(mp, ESTIMATE_SIZE(other->ma_used))) { return -1; } } | |||
| msg279488 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2016-10-26 04:17 | |
FWIW, I prefer the existing code be left as is. We've already greatly compacted the dictionaries. There is no need to be ultra-aggressive in shaving off every little corner. There is some advantage to having the dict be more sparse (fewer collisions, quicker insertion time for the update, quicker lookups etc). | |||
| msg279497 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2016-10-26 10:08 | |
OK, I won't change it to allow additional resize while merging, after pre-resize.But current code has two problem:* Pre-resize happen even when resizing is not necessary. (ex. two dict has same keys).* Pre-resize allocates too much memory which doesn't make sense.Next patch (28509-smaller-update2.patch) seems better because:* When pre-resize doesn't happen, at most one resize while merging.* When pre-resize happens, no resize happens while merging.* Doesn't make surprisingly sparse dict when two dicts have same keys.PoC code:import sysb = {}for i in range(16): b[i] = i a = b.copy() a.update(b) # No need to resize print(i, sys.getsizeof(a), sys.getsizeof(b))Current:0 256 2561 256 2562 256 2563 664 2564 664 2565 384 384 # !!!6 664 3847 664 3848 664 3849 664 38410 664 66411 664 66412 1200 66413 1200 66414 1200 66415 1200 664With second patch:0 256 2561 256 2562 256 2563 256 2564 256 2565 384 3846 384 3847 384 3848 384 3849 384 38410 664 66411 664 66412 664 66413 664 66414 664 66415 664 664 | |||
| msg279528 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-10-27 07:16 | |
28509-smaller-update2.patch LGTM.Your idea inmsg279480 looks worth, but needs benchmarking different corner cases to check that it doesn't cause to a regression. I think we will have enough time for this at 3.7 developing stage. | |||
| msg279534 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-10-27 10:30 | |
New changeset8c2615decd2e by INADA Naoki in branch '3.6':Issue#28509: dict.update() no longer allocate unnecessary large memoryhttps://hg.python.org/cpython/rev/8c2615decd2eNew changesetdeb3e5857d8c by INADA Naoki in branch 'default':Issue#28509: dict.update() no longer allocate unnecessary large memoryhttps://hg.python.org/cpython/rev/deb3e5857d8c | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:38 | admin | set | github: 72695 |
| 2017-10-23 11:18:07 | serhiy.storchaka | set | pull_requests: -pull_request839 |
| 2017-03-31 16:36:08 | dstufft | set | pull_requests: +pull_request839 |
| 2016-10-27 10:31:04 | methane | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
| 2016-10-27 10:30:21 | python-dev | set | nosy: +python-dev messages: +msg279534 |
| 2016-10-27 07:16:06 | serhiy.storchaka | set | assignee:methane messages: +msg279528 stage: patch review -> commit review |
| 2016-10-26 10:08:44 | methane | set | files: +28509-smaller-update2.patch messages: +msg279497 |
| 2016-10-26 04:17:40 | rhettinger | set | messages: +msg279488 |
| 2016-10-26 01:47:13 | methane | set | messages: +msg279480 |
| 2016-10-25 21:10:17 | serhiy.storchaka | set | messages: +msg279458 |
| 2016-10-25 19:17:57 | serhiy.storchaka | set | stage: needs patch -> patch review |
| 2016-10-25 19:05:25 | methane | set | files: +28509-smaller-update.patch keywords: +patch messages: +msg279447 |
| 2016-10-23 07:15:54 | serhiy.storchaka | set | title: Key-sharing dictionaries can inrease the memory consumption -> dict.update allocates too much stage: needs patch messages: +msg279240 versions: + Python 3.6, Python 3.7 |
| 2016-10-23 05:17:40 | xiang.zhang | set | nosy: +xiang.zhang |
| 2016-10-23 02:41:31 | methane | set | messages: +msg279235 |
| 2016-10-23 02:25:55 | methane | set | messages: +msg279234 |
| 2016-10-22 19:13:37 | serhiy.storchaka | create | |