Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue28509

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:dict.update allocates too much
Type:resource usageStage:resolved
Components:Interpreter CoreVersions:Python 3.7, Python 3.6
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: methaneNosy List: Mark.Shannon, benjamin.peterson, methane, python-dev, rhettinger, serhiy.storchaka, xiang.zhang
Priority:normalKeywords:patch

Created on2016-10-22 19:13 byserhiy.storchaka, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
28509-smaller-update.patchmethane,2016-10-25 19:05review
28509-smaller-update2.patchmethane,2016-10-26 10:08review
Messages (11)
msg279215 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-10-22 19:13
The size of small key-sharing dictionary (PEP 412) can be larger than the size of normal dictionary.Python 3.6:>>> def dictsizes(k):...     d = {'a%d'%i: None for i in range(k)}...     class C:...         def __init__(self):...             self.__dict__.update(d)...     return sys.getsizeof(d), sys.getsizeof(C().__dict__)... >>> for i in range(20):...     print(i, dictsizes(i))... 0 (128, 60)1 (128, 60)2 (128, 60)3 (128, 60)4 (128, 60)5 (128, 60)6 (196, 196)7 (196, 196)8 (196, 344)9 (196, 344)10 (196, 344)11 (344, 344)12 (344, 344)13 (344, 344)14 (344, 344)15 (344, 344)16 (344, 628)17 (344, 628)18 (344, 628)19 (344, 628)Normal dictionaries of size 8-10 are more compact than corresponding key-sharing dictionaries.Python 3.5:>>> for i in range(20):...     print(i, dictsizes(i))... 0 (144, 48)1 (144, 48)2 (144, 48)3 (144, 48)4 (144, 240)5 (144, 240)6 (240, 240)7 (240, 240)8 (240, 432)9 (240, 432)10 (240, 432)11 (240, 432)12 (432, 432)13 (432, 432)14 (432, 432)15 (432, 432)16 (432, 816)17 (432, 816)18 (432, 816)19 (432, 816)In Python 3.5 normal dictionaries of size 4-5 and 8-11 are more compact than corresponding key-sharing dictionaries. And note that key-sharing dictionaries of size 0-3 consume more memory on 3.6 that on 3.5. I think we should use more thrifty strategy for allocating key-sharing dictionaries.
msg279234 -(view)Author: Inada Naoki (methane)*(Python committer)Date: 2016-10-23 02:25
> 0 (128, 60)> 1 (128, 60)> 2 (128, 60)> 3 (128, 60)> 4 (128, 60)> 5 (128, 60)Minimum dict keysize is 8, and it's capacity is 5.> 6 (196, 196)Dict is resized. And since __dict__.update() caused the resizing,both are normal (combined) dict.> 7 (196, 196)> 8 (196, 344)dict.update() cause faster increasing keysize.Adding to dict cause resizing when number of items reaches 2/3 of keysize.On the other hand, dict.update() targets 1/2 of keysize is filled.In this case, keysize is 16 and 16 * 2 // 3 = 10. Since 8 < 10, addingitem to keydoesn't increase it's size. Since 8 >= (16 / 2), dict.update() createsdict having keysize == 32.(note: My patch inhttp://bugs.python.org/issue28147 changes >= to >.So keysize == 16 whennumber of items == 8).But, while checking this, I found another bug in dict_merge.        /* Do one big resize at the start, rather than         * incrementally resizing as we insert new items.  Expect         * that there will be no (or few) overlapping keys.         */        if (mp->ma_keys->dk_usable * 3 < other->ma_used * 2)            if (dictresize(mp, (mp->ma_used + other->ma_used)*2) != 0)               return -1;dk_usable means "how many new items can be inserted without resizing".So this if statement should be:        if (mp->ma_keys->dk_usable < other->ma_used)
msg279235 -(view)Author: Inada Naoki (methane)*(Python committer)Date: 2016-10-23 02:41
And I feel current target size of dict_merge is bit larger.When inserting new item:* ma_used = dk_size*2 / 3 when right before increasing keys* ma_used = dk_size    / 3 when right after increasing keysOn the other hand, current dict_merge creates:* ma_used = dk_size / 2 when all keys in two dict is distinct* ma_used = dk_size / 4 when all keys in two dict is sameIf changing it to dictresize(mp, (mp->ma_used + other->ma_used)*3/2),* ma_used = dk_size*2 / 3 when all keys in two dict is distinct* ma_used = dk_size    / 3 when all keys in two dict is sameI think this is more consistent.
msg279240 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-10-23 07:15
> Dict is resized. And since __dict__.update() caused the resizing,both are normal (combined) dict.Ah, my bad, I missed this point. The original issue now disappears. Thanks Naoki.But dict.update (and maybe inserting) can use more economical allocation strategy.
msg279447 -(view)Author: Inada Naoki (methane)*(Python committer)Date: 2016-10-25 19:05
script:import sysfor i in range(25):    a = {}    b = {j: j for j in range(i)}    a.update(b)    print(i, sys.getsizeof(a))before:0 2561 2562 2563 2564 2565 2566 3847 3848 6649 66410 66411 66412 66413 66414 66415 66416 120017 120018 120019 120020 120021 120022 120023 120024 1200patched:0 2561 2562 2563 2564 2565 2566 3847 3848 3849 38410 38411 66412 66413 66414 66415 66416 66417 66418 66419 66420 66421 66422 120023 120024 1200
msg279458 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-10-25 21:10
LGTM.Here is a script that produces more compact output. The first column is a size after which the dict is resized.import sysp = 10000b = {}for i in range(10000):    a = {}    b[i] = i    a.update(b)    s = sys.getsizeof(a)    if s > p:        print(i, p)    p = sUnpatched:5 1287 19615 34431 62863 1208127 2612255 5176511 102921023 205362047 410124095 819768191 163892Patched:5 12810 19621 34442 62885 1208170 2612341 5176682 102921365 205362730 410125461 81976But I suggest instead the condition    mp->ma_keys->dk_usable < other->ma_useduse the condition    mp->ma_used + mp->ma_keys->dk_usable < other->ma_usedIf there are overlapping keys this can allow to avoid resizing. In worst keys one resizing will be happened in dictinsert().Yes one estimation is:    USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_usedDict size is at least doubled after resizing. No need to make preliminary resizing if the final result is the same. The benefit is that if there are many overlapping keys the final size can be ESTIMATE_SIZE(other->ma_used) instead of ESTIMATE_SIZE(mp->ma_used + other->ma_used).All this conditions can be combined (because they have different computational cost):    mp->ma_keys->dk_usable < other->ma_used &&    mp->ma_used + mp->ma_keys->dk_usable < other->ma_used &&    USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used
msg279480 -(view)Author: Inada Naoki (methane)*(Python committer)Date: 2016-10-26 01:47
I feel that accept one resize while merging is better.How about this?        /* Do one big resize at the start, rather than incrementally         * resizing.  At most one resize happen while merging.         */        if (USABLE_FRACTION(mp->ma_keys->dk_size) < other->ma_used) {            assert(mp->ma_used < other->ma_used);            if (dictresize(mp, ESTIMATE_SIZE(other->ma_used))) {               return -1;            }        }
msg279488 -(view)Author: Raymond Hettinger (rhettinger)*(Python committer)Date: 2016-10-26 04:17
FWIW, I prefer the existing code be left as is.   We've already greatly compacted the dictionaries.  There is no need to be ultra-aggressive in shaving off every little corner.  There is some advantage to having the dict be more sparse (fewer collisions, quicker insertion time for the update, quicker lookups etc).
msg279497 -(view)Author: Inada Naoki (methane)*(Python committer)Date: 2016-10-26 10:08
OK, I won't change it to allow additional resize while merging, after pre-resize.But current code has two problem:* Pre-resize happen even when resizing is not necessary. (ex. two dict has same keys).* Pre-resize allocates too much memory which doesn't make sense.Next patch (28509-smaller-update2.patch) seems better because:* When pre-resize doesn't happen, at most one resize while merging.* When pre-resize happens, no resize happens while merging.* Doesn't make surprisingly sparse dict when two dicts have same keys.PoC code:import sysb = {}for i in range(16):    b[i] = i    a = b.copy()    a.update(b)  # No need to resize    print(i, sys.getsizeof(a), sys.getsizeof(b))Current:0 256 2561 256 2562 256 2563 664 2564 664 2565 384 384  # !!!6 664 3847 664 3848 664 3849 664 38410 664 66411 664 66412 1200 66413 1200 66414 1200 66415 1200 664With second patch:0 256 2561 256 2562 256 2563 256 2564 256 2565 384 3846 384 3847 384 3848 384 3849 384 38410 664 66411 664 66412 664 66413 664 66414 664 66415 664 664
msg279528 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-10-27 07:16
28509-smaller-update2.patch LGTM.Your idea inmsg279480 looks worth, but needs benchmarking different corner cases to check that it doesn't cause to a regression. I think we will have enough time for this at 3.7 developing stage.
msg279534 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2016-10-27 10:30
New changeset8c2615decd2e by INADA Naoki in branch '3.6':Issue#28509: dict.update() no longer allocate unnecessary large memoryhttps://hg.python.org/cpython/rev/8c2615decd2eNew changesetdeb3e5857d8c by INADA Naoki in branch 'default':Issue#28509: dict.update() no longer allocate unnecessary large memoryhttps://hg.python.org/cpython/rev/deb3e5857d8c
History
DateUserActionArgs
2022-04-11 14:58:38adminsetgithub: 72695
2017-10-23 11:18:07serhiy.storchakasetpull_requests: -pull_request839
2017-03-31 16:36:08dstufftsetpull_requests: +pull_request839
2016-10-27 10:31:04methanesetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-10-27 10:30:21python-devsetnosy: +python-dev
messages: +msg279534
2016-10-27 07:16:06serhiy.storchakasetassignee:methane
messages: +msg279528
stage: patch review -> commit review
2016-10-26 10:08:44methanesetfiles: +28509-smaller-update2.patch

messages: +msg279497
2016-10-26 04:17:40rhettingersetmessages: +msg279488
2016-10-26 01:47:13methanesetmessages: +msg279480
2016-10-25 21:10:17serhiy.storchakasetmessages: +msg279458
2016-10-25 19:17:57serhiy.storchakasetstage: needs patch -> patch review
2016-10-25 19:05:25methanesetfiles: +28509-smaller-update.patch
keywords: +patch
messages: +msg279447
2016-10-23 07:15:54serhiy.storchakasettitle: Key-sharing dictionaries can inrease the memory consumption -> dict.update allocates too much
stage: needs patch
messages: +msg279240
versions: + Python 3.6, Python 3.7
2016-10-23 05:17:40xiang.zhangsetnosy: +xiang.zhang
2016-10-23 02:41:31methanesetmessages: +msg279235
2016-10-23 02:25:55methanesetmessages: +msg279234
2016-10-22 19:13:37serhiy.storchakacreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp