
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2017-03-30 18:09 bymethane, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 943 | merged | methane,2017-04-01 06:40 | |
| PR 945 | merged | methane,2017-04-01 08:32 | |
| Messages (12) | |||
|---|---|---|---|
| msg290868 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-03-30 18:09 | |
(original thread ishttps://mail.python.org/pipermail/python-list/2017-March/720391.html)https://github.com/python/cpython/commit/4897300276d870f99459c82b937f0ac22450f0b6this commit doubles sizeof set object created by set_merge().It is used by constructor of set and frozenset.$ /usr/bin/python3Python 3.5.2+ (default, Sep 22 2016, 12:18:14)[GCC 6.2.0 20160927] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import sys>>> s = set(range(10))>>> sys.getsizeof(frozenset(s))736$ python3Python 3.6.0 (default, Dec 30 2016, 20:49:54)[GCC 6.2.0 20161005] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import sys>>> s = set(range(10))>>> sys.getsizeof(frozenset(s))1248 | |||
| msg290888 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-03-31 03:16 | |
https://gist.github.com/methane/8faf12621cdb2166019bbcee65987e99This patch fixes the regression.But I think it is still larger than idiomatic. See this code:code: minused *= 2; /* Find the smallest table size > minused. */ /* XXX speed-up with intrinsics */ for (newsize = PySet_MINSIZE; newsize <= minused && newsize > 0; newsize <<= 1) ;When original minused is X, newsize will be about 2X ~ 4X.For set.add(), preserving extra space for further add() make sense.But for set_merge(), intention is avoiding repeated resize while merging.There may be not "further add()", especially for `frozenset(s)`.So 30% ~ 60% seems better than 25% ~ 50%.How do you think, Raymond? | |||
| msg290906 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2017-03-31 11:25 | |
I'll look at this over the next week or two. I don't really like the proposed patch at all but will carefully think through the speed/space trade-offs. | |||
| msg290908 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2017-03-31 12:18 | |
Hmm, I wonder why I'm not seeing the same sizes you are seeing.$ cat setsize.pyfrom sys import getsizeofprint( [getsizeof(frozenset(range(n))) for n in range(20)] )$ python3.4 setsize.py[224, 224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]$ python3.5 setsize.py[224, 224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]$ python3.6 setsize.py[224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]$ python3.4 --versionPython 3.4.4$ python3.5 --versionPython 3.5.3$ python3.6 --versionPython 3.6.1 | |||
| msg290909 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-03-31 12:22 | |
See set_update_internal().https://github.com/python/cpython/blob/master/Objects/setobject.c#L969-L1016This happens only when iterable is set or dict.>>> import sys>>> sys.getsizeof(set(range(10)))736>>> sys.getsizeof(set(set(range(10))))1248>>> sys.getsizeof(set(dict.fromkeys(range(10))))1248 | |||
| msg290911 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2017-03-31 12:50 | |
$ ./python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )'3.5: [(0, 112), (6, 368), (22, 1136), (86, 4208), (342, 16496), (1366, 65648), (5462, 262256)]3.6: [(0, 112), (5, 368), (21, 1136), (85, 4208), (341, 16496), (1365, 65648), (5461, 262256)]3.7: [(0, 112), (5, 368), (19, 1136), (77, 4208), (307, 16496), (1229, 65648), (4915, 262256)]$ ./python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(set(range(n)))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )'3.5: [(0, 112), (6, 240), (8, 368), (16, 624), (32, 1136), (64, 2160), (128, 4208), (256, 8304), (512, 16496), (1024, 32880), (2048, 65648), (4096, 131184)]3.6: [(0, 112), (5, 368), (8, 624), (16, 1136), (32, 2160), (64, 4208), (128, 8304), (256, 16496), (512, 32880), (1024, 65648), (2048, 131184), (4096, 262256)]3.7: [(0, 112), (5, 368), (8, 624), (16, 1136), (32, 2160), (64, 4208), (128, 8304), (256, 16496), (512, 32880), (1024, 65648), (2048, 131184), (4096, 262256)]frozenset(range(n)) is slightly larger in 3.7 than in 3.6. It is 4 times larger for about 10% of sizes.frozenset(set(range(n))) is 2 times larger in 3.6 than in 3.5 for all sizes >= 16. | |||
| msg290912 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-03-31 12:52 | |
> frozenset(range(n)) is slightly larger in 3.7 than in 3.6. It is 4 times larger for about 10% of sizes.This is intensional:https://github.com/python/cpython/commit/5cd87a8d61246b0a6233bfb8503d4718b693cef0load factor is reduced from 66% to 60%. (-10%) | |||
| msg290915 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2017-03-31 15:24 | |
I think the best thing to do is to undo the refactoring inhttps://github.com/python/cpython/commit/4897300276d870f99459c82b937f0ac22450f0b6 . It is was intended be neutral but did affect set_update_internal() for small sets. | |||
| msg290928 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-03-31 17:19 | |
I agree. Before thinking about rebalance between size and speed,resolving regression is important. | |||
| msg290934 -(view) | Author: Raymond Hettinger (rhettinger)*![]() | Date: 2017-03-31 18:01 | |
Do you want to prepare a PR for me? I not yet set-up with the ways of Github. Please limit the PR to just unwinding the refactoring in the simplest possible way..If in the future you want to chat about speed/space trade-offs, we can do that offline. I've spent years thinking about this, interacting with users, discussing with other devs, speaking on the topic, and working through use cases for sets. The original reasons for the choices made largely are still true today. I would be happy to walk you through the history (the tracker isn't a good place to do this, IRC would serve us better). | |||
| msg290970 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-04-01 08:20 | |
New changesete82cf8675bacd7a03de508ed11865fc2701dcef5 by INADA Naoki in branch 'master':bpo-29949: Fix set memory usage regression (GH-943)https://github.com/python/cpython/commit/e82cf8675bacd7a03de508ed11865fc2701dcef5 | |||
| msg290984 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-04-01 14:29 | |
New changesetefde51ad54c58353f25ff80c8d30dbee82ef33a3 by INADA Naoki in branch '3.6':bpo-29949: Fix set memory usage regression (GH-945)https://github.com/python/cpython/commit/efde51ad54c58353f25ff80c8d30dbee82ef33a3 | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:44 | admin | set | github: 74135 |
| 2017-04-01 14:29:33 | methane | set | messages: +msg290984 |
| 2017-04-01 11:53:48 | methane | set | status: open -> closed resolution: fixed stage: resolved |
| 2017-04-01 08:32:42 | methane | set | pull_requests: +pull_request1127 |
| 2017-04-01 08:20:27 | methane | set | messages: +msg290970 |
| 2017-04-01 06:40:30 | methane | set | pull_requests: +pull_request1125 |
| 2017-03-31 23:12:12 | cameron | set | nosy: +cameron |
| 2017-03-31 18:01:38 | rhettinger | set | messages: +msg290934 |
| 2017-03-31 17:19:50 | methane | set | messages: +msg290928 |
| 2017-03-31 15:24:31 | rhettinger | set | messages: +msg290915 |
| 2017-03-31 12:52:51 | methane | set | messages: +msg290912 |
| 2017-03-31 12:50:37 | serhiy.storchaka | set | messages: +msg290911 |
| 2017-03-31 12:22:41 | methane | set | messages: +msg290909 |
| 2017-03-31 12:18:05 | rhettinger | set | messages: +msg290908 |
| 2017-03-31 11:25:48 | rhettinger | set | messages: +msg290906 |
| 2017-03-31 11:11:54 | rhettinger | set | assignee:rhettinger |
| 2017-03-31 03:16:56 | methane | set | messages: +msg290888 |
| 2017-03-30 18:25:51 | jgosmann | set | nosy: +jgosmann |
| 2017-03-30 18:15:04 | serhiy.storchaka | set | nosy: +serhiy.storchaka |
| 2017-03-30 18:09:13 | methane | create | |