python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork34.1k
Star71.6k

`tokenize.generate_tokens()` performance regression in 3.12 #119118

New issue

Closed

tokenize.generate_tokens() performance regression in 3.12#119118

Assignees

Labels

performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

Description

devdanzin

opened

on May 17, 2024

Bug report

Bug description:

There seems to be a significant performance regression intokenize.generate_tokens() between 3.11 and 3.12 when tokenizing a (very) large dict on a single line. I searched the existing issues but couldn't find anything about this.

To reproduce, rename the filelargedict.py.txt tolargedict.py in the same directory as the script below, then run the script. That file comes fromcoveragepy/coveragepy#1785.

importio,time,sys,tokenizeimportlargedicttext=largedict.dreadline=io.StringIO(text).readlineglob_start=start=time.time()print(f"{sys.implementation.name}{sys.platform}{sys.version}")fori, (ttype,ttext, (sline,scol), (_,ecol),_)inenumerate(tokenize.generate_tokens(readline)):ifi%500==0:print(i,ttype,ttext,sline,scol,time.time()-start)start=time.time()ifi%5000==0:print(time.time()-glob_start)print(f"Time taken:{time.time()-glob_start}")

For Python 3.12, this results in:

cpython linux 3.12.3 (main, May 17 2024, 07:19:22) [GCC 11.4.0]0 1 a_large_dict_literal 1 0 0.046418666839599610.046633005142211914500 3 ':tombol_a_(golongan_darah):' 1 2675 9.6897459030151371000 3 ':flagge_anguilla:' 1 5261 9.7670536041259771500 3 ':флаг_Армения:' 1 7879 9.258271932601929[...]

For Python 3.11, this results in:

cpython linux 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]0 1 a_large_dict_literal 1 0 0.0136373043060302730.013663768768310547500 3 ':tombol_a_(golongan_darah):' 1 2675 0.0029394626617431641000 3 ':flagge_anguilla:' 1 5261 0.00287151336669921881500 3 ':флаг_Армения:' 1 7879 0.002806425094604492[...]352500 3 'pt' 1 2589077 0.003370046615600586Time taken: 2.1244866847991943

That is, each 500 tokens in Python 3.12 is taking over 9 seconds to process, while the 352500 tokens in Python 3.11 is taking a bit over 2 seconds to process.

I can reproduce this on Linux (WSL) and Windows. Also seems to affect 3.13.

CPython versions tested on:

3.9, 3.10, 3.11, 3.12

Operating systems tested on:

Linux, Windows

Linked PRs

Metadata

Assignees

Labels

performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`tokenize.generate_tokens()` performance regression in 3.12 #119118

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Movatterモバイル変換

Uh oh!

tokenize.generate_tokens() performance regression in 3.12 #119118

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`tokenize.generate_tokens()` performance regression in 3.12 #119118