Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-98686: Quicken everything#98687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
brandtbucher merged 12 commits intopython:mainfrombrandtbucher:quicken-everything
Nov 2, 2022

Conversation

brandtbucher
Copy link
Member

@brandtbucherbrandtbucher commentedOct 25, 2022
edited by bedevere-bot
Loading

@brandtbucherbrandtbucher added performancePerformance or resource usage interpreter-core(Objects, Python, Grammar, and Parser dirs) labelsOct 25, 2022
@brandtbucherbrandtbucher self-assigned thisOct 25, 2022
@brandtbucher
Copy link
MemberAuthor

1% faster:

Slower (10):- xml_etree_iterparse: 102 ms +- 2 ms -> 106 ms +- 2 ms: 1.04x slower- python_startup_no_site: 6.09 ms +- 0.01 ms -> 6.29 ms +- 0.01 ms: 1.03x slower- spectral_norm: 94.8 ms +- 3.9 ms -> 96.7 ms +- 2.0 ms: 1.02x slower- python_startup: 8.47 ms +- 0.01 ms -> 8.63 ms +- 0.01 ms: 1.02x slower- chameleon: 6.54 ms +- 0.06 ms -> 6.63 ms +- 0.07 ms: 1.01x slower- nqueens: 79.9 ms +- 1.1 ms -> 81.0 ms +- 0.9 ms: 1.01x slower- pickle: 10.3 us +- 0.1 us -> 10.4 us +- 0.1 us: 1.01x slower- json: 4.66 ms +- 0.10 ms -> 4.71 ms +- 0.10 ms: 1.01x slower- regex_dna: 204 ms +- 1 ms -> 205 ms +- 1 ms: 1.01x slower- regex_compile: 129 ms +- 1 ms -> 130 ms +- 0 ms: 1.00x slowerFaster (42):- richards: 45.2 ms +- 0.7 ms -> 42.2 ms +- 0.8 ms: 1.07x faster- regex_v8: 22.5 ms +- 0.1 ms -> 21.3 ms +- 0.1 ms: 1.06x faster- pidigits: 198 ms +- 0 ms -> 189 ms +- 0 ms: 1.05x faster- go: 141 ms +- 2 ms -> 135 ms +- 1 ms: 1.05x faster- scimark_sor: 108 ms +- 2 ms -> 104 ms +- 1 ms: 1.04x faster- mdp: 2.55 sec +- 0.01 sec -> 2.48 sec +- 0.01 sec: 1.03x faster- pyflate: 410 ms +- 3 ms -> 398 ms +- 3 ms: 1.03x faster- deepcopy_reduce: 2.95 us +- 0.05 us -> 2.86 us +- 0.04 us: 1.03x faster- scimark_lu: 111 ms +- 2 ms -> 108 ms +- 2 ms: 1.03x faster- pickle_list: 4.21 us +- 0.05 us -> 4.11 us +- 0.04 us: 1.03x faster- deltablue: 3.40 ms +- 0.04 ms -> 3.32 ms +- 0.03 ms: 1.02x faster- scimark_fft: 322 ms +- 2 ms -> 315 ms +- 4 ms: 1.02x faster- genshi_text: 21.6 ms +- 0.2 ms -> 21.2 ms +- 0.2 ms: 1.02x faster- coroutines: 25.0 ms +- 0.3 ms -> 24.5 ms +- 0.3 ms: 1.02x faster- xml_etree_process: 54.2 ms +- 0.8 ms -> 53.2 ms +- 0.6 ms: 1.02x faster- xml_etree_generate: 77.0 ms +- 1.2 ms -> 75.6 ms +- 0.7 ms: 1.02x faster- deepcopy: 334 us +- 6 us -> 328 us +- 3 us: 1.02x faster- django_template: 33.1 ms +- 0.3 ms -> 32.5 ms +- 0.3 ms: 1.02x faster- tornado_http: 94.2 ms +- 1.4 ms -> 92.6 ms +- 1.4 ms: 1.02x faster- pycparser: 1.10 sec +- 0.02 sec -> 1.08 sec +- 0.02 sec: 1.02x faster- thrift: 756 us +- 14 us -> 745 us +- 9 us: 1.02x faster- pickle_dict: 31.3 us +- 0.2 us -> 30.9 us +- 0.2 us: 1.01x faster- fannkuch: 381 ms +- 4 ms -> 376 ms +- 6 ms: 1.01x faster- sqlalchemy_declarative: 134 ms +- 3 ms -> 133 ms +- 3 ms: 1.01x faster- async_tree_cpu_io_mixed: 743 ms +- 20 ms -> 734 ms +- 16 ms: 1.01x faster- pickle_pure_python: 288 us +- 2 us -> 284 us +- 4 us: 1.01x faster- logging_simple: 5.85 us +- 0.09 us -> 5.79 us +- 0.09 us: 1.01x faster- logging_format: 6.49 us +- 0.07 us -> 6.42 us +- 0.08 us: 1.01x faster- sympy_str: 283 ms +- 3 ms -> 280 ms +- 3 ms: 1.01x faster- sqlglot_normalize: 105 ms +- 1 ms -> 104 ms +- 1 ms: 1.01x faster- 2to3: 251 ms +- 1 ms -> 248 ms +- 1 ms: 1.01x faster- mako: 9.76 ms +- 0.05 ms -> 9.67 ms +- 0.04 ms: 1.01x faster- pathlib: 17.7 ms +- 0.3 ms -> 17.5 ms +- 0.2 ms: 1.01x faster- gunicorn: 1.08 ms +- 0.01 ms -> 1.07 ms +- 0.01 ms: 1.01x faster- sqlite_synth: 2.62 us +- 0.06 us -> 2.60 us +- 0.04 us: 1.01x faster- sympy_expand: 457 ms +- 3 ms -> 454 ms +- 7 ms: 1.01x faster- regex_effbot: 3.52 ms +- 0.01 ms -> 3.50 ms +- 0.01 ms: 1.01x faster- dulwich_log: 61.2 ms +- 0.3 ms -> 60.9 ms +- 0.3 ms: 1.01x faster- sqlglot_optimize: 51.5 ms +- 0.4 ms -> 51.2 ms +- 0.3 ms: 1.01x faster- nbody: 95.8 ms +- 0.8 ms -> 95.4 ms +- 0.9 ms: 1.00x faster- generators: 72.7 ms +- 0.2 ms -> 72.4 ms +- 0.3 ms: 1.00x faster- sympy_integrate: 20.4 ms +- 0.1 ms -> 20.4 ms +- 0.1 ms: 1.00x fasterBenchmark hidden because not significant (33): aiohttp, async_tree_none, async_tree_io, async_tree_memoization, chaos, coverage, crypto_pyaes, deepcopy_memo, float, genshi_xml, hexiom, html5lib, json_dumps, json_loads, logging_silent, meteor_contest, mypy, pprint_safe_repr, pprint_pformat, pylint, raytrace, scimark_monte_carlo, scimark_sparse_mat_mult, sqlalchemy_imperative, sqlglot_parse, sqlglot_transpile, sympy_sum, telco, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parseGeometric mean: 1.01x faster

@markshannon
Copy link
Member

I notice that startup (both with and without-S) is slower. Presumably that is the overhead of calling_PyCode_Quicken() all the time.
Presumably all that overhead will go when we move quickening to the bytecode compiler.

Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The original design made a distinction between the initial adaptive counter (which was zero as there was already a delay for quickening), and the first value for the backoff counter (15).

This PR loses that distinction.

Maybe we don't need that distinction, we already have to hit 53 misses before falling back to the adaptive form? Maybe we should retry specialization almost immediately and rely on the exponential backoff in case of repeated failures?

How about setting the initial value to 3 (or 5 or 7) to avoid specializing too aggressively at startup.
We can reuse the same value, as it makes little difference once misses are taken into account (53 + 7 is near enough 53 + 15)

One final point (not a fault of this PR, but relevant) is that inadaptive_counter_start()
The counter is set to (2**backoff-1), there is no reason for this.
We could set the counter to3 or5 and the backoff to4.

@brandtbucher thoughts?

/* The initial counter value is31 == 2**ADAPTIVE_BACKOFF_START - 1 */
#define ADAPTIVE_BACKOFF_START5
/* The initial counter value is1 == 2**ADAPTIVE_BACKOFF_START - 1 */
#define ADAPTIVE_BACKOFF_START1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is for backoff, after failure. Do you want to change this, or just lower the initial counter value?

uint8_t adaptive_opcode = _PyOpcode_Adaptive[opcode];
if (adaptive_opcode) {
_Py_SET_OPCODE(instructions[i], adaptive_opcode);
// Make sure the adaptive counter is zero:
assert(instructions[i + 1] == 0);
instructions[i + 1] = adaptive_counter_start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think this is what you want.adaptive_counter_start() is supposed to be usedafter resetting to the adaptive form after repeated misses.

Maybe renameadaptive_counter_start toadaptive_initial_backoff_value() to avoid future confusion.?

What values did you try for the initial counter value?
1 seems low, as we might be spending too much effort specializing startup code.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment:I have made the requested changes; please review again.

@brandtbucher
Copy link
MemberAuthor

Recapping our discussion on Monday:

An initial value of1 seems to work best (1% faster) out of all the values I tried (0,1,3,7,15,31,63,127,255,511,1023,2047, and4095). With a value of1, we specialize thesecond time each adaptive instruction is run. This makes sense: running twice is probably a much better indicator of "hotness" than running once, but additional warmup delays only prevent specialization. Most types stabilize by the second loop iteration, too.

It's not strictly necessary to use the same value for the initial warmup delay and the initial backoff value, but1 seems to work well for both. We can always tweak/rename one of them later if we want.

markshannon reacted with thumbs up emoji

@brandtbucherbrandtbucher added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelNov 1, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@brandtbucher for commit17292a3 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelNov 1, 2022
@markshannonmarkshannon self-requested a reviewNovember 2, 2022 15:03
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@markshannonmarkshannonmarkshannon approved these changes

Assignees

@brandtbucherbrandtbucher

Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@brandtbucher@markshannon@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp