Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Improve performance - skip manifest read if no components/targets requested#2917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
JoshMcguigan wants to merge1 commit intorust-lang:master
base:master
Choose a base branch
Loading
fromJoshMcguigan:skip-manifest-read

Conversation

JoshMcguigan
Copy link

@JoshMcguiganJoshMcguigan commentedDec 4, 2021
edited
Loading

I was looking into why rustup had a performance penalty over calling cargo directly in simple cases, and found most of the time was spent parsing themultirust-channel-manifest.toml file. Looking more closely, it seemed parsing that file was not really required in many cases. This PR tweaks the ordering of the code to skip parsing this file when it isn't needed.

I do not intend to make any behavioral change with this, other than potentially if some code paths handle an error while reading this file (and now they won't see that error in cases where we skip reading the file entirely). I believe this is the reason theheal_damaged_toolchain test is now failing. I need to look into this more closely (this PR is marked as draft until I figure this out), but I'd appreciate any insight into whether that test was intended to cover this behavior.

The performance changes were tested in a brand new hello world cargo project (cargo new hello), after building rustup in release mode and installing it into arustup/home directory as described in the contributing guidelines. The before tests were done on the current master branch, because it seems to already be much faster than the most recent official release.

Before this change a repeated call tocargo build (through the rustup proxy) would take ~90ms. After this change it takes ~35ms. Running cargo directly takes ~22ms. So this changereduces the rustup overhead in this case from 68ms to 13ms (a reduction of 80%).

michalhosna reacted with thumbs up emojidhztao68 reacted with eyes emoji
@JoshMcguigan
Copy link
Author

It looks like theheal_damaged_toolchain test is failing because it relies onrustup show active-toolchain having previously read/parsed the manifest, which is specifically what this PR removes (since reading/parsing the manifest seems to not be strictly required for that operation, and it takes meaningful time).

Is my understanding correct here? And is it reasonable to a command likerustup show active-toolchain to NOT heal the damaged toolchain manifest file (since it doesn't even need to look at that file)? If so, is there a command which would be more appropriate to run as part of this test to demonstrate the healing?

@kinnison
Copy link
Contributor

Personally I think the request to show the active toolchain (which implies checking for components in therust-toolchain.toml being available) should be able to succeed at healing things.

I remain convinced that the correct way to ameliorate the incredible slowness of parsing the TOML is to, instead, keep a cached faster-to-parse copy alongside it, e.g. BINCODE.

@dralley
Copy link
Contributor

dralley commentedJan 12, 2022
edited
Loading

Has any profiling been done on the TOML parsing itself? Just glancing through the file, the parsing code doesn't seem especially efficient, it could probably benefit from a little focused attention.

edit: I just created one and the main thing that stands is tons and tons of individual allocations and drops

@JoshMcguigan
Copy link
Author

JoshMcguigan commentedJan 12, 2022
edited
Loading

I created a small one.

use std::fs::read_to_string;fnmain(){let s =read_to_string("/home/josh/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/multirust-channel-manifest.toml").unwrap();for _in0..10{        s.parse::<toml::Value>().unwrap();}}

This runs in ~450ms on my machine. I found I can improve this by ~100ms by bypassing the\r\n windows style newline handling insrc/tokens.rs, which I actually wonder if that is where you are seeing all the allocations. However I'm not sure how we could get this improvement while also handling windows style newlines (short of replacing all\r\n with\n before parsing, but there is a memory use cost there).

edit: I just created one and the main thing that stands is tons and tons of individual allocations and drops

I'd like to poke around at this. How did you gather this data?

@dralley
Copy link
Contributor

dralley commentedJan 12, 2022
edited
Loading

I'd like to poke around at this. How did you gather this data?

If you pass the--reverse flag it groups together all of the tiny calls at the bottom of the call stack instead of the ones at the top, if you do it seems that about 25% of the total time is spent on allocs, frees and moves. Another 8% is spent computing hashes.

@JoshMcguigan
Copy link
Author

I've been using frida tracing to count malloc calls, and its about 100,000 allocations to parse the manifest file a single time (so the example above would be 10x that). I replacedVec in one place withTinyVec and cut the allocations by ~10%, but the runtime didn't really change.

I'm probably not going to continue down this path right now, but I'd be very curious to see if anyone else comes up with anything fruitful here.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@JoshMcguigan@kinnison@dralley

[8]ページ先頭

©2009-2025 Movatter.jp