Movatterモバイル変換


[0]ホーム

URL:


Wayback Machine
1 capture
30 Aug 2024
JulAUGSep
Previous capture30Next capture
202320242025
success
fail
COLLECTED BY
TIMESTAMPS
loading
The Wayback Machine - https://web.archive.org/web/20240830112711/https://github.com/dotnet/coreclr/pull/17350
Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/coreclrPublic archive

Add Word2Vec Benchmark Harness#17350

Merged

Conversation

michellemcdaniel
Copy link

This change adds an additional scenario benchmark, the Word2Vec
benchmark. The harness pulls down Word2Vec.Net from eabdullin, applies a
patch of changes that we made to work with netcoreapp21, harness the
word training and search, and then runs the benchmark. It also updates
the timeout for running benchmarks, since the training scenario on a
100M file takes about 7 minutes locally.

4creators and arthurvb reacted with thumbs up emoji
This change adds an additional scenario benchmark, the Word2Vecbenchmark. The harness pulls down Word2Vec.Net from eabdullin, applies apatch of changes that we made to work with netcoreapp21, harness theword training and search, and then runs the benchmark. It also updatesthe timeout for running benchmarks, since the training scenario on a100M file takes about 7 minutes locally.
@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

1 similar comment
@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 tiered ryujit Performance Scenarios Tests please

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

2 similar comments
@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

Copy link
Member

@adamsitnikadamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Overall the code looks good to me.

I am not sure which approach is the best:git submodule,fork orpatch file. Thepatch file seems to be very easy to add, but I am not sure about the maintenance. For example: how do I edit any of the files included in the patch?

@jorive what is your opinion about that?

It looks like we could remove Dotnet-Install scripts to make the diff file simpler (they are not used by the benchmark)

@@ -178,6 +178,7 @@ public static void DeleteDirectory(string path, ITestOutputHelper output)
}
try
{
SetAttributesNormal(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why do we need that? I am sure there is a good reason for that, could please you add a comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This was something I needed on my machine, where it seems like the permissions are a bit funky. Basically, it wasn't able to delete various directories, so it would crash. Changing the attributes gave the app permission to delete the directory.

int retries = 10;
for (int i = 0; i < retries; i++)
{
if (!File.Exists(sourceFileName) && File.Exists(destFileName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what if!File.Exists(sourceFileName) is true, butFile.Exists(destFileName) not?

I would use|| instead of&& and move it outside of the loop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I just copied the logic from MoveDirectory here. I think the logic basically is that if the file already exists in the destination, but doesn't in the source location, that's ok, we can just return. If it doesn't exist in either location, then we want the exception to fire.

output.WriteLine($" Attempt #{i + 1} failed: {e.Message}");
}
// if something has a transient lock on the file waiting may resolve the issue
Thread.Sleep((i + 1) * 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: the code could have been simpler withi = 1; i <= retries (feweri + 1 to get the human friendly index)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Again, this is copied from MoveDirectory, and I'd like the logic to match.

return null;
}

string GetLocalWord2VecNetRepoDirectory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: dead code

string tfm = DotNetSetup.GetTargetFrameworkMonikerForFrameworkVersion(dotNetInstall.FrameworkVersion);
string word2VecNetPublishDir = GetWord2VecNetPublishDirectory(dotNetInstall, outputDir, tfm);

var url = "http://mattmahoney.net/dc/text8.zip";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: it would be nice to have some comment saying what is the content of the file.

btw how big is this file and what is the license for it? maybe we could have our own copy of it in case somebody deletes it in the future?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The file in the zip is 100MB. It is part of, as far as I can tell, and open data compression benchmark, but basically it's just 100MB of text from Wikipedia. We potentially could add a copy to azure blob storage, but I didn't have access when I wrote up the benchmark.


namespace JitBench
{
class Word2VecBenchmark : Word2VecNetBenchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do we plan to have more Word2VecNetBenchmarks? if not then both types (Word2VecBenchmarkWord2VecNetBenchmark) can be merged to a single class

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We may add additional ML benchmarks, but not any more word2vec. I'll update this.

private readonly Metric TrainingMetric = new Metric("Training", "ms");
private readonly Metric FirstSearchMetric = new Metric("First Search", "ms");
private readonly Metric MedianSearchMetric = new Metric("Median Search", "ms");
private readonly Metric MeanSearchMetric = new Metric("Mean Search", "ms");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: dead code

private const string Word2VecNetCommitSha1Id = "6012a2b5b886926918d51b1b56387d785115f448";
private const string Word2VecNetPatch = "word2vecnet.patch";
private const string EnvironmentFileName = "Word2VecNetEnvironment.txt";
private const string StoreDirName = ".store";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: dead code

@@ -67,7 +67,7 @@ public ProcessRunner(string exePath, string arguments, string replayCommand = nu
_p.StartInfo = psi;
_p.EnableRaisingEvents = false;
_loggers = new List<IProcessLogger>();
_timeout = TimeSpan.FromMinutes(10);
_timeout = TimeSpan.FromMinutes(60);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

good idea! 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This changes the default timeout for every process JitBench launches. If you just have one particular process that is long running I'd suggest changing just that one rather than giving every process a free pass to be slow:

 new ProcessRunner("bla", "blabla").WithTimeout(TimeSpan.FromMinutes(60))

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@adamsitnik In terms of patch vs submodule, I think what we decided on was that we were going to create a perf user that will fork off the various benchmarks we want to use, and then we'd have actual changes to those benchmarks there. Until that account and those forks exist, I'm not sure we decided on a best practice, but creating the patch was the easiest thing for me, since this doesn't really belong anywhere else (ie, it's not a aspnet benchmark, so does it really belong as a submodule in JitBench itself? I'm not sure).

Copy link
Member

@adamsitnikadamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM!

@adamsitnik
Copy link
Member

@adiaaida after thinking about this for a moment the patch approach has an advantage: once we have the new git user we just fork given repo, apply the patch, push the changes, update the url and that's it 👍

michellemcdaniel reacted with thumbs up emoji

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@noahfalk Do you have any idea why on the tiered Scenario test we are failing to restore from nuget? I don't see this on the full opt, so I'm super confused. It's only on my new benchmark, not on MusicStore/AllReady, so I'm wondering if I missed something. The failure is:

Running Process: C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\.dotnet\dotnet.exe publish -c Release -f netcoreapp2.1    Working Directory: C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\W\Word2VecScenario    Additional Environment Variables: WORD2VEC_FRAMEWORK_VERSION=2.1.0-preview3-26327-01    {        00:00.187: Microsoft (R) Build Engine version 15.5.178.35674 for .NET Core        00:00.187: Copyright (C) Microsoft Corporation. All rights reserved.        00:00.187:         00:00.670:   Restoring packages for C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\W\Word2Vec.Net\Word2Vec.Net.csproj...        00:00.670:   Restoring packages for C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\W\Word2VecScenario\Word2VecScenario.csproj...        00:00.806: C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\.dotnet\sdk\2.2.0-preview1-007558\NuGet.targets(103,5): error : Unable to load the service index for source https://api.nuget.org/v3/index.json. [C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\W\Word2VecScenario\Word2VecScenario.csproj]        00:00.806: C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\.dotnet\sdk\2.2.0-preview1-007558\NuGet.targets(103,5): error :   Unknown error (0xffffffff) [C:\j\w\perf_scenario---d898017e\bin\sandbox_logs\Scenarios\Off\JitBench\W\Word2VecScenario\Word2VecScenario.csproj]    }    Exit code: 1 ( 00:00.868 elapsed)


string GetLocalWord2VecNetRepoDirectory()
{
return Environment.GetEnvironmentVariable("MUSICSTORE_PRIVATE_REPO");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You probably want a new env var : )


string GetCoreClrRoot()
{
string currentDirectory = Directory.GetCurrentDirectory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks like there is a dependency for the benchmark to run correctly that either the current directory is the root of the coreclr repo, or the CORECLR_REPO env var is specified?

The standard way I run this on my own dev machine is to go \tests\src\performance\scenarios\JitBench\JitBench_unofficial and then use "dotnet run" from there so it appears my workflow would break. I'd suggest that you locate the patch file relative to JitBench.dll, and during build of JitBench.dll you ensure that the patch file is copied adjacent or in a well-known subfolder. That will eliminate any need to refer or reason about the source tree.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess the question I have here is how would the application know where the file was located if it is located alongside the dll? Especially if we're changing directories to clone/build/etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd suggest using:

Uri uri = new Uri(Assembly.GetExecutingAssembly().CodeBase);string filePathToJitBench = uri.AbsolutePath;//compute some other path relative to this one

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah got it. It looks like on core, we actually use GetEntryAssembly(), but otherwise, it's the same. Thanks for the help.

@noahfalk
Copy link
Member

That unable to load service index error for NuGet looks suspiciously like a network error to me. If you are able to reproduce it consistently for one test and not for others I'd guess that is because the others have files locally cached so that only the new test is hitting the network. If you try again now does the issue still reproduce?

@michellemcdaniel
Copy link
Author

Every time I've run that test, it has failed for that reason. I cannot repro it locally.

@noahfalk
Copy link
Member

All the references I can find suggest its likely to be network environment related:
NuGet/Home#2880

You said it wasn't failing locally so I assume that means it was failing in CI? Perhaps you can get manual access to the machine or script the JitBench test to do some automated probing on your behalf. A network trace with fidler might give some indication what was going on, or perhaps engage the NuGet team and see if they have a suggestion? Sadly the "unknown error (0xffffffff) ..." doesn't give much to go on.

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdaniel
Copy link
Author

@noahfalk Updated to use the csproj to copy the patch file along side the build dlls. Tested using your workflow. PTAL.

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@@ -54,5 +54,8 @@
Overwrite="true"
Encoding="Unicode"/>
</Target>
<Target Name="AfterBuild">
<Copy SourceFiles="Resources\word2vecnet.patch" DestinationFolder="$(OutDir)" ContinueOnError="true" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Don't you want to fail the build if not copied?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

removed.

@michellemcdaniel
Copy link
Author

@dotnet-bot test Windows_NT x64 perf scenarios please

@michellemcdanielmichellemcdaniel merged commit34561ef intodotnet:masterApr 5, 2018
@michellemcdanielmichellemcdaniel deleted the addMLBenchmark branchAugust 28, 2018 21:38
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@jorivejorivejorive left review comments

@adamsitnikadamsitnikadamsitnik approved these changes

@noahfalknoahfalknoahfalk approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
4 participants
@michellemcdaniel@adamsitnik@noahfalk@jorive

[8]ページ先頭

©2009-2025 Movatter.jp