- Notifications
You must be signed in to change notification settings - Fork2.7k
Conversation
This change adds an additional scenario benchmark, the Word2Vec
benchmark. The harness pulls down Word2Vec.Net from eabdullin, applies a
patch of changes that we made to work with netcoreapp21, harness the
word training and search, and then runs the benchmark. It also updates
the timeout for running benchmarks, since the training scenario on a
100M file takes about 7 minutes locally.
This change adds an additional scenario benchmark, the Word2Vecbenchmark. The harness pulls down Word2Vec.Net from eabdullin, applies apatch of changes that we made to work with netcoreapp21, harness theword training and search, and then runs the benchmark. It also updatesthe timeout for running benchmarks, since the training scenario on a100M file takes about 7 minutes locally.
@dotnet-bot test Windows_NT x64 perf scenarios please |
1 similar comment
@dotnet-bot test Windows_NT x64 perf scenarios please |
@dotnet-bot test Windows_NT x64 tiered ryujit Performance Scenarios Tests please |
@dotnet-bot test Windows_NT x64 perf scenarios please |
2 similar comments
@dotnet-bot test Windows_NT x64 perf scenarios please |
@dotnet-bot test Windows_NT x64 perf scenarios please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall the code looks good to me.
I am not sure which approach is the best:git submodule
,fork
orpatch
file. Thepatch
file seems to be very easy to add, but I am not sure about the maintenance. For example: how do I edit any of the files included in the patch?
@jorive what is your opinion about that?
It looks like we could remove Dotnet-Install scripts to make the diff file simpler (they are not used by the benchmark)
@@ -178,6 +178,7 @@ public static void DeleteDirectory(string path, ITestOutputHelper output) | |||
} | |||
try | |||
{ | |||
SetAttributesNormal(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why do we need that? I am sure there is a good reason for that, could please you add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This was something I needed on my machine, where it seems like the permissions are a bit funky. Basically, it wasn't able to delete various directories, so it would crash. Changing the attributes gave the app permission to delete the directory.
int retries = 10; | ||
for (int i = 0; i < retries; i++) | ||
{ | ||
if (!File.Exists(sourceFileName) && File.Exists(destFileName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
what if!File.Exists(sourceFileName)
is true, butFile.Exists(destFileName)
not?
I would use||
instead of&&
and move it outside of the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I just copied the logic from MoveDirectory here. I think the logic basically is that if the file already exists in the destination, but doesn't in the source location, that's ok, we can just return. If it doesn't exist in either location, then we want the exception to fire.
output.WriteLine($" Attempt #{i + 1} failed: {e.Message}"); | ||
} | ||
// if something has a transient lock on the file waiting may resolve the issue | ||
Thread.Sleep((i + 1) * 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: the code could have been simpler withi = 1; i <= retries
(feweri + 1
to get the human friendly index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Again, this is copied from MoveDirectory, and I'd like the logic to match.
return null; | ||
} | ||
string GetLocalWord2VecNetRepoDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: dead code
string tfm = DotNetSetup.GetTargetFrameworkMonikerForFrameworkVersion(dotNetInstall.FrameworkVersion); | ||
string word2VecNetPublishDir = GetWord2VecNetPublishDirectory(dotNetInstall, outputDir, tfm); | ||
var url = "http://mattmahoney.net/dc/text8.zip"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: it would be nice to have some comment saying what is the content of the file.
btw how big is this file and what is the license for it? maybe we could have our own copy of it in case somebody deletes it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The file in the zip is 100MB. It is part of, as far as I can tell, and open data compression benchmark, but basically it's just 100MB of text from Wikipedia. We potentially could add a copy to azure blob storage, but I didn't have access when I wrote up the benchmark.
namespace JitBench | ||
{ | ||
class Word2VecBenchmark : Word2VecNetBenchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
do we plan to have more Word2VecNetBenchmarks? if not then both types (Word2VecBenchmark
Word2VecNetBenchmark
) can be merged to a single class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We may add additional ML benchmarks, but not any more word2vec. I'll update this.
private readonly Metric TrainingMetric = new Metric("Training", "ms"); | ||
private readonly Metric FirstSearchMetric = new Metric("First Search", "ms"); | ||
private readonly Metric MedianSearchMetric = new Metric("Median Search", "ms"); | ||
private readonly Metric MeanSearchMetric = new Metric("Mean Search", "ms"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: dead code
private const string Word2VecNetCommitSha1Id = "6012a2b5b886926918d51b1b56387d785115f448"; | ||
private const string Word2VecNetPatch = "word2vecnet.patch"; | ||
private const string EnvironmentFileName = "Word2VecNetEnvironment.txt"; | ||
private const string StoreDirName = ".store"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: dead code
@@ -67,7 +67,7 @@ public ProcessRunner(string exePath, string arguments, string replayCommand = nu | |||
_p.StartInfo = psi; | |||
_p.EnableRaisingEvents = false; | |||
_loggers = new List<IProcessLogger>(); | |||
_timeout = TimeSpan.FromMinutes(10); | |||
_timeout = TimeSpan.FromMinutes(60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
good idea! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This changes the default timeout for every process JitBench launches. If you just have one particular process that is long running I'd suggest changing just that one rather than giving every process a free pass to be slow:
new ProcessRunner("bla", "blabla").WithTimeout(TimeSpan.FromMinutes(60))
@dotnet-bot test Windows_NT x64 perf scenarios please |
@adamsitnik In terms of patch vs submodule, I think what we decided on was that we were going to create a perf user that will fork off the various benchmarks we want to use, and then we'd have actual changes to those benchmarks there. Until that account and those forks exist, I'm not sure we decided on a best practice, but creating the patch was the easiest thing for me, since this doesn't really belong anywhere else (ie, it's not a aspnet benchmark, so does it really belong as a submodule in JitBench itself? I'm not sure). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM!
@adiaaida after thinking about this for a moment the patch approach has an advantage: once we have the new git user we just fork given repo, apply the patch, push the changes, update the url and that's it 👍 |
@dotnet-bot test Windows_NT x64 perf scenarios please |
@dotnet-bot test Windows_NT x64 perf scenarios please |
@noahfalk Do you have any idea why on the tiered Scenario test we are failing to restore from nuget? I don't see this on the full opt, so I'm super confused. It's only on my new benchmark, not on MusicStore/AllReady, so I'm wondering if I missed something. The failure is:
|
string GetLocalWord2VecNetRepoDirectory() | ||
{ | ||
return Environment.GetEnvironmentVariable("MUSICSTORE_PRIVATE_REPO"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You probably want a new env var : )
string GetCoreClrRoot() | ||
{ | ||
string currentDirectory = Directory.GetCurrentDirectory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This looks like there is a dependency for the benchmark to run correctly that either the current directory is the root of the coreclr repo, or the CORECLR_REPO env var is specified?
The standard way I run this on my own dev machine is to go \tests\src\performance\scenarios\JitBench\JitBench_unofficial and then use "dotnet run" from there so it appears my workflow would break. I'd suggest that you locate the patch file relative to JitBench.dll, and during build of JitBench.dll you ensure that the patch file is copied adjacent or in a well-known subfolder. That will eliminate any need to refer or reason about the source tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess the question I have here is how would the application know where the file was located if it is located alongside the dll? Especially if we're changing directories to clone/build/etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd suggest using:
Uri uri = new Uri(Assembly.GetExecutingAssembly().CodeBase);string filePathToJitBench = uri.AbsolutePath;//compute some other path relative to this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ah got it. It looks like on core, we actually use GetEntryAssembly(), but otherwise, it's the same. Thanks for the help.
That unable to load service index error for NuGet looks suspiciously like a network error to me. If you are able to reproduce it consistently for one test and not for others I'd guess that is because the others have files locally cached so that only the new test is hitting the network. If you try again now does the issue still reproduce? |
Every time I've run that test, it has failed for that reason. I cannot repro it locally. |
All the references I can find suggest its likely to be network environment related: You said it wasn't failing locally so I assume that means it was failing in CI? Perhaps you can get manual access to the machine or script the JitBench test to do some automated probing on your behalf. A network trace with fidler might give some indication what was going on, or perhaps engage the NuGet team and see if they have a suggestion? Sadly the "unknown error (0xffffffff) ..." doesn't give much to go on. |
@dotnet-bot test Windows_NT x64 perf scenarios please |
@noahfalk Updated to use the csproj to copy the patch file along side the build dlls. Tested using your workflow. PTAL. |
@dotnet-bot test Windows_NT x64 perf scenarios please |
@@ -54,5 +54,8 @@ | |||
Overwrite="true" | |||
Encoding="Unicode"/> | |||
</Target> | |||
<Target Name="AfterBuild"> | |||
<Copy SourceFiles="Resources\word2vecnet.patch" DestinationFolder="$(OutDir)" ContinueOnError="true" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Don't you want to fail the build if not copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
removed.
@dotnet-bot test Windows_NT x64 perf scenarios please |