Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A set of scripts to quickly visualize baseline models for a bunch of tasks#106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
IanMagnusson wants to merge5 commits intomain
base:main
Choose a base branch
Loading
frombaseline-sampler

Conversation

@IanMagnusson
Copy link
Contributor

This visualization helps you quickly learn the noise, spread, saturation, and scaling profile for a task. It runs evals and retrieves the results and produces a visualization like this:

minerva_math::olmes.pdf

@IanMagnusson
Copy link
ContributorAuthor

This is more or less working but has some todos for fixing the aggregation on mt_mbpp.

@kyleclo
Copy link
Collaborator

Thanks@IanMagnusson ! do u mind:

  • Moving these into a subdir likescripts/visualize_baselines_on_benchmark/ with aREADME.md for how to invoke? For README, the main questions:
    ** I have is what aspects of each script I need to modify for a given new benchmark?
    ** When do I need to usedownload_checkpoints.sh? Was this something you already ran one time? What$OUTPUT_DIR did you use?
    ** Doesbaseline_task_viz.py consumebaseline_sampler_data.json? If so, where did you getbaseline_sampler_data.json in first place?
  • Migratebaseline_sampler_data.json which is currently in root also within this subdir
IanMagnusson reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@kyleclokylecloAwaiting requested review from kyleclo

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@IanMagnusson@kyleclo

[8]ページ先頭

©2009-2025 Movatter.jp