Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Create durable runner for top-k variant selection#5245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
amishler wants to merge40 commits intoalan/evals-split-file-add-tests
base:alan/evals-split-file-add-tests
Choose a base branch
Loading
fromalan/evals-topk-runner-rebased

Conversation

@amishler
Copy link
Member

No description provided.

Aaron1011and others added30 commitsDecember 16, 2025 01:43
* Only write streaming provider-proxy cache body on finishWe were previously writing the current body when the stream wasdropped, even if it never finished (e.g. due to a timeout).As a result, we could end up writing an invalid body to diskdue to an *earlier* request from the e2e tests (if the e2e testshit an internal timeout and dropped the stream), overwritingthe good cache line from a later successful test run* Wait for provider-proxy write before sending back responseThis should help prevent race conditions involving client-side retries* Fix typoCo-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>* Fix clippy* Forward Cloudflare R2 env vars into fixtures containerSince these were unset inside the container, we were fallingback to the slow dev-endpoint download, rather than the fastaws cli download* Install aws cli in fixtures Dockerfile* added multi-threaded runtime for test that was exploding---------Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>Co-authored-by: Viraj Mehta <viraj@tensorzero.com>
…es (#5189)* added a get config route* added migration for tags and missing file* removed failing assertion* added json schema derives to uninitialized variant and evaluation types* made the migration public from tensorzero-core crate
* Stream inferences and datapoints in UI* Stream inferences and datapoints in UI* Stream inferences and datapoints in UI* Stream inferences and datapoints in UI* Stream inferences and datapoints in UI---------Co-authored-by: Viraj Mehta <viraj@tensorzero.com>
This should reduce the flakiness of this test (we were previouslyonly discarding the thought blocks in some cases)
* added a get config route* added migration for tags and missing file* removed failing assertion* made the migration public from tensorzero-core crate* added generic action handler* added e2e tests* added rust client method* use rust client in tests* fixed PR comments
Closes#4783.Co-authored-by: Gabriel Bianconi <1275491+GabrielBianconi@users.noreply.github.com>
* Add support for extra_headers/extra_body in relay modeThe relay gateway now forwards these options to the downstreamgateway (after performing variant-level filtering on the relaygateway).* Fix typoCo-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>* Remove collectCo-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>* Run fmt* Add some unit tests* Add more unit tests---------Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Aaron Hill <aaron@tensorzero.com>
…modal (#5197)* Add searchable evaluation and variant selectors to Launch Evaluation modal- Add generic combobox UI components (Combobox, ComboboxInput, ComboboxContent, useCombobox hook)- Add EvaluationSelector component that wraps the generic Combobox- Add VariantSelector component that wraps the generic Combobox- Update LaunchEvaluationModal to use searchable selectors instead of static Select- Update Input component focus style to use border darkening instead of ring* Add documentation comment to useCombobox hook* Update e2e tests for Combobox selectors in Launch Evaluation modal
)* e2e tests: feedback: construct payloads using structs + serializeContributes to#4710.* rely on Serialize---------Co-authored-by: Aaron Hill <aaron@tensorzero.com>
* Prevent duplicate fake-path templates from being loadedWhen loading templates from the config, any fake-path keysshould be unique (these potentially come from agent-generatedvariant configs, which will have inline templates)* Fix typoCo-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>* Fix test---------Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Clean up Docker Compose for UI* Lower log level for config message
* Tag filter for != should still require the tag to be set* Add test
* Initial partial implementation of betting-based confidence sequences* Fix bets to use previous time step's variance estimate* Add updates to wealth processes* Add last step of confidence sequence calculation from hedged wealth process* Change product to log-sum-exp for numerical stability* Add computation of point estimator for the mean* Combine weighting and hedged wealth process computation* Move bet calculation to helper function, add bet truncation* Tweak comment string for clarity* Add module-level and type-level docstrings* Remove max vs combo hedging choice, explain use of max in docstring* For computing mean estimate, restrict search to inside confidence interval for efficiency* Add unit tests, fix search for interval endpoints* Tweak test and test comments* Fix regularized mean value, add regression tests with known values* Move test, clarify comment strings in another test* Add test with known confidence sequence values* Change tests to use non-constant observations for variance fluctuations* Rename module to contrast with asymptotic_confidence_sequences.rs* Add input validation and associated tests* Remove unnecessary iterator method for m-values* Make return type a Result, use anyhow for errors* Create enums and structs* Add check_topk_stopping(), allow dead code as needed* Add epsilon argument, changed from allow to expect dead_code* Add wip note to docstring* Add tests using non-zero epsilon tolerance* Add more tests with k_min not equal to k_max* Add docstrings to tests* Change return value when k_min > num_variants* Tweak docstrings* Remove enum that will be included in a separate PR* Add VariantStatus enum* Refactor evaluations to accept a vector of variants for batch processing* Remove changes from topk.rs for now* Use partition_point() instead of binary search for finding confidence sequence bounds* Make enum for specifying grid of points where wealth processes are calculated* Move argument validation for WealthProcessGridPoints to new constructor methods* Fix bug where variant could count itself when checking how many variants it beats* Add variant_names option to CLI and python client* Fix tests to use new WealthProcessGridPoints enum* Don't multiply num_datapoints by num_variants* Update some tests to use variant_names instead of variant_name* Add deprecation warning for variant_name argument* Add deprecation warning for variant_name arg to python client* Factor out batch evals functionality into helper function* Revert change to allow passing multiple variants to run_evaluations()* Add variant back into tracing span* Pre-resolve datapoint inputs to minimize clones
* Migrate getWorkflowEvaluationProjects* Add clickhouse e2e tests
)* added the key info to the request extensions from tensorzero-auth* put in a single request extension
…5216)Bumps the rust-dependencies group with 4 updates in the / directory: [reqwest](https://github.com/seanmonstar/reqwest), [minijinja](https://github.com/mitsuhiko/minijinja), [rcgen](https://github.com/rustls/rcgen) and [tree-sitter](https://github.com/tree-sitter/tree-sitter).Updates `reqwest` from 0.12.25 to 0.12.26- [Release notes](https://github.com/seanmonstar/reqwest/releases)- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)- [Commits](seanmonstar/reqwest@v0.12.25...v0.12.26)Updates `minijinja` from 2.13.0 to 2.14.0- [Release notes](https://github.com/mitsuhiko/minijinja/releases)- [Changelog](https://github.com/mitsuhiko/minijinja/blob/main/CHANGELOG.md)- [Commits](mitsuhiko/minijinja@2.13.0...2.14.0)Updates `rcgen` from 0.14.5 to 0.14.6- [Release notes](https://github.com/rustls/rcgen/releases)- [Commits](rustls/rcgen@v0.14.5...v0.14.6)Updates `tree-sitter` from 0.25.10 to 0.26.3- [Release notes](https://github.com/tree-sitter/tree-sitter/releases)- [Commits](tree-sitter/tree-sitter@v0.25.10...v0.26.3)---updated-dependencies:- dependency-name: reqwest  dependency-version: 0.12.26  dependency-type: direct:production  update-type: version-update:semver-patch  dependency-group: rust-dependencies- dependency-name: minijinja  dependency-version: 2.14.0  dependency-type: direct:production  update-type: version-update:semver-minor  dependency-group: rust-dependencies- dependency-name: rcgen  dependency-version: 0.14.6  dependency-type: direct:production  update-type: version-update:semver-patch  dependency-group: rust-dependencies- dependency-name: tree-sitter  dependency-version: 0.26.3  dependency-type: direct:production  update-type: version-update:semver-minor  dependency-group: rust-dependencies...Signed-off-by: dependabot[bot] <support@github.com>Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Claude 3.7 -> 4.5* Claude 3.7 -> 4.5* Claude 3.7 -> 4.5* Claude 3.7 -> 4.5
* Tag filter for != should still require the tag to be set* Add test* Fix other NULL filters* Fix* Fix
* Migrate countInferencesForEpisode* Update route
This setting has significant performance implications,and is useful to know when viewing the soon-to-be-added'overhead' metric
@amishleramishler changed the base branch fromalan/evals-split-file-add-tests tomainDecember 17, 2025 19:57
@amishleramishler changed the base branch frommain toalan/evals-split-file-add-testsDecember 17, 2025 19:58
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

9 participants

@amishler@Aaron1011@GabrielBianconi@virajmehta@jinnovation@quangIO@simeonlee@ecalifornica@shuyangli

[8]ページ先頭

©2009-2025 Movatter.jp