Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tags: pytorch/test-infra

Tags

v20250709-181311

Toggle v20250709-181311's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
[ez][CH] Fix infra_metrics.cloud.watch_metrics schema: use DateTime64 (…#6909)The timestamp used by cloudwatch has milliseconds, so change thetimestamp field to match thatTesting: replaced the old table, then ran `pythontools/rockset_migration/s32ch.py --clickhouse-table"infra_metrics.cloudwatch_metrics" --stored-data t.json --s3-bucketfbossci-cloudwatch-metrics --s3-prefix ghci-related `

v20250708-173352

Toggle v20250708-173352's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
[ghinfra] Set up ingestion from s3 -> clickhouse for cloudwatch (#6898)Path: cloudwatch metrics -> firehose -> s3 (new bucketfbossci-cloudwatch-metrics) -> clickhouseThis is the s3 -> clickhouse partI think clickhouse has some in built ingestions for kinesis but I'mlazy...Requirespytorch-labs/pytorch-gha-infra#751Testing: ran the python code via`python tools/rockset_migration/s32ch.py --clickhouse-table"infra_metrics.cloudwatch_metrics" --stored-data t.json --s3-bucketfbossci-cloudwatch-metrics --s3-prefix ghci-related`

v20250703-021349

Toggle v20250703-021349's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
Add revert category extractionand exclude `ghfirst` reverts from stats (#6882)Adds revert category extraction from GitHub comments and excludes`ghfirst` reverts from precision/recall metrics.## Changes### 1. Added Revert Category Extraction- New method `extract_revert_categories_batch()` in`autorevert_checker.py`- Extracts categories (`nosignal`, `ignoredsignal`, `landrace`, `weird`,`ghfirst`) from GitHub issue comments- Single batch query for performance### 2. Enhanced `get_commits_reverted_with_info()`- Now includes category information for each revert- Uses batch extraction for all reverts at once### 3. Updated Metrics Calculation- Excludes `ghfirst` reverts from recall calculation- Shows category breakdown in summary statistics- Per-workflow precision now shows both total and non-ghfirst metrics### 4. Fixed Pattern Detection Bug- Fixed `AttributeError: 'NoneType' object has no attribute 'head_sha'`- Created proper mapping between failures and their newer commits<details><summary>Bug Fix Details</summary>**Problem**: `newer_commit_same_job` was used outside its loop scope**Solution**: Created `failure_to_newer_commit` dict to track mappings```python# Map each failure to its newer commitfailure_to_newer_commit = {}for (rule, job) in suspected_failures:    newer_commit_same_job, newer_same_jobs = self._find_last_commit_with_job(...)    if newer_commit_same_job and any(...):        failure_to_newer_commit[(rule, job)] = newer_commit_same_job# Use mapping in pattern creationfor (failure_rule, job_name), newer_commit in failure_to_newer_commit.items():    patterns.append({        "newer_commits": [newer_commit.head_sha, suspected_commit1.head_sha],        ...    })```</details>## Example Output``` python -m pytorch_auto_revert autorevert-checker Lint trunk pull inductor linux-binary-manywheel  --hours 720 --verbose==================================================SUMMARY STATISTICS==================================================Workflow(s): Lint, trunk, pull, inductor, linux-binary-manywheelTimeframe: 720 hoursCommits checked: 6741Auto revert patterns detected: 419Actual reverts inside auto revert patterns detected (precision): 50 (11.9%)Total revert commits in period: 121Revert categories:  nosignal: 46 (38.0%)  ghfirst: 28 (23.1%)  uncategorized: 21 (17.4%)  ignoredsignal: 16 (13.2%)  weird: 9 (7.4%)  landrace: 1 (0.8%)Total reverts excluding ghfirst: 93Reverts (excluding ghfirst) that dont match any auto revert pattern detected (recall): 50 (53.8%)Per workflow precision:  Lint: 6 reverts out of 17 patterns (35.3%) [excluding ghfirst: 6 (35.3%)]  trunk: 2 reverts out of 14 patterns (14.3%) [excluding ghfirst: 2 (14.3%)]  pull: 40 reverts out of 354 patterns (11.3%) [excluding ghfirst: 33 (9.3%)]  inductor: 2 reverts out of 31 patterns (6.5%) [excluding ghfirst: 2 (6.5%)]  linux-binary-manywheel: 0 reverts out of 3 patterns (0.0%) [excluding ghfirst: 0 (0.0%)]Reverted patterns:  - Python RuntimeError: e1aee866 (ignoredsignal)  - GitHub workflows weren't regenerated: 3b6569b1 (ignoredsignal)  - GitHub workflows weren't regenerated: bbbced94 (landrace)  - Python RuntimeError: 060838c2 (nosignal)  - Lintrunner failure: 1a55fb0e (weird)  - Lintrunner failure: 3239da0c (nosignal)  - MSVC compiler error: eab45643 (ignoredsignal)  - Bad response status code: ea7b2330 (uncategorized)  - gtest failure: 347ace4c (nosignal)  - pytest failure: 216bd609 (nosignal)  - pytest failure: 84c588e5 (ignoredsignal)  - GHA error: 863327ae (ignoredsignal)  - GHA error: eb9efb37 (ghfirst)  - GHA error: 9c39bc24 (ignoredsignal)  - Python Test File RuntimeError: 6de41ce0 (uncategorized)  - Fallback for other test failure rules: 3f920f3d (nosignal)  - pytest failure: f179b719 (ghfirst)  - pytest failure: 92409b6c (uncategorized)  - pytest failure: d1b4e0fa (nosignal)  - pytest failure: 099d0d61 (uncategorized)  - pytest failure: c79c7bbe (nosignal)  - pytest failure: c95f7fa8 (nosignal)  - pytest failure: 08dae945 (weird)  - pytest failure: fb75dea2 (uncategorized)  - pytest failure: 9de23d0c (uncategorized)  - pytest failure: 830a335a (ghfirst)  - pytest failure: 6d3a4356 (ignoredsignal)  - pytest failure: a6a3a441 (nosignal)  - Python Test timeout (KeyboardInterrupt): 8142a028 (nosignal)  - pr_time_benchmarks regression: 2b9d638e (weird)  - pytest failure: dc5e8f79 (nosignal)  - pytest failure: 5264f8cd (weird)  - pytest failure: 8823138e (nosignal)  - pytest failure: f154f9b3 (ignoredsignal)  - pr_time_benchmarks regression: b07725a9 (ghfirst)  - pr_time_benchmarks regression: d4d0ede6 (ignoredsignal)  - Build error: 2596e3d0 (nosignal)  - pytest failure: c1f531f0 (ghfirst)  - GHA error: c6b4f986 (nosignal)  - GHA error: 529e0357 (nosignal)  - GHA error: e694280d (ghfirst)  - GHA error: e1180c72 (weird)  - GHA error: 7dcc77e4 (nosignal)  - GHA error: a3098a74 (ignoredsignal)  - GHA error: a14f427d (nosignal)  - GHA error: bee9c70c (nosignal)  - GHA error: 409c396a (ghfirst)  - GHA error: 67fb9b7c (nosignal)  - GHA error: 1b50c125 (nosignal)  - pytest failure: 196c95d4 (nosignal)```

v20250630-183403

Toggle v20250630-183403's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
[Pytorch AutoRevert] - Improves autorevert check heuristics (#6853)Do some improvements in the back analisys for the revert logic with thegoal of improving precision and recall and validate as a valid strategy.Checked against the workflows: pull trunk inductorlinux-binary-manywheelOld code:```Timeframe: 720 hoursCommits checked: 6177Auto revert patterns detected: 188Actual reverts inside auto revert patterns detected: 24 (12.8%)Total revert commits in period: 115Reverts that dont match any auto revert pattern detected: 91```Newer code:```Workflow(s): pull, trunk, inductor, linux-binary-manywheelTimeframe: 720 hoursCommits checked: 5403Auto revert patterns detected: 442Actual reverts inside auto revert patterns detected (precision): 48 (10.9%)Total revert commits in period: 115Reverts that dont match any auto revert pattern detected (recall): 67 (58.3%)Per workflow precision:  pull: 45 reverts out of 411 patterns (10.9%)  trunk: 1 reverts out of 8 patterns (12.5%)  inductor: 2 reverts out of 20 patterns (10.0%)  linux-binary-manywheel: 0 reverts out of 3 patterns (0.0%)```Critical implemented changes:* Look forward and back for the first commit that ran the failed job,instead of trusting on always looking on the one right before or rightafter.* Job names have parts we don't care, like shards indices. As a failurecould happen in any shard we want to find any shard with the samefailure;Things I tried and don't lead to great results:* ignoring error classification - too low precision, not significantincrease in recall* not requiring error repetition - too low precision, not significantincrease in recallMy take:With a precision of 10% it justifies the cost of re-running jobs inorder to confirm redness status, even if it is not possible to test, Isuspect that the fact we force require the same output 2 times for all 3signals, this should elevate the precision to a very high standard.Unfortunately the only way to test is run this in shadow mode.With a recall of 55%, it points out to being able to capture **most** ofthe introduced trunk redness errors. Lots of reverts might not be causedby ci redness, especially not in the workflows we are analyzing (couldbe performance degradation, GHF/internal reasons and many others). Thisnumber seems comfortable to provide a substantial gain in benefit for CIquality.

v20250630-164255

Toggle v20250630-164255's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
runners: Revert things related to batch termination (#6868)This reverts the following PRs:*#6859 *#6858 *#6855 *#6854*#6852These were causing issues where scale-down was too aggressively scalingdown instances leading to runners not being refreshed by scale-up.I do think the SSM expiration stuff is worth a re-do though but therewere merge conflicts so I have to revert the entire thing.

v20250627-203612

Toggle v20250627-203612's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
runners: Fix lint (#6859)There was some outstanding lint issues from previous PRs.Fixes the lint and formatting.Signed-off-by: Eli Uriegas <eliuriegas@meta.com>

v20250627-202541

Toggle v20250627-202541's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
[ez][docs] Add wiki maintenance magic strings to aws/lambda/readme (#……6856)As in titleAlso* switches some things to permalinks* some capitalizationMost of this was written by yang, so I can't take credit even though thewiki maintenance script will say otherwise

v20250627-200622

Toggle v20250627-200622's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
runners: make ssm policy an array (#6858)Fixes an issue where the SSM parameter policies were not being setcorrectly.Resulted in errors like:ValidationException: Invalid policies input:{"Type":"Expiration","Version":"1.0","Attributes":{"Timestamp":"2025-06-27T19:11:55.437Z"}}.Signed-off-by: Eli Uriegas <eliuriegas@meta.com>

v20250627-185904

Toggle v20250627-185904's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
[log classifier] Rule for graph break registry check (#6837)For failures like [GH joblink](https://github.com/pytorch/pytorch/actions/runs/15859789097/job/44714997710)[HUD commitlink](https://hud.pytorch.org/pytorch/pytorch/commit/c1ad4b8e7a16f54c35a3908b56ed7d9f95eef586)Currently matches ` ##[error]Process completed with exit code 1.`but there is a better line`Found the unimplemented_v2 or unimplemented_v2_with_warning calls belowthat don't match the registry in graph_break_registry.json.`

v20250627-183532

Toggle v20250627-183532's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’sverified signature.
GPG key ID:B5690EEEBB952194
Verified
Learn about vigilant mode
runners: Add expiration policy to SSM parameters (#6855)Instead of doing expensive cleanups we can rely on SSM parameterpolicies to do the cleanup for us!This is a workaround to avoid the need to do expensive cleanup of SSMparameters.Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
PreviousNext

[8]ページ先頭

©2009-2025 Movatter.jp