NotificationsYou must be signed in to change notification settings
Fork638
Star5.2k

Add performance tests#1868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

alexeyr-ci2 wants to merge38 commits intomaster

base:master

Choose a base branch

fromalexeyr/performance-tests

Draft

Add performance tests#1868

alexeyr-ci2 wants to merge38 commits intomasterfromalexeyr/performance-tests

Conversation

Copy link

Collaborator

alexeyr-ci2 commentedOct 20, 2025•
edited by justin808
Loading

Summary

Add performance tests.Fixes#1777.

Pull Request checklist

~~[ ] Add/update test to cover these changes~~
Update documentation
Update CHANGELOG file

This change is

Copy link

Contributor

coderabbitaibot commentedOct 20, 2025•
edited
Loading

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the.coderabbit.yaml file in this repository. To trigger a single review, invoke the@coderabbitai review command.

You can disable this status message by setting thereviews.review_status tofalse in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branchalexeyr/performance-tests

Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment@coderabbitai help to get the list of available commands and usage tips.}

alexeyr-ci2 changed the title~~Alexeyr/performance tests~~Add performance tests

Oct 20, 2025

alexeyr-ci force-pushed thealexeyr/performance-tests branch from980a424 to4a01be4Compare

October 20, 2025 19:01

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch 5 times, most recently fromec26e73 toa616cd1Compare

November 5, 2025 11:54

Copy link

claudebot commentedNov 5, 2025

Code Review - PR#1868: Add Performance Tests

Thanks for this contribution! This is a solid implementation of performance testing infrastructure.

✅ Strengths

Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
Well-Structured Workflow: Clear separation of concerns with logical step grouping
Good Error Handling: Proper use of set -euo pipefail and validation steps
Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
Security Conscious: SSH access properly gated with warnings and actor-only access
Flexible Configuration: Parameterized inputs allow customization

🐛 Critical Issues

Node.js Version Mismatch (benchmark.yml:155)
- Uses Node 22, but PRFix Node.js 22 V8 cache crash by reverting to Node 20 #1925 documents Node 22 V8 cache crashes
- Recommend changing to Node 20 untilFix Node.js 22 V8 cache crash by reverting to Node 20 #1925 is resolved
Missing Trailing Newlines (CRITICAL per CLAUDE.md)
- Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
- This will cause CI failures - Run: rake autofix to fix
Race Condition in Server Startup (benchmark.yml:220)
- Background server might not be fully initialized when curl check succeeds
- Recommend adding health check endpoint or verifying multiple successful responses
Incomplete PR Checklist
- Documentation and CHANGELOG boxes unchecked
- Per CLAUDE.md: Should update CHANGELOG for user-visible features

⚠️ Performance & Quality Issues

bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
Fortio Optimization Missing - Referenced guide not implemented
No Resource Monitoring - Missing CPU/memory metrics
Missing Test Validation - No thresholds to detect performance regressions
Single Route Testing - Only tests server_side_hello_world_hooks
Complex Validation Logic - Lines 17-27 hard to read
Inconsistent Quoting - Variables not consistently quoted
Hard-coded Timeouts - 30s and 60s timeouts not configurable

📚 Documentation Issues

Missing Documentation - No README for running benchmarks locally
Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
No Historical Tracking - Results not compared over time

🎯 Before Merging

Fix Node.js version (22 → 20)
Add trailing newlines
Run bundle exec rubocop
Update CHANGELOG.md
Add performance thresholds
Update workflow triggers

📊 Summary

Assessment: Good foundation but needs critical fixes before merge

Effort: Critical issues ~30 mins, High priority ~1-2 hours

Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.

Let me know if you need help with any of these!

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch 5 times, most recently from040c0a5 toad56e14Compare

November 5, 2025 15:49

Copy link

claudebot commentedNov 5, 2025

PR Review: Add Performance Tests (1868)

Summary

This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.

Critical Issues

1. Missing Trailing Newlines

CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 202)

2. RuboCop Required

CRITICAL per CLAUDE.md: Must runbundle exec rubocop and fix ALL violations before commit.

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

The jq expression usesadd which fails on empty arrays. Useadd // 0 for safety.

Server Startup Race (benchmark.yml:254)

Usesexit 0 inside loop which exits the entire step. Should usebreak instead.

npm vs yarn (benchmark.yml:200)

Usesnpm install --global yalc but CLAUDE.md requires yarn-only commands.

Code Quality Strengths

Excellent tool selection - three benchmarking tools for cross-validation
Well-structured workflow with clear sections
Good error handling with set -euo pipefail
Comprehensive parameterization via workflow inputs
Smart binary caching for faster CI
Security-conscious SSH access gating

Suggestions

Shell portability: Replace bc with awk for float comparisons
Hardcoded route: Make server_side_hello_world_hooks configurable
Duration: 30s may be too short, consider 60s for stable measurements
Add documentation for local benchmark execution
Consider performance regression detection vs master
Add spec/performance/README.md with interpretation guide

Test Coverage

Good: Infrastructure in place, runs on PR/push, results preserved as artifacts
Missing: No bash script tests, no performance thresholds, no local testing docs

Verdict

Solid PR with high code quality. Performance testing infrastructure will be valuable.

Required Before Merge:

Fix trailing newlines (will cause CI failure)
Run bundle exec rubocop and fix violations
Fix K6 empty array bug

claudebot commentedNov 6, 2025

PR Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:

✅Strengths

Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks

🐛Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

The script uses bc for floating-point comparisons but bc is not installed in the workflow.

Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.

2. Race Condition in Server Startup (benchmark.yml:248)

The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.

Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.

3. k6 Status Reporting Edge Case (bench.sh:199-204)

If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.

Fix: Add null handling by changing the jq expression to use | add // 0

4. Missing Newline at End of Files

Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character

Files needing newlines:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 214)

Fix: Run rake autofix or add trailing newlines manually.

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.

Suggestion:

Remove pull_request trigger and only keep workflow_dispatch + push to master
Or add a condition to only run on PRs with a specific label (e.g., performance)
Or significantly reduce default duration (e.g., 10s) for PR runs

🔍Code Quality & Best Practices

Security

✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
✅ limit-access-to-actor: true prevents unauthorized access
⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active

Performance

✅ Binary caching will significantly speed up repeated runs
✅ Server warm-up phase (10 requests) is good practice
⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster

Maintainability

✅ Clear section comments in YAML
✅ Parameterized configuration
⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability

📊Test Coverage

Missing:

❌ No automated tests for the bench.sh script itself
❌ No validation that the summary.txt format is correct
❌ No performance regression detection (just data collection)

Suggestions:

Add a unit test for bench.sh parameter validation
Consider storing benchmark results over time and comparing against baselines
Add a step to post benchmark results as a PR comment for visibility

🔧Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.

Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.

📝Documentation & Changelog

Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)

This PR qualifies as it adds a performance testing feature. Needs:

✅ Changelog entry format:PR 1868 byalexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.

Missing:

❌ No README/docs update explaining how to run benchmarks locally
❌ No documentation on interpreting benchmark results

🎯Recommendations

Before Merge (Critical):

✅ Add trailing newlines to all new files
✅ Run bundle exec rubocop and fix any violations
✅ Add CHANGELOG entry
✅ Fix k6 null handling edge case
✅ Reconsider pull_request trigger or reduce duration

Nice to Have:
6. Add bc to dependencies or remove floating-point support
7. Add brief README section on running benchmarks
8. Handle server startup race condition
9. Consider performance regression detection

🎓Overall Assessment

This is asolid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:

CI cost - running on every PR may be expensive
Missing documentation - users won't know how to use this
File formatting - needs trailing newlines per repo standards

Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.

Score: 8/10 - Great implementation, minor issues to address.

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch from7b9d621 tof2b9ef8Compare

November 7, 2025 13:13

Copy link

claudebot commentedNov 7, 2025

Pull Request Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:

✅Strengths

Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.
Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.
Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.
Good error handling: Thebench.sh script includes comprehensive input validation and error messages.
Caching: Binary caching for tools reduces workflow execution time.
Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.

🔍Issues and Concerns

Critical Issues

Missing file newlines (spec/dummy/bin/prod,spec/dummy/bin/prod-assets)
- Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
- Impact: CI will fail per CLAUDE.md requirements
- Fix: Ensure both files end with a newline character
- Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
RuboCop not run
- Impact: MANDATORY requirement before commits
- Required action: Runbundle exec rubocop and fix ALL violations
- Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS runbundle exec rubocop and fix ALL violations"

High Priority Issues

Workflow runs on every PR/push to master (benchmark.yml:55-58)
- Impact: Performance tests are resource-intensive and will slow down CI significantly
- Recommendation: Only run on manual workflow_dispatch, specific labels (e.g.,run-benchmarks), or scheduled runs (e.g., nightly)
Server process not properly cleaned up (benchmark.yml:248)
- Issue: Server started in background withbin/prod & but never explicitly killed
- Impact: May leave orphaned processes, especially if workflow fails
- Fix: Add cleanup step or use a trap/signal handler

Medium Priority Issues

Hardcoded route in bench.sh (line 6)
- Issue:ROUTE:-server_side_hello_world_hooks may not exist or may not be representative
- Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
No baseline comparison
- Issue: Results are collected but not compared against previous runs
- Enhancement: Consider using benchmark-action to track performance over time
Missing documentation
- Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
- Recommendation: Addspec/performance/README.md

🔒Security Considerations

SSH access via tmate (benchmark.yml:108-113)
- Status: Properly gated behind workflow input (default: false)
- Good: Includes security warnings andlimit-access-to-actor

🧪Test Coverage

Missing:

No unit tests for the bash script logic
No tests verifying the benchmark workflow succeeds
No validation that summary output format is correct

Recommendations:

Add a dry-run mode tobench.sh for testing
Consider shellcheck for bash script validation

📝Suggested Changes

Must Fix (Blocks Merge)

Add trailing newlines tospec/dummy/bin/prod andspec/dummy/bin/prod-assets
Runbundle exec rubocop and fix all violations
Change workflow triggers to avoid running on every PR/push

Should Fix (High Priority)

Add server cleanup/shutdown in workflow
Verify the benchmark route exists and is representative
Add basic documentation (spec/performance/README.md)

Nice to Have

Add baseline comparison/tracking
Add shellcheck to CI for bash scripts
Consider shorter default duration for PR checks
Extract K6 test to separate file

✅Changelog

Per CLAUDE.md, thisshould be added to CHANGELOG.md as it's a new feature.

🎯Overall Assessment

This is asolid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.

Estimated effort to address critical issues: ~30 minutes

Great work overall! 🚀

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch fromc58f480 to43f8858Compare

November 7, 2025 13:36

Copy link

claudebot commentedNov 7, 2025

Code Review: Add Performance Tests

Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue#1777.

✅ Strengths

1. Excellent Tool Coverage

Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
Smart caching strategy for tool binaries reduces CI time
Flexible configuration via environment variables

2. Robust Error Handling

Good use of set -euo pipefail in bash scripts
Comprehensive input validation with clear error messages
Server health checks with timeout protection

3. Production-Ready Configuration

Proper Puma clustering setup for production benchmarks
Preloading and worker configuration aligned with best practices
Appropriate separation of prod/dev configurations

🔒 Security Concerns - CRITICAL

SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)

This posessignificant security risks:

Issues:

Detached mode risk: Using detached: true leaves the SSH session open indefinitely
Repository exposure: The entire repository is accessible via SSH
No audit trail: Limited visibility into commands executed during SSH sessions
Secrets exposure risk: If secrets are added later, they could be exposed

Recommendations:

Consider removing this feature entirely for production workflows
If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
Document that NO secrets should ever be added to this workflow

🐛 Potential Bugs

1. Division by Zero Risk (spec/performance/bench.sh:195-197)

If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.

2. Race Condition in Server Startup (spec/dummy/bin/prod)

The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.

3. Missing Error Context

When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.

4. Hardcoded Paths (spec/performance/bench.sh:87)

If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.

⚡ Performance Considerations

Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.
Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.
Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.

🧪 Test Coverage - Missing

Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
Workflow validation: Consider adding actionlint to CI
Integration test: Should verify benchmark artifacts are generated correctly

📝 Documentation - REQUIRED

Missing per CLAUDE.md requirements:

CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md
Inline documentation: spec/performance/bench.sh needs header comment explaining:
- Purpose
- Usage examples
- Required dependencies
- Expected output format
Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results

🎨 Code Quality Suggestions

Good Practices:

Consistent error handling
Input validation with clear messages
Proper use of environment variables with defaults
JSON output for machine-readable results

Suggestions:

YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
Puma Configuration: Default of 2 workers may be too low - document why this was chosen

📋 Action Items Summary

🔴 High Priority (Before Merge):

Address SSH security concerns - Remove or significantly restrict
Add CHANGELOG.md entry (required per project guidelines)
Fix potential null reference in K6 status parsing
Add workflow path filters to prevent unnecessary runs

🟡 Medium Priority (Before Merge):

Add header documentation to bench.sh
Add error context to benchmark failures
Fix hardcoded paths in bash script

🟢 Low Priority (Future Enhancement):

Add unit tests for benchmark script
Refactor duplicate code in parsing logic
Make magic numbers configurable

✨ Overall Assessment

This is asolid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.

Main concerns:

Security: SSH feature needs significant changes or removal
Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
Robustness: A few edge cases in error handling

Recommendation: Request changes for security and documentation issues, then approve after those are addressed.

Great work on this comprehensive testing infrastructure!

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch 8 times, most recently frombe4af9e to564b8b6Compare

November 7, 2025 20:20

alexeyr added29 commits

November 17, 2025 10:16

Fix max rate K6 scenario

1a57bf4

Reorder workflow parameters more logically

2645dc3

Closer to recommended Fortio options

d2f4254

Allow configuring RAILS_MAX/MIN_THREADS in the workflow

84d76a5

Move showing benchmark params to bench.sh for simplicity

3f0f3d4

Convert the benchmark script to Ruby

11ed6c9

Fix k6 timeout

c0f7322

Replace DURATION_SEC with DURATION

6b64922

Group all code for a tool into a single block

e86cb1e

Remove duplication in adding summaries

8145e36

Benchmark all routes

64b0d6b

Fix Fortio failure on server_side_log_throw_raise

e31d4a6

Allow specifying routes

a3866f3

Add pro benchmarks

a914dcd

Update Claude instructions

3f7224f

FIXME temp commit

1ee85ef

Update prod-assets to include generate_packs

a358195

Disable js_compressor and css_compressor

3aa40f2

Remove unused pg gem

24ef5a3

Handle empty inputs correctly

2943eaf

Fix app version handling in the benchmark workflow

2f658c5

Fix starting/stopping servers

a175f08

Simplify validate steps

b44f351

Temp config to speed up

40029fa

Optimize tools installation

88dc86e

Add logging to server check

1404c7b

Make installs frozen

e22d3db

Allow redirects in server_responding

b7f635b

Try full Pro benchmark

8f18ead

alexeyr-ci2 force-pushed thealexeyr/performance-tests branch fromfe84f1e to8f18eadCompare

November 17, 2025 10:17

Labels

None yet

Movatterモバイル変換

Uh oh!

Add performance tests#1868

Are you sure you want to change the base?

Add performance tests#1868

Uh oh!

Conversation

alexeyr-ci2 commentedOct 20, 2025• edited by justin808Loading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Summary

Pull Request checklist

Uh oh!

coderabbitaibot commentedOct 20, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Review skipped

Uh oh!

claudebot commentedNov 5, 2025

Code Review - PR#1868: Add Performance Tests

✅ Strengths

🐛 Critical Issues

⚠️ Performance & Quality Issues

📚 Documentation Issues

🎯 Before Merging

📊 Summary

Uh oh!

claudebot commentedNov 5, 2025

PR Review: Add Performance Tests (1868)

Summary

Critical Issues

1. Missing Trailing Newlines

2. RuboCop Required

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

Server Startup Race (benchmark.yml:254)

npm vs yarn (benchmark.yml:200)

Code Quality Strengths

Suggestions

Test Coverage

Verdict

Required Before Merge:

Recommended:

Uh oh!

claudebot commentedNov 6, 2025

PR Review: Add Performance Tests

✅Strengths

🐛Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

2. Race Condition in Server Startup (benchmark.yml:248)

3. k6 Status Reporting Edge Case (bench.sh:199-204)

4. Missing Newline at End of Files

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

🔍Code Quality & Best Practices

Security

Performance

Maintainability

📊Test Coverage

🔧Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

📝Documentation & Changelog

🎯Recommendations

🎓Overall Assessment

Uh oh!

claudebot commentedNov 7, 2025

Pull Request Review: Add Performance Tests

✅Strengths

🔍Issues and Concerns

Critical Issues

High Priority Issues

Medium Priority Issues

🔒Security Considerations

🧪Test Coverage

📝Suggested Changes

Must Fix (Blocks Merge)

Should Fix (High Priority)

Nice to Have

✅Changelog

🎯Overall Assessment

Uh oh!

claudebot commentedNov 7, 2025

Code Review: Add Performance Tests

alexeyr-ci2 commentedOct 20, 2025•
edited by justin808
Loading

coderabbitaibot commentedOct 20, 2025•
edited
Loading