- Notifications
You must be signed in to change notification settings - Fork1.1k
feat: add hard-limited presets metric#18008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
32f43de to39748bdCompare39748bd to6fb6800Comparef6ea98b to2667684Compare8db12a2 to3100e01CompareUh oh!
There was an error while loading.Please reload this page.
| key:=hardLimitedPresetKey{orgName:orgName,templateName:templateName,presetName:presetName} | ||
| mc.isPresetHardLimited[key]=isHardLimited |
ssncferreiraMay 26, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: With this approach, we’re accumulating entries that are no longer relevant over time. This can cause the metrics to keep reporting0 for presets that should disappear (e.g., deleted or outdated presets).
A small improvement would be to remove presets from the map whenisHardLimited == false:
if isHardLimited {mc.isPresetHardLimited[key] = true} else {delete(mc.isPresetHardLimited, key)}This keeps the metric data cleaner and avoids unnecessary entries in Prometheus.
But there is a catch 🤔 With this change, there’s a brief window where Prometheus may still show the previous value (1) for a preset after it’s been removed from the map. This is expected behavior in Prometheus, metrics are not deleted immediately, but expire after some time without being reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Good catch, fixed here:80c89fe.
I was under a false assumption that it will report stale value indefinitely.
ssncferreira left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall looks good to me, just a note on the metrics setting: removing entries from the map whenisHardLimited keeps metrics clean and reduce cardinality. The trade-off is that Prometheus will show stale data briefly after removal, since metrics aren’t deleted immediately but expire after some time without being reported. This behavior is expected and I think acceptable in this context.
ssncferreira left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM! 🚀 The comments in the tests are especially helpful, they make it really easy to follow the reconciliation process and understand the logic behind each condition.
Just one small suggestion regarding the metric description.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
2a15aa8 intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Closes#17988
Define
preset_hard_limitedmetric which for every preset indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).CLI example:
NOTE:
Only active template version is tracked. If admin creates new template version - old value of metric (for previous template version) will be overwritten with new value of metric (for active template version).
Because
template_versionis not part of metric:Implementation is similar to implementation of
MetricResourceReplacementsCountmetric