- Notifications
You must be signed in to change notification settings - Fork907
feat: add hard-limited presets metric#18008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
32f43de
to39748bd
Compare39748bd
to6fb6800
Comparef6ea98b
to2667684
Compare8db12a2
to3100e01
CompareUh oh!
There was an error while loading.Please reload this page.
key := hardLimitedPresetKey{orgName: orgName, templateName: templateName, presetName: presetName} | ||
mc.isPresetHardLimited[key] = isHardLimited |
ssncferreiraMay 26, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: With this approach, we’re accumulating entries that are no longer relevant over time. This can cause the metrics to keep reporting0
for presets that should disappear (e.g., deleted or outdated presets).
A small improvement would be to remove presets from the map whenisHardLimited == false
:
if isHardLimited {mc.isPresetHardLimited[key] = true} else {delete(mc.isPresetHardLimited, key)}
This keeps the metric data cleaner and avoids unnecessary entries in Prometheus.
But there is a catch 🤔 With this change, there’s a brief window where Prometheus may still show the previous value (1
) for a preset after it’s been removed from the map. This is expected behavior in Prometheus, metrics are not deleted immediately, but expire after some time without being reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Good catch, fixed here:80c89fe.
I was under a false assumption that it will report stale value indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall looks good to me, just a note on the metrics setting: removing entries from the map whenisHardLimited
keeps metrics clean and reduce cardinality. The trade-off is that Prometheus will show stale data briefly after removal, since metrics aren’t deleted immediately but expire after some time without being reported. This behavior is expected and I think acceptable in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM! 🚀 The comments in the tests are especially helpful, they make it really easy to follow the reconciliation process and understand the logic behind each condition.
Just one small suggestion regarding the metric description.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
2a15aa8
intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Closes#17988
Define
preset_hard_limited
metric which for every preset indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).CLI example:
NOTE:
Only active template version is tracked. If admin creates new template version - old value of metric (for previous template version) will be overwritten with new value of metric (for active template version).
Because
template_version
is not part of metric:Implementation is similar to implementation of
MetricResourceReplacementsCount
metric