- Notifications
You must be signed in to change notification settings - Fork1.1k
feat(coderd/database/dbpurge): add retention for audit logs#21025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:mafredri/feat-coderd-db-retention-policy-2
Are you sure you want to change the base?
feat(coderd/database/dbpurge): add retention for audit logs#21025
Conversation
07ae594 to9ca58c3Compare782f1f7 to039afdbCompare9ca58c3 toa21395aCompare
dannykopping left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM, except for batch size; I won't need to re-review.
| // locks that could impact concurrent database operations. | ||
| connectionLogsBatchSize=1000 | ||
| // Batch size for audit log deletion. | ||
| auditLogsBatchSize=1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For a sufficiently large deployment, would this even be able to keep up? 1000 entries every 10m would probably not be enough. Deletes are pretty quick and have granular locks, so I'd say go far more aggressive than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's fair criticism. I could just bump this to 10k, or in a follow-up PR implement one of the following:
- If the backlog is >100% batch size, decrease wait time (say, 10m -> 5s)
- Check backlog and adjust batch size accordingly (catch-up within N time)
I'd also want to update the dbpurge to not run all deletes in one single transaction as well, but given the lock/replica sync it's a bit tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think we need anything sophisticated like that which breaks predictability.
10K per tick should be fine, even for large installations.
Once we havecoder/internal#1139, and we include the total number of records to be purged, operators could see if purge is not keeping up.
For now I think it's easy enough to just add a 0 and call it a day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nice, wasn't aware of that issue 👍🏻
Add configurable retention policy for audit logs. The DeleteOldAuditLogsquery excludes deprecated connection events (connect, disconnect, open,close) which are handled separately by DeleteOldAuditLogConnectionEvents.Falls back to global retention if audit logs retention is unset.Disabled (0) by default.Depends on#21021Updates#20743
Audit logs retention is now explicit - it's enabled when--audit-logs-retention is set to a non-zero duration, anddisabled when set to 0. No fallback to global retention.
Use :execrows instead of :one to simplify the query by removing theextra CTE wrapper. This lets PostgreSQL return the row count directlyvia RowsAffected() instead of requiring an explicit COUNT(*) scan.
a21395a to82f1c2bCompare
Uh oh!
There was an error while loading.Please reload this page.
Add configurable retention policy for audit logs. The DeleteOldAuditLogs
query excludes deprecated connection events (connect, disconnect, open,
close) which are handled separately by DeleteOldAuditLogConnectionEvents.
Disabled (0) by default.
Depends on#21021
Updates#20743
PR Stack
workspace_agent_logs