- Notifications
You must be signed in to change notification settings - Fork1.1k
chore: remove stats batcher and workspace usage tracker#13393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
mafredri commentedMay 29, 2024
I was under the impression that |
johnstcn commentedMay 29, 2024
Yep, up to 1024 stats updates. |
johnstcn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm happy to remove this if it's not enough of a net positive!
f0ssel commentedMay 29, 2024
The batcher is totally valid, but my concern is that we are only saving 1 db call when reporting the stat already takes 7-8 db calls to get to that point, and continues to use the DB afterwards too. In return we lose the ability to return an error on failed writes, and a good bit of code to manage this buffer. |
f0ssel commentedMay 30, 2024
Going to continue removing some batchers on this branch and going to scaletest the impact to see if these changes are safe to merge |
mafredri commentedMay 30, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I see, that's fair reasoning. I believe writing to this table (agent stats) used to be more expensive as we kept 6 months of data in it. Now we only keep about 1 day. Edit: Still, there's a big difference between read and write queries. I assume most of those 8 DB calls are read-only? |
f0ssel commentedMay 30, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@mafredri Thanks for bringing this up, I do share your concerns since it's a write operation, so I've went ahead and made an alternate PR (#13418) where we keep these structures in place under the |
mafredri commentedMay 30, 2024
👍🏻
If we find that they're not beneficial, I'm fine with removing them entirely. We can always dig in the git history if we realize we need them back. |
| errGroup.Go(func()error { | ||
| err:=r.opts.StatsBatcher.Add(now,workspaceAgent.ID,workspace.TemplateID,workspace.OwnerID,workspace.ID,stats) | ||
| start:=time.Now() | ||
| connectionsByProto:=json.RawMessage(`[]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder if[{}] would be a more accurate representation here? 🤔 Not sure it's necessary though.
johnstcn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
After performing a 1,000 user scaletest and comparing with results from yesterday. Given a GCPdb2-custom-8 instance with 100GB of disk:
- Before:
- With no active connections, seeing DB CPU usage of around 80% on a GCP
db2-custom-8instance. - With active SSH connections, seeing DB CPU usage of around 90%
- InsertWorkspaceAgentStats averaging 1.2 q/s
- With no active connections, seeing DB CPU usage of around 80% on a GCP
- After:
- With no active connections, seeing DB CPU usage of around 86%
- With active SSH connections, seeing DB CPU usage maxing out.
- InsertWorkspaceAgentStats averaging 33.3 q/s
The increased queries during active connections is likely due to the agents only reporting stats if there are active connections.
I can't be sure of the wider ramifications of this change, but it does definitely cause increased database load in a scaletest environment. I would recommend against merging this.
f0ssel commentedMay 31, 2024
Thanks for this data, glad to close and happy to know these are working as intended! |
It's become clear that only
workspacestats.ReportAgentStat()is the only caller ofStatsBatcher.Add(). Using the stats batcher saves us a DB call, but only 1 call is saved and many are still called outside of the batcher for reporting a stat. I don't think the StatsBatcher caching abstraction is serving us anymore, so I think it should be removed in favor of writing directly to the DB when called.