Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

chore: remove stats batcher and workspace usage tracker#13393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
f0ssel wants to merge6 commits intomainfromf0ssel/remove-stats-batcher-2

Conversation

f0ssel
Copy link
Contributor

It's become clear that onlyworkspacestats.ReportAgentStat() is the only caller ofStatsBatcher.Add(). Using the stats batcher saves us a DB call, but only 1 call is saved and many are still called outside of the batcher for reporting a stat. I don't think the StatsBatcher caching abstraction is serving us anymore, so I think it should be removed in favor of writing directly to the DB when called.

@f0sself0ssel requested a review fromjohnstcnMay 29, 2024 18:12
@mafredri
Copy link
Member

I was under the impression thatbatchstats buffers up to a certain amount of updates, so let's say 100 workspaces all report their agent stats at the same time, all 100 stats would be written in one go. Is this not the case? (Please note, I'm not making a case for or against the batcher here, just wondering if the premise is correct?)

@johnstcn
Copy link
Member

I was under the impression thatbatchstats buffers up to a certain amount of updates, so let's say 100 workspaces all report their agent stats at the same time, all 100 stats would be written in one go. Is this not the case? (Please note, I'm not making a case for or against the batcher here, just wondering if the premise is correct?)

Yep, up to 1024 stats updates.

johnstcn
johnstcn previously approved these changesMay 29, 2024
Copy link
Member

@johnstcnjohnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm happy to remove this if it's not enough of a net positive!

@f0ssel
Copy link
ContributorAuthor

The batcher is totally valid, but my concern is that we are only saving 1 db call when reporting the stat already takes 7-8 db calls to get to that point, and continues to use the DB afterwards too. In return we lose the ability to return an error on failed writes, and a good bit of code to manage this buffer.

@f0ssel
Copy link
ContributorAuthor

Going to continue removing some batchers on this branch and going to scaletest the impact to see if these changes are safe to merge

@f0sself0ssel changed the titlechore: remove stats batcherchore: remove stats batcher and workspace usage trackerMay 30, 2024
@mafredri
Copy link
Member

mafredri commentedMay 30, 2024
edited
Loading

The batcher is totally valid, but my concern is that we are only saving 1 db call when reporting the stat already takes 7-8 db calls to get to that point, and continues to use the DB afterwards too. In return we lose the ability to return an error on failed writes, and a good bit of code to manage this buffer.

I see, that's fair reasoning. I believe writing to this table (agent stats) used to be more expensive as we kept 6 months of data in it. Now we only keep about 1 day.

Edit: Still, there's a big difference between read and write queries. I assume most of those 8 DB calls are read-only?

@f0ssel
Copy link
ContributorAuthor

f0ssel commentedMay 30, 2024
edited
Loading

@mafredri Thanks for bringing this up, I do share your concerns since it's a write operation, so I've went ahead and made an alternate PR (#13418) where we keep these structures in place under theworkspacestats.Reporter. I'd also like your perspective on scale testing: If we do a scale test and find that these batchers are not showing any impact (that we measure), would you still want to keep them for posterity or would you be in favor of removing them?

@f0sself0ssel requested a review frommafredriMay 30, 2024 19:45
@mafredri
Copy link
Member

@mafredri Thanks for bringing this up, I do share your concerns since it's a write operation, so I've went ahead and made an alternate PR (#13418) where we keep these structures in place under theworkspacestats.Reporter.

👍🏻

I'd also like your perspective on scale testing: If we do a scale test and find that these batchers are not showing any impact (that we measure), would you still want to keep them for posterity or would you be in favor of removing them?

If we find that they're not beneficial, I'm fine with removing them entirely. We can always dig in the git history if we realize we need them back.

@@ -146,11 +147,45 @@ func (r *Reporter) ReportAgentStats(ctx context.Context, now time.Time, workspac

var errGroup errgroup.Group
errGroup.Go(func() error {
err := r.opts.StatsBatcher.Add(now, workspaceAgent.ID, workspace.TemplateID, workspace.OwnerID, workspace.ID, stats)
start := time.Now()
connectionsByProto := json.RawMessage(`[]`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if[{}] would be a more accurate representation here? 🤔 Not sure it's necessary though.

Copy link
Member

@johnstcnjohnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After performing a 1,000 user scaletest and comparing with results from yesterday. Given a GCPdb2-custom-8 instance with 100GB of disk:

  • Before:
    • With no active connections, seeing DB CPU usage of around 80% on a GCPdb2-custom-8 instance.
    • With active SSH connections, seeing DB CPU usage of around 90%
    • InsertWorkspaceAgentStats averaging 1.2 q/s
  • After:
    • With no active connections, seeing DB CPU usage of around 86%
    • With active SSH connections, seeing DB CPU usage maxing out.
    • InsertWorkspaceAgentStats averaging 33.3 q/s

The increased queries during active connections is likely due to the agents only reporting stats if there are active connections.

I can't be sure of the wider ramifications of this change, but it does definitely cause increased database load in a scaletest environment. I would recommend against merging this.

@johnstcnjohnstcn dismissed theirstale reviewMay 31, 2024 13:21

scaletest results

@f0ssel
Copy link
ContributorAuthor

Thanks for this data, glad to close and happy to know these are working as intended!

@f0sself0ssel closed thisMay 31, 2024
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMay 31, 2024
@github-actionsgithub-actionsbot deleted the f0ssel/remove-stats-batcher-2 branchDecember 1, 2024 00:07
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@johnstcnjohnstcnjohnstcn left review comments

@mafredrimafredrimafredri approved these changes

Assignees

@f0sself0ssel

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@f0ssel@mafredri@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp