Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(coderd/database): track user status changes over time#16019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
SasSwart merged 32 commits intomainfromjjs/dau-history-backend
Jan 13, 2025

Conversation

SasSwart
Copy link
Contributor

@SasSwartSasSwart commentedJan 3, 2025
edited
Loading

RE:#15740,#15297

In order to add a graph to the coder frontend to show user status over time as an indicator of license usage, this PR adds the following:

  • a newapi.insightsUserStatusCountsOverTime endpoint to the API
  • which calls a newGetUserStatusCountsOverTime query from postgres
  • which relies on two new tablesuser_status_changes anduser_deleted
  • which are populated by a new trigger and function that tracks updates to the users table

The graph itself will be added in a subsequent PR

Copy link
Member

@EmyrkEmyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Small nit, can we use the wordChart instead ofGraph in various comments when referring to the frontend display?

For a moment I was wondering why we needed a graph data structure.


On deleting a user more than once. We should enforce this at the db_schema level to not allow it if it's going to break our logic. An index could be made on theuser_deleted(user_id).

Comment on lines 24 to 28
CREATETABLEuser_deleted (
id uuidPRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuidNOT NULLREFERENCES users(id),
deleted_attimestamptzNOT NULL DEFAULTCURRENT_TIMESTAMP
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Will we ever be able to "un-delete" a user? Would is be easy to implement this to handle that case if we ever support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why we don't implement this as a user status change? Perhaps as an additional field on the table or modify the enum to include deleted. Wrt to changing the enum, it doesn't make much sense to me to differentiate between e.g. deleted/active or deleted/dormant users (or whatever the possibilities are), so changing it makes sense IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That is true. I assume we can still make an index with a conditional expression onstatus == deleted. We might want to have some trigger or something to prevent "un-deletion" if we want to prevent that.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I considered these options, but avoided them simply because they expand the scope beyond a reasonable time cost for this feature. We could relatively easy migrate from this to a space wheredeleted is just another user status.

I'm trying to remain agnostic of the massive rabbit-hole shaped question of what the business rule should be around un/re-deletion.

The trigger and function right now would handle un/re-deletion correctly, but the query would break as identified by@Emyrk in another comment below.

I would prefer that we stick to the current implementation because it is good enough and I can't find anywhere that we support un/re-deletion right now. If y'all would like to explicitly request a specific change I'd be happy to consider it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@SasSwart can we open an issue to address un-deletion and mention this query if we decide to support it?

Just to track it

Comment on lines +798 to +799
WHERE changed_at> @start_time::timestamptz
AND changed_at< @end_time::timestamptz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If it is inclusive, should it be?

Suggested change
WHERE changed_at> @start_time::timestamptz
AND changed_at< @end_time::timestamptz
WHERE changed_at>= @start_time::timestamptz
AND changed_at<= @end_time::timestamptz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think that's covered by the initial and final union.

Personally I prefer start_time inclusive, end_time exclusive though, as most of the time that mirrors the intuitive understanding of data. Say we want to view Monday to Sunday (1 week), we ultimately want Monday 00:00 -> Sunday 23:59 but it's simplest to request this as Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function. Alternatively, if we include Monday 00:00 at the end, we include data from the beyond the range.

Copy link
ContributorAuthor

@SasSwartSasSwartJan 9, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think that's covered by the initial and final union.

Correct.

Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function

I think the inclusive approach is good enough. There are higher impact changes to make. If you'd like me to change it, please ask me directly to do so and I will. My own opinion is that for how this chart is going to be displayed inclusive works well enough.

mafredri reacted with thumbs up emoji
usc.new_status,
usc.changed_at
FROM user_status_changes usc
LEFT JOIN user_deleted udONusc.user_id=ud.user_id
Copy link
Member

@EmyrkEmyrkJan 3, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If a user is deleted twice in it's history (should not be possible, but would be nice to be durable against), then this join will duplicate the status changed rows for each deletion event.

We could make the table a subquery that returns 1 row, or we should enforce a user can only be deleted once.

Minimal example:https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15847

Copy link
Member

@mafredrimafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice work and good job adding all that test coverage. A few suggestions and questions, but nothing major.

Comment on lines 24 to 28
CREATETABLEuser_deleted (
id uuidPRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuidNOT NULLREFERENCES users(id),
deleted_attimestamptzNOT NULL DEFAULTCURRENT_TIMESTAMP
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why we don't implement this as a user status change? Perhaps as an additional field on the table or modify the enum to include deleted. Wrt to changing the enum, it doesn't make much sense to me to differentiate between e.g. deleted/active or deleted/dormant users (or whatever the possibilities are), so changing it makes sense IMO.

Comment on lines +798 to +799
WHERE changed_at> @start_time::timestamptz
AND changed_at< @end_time::timestamptz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think that's covered by the initial and final union.

Personally I prefer start_time inclusive, end_time exclusive though, as most of the time that mirrors the intuitive understanding of data. Say we want to view Monday to Sunday (1 week), we ultimately want Monday 00:00 -> Sunday 23:59 but it's simplest to request this as Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function. Alternatively, if we include Monday 00:00 at the end, we include data from the beyond the range.

rsc1.new_status
FROM dates_of_interest d
LEFT JOIN relevant_status_changes rsc1ONrsc1.changed_at<=d.date
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm a bit worried about the number of CTEs and performance, but I think this is fine for now and let's not prematurely optimize. Just raising this so you're aware that in some cases a CTE performs worse than a pure query with joins and subqueries. That's mainly because the result of a CTE lacks indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was wondering about performance too, but user count and user status changes is going to be way less data than say workspace builds. Like on the order of 1000s. 🤷

Might be worth a benchmark, but we don't have a good way to populate "large" datasets atm.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I did it this way for legibility. Trying to write the optimal solution here made my own query inscrutable even to myself. I don't consider this to be on a particularly hot path. We do have metrics for this query. If it needs to be optimised, we can do so.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just raising this so you're aware that in some cases a CTE performs worse than a pure query with joins and subqueries

Also these CTEs were designed to reduce the amount of data being handled as early as possible. We use the existing indices while we have them to identify exactly the data we need ASAP and then once we need to join it all the hope is that we've brought down the volume low enough that it's performant regardless of the lack of indices on CTEs.

@SasSwartSasSwartforce-pushed thejjs/dau-history-backend branch from699ee8a to8fca0c5CompareJanuary 9, 2025 10:48
@SasSwartSasSwart merged commit4543b21 intomainJan 13, 2025
36 checks passed
@SasSwartSasSwart deleted the jjs/dau-history-backend branchJanuary 13, 2025 11:08
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsJan 13, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@EmyrkEmyrkEmyrk left review comments

@mafredrimafredrimafredri approved these changes

Assignees

@SasSwartSasSwart

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@SasSwart@mafredri@Emyrk

[8]ページ先頭

©2009-2025 Movatter.jp