- Notifications
You must be signed in to change notification settings - Fork924
RFC: Insights page for Coder admins#8109
-
I have had discussions with a few users to gather valuable insights on the data they find interesting to have. Primarily, they are focused on Coder engagement and detecting failures/errors. To enhance visibility in these areas, we can have a dashboard with the following components:
In terms of error/failure detection, we can implement the following features:
These additions aim to provide valuable insights and facilitate the identification of engagement patterns and potential issues for our customers. A preview of how it should look like: The mentioned features are just the initial features we want to have, but we also expect to have in a second version the following features:
These additions will be included in the second version to further enhance our insights and improve the overall user experience. Back-endI will wait until we have approval from@bpmct and@mtojek regarding the feature proposal to describe the requirements from the back-end (BE) to develop this screen. |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 10 comments 22 replies
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
I like this first iteration. It is pretty clear to me what the first steps are and next steps. What do you think about also switching the order of items in the sidebar? (failed actions are next to failed builds, active users are next to DAU chart) |
BetaWas this translation helpful?Give feedback.
All reactions
-
Makes total sense. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Should this be per-template instead of global? Or maybe we allow for a filter... |
BetaWas this translation helpful?Give feedback.
All reactions
-
Idk, I still think the insights belong to the deployment and not to the template. Eg. If I want to see the user activity I would like to see it per deployment and not per template, if I want to see the number of workspaces in a failed state I would do it for deployment so I can see of this is specific to a template, etc. I see value on scope by template, I just think deployment metrics give the user a better view on what is happening. |
BetaWas this translation helpful?Give feedback.
All reactions
-
@bpmct what popular parameters are? |
BetaWas this translation helpful?Give feedback.
All reactions
👀 1
-
Like what rich parameters are most used with a template. Specifically:
All are rich parameters at the moment but there's no way to see an aggregate / summary of what people use |
BetaWas this translation helpful?Give feedback.
All reactions
-
Ahhhhh I see. But this would be by template right? |
BetaWas this translation helpful?Give feedback.
All reactions
-
Yes. Per template :) |
BetaWas this translation helpful?Give feedback.
All reactions
-
Here's a query that returns the connection latency in milliseconds for all users grouped by template: SELECTuser_id,template_id,coalesce((PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY connection_median_latency_ms)),-1)::FLOATAS workspace_connection_latency_50,coalesce((PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY connection_median_latency_ms)),-1)::FLOATAS workspace_connection_latency_95FROMworkspace_agent_statsWHERE connection_median_latency_ms>0GROUP BY user_id, template_id; This would let us easily understand which users are having a bad experience with specific templates. It's important to group by template because some templates might be region limited. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Since this is planned to be an improvement to be made after the first release we can put some thought on this later. cc.:@bpmct Maybe this should live inside of a "insights" page inside of the template page and not in the "deployment" insights. |
BetaWas this translation helpful?Give feedback.
All reactions
-
I like the initial concept for the Insights page. We decide to add more visualizations later, but the graphs you placed on the mockup are good candidates for the MVP of the feature. I'm wondering if we should make them draggable, so DevOps can rearrange or show/hide a few of them.
👍 What you presented here is good enough to start drafting these requirements. I'm curious if we should pack all relevant endpoints behind the API family |
BetaWas this translation helpful?Give feedback.
All reactions
-
Make them draggable adds complexity because you would have to be able to resize the areas as well. I like to put the insights endpoint under the insights route. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
So what I'm understanding is we want to have two insights page, one for the deployment and another one for a given template. Is that correct@bpmct@kylecarbs ? If yes, do you think we could start by developing the one related to the deployment since we already have the mock? |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
I prefer bars instead of lines for 1st and 3rd plots. |
BetaWas this translation helpful?Give feedback.
All reactions
❤️ 1
-
As part of this RFC, I'd like to see a draft for public backend APIs as early as possible. (posting here not to forget) |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
Agreed. In v1 we wrote ad hoc sql queries for the few metrics we supported. In v2, we are seemingly doing the same. Metrics are gathered and queried on a case by case basis. It would be ideal to leverage similar apis to prometheus or another standard time series db. Maybe we can even leverage some library that sits ontop of postgres. Whatever we come up with should be easy to expand and adapt to new requirements (adding labels for filters, changing the query to say daily vs weekly etc). The sql queries in v1 were very hard to maintain as they were long and difficult to follow and test. |
BetaWas this translation helpful?Give feedback.
All reactions
-
This is a proposal for the backend API ofRFC: Insights page for Coder admins. This proposal introduces a single endpoint for reporting template insights (or deployment wide, given no template filter). The motivation behind this is to simplify the API and reduce the number of requests needed to get all the data for the insights page. Outside the scope of the proposal, this format can also help ensure data consistency between weekly/daily intervals (for instance when viewing this week and new data came in between the two requests). This also lets us handle concurrency on the server-side instead of the client performing multiple concurrent requests. We would introduce the following endpoint, request and response:
{"report": {"start_time":"2023-07-01T00:00:00.000000Z","end_time":"2023-07-08T00:00:00.000000Z","templates": ["uuid1","uuid2"],"active_users":22,"user_latency": [ {"user_id":"fcb9f5c7-ad6d-4515-b12e-496bc04ca116",// Optional, useful for linking."name":"John Doe","connection_latency_ms": {"P50":5.601,"P95":16.352049999999984 } }, {"user_id":"aee4bef9-479f-488e-abb4-b2bce2bf9e0d","name":"Jane Doe","connection_latency_ms": {"P50":31.312,"P95":119.832 } } ],"usage_builtin": {"vscode": {// TODO: Name + icon here too, to simplify the UI?"seconds":54000 },"jetbrains": {"seconds":900 },"web-terminal": {"seconds":5400 },"ssh": {"seconds":10800 } },"usage_apps": [ {// As long as name/slug/icon match, we can merge these between multiple templates."display_name":"code-server","slug":"code-server","icon":"/icon/code.svg","seconds":10800, } ]"usage_parameters": [ {// As long as name/slug match, we can merge these between multiple templates."display_name":"Coder Repository Directory","name":"coder_repository_directory","values": [ {"value":"~/coder","icon":"","count":10 }, {"value":"~/coder.com","icon":"","count":2 } ] }, {"display_name":"Dotfiles URL","name":"dotfiles_url","values": [ {"value":"~/usr/.file","icon":"","count":10 }, {"value":null,"icon":"","count":2 } ] }, {"display_name":"Region","name":"region","values": [ {"value":"Pittsburgh","icon":"/icon/flag1.svg","count":8 }, {"value":"Helsinki","icon":"/icon/flag1.svg","count":2 }, {"value":"Sydney","icon":"/icon/flag3.svg","count":1 }, {"value":"Sao Paulo","icon":"/icon/flag4.svg","count":1 } ] } ] },"interval_reports": [ {"start_time":"2023-07-01T00:00:00.000000Z","end_time":"2023-07-02T00:00:00.000000Z","templates": ["uuid1","uuid2"],"interval":"day","active_users":19 }, {"start_time":"2023-07-02T00:00:00.000000Z","end_time":"2023-07-03T00:00:00.000000Z",... }, {...}, {...}, {...}, {...}, {...} ]} Note: One logical split that could be done here is to separate For now, our interval reporting requirements are slim, and we only need this data for We can introduce this endpoint in stages where we start with a single or a few KPIs, and expand upon it as we go. The first stage would be to introduce the endpoint with the following KPIs (they are all based on the same existing data source):
This data is available, but we need to write queries to pull it out:
We currently don't track the following, which will require storing the data and querying it:
|
BetaWas this translation helpful?Give feedback.
All reactions
-
The interface looks good!
|
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
👍🏻
Ok, this is good to know. One question comes to mind: If this was a view of deployment, would we want to be able to show app/parameter usage there as well? If yes, I think it makes sense keeping as is, but if no, then we can simplify this data to be for one template only.
Sounds good 👍🏻
I think this is sensible. Maybe we should change this from the get-go? Like this: "usage": [ {"type":"builtin","display_name":"Visual Studio Code",// Could be omitted/let frontend decide."slug":"vscode","icon":"/icon/vscode.svg",// Could be omitted/let frontend decide."seconds":54000, }, {"type":"app","display_name":"code-server","slug":"code-server","icon":"/icon/code.svg","seconds":10800, } ], This format is conducive to introducing new data in the UI without needing frontend changes, the backend can simply add to the array and it would show up.
I was thinking we could track activity on the proxied app URL (e.g. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Thanks for flashing out the API sketch! I reviewed it, and here is a set of questions I have:
|
BetaWas this translation helpful?Give feedback.
All reactions
-
Thanks for the feedback@mtojek!
|
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Here is an updated proposal, based on the feedback: The
{"report": {"start_time":"2023-07-01T00:00:00.000000Z","end_time":"2023-07-08T00:00:00.000000Z","template_ids": ["uuid1","uuid2"],"active_users":22,"usage_apps": [ {"template_ids": ["uuid1","uuid2"],"type":"builtin","display_name":"Visual Studio Code","slug":"vscode","icon":"/icon/vscode.svg","seconds":54000, }, {"template_ids": ["uuid1","uuid2"],"type":"builtin","display_name":"JetBrains","slug":"jetbrains","icon":"/icon/jetbrains.svg","seconds":900, }, {"template_ids": ["uuid1","uuid2"],"type":"builtin","display_name":"Web Terminal","slug":"web-terminal","icon":"/icon/terminal.svg","seconds":5400, }, {"template_ids": ["uuid1","uuid2"],"type":"builtin","display_name":"SSH","slug":"ssh","icon":"/icon/ssh.svg","seconds":10800, }, {"template_ids": ["uuid1","uuid2"],"type":"app","display_name":"code-server","slug":"code-server","icon":"/icon/code.svg","seconds":10800, } ],"usage_parameters": [ {"template_ids": ["uuid1","uuid2"],"display_name":"Coder Repository Directory","name":"coder_repository_directory","values": [ {"value":"~/coder","icon":"","count":10 }, {"value":"~/coder.com","icon":"","count":2 } ] }, {"template_ids": ["uuid2"],"display_name":"Dotfiles URL","name":"dotfiles_url","values": [ {"value":"~/usr/.file","icon":"","count":10 }, {"value":null,"icon":"","count":2 } ] }, {"template_ids": ["uuid1"],"display_name":"Region","name":"region","values": [ {"value":"Pittsburgh","icon":"/icon/flag1.svg","count":8 }, {"value":"Helsinki","icon":"/icon/flag1.svg","count":2 }, {"value":"Sydney","icon":"/icon/flag3.svg","count":1 }, {"value":"Sao Paulo","icon":"/icon/flag4.svg","count":1 } ] } ] },"interval_reports": [ {"start_time":"2023-07-01T00:00:00.000000Z","end_time":"2023-07-02T00:00:00.000000Z","template_ids": ["uuid1","uuid2"],"interval":"day","active_users":19 }, {"start_time":"2023-07-02T00:00:00.000000Z","end_time":"2023-07-03T00:00:00.000000Z",... }, {...}, {...}, {...}, {...}, {...} ]} User latency is it's own endpoint that supports filtering on template, this allows us to easily support pagination as needed.
{"report": {"start_time":"2023-07-01T00:00:00.000000Z","end_time":"2023-07-08T00:00:00.000000Z","template_ids": ["uuid1","uuid2"],"latency": [ {"template_ids": ["uuid1"],"user_id":"fcb9f5c7-ad6d-4515-b12e-496bc04ca116","name":"John Doe","connection_latency_ms": {"P50":5.601,"P95":16.352049999999984 } }, {"template_ids": ["uuid2"],"user_id":"aee4bef9-479f-488e-abb4-b2bce2bf9e0d","name":"Jane Doe","connection_latency_ms": {"P50":31.312,"P95":119.832 } } ], }} |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
We send many metrics on Prometheus, so why are we adding this natively to Coder? Can't a user create their dashboard on Grafana using our Prometheus? |
BetaWas this translation helpful?Give feedback.
All reactions
-
|
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
I know about this. But I was thinking about the need and motivation to do this nativity. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Something that's not possible via Prometheus, for example, is giving the number of unique active users for a certain time-frame (something that's to be shown in the proposed insights page). Prometheus can show how many unique there are at any certain time, but if we want the count for a day we can't simply add these values. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
Sounds reasonable, so we can paginate these results. |
BetaWas this translation helpful?Give feedback.