prometheus-community/postgres_exporterPublic

NotificationsYou must be signed in to change notification settings
Fork805
Star3.3k

Update the PLPGSQL queries to resolve duplicate queries#1013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

vigneshkumar2016 wants to merge1 commit intoprometheus-community:master

base:master

Choose a base branch

fromvigneshkumar2016:patch-1

Open

Update the PLPGSQL queries to resolve duplicate queries#1013

vigneshkumar2016 wants to merge1 commit intoprometheus-community:masterfromvigneshkumar2016:patch-1

Conversation

Copy link

vigneshkumar2016 commentedMar 11, 2024

Update the PLPGSQL queries to resolve duplicate queries

sysadmind reviewed

Mar 14, 2024

View reviewed changes

Copy link

Contributor

sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it looks pretty good. I want to understand the CROSS JOIN, but everything else looks good.

Thank you for your patience.

collector/pg_stat_statements.go

		pss.blk_write_time / 1000.0 as block_write_seconds_total
		FROM pg_stat_statements pss
		JOIN pg_database ON pg_database.oid = pss.dbid
		CROSS JOIN percentiles

Copy link

Contributor

sysadmindMar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you explain the CROSS JOIN here? I'm having trouble understanding the why. My understanding is that CROSS JOIN will produce a cartesian product where every possible combination of table A and B will exist.

Copy link

Author

vigneshkumar2016 commentedMar 15, 2024

I think it looks pretty good. I want to understand the CROSS JOIN, but everything else looks good.
Thank you for your patience.

Can you please help in merging this branch to master please

vigneshkumar2016 requested a review fromsysadmind

March 15, 2024 03:58

Copy link

Contributor

sysadmind commentedMar 15, 2024

I need the explanation of the CROSS JOIN before I am willing to merge this code. It's not obvious why it is there so I feel that it deserves some context.

Copy link

Contributor

sysadmind commentedMar 19, 2024

If you're asking about the fact that the base branch is out of date, you can reference the github guide here:https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch#updating-your-pull-request-branch

You want to be sure to use the rebase option as that will make my review easier.

Copy link

Author

vigneshkumar2016 commentedMar 19, 2024

I need the explanation of the CROSS JOIN before I am willing to merge this code. It's not obvious why it is there so I feel that it deserves some context.

Added cross join to achieve the Cartesian product, as the query id gets duplicated in case multiple case scenarios

query id gets duplicated when the query and query planner shares similar execution pattern.
subqueries and nested queries running shares same query id
multiple users running same query shares same query id.

We need to possibly generate the Cartesian product as it generates all possible combinations and doesn't excludes out the any vital information that's required. Hence this query is designed in such a way, because the fact the pg_stat_statements table doens't have a unique key constraint to exclude queryid. Hope this helps.

Update the PLPGSQL queries to resolve duplicate queries

1081775

Update the PLPGSQL queries to resolve duplicate queriesSigned-off-by: VigneshViggu <vigneshkumar.venugopal@outlook.com>

vigneshkumar2016 force-pushed thepatch-1 branch fromcb73f19 to1081775Compare

March 20, 2024 03:11

Copy link

Author

vigneshkumar2016 commentedMar 20, 2024

If you're asking about the fact that the base branch is out of date, you can reference the github guide here:https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch#updating-your-pull-request-branch
You want to be sure to use the rebase option as that will make my review easier.

this PR has been rebased with recent changes in the master.

Copy link

Author

vigneshkumar2016 commentedMar 21, 2024

If you're asking about the fact that the base branch is out of date, you can reference the github guide here:https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch#updating-your-pull-request-branch
You want to be sure to use the rebase option as that will make my review easier.

Any defined time line on merging this PR to main.. any sooner that we can expect ?

Copy link

Contributor

sysadmind commentedMar 21, 2024

I need to find time to sit down with the query and understand it. Even after your explanation I'm having a hard time understanding the intention of the CROSS JOIN. SQL hides a lot of intention and it's harder to comment than code. I want to be sure that it's producing the desired outcome before merging and that means finding some time to run tests against an instance and getting a better understanding of the query. I'm hoping that I can find some time over the next week.

Another option would be to do more of the work in go instead of SQL which would make it easier to comment on the intention of the why.

Copy link

Author

vigneshkumar2016 commentedMar 22, 2024

I need to find time to sit down with the query and understand it. Even after your explanation I'm having a hard time understanding the intention of the CROSS JOIN. SQL hides a lot of intention and it's harder to comment than code. I want to be sure that it's producing the desired outcome before merging and that means finding some time to run tests against an instance and getting a better understanding of the query. I'm hoping that I can find some time over the next week.
Another option would be to do more of the work in go instead of SQL which would make it easier to comment on the intention of the why.

Do let me know if you need any extra hands or help.. I'm open to help explaining the query and it's logic

Copy link

longtomjr commentedMar 25, 2024

Hi. Thanks for the PR@vigneshkumar2016 !

I have been testing this change on a database (pg 16) that is tracking a lot of statements. TheLIMIT 100 is filtering out a lot of statements, since there are a lot of queries in the selected percentile. With this change I get different values compared to the master branch queries, which orders the records by the total seconds before taking the 100 records.

I am able to get it to match using by wrapping the query from the PR (for pg 16) in anotherSELECT statement which does theORDER BY andLIMIT. I am sure that there is a more elegant solution, this was just a 5 minute fix to check if it gives the expected output on my side.

SELECT*FROM (SELECT DISTINCTON (pss.queryid, pg_get_userbyid(pss.userid),pg_database.datname)                pg_get_userbyid(pss.userid)AS user,pg_database.datnameAS database_name,pss.queryid,pss.callsAS calls_total,pss.total_exec_time/1000.0AS seconds_total,pss.rowsAS rows_total,pss.blk_read_time/1000.0AS block_read_seconds_total,pss.blk_write_time/1000.0AS block_write_seconds_totalFROM pg_stat_statements pssJOIN pg_databaseONpg_database.oid=pss.dbidJOIN (SELECT percentile_cont(0.1) WITHIN GROUP (ORDER BY total_exec_time)AS percentile_valFROM pg_stat_statements            )AS percONpss.total_exec_time>perc.percentile_valORDER BYpss.queryid, pg_get_userbyid(pss.userid)DESC,pg_database.datname    )ORDER BY seconds_totalDESCLIMIT100;

I did not test the query for older PG versions, but it looks like it will have the same issue.

Looking forward to having a fix for this merged! Thanks again@vigneshkumar2016.

Copy link

isaiasanchez commentedAug 8, 2024•
edited
Loading

I'm arriving late to this discussion, i don't know if it's related with having two entries in pg_stat_statements with same queryid, this is related to the pg_stat_statements.track = 'all' witch includes a new boolean field called: toplevel, to avoid duplication we can set: pg_stat_statements.track = 'top'.

My suggestion would be to add this field to the labels, what I cannot foresee is if it would be more than one query with toplevel false and the same queryid.

I detected with pg16 and pg_stat_statements 1.10, easy to replicate running:
EXPLAIN ANALYZE SELECT t.a FROM generate_series(1,10) t(a);

Labels

None yet

Movatterモバイル変換

Update the PLPGSQL queries to resolve duplicate queries#1013

Are you sure you want to change the base?

Update the PLPGSQL queries to resolve duplicate queries#1013

Uh oh!

Conversation

vigneshkumar2016 commentedMar 11, 2024

Uh oh!

sysadmind left a comment

Choose a reason for hiding this comment

Uh oh!

sysadmindMar 14, 2024

Choose a reason for hiding this comment

Uh oh!

vigneshkumar2016 commentedMar 15, 2024

Uh oh!

sysadmind commentedMar 15, 2024

Uh oh!

sysadmind commentedMar 19, 2024

Uh oh!

vigneshkumar2016 commentedMar 19, 2024

Uh oh!

vigneshkumar2016 commentedMar 20, 2024

Uh oh!

vigneshkumar2016 commentedMar 21, 2024

Uh oh!

sysadmind commentedMar 21, 2024

Uh oh!

vigneshkumar2016 commentedMar 22, 2024

Uh oh!

longtomjr commentedMar 25, 2024

Uh oh!

isaiasanchez commentedAug 8, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

isaiasanchez commentedAug 8, 2024•
edited
Loading