NotificationsYou must be signed in to change notification settings
Fork1k
Star11.1k

chore: fix concurrent`CommitQuota` transactions for unrelated users/orgs#15261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

Emyrk merged 19 commits intomainfromstevenmasley/postgres_tx_serial

Nov 1, 2024

Merged

chore: fix concurrent`CommitQuota` transactions for unrelated users/orgs#15261

Emyrk merged 19 commits intomainfromstevenmasley/postgres_tx_serial

Nov 1, 2024

Conversation

Copy link

Member

Emyrk commentedOct 28, 2024•
edited
Loading

What this PR does

The testsDoubleQuotaUnrelatedWorkspaces andDoubleQuotaUserWorkspacesDiffOrgs fail onmain. The change to the queryGetQuotaConsumedForUser makes 2CommitQuota calls to different usersor different organizationsNOT fail withpq: could not serialize access due to read/write dependencies among transactions

Before a complete table scan was done on theworkspace_build table forGetQuotaConsumedForUser. This means any writes to any row would cause the serialization failure. (Seepostgres docs). By preventing a full table scan, we remove the serialization error for unrelated workspaces (different user, different org, or both).

GetQuotaConsumedForUser Changes

The failure condition being fixed is below.w1 andw2 could belong to different users, organizations, and templates and still cause a failure. This is because the old query did aseq scan on theworkspace_builds table. Since that is the table being updated, we really want to prevent that.

So before this would fail for any 2 workspaces. Now it only fails ifw1 andw2 are owned by the same user and organization.

//  +---------------------+---------------------+//  | W1 Quota Tx         | W2 Quota Tx         |//  +---------------------+---------------------+//  | Begin Tx            |                     |//  +---------------------+---------------------+//  |                     | Begin Tx            |//  +---------------------+---------------------+//  | GetQuota(w1)        |                     |//  +---------------------+---------------------+//  | GetAllowance(w1)    |                     |//  +---------------------+---------------------+//  | UpdateBuildCost(w1) |                     |//  +---------------------+---------------------+//  |                     | UpdateBuildCost(w2) |//  +---------------------+---------------------+//  |                     | GetQuota(w2)        |//  +---------------------+---------------------+//  |                     | GetAllowance(w2)    |//  +---------------------+---------------------+//  | CommitTx()          |                     |//  +---------------------+---------------------+//  |                     | CommitTx()          |//  +---------------------+---------------------+

Before changes`SQL EXPLAIN`

Aggregate  (cost=45.95..45.96 rows=1 width=8)  ->  Merge Join  (cost=41.28..45.95 rows=1 width=4)        Merge Cond: (workspaces.id = wb.workspace_id)        ->  Sort  (cost=8.17..8.18 rows=1 width=16)              Sort Key: workspaces.id              ->  Index Scan using workspaces_owner_id_lower_idx on workspaces  (cost=0.14..8.16 rows=1 width=16)                    Index Cond: (owner_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)                    Filter: (organization_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)        ->  Unique  (cost=33.11..35.26 rows=200 width=44)              ->  Sort  (cost=33.11..34.18 rows=430 width=44)"                    Sort Key: wb.workspace_id, wb.created_at DESC"                    ->  Seq Scan on workspace_builds wb  (cost=0.00..14.30 rows=430 width=44)

After changes`SQL EXPLAIN`

Aggregate  (cost=30.88..30.89 rows=1 width=8)  ->  Nested Loop  (cost=22.80..30.88 rows=1 width=4)        Join Filter: (wb.workspace_id = workspaces.id)        ->  Index Scan using workspaces_owner_id_lower_idx on workspaces  (cost=0.14..8.16 rows=1 width=16)              Index Cond: (owner_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)              Filter: (organization_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)        ->  Unique  (cost=22.66..22.67 rows=2 width=44)              ->  Sort  (cost=22.66..22.66 rows=2 width=44)"                    Sort Key: wb.workspace_id, wb.created_at DESC"                    ->  Nested Loop  (cost=4.16..22.65 rows=2 width=44)                          ->  Seq Scan on workspaces workspaces_1  (cost=0.00..13.12 rows=1 width=16)                                Filter: (owner_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)                          ->  Bitmap Heap Scan on workspace_builds wb  (cost=4.16..9.50 rows=2 width=28)                                Recheck Cond: (workspace_id = workspaces_1.id)                                ->  Bitmap Index Scan on workspace_builds_workspace_id_build_number_key  (cost=0.00..4.16 rows=2 width=0)                                      Index Cond: (workspace_id = workspaces_1.id)

Future Work

Concurrent transaction tests that still fail aret.Skip'd. They remain if we decide to solve this further in the future.

Emyrk added7 commits

October 28, 2024 17:16

chore: making unit tests to try out various things

dce750d

wip

4fd9760

wip

6215603

wip

9ccd0b2

refactor tests

784a98f

add extra test

5f021ad

delete unused code

426eeb6

github-actionsbot assignedEmyrk

Oct 28, 2024

Emyrk added5 commits

October 28, 2024 19:13

revert some changes

b69dd0f

fix comment

ec8f799

prevent seq scan on workspace_builds for GetQuotaAllowanceForUser

f613f31

update tests

746203d

fixups

c0d05db

Emyrk changed the title~~chore: add unit tests that exercise serial transaction bug~~chore: fix concurrentCommitQuota transactions for unrelated users/orgs

Oct 29, 2024

Emyrk commented

Oct 29, 2024

View reviewed changes

coderd/database/db.go

Comment on lines 58 to 66

		funcNew(sdbsql.DB,opts...func(sqlQuerier))Store {
		dbx:=sqlx.NewDb(sdb,"postgres")
		return&sqlQuerier{
		db:dbx,
		sdb:dbx,
		// This is an arbitrary number.
		serialRetryCount:3,
		}
		}

Copy link

MemberAuthor

EmyrkOct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I made this configurable so unit tests should not fail if they hit serial errors. We are not doing much concurrency in our tests, and if we are hitting these failures, we should understand them better and solve them case by case.

We can change this if we get too many flakes.

Emyrk commented

Oct 29, 2024

View reviewed changes

coderd/database/queries/quotas.sql

Comment on lines 21 to 51

		DISTINCTON
		(workspace_id) id,
		workspace_id,
		daily_cost
		(wb.workspace_id)wb.id,
		wb.workspace_id,
		wb.daily_cost
		FROM
		workspace_builds wb
		-- This INNER JOIN prevents a seq scan of the workspace_builds table.
		-- Limit the rows to the absolute minimum required, which is all workspaces
		-- in a given organization for a given user.
		INNER JOIN
		workspacesonwb.workspace_id=workspaces.id
		WHERE
		workspaces.owner_id= @owner_idAND
		workspaces.organization_id= @organization_id
		ORDER BY
		workspace_id,
		created_atDESC
		wb.workspace_id,
		wb.created_atDESC
		)
		SELECT
		coalesce(SUM(daily_cost),0)::BIGINT
		FROM
		workspaces
		JOIN latest_buildsON
		latest_builds.workspace_id=workspaces.id
		WHERE NOT
		deletedAND
		WHERE
		NOT deletedAND
		-- We can likely remove these conditions since we check above.
		-- But it does not hurt to be defensive and make sure future query changes
		-- do not break anything.
		workspaces.owner_id= @owner_idAND
		workspaces.organization_id= @organization_id
		;

Copy link

MemberAuthor

EmyrkOct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is the fix. Everything else is just tests

Emyrk marked this pull request as ready for review

October 29, 2024 00:25

Emyrk added4 commits

October 29, 2024 12:34

linting

245f6df

fix opts

422e694

linting

685b21e

linting

33dbd92

Emyrk requested review fromspikecurtis anddannykopping

October 29, 2024 17:40

spikecurtis reviewed

Oct 31, 2024

View reviewed changes

coderd/database/queries/quotas.sql Outdated

		(workspace_id) id,
		workspace_id,
		daily_cost
		(wb.workspace_id)wb.id,

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think the build ID is needed.

Copy link

MemberAuthor

EmyrkOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I didn't try and change the query too much beyond the seq scan.

It looks like it was never used at all:

coder/coderd/database/queries/quotas.sql

Lines 11 to 30 ind75c469

	-- name: GetQuotaConsumedForUser :one
	WITH latest_buildsAS (
	SELECT
	DISTINCTON
	(workspace_id) id,
	workspace_id,
	daily_cost
	FROM
	workspace_builds wb
	ORDER BY
	workspace_id,
	created_atDESC
	)
	SELECT
	coalesce(SUM(daily_cost),0)::BIGINT
	FROM
	workspaces
	JOIN latest_buildsON
	latest_builds.workspace_id=workspaces.id
	WHERE NOT deletedANDworkspaces.owner_id= $1;

coderd/database/queries/quotas.sqlShow resolvedHide resolved

coderd/database/querier.go OutdatedShow resolvedHide resolved

enterprise/coderd/workspacequota_test.go OutdatedShow resolvedHide resolved

coderd/database/queries/quotas.sql

		workspace_id,
		created_atDESC
		wb.workspace_id,
		wb.created_atDESC

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It might be faster to usewb.build_number, since the query planner has access to theworkspace_builds_workspace_id_build_number_key index. That could be why it's having to do the

              ->  Sort  (cost=22.66..22.66 rows=2 width=44)"                    Sort Key: wb.workspace_id, wb.created_at DESC"

Copy link

MemberAuthor

EmyrkOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wondered about that too. I had a specific goal and did not want to start mutating the query too much to solicit feedback on different things.

Can I throw this up in another PR?

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yeah, that works

coderd/database/queries/quotas.sql

		SELECT
		coalesce(SUM(daily_cost),0)::BIGINT
		FROM
		workspaces

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like it's still doing a Seq scan onworkspaces

                          ->  Seq Scan on workspaces workspaces_1  (cost=0.00..13.12 rows=1 width=16)                                Filter: (owner_id = 'b4ed5c8a-725e-482d-b5a7-368a1dd7cd77'::uuid)

Doesn't that mean any update to the workspaces table will still cause a serialization error?

However! I think the query planner takes into account cardinality of the columns relative to the rows in the table, so it might make a different plan in a "real" deployment.

Copy link

MemberAuthor

EmyrkOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, we still do a seq scan on the workspaces.

To guarantee true serializability PostgreSQL uses predicate locking, which means that it keeps locks which allow it to determine when a write would have had an impact on the result of a previous read from a concurrent transaction, had it run first.

The serializable lock only affects tables (rows) that it written to. TheCommitQuota transaction only writes toworkspace_builds. I have a test case that writes to the workspaces table, and it does not cause a serialization error.

Copy link

Contributor

spikecurtisOct 31, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How do you square this with the observation that a Seq Scan onworkspace_builds caused a serialization error? I think the PG docs say Seq Scan locks the entire relation.

Maybe it depends on the number of workspaces and the assumed cardinality.

Copy link

MemberAuthor

EmyrkOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Experimentally, I am running this:

Begin TX
GetQuotaConsumedForUser(user, org)
UpdateWorkspaceBuildCostByID(workspace, 10)

The TX mode that is important here isSIReadLock.

Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in thepg_locks system view with a mode of SIReadLock.

What I see my change doing isGetQuotaConsumedForUser used to do:

992-<nil> [granted] workspace_builds/relation/SIReadLock: 992-<nil> [granted] workspace_builds_pkey/page/SIReadLock: page=1992-<nil> [granted] workspaces/relation/SIReadLock: 992-<nil> [granted] workspaces_owner_id_lower_idx/page/SIReadLock: page=1

And now does:

929-<nil> [granted] workspace_builds_pkey/page/SIReadLock: page=1929-<nil> [granted] workspace_builds_workspace_id_build_number_key/page/SIReadLock: page=1929-<nil> [granted] workspaces/relation/SIReadLock: 929-<nil> [granted] workspaces_owner_id_lower_idx/page/SIReadLock: page=1

So we went from a fullrelation lock on the table to one on the pkey (was there before too) and build number.

Theworkspaces table still has the same locks, and would be improved by your index suggestions. For this specific error mode, we insert into theworkspace_builds table though, so I was honestly not even looking at other tables at this point.

With my query changes raw

930-<nil> [granted] <nil>/virtualxid/ExclusiveLock: waiting to acquire virtual tx id lock929- 3848 [granted] <nil>/transactionid/ExclusiveLock: ???929-<nil> [granted] <nil>/virtualxid/ExclusiveLock: waiting to acquire virtual tx id lock930-<nil> [granted] pg_locks/relation/AccessShareLock: 929-<nil> [granted] workspace_builds/relation/AccessShareLock: 929-<nil> [granted] workspace_builds/relation/RowExclusiveLock: 929-<nil> [granted] workspace_builds_job_id_key/relation/AccessShareLock: 929-<nil> [granted] workspace_builds_job_id_key/relation/RowExclusiveLock: 929-<nil> [granted] workspace_builds_pkey/relation/RowExclusiveLock: 929-<nil> [granted] workspace_builds_pkey/relation/AccessShareLock: 929-<nil> [granted] workspace_builds_pkey/page/SIReadLock: page=1929-<nil> [granted] workspace_builds_workspace_id_build_number_key/relation/RowExclusiveLock: 929-<nil> [granted] workspace_builds_workspace_id_build_number_key/relation/AccessShareLock: 929-<nil> [granted] workspace_builds_workspace_id_build_number_key/page/SIReadLock: page=1929-<nil> [granted] workspaces/relation/AccessShareLock: 929-<nil> [granted] workspaces/relation/SIReadLock: 929-<nil> [granted] workspaces_owner_id_lower_idx/relation/AccessShareLock: 929-<nil> [granted] workspaces_owner_id_lower_idx/page/SIReadLock: page=1929-<nil> [granted] workspaces_pkey/relation/AccessShareLock:

Before my changes raw

993-<nil> [granted] <nil>/virtualxid/ExclusiveLock: waiting to acquire virtual tx id lock992- 4039 [granted] <nil>/transactionid/ExclusiveLock: ???992-<nil> [granted] <nil>/virtualxid/ExclusiveLock: waiting to acquire virtual tx id lock993-<nil> [granted] pg_locks/relation/AccessShareLock: 992-<nil> [granted] workspace_builds/relation/AccessShareLock: 992-<nil> [granted] workspace_builds/relation/SIReadLock: 992-<nil> [granted] workspace_builds/relation/RowExclusiveLock: 992-<nil> [granted] workspace_builds_job_id_key/relation/AccessShareLock: 992-<nil> [granted] workspace_builds_job_id_key/relation/RowExclusiveLock: 992-<nil> [granted] workspace_builds_pkey/relation/RowExclusiveLock: 992-<nil> [granted] workspace_builds_pkey/relation/AccessShareLock: 992-<nil> [granted] workspace_builds_pkey/page/SIReadLock: page=1992-<nil> [granted] workspace_builds_workspace_id_build_number_key/relation/RowExclusiveLock: 992-<nil> [granted] workspace_builds_workspace_id_build_number_key/relation/AccessShareLock: 992-<nil> [granted] workspaces/relation/AccessShareLock: 992-<nil> [granted] workspaces/relation/SIReadLock: 992-<nil> [granted] workspaces_owner_id_lower_idx/relation/AccessShareLock: 992-<nil> [granted] workspaces_owner_id_lower_idx/page/SIReadLock: page=1992-<nil> [granted] workspaces_pkey/relation/AccessShareLock:

Disclaimer: I recognize the docs suggest these lock patterns can change with table size and memory availability. This is a very small dataset.

Copy link

Contributor

spikecurtisNov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Interesting that serializable transactions only interfere with other serializable transactions --- OK for now, but I feel like this puts us on shaky ground.

Today we insert new builds with repeatable read isolation, but do not set thedaily_cost. If we ever started setting the cost but didn't up the isolation to serializable, we'd start seeing quota anomalies.

Or, if we ever introduce a new update query onworkspaces that runs at serializable isolation, it will start interfering with every quota transaction.

Not necessarily for this PR, but I think we should still aim to remove the Seq Scan onworkspaces.

Copy link

MemberAuthor

EmyrkNov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Or, if we ever introduce a new update query on workspaces that runs at serializable isolation, it will start interfering with every quota transaction.

Yes.

Not necessarily for this PR, but I think we should still aim to remove the Seq Scan on workspaces.

I agree. I will make a follow up PR (or issue if it takes to long) to address all these extra concerns.

In this PR I did disable retries, so a unit test is more likely to fail if we hit this (would be a valid flake).

I do not see a way to implement forward thinking protections if someone adds a newserializable TX. So you are correct, a change in the future can break things further, especially because the seq scan on the workspaces.

Copy link

MemberAuthor

EmyrkNov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Interesting that serializable transactions only interfere with other serializable transactions --- OK for now, but I feel like this puts us on shaky ground.

Just to clarify, I specifically meant the errorpq: could not serialize access due to read/write dependencies among transactions, which was reported.

There is another error you can get that is not between 2 serializable transactions.

pq: could not serialize access due to concurrent update

coderd/database/queries/quotas.sql

		-- Limit the rows to the absolute minimum required, which is all workspaces
		-- in a given organization for a given user.
		INNER JOIN
		workspacesonwb.workspace_id=workspaces.id

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ok, so in the query plan you have now, it's doing this nested loop where it finds the workspace IDs, then for each workspace ID it's doing this bitmap query:

                          ->  Bitmap Heap Scan on workspace_builds wb  (cost=4.16..9.50 rows=2 width=28)                                Recheck Cond: (workspace_id = workspaces_1.id)                                ->  Bitmap Index Scan on workspace_builds_workspace_id_build_number_key  (cost=0.00..4.16 rows=2 width=0)                                      Index Cond: (workspace_id = workspaces_1.id)

That is, it first scans the index to find the pages to load, then scans the pages with theRecheck Cond.

Do you know whether this results in page locks for the transaction, or tuple locks (I'm assuming these are row-level locks)? Page locks have a greater likelihood of catching unrelated transactions.

And, some suggestions:

Can we just move thedaily_cost directly to the workspace table? We only ever do computations with the most recent cost. If we really needed to keep it on theworkspace_build for compatibility or querying history, we could put it in both places, and have this query only look atworkspaces. That would remove a join as well. If we built an index of(org_id, owner_id) then we'd also be very unlikely to ever need to Seq scan theworkspaces table for quotas.
If we don't want to go that far, we could add an indexON workspace_builds (workspace_id, build_number, daily_cost) that would allow the quota query to compute the results right from the index, so it'd never have to a bitmap scan and read whole pages.

Copy link

MemberAuthor

EmyrkOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The particular locks acquired during execution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple locks) may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to prevent exhaustion of the memory used to track the locks.

I was under the impression it was row-level locks, but I admit that was a result of my experiments with very few rows. The docs seem to imply it could take out larger locks.

Can we just move the daily_cost directly to the workspace table? We only ever do computations with the most recent cost. If we really needed to keep it on the workspace_build for compatibility or querying history, we could put it in both places, and have this query only look at workspaces. That would remove a join as well. If we built an index of (org_id, owner_id) then we'd also be very unlikely to ever need to Seq scan the workspaces table for quotas.

Originally there was a suspicion we were getting this error from different transactions interring withCommitQuota. But the error reported can only occur between 2 serializable transactions, meaning it isCommitQuota interfering with itself.

I thought about movingdaily_cost to it's own table entirely, but I'm not sure that would actually improve much.

I think this index + sorting by build number would be a large win:workspace_builds (workspace_id, build_number, daily_cost)

As I look at this query more, I don't see why we need to inner joinworkspaces again. All the information can be pulled from thelatest_build subquery.

If we add the index(org_id, owner_id) on workspaces as well that is another win here.

According to the quote I pulled though, is the goal just to reduce the number of rows touched to prevent the lock from having to go more "coarse" to the page lock? The text implies the behavior depends on the memory availability.

Copy link

Contributor

spikecurtisNov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree that we don't need to join toworkspaces twice.

And yeah, my understanding is that with page locks, updating quota for an unrelated user could cause a serialization error if the builds happen to share a page. So it's undesirable to use page locks if we could get away with finer grained locking.

I realize that postgres can use page locks for memory reasons, and maybe there isn't anything we can do about that, but I'm also wondering whether it automatically uses page locks when it does a bitmap query, rather than tuple locks if we were able to use the index.

enterprise/coderd/workspacequota_test.go

		Group(database.Group{
		QuotaAllowance:10,
		},user).
		Do()

Copy link

Contributor

spikecurtisOct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm pretty sure the query planner uses the cardinality of different columns relative to the table size to determine how to run its query. That means to get reasonable assurance about whether different transactions interfere, we need a realistic number of organizations, users, workspaces, and workspace builds.

PG Docs have this to say

Predicate locks in PostgreSQL, like in most other database systems, are based on data actually accessed by a transaction. These will show up in the pg_locks system view with a mode of SIReadLock. The particular locks acquired during execution of a query will depend on the plan used by the query, and multiple finer-grained locks (e.g., tuple locks) may be combined into fewer coarser-grained locks (e.g., page locks) during the course of the transaction to prevent exhaustion of the memory used to track the locks.

If any page locks are held by our transactions, either as a result of bitmap scans (if that's a thing), or as a result of coalescing finer grained locks, then whether or not transactions serialize without errors could depend on factors not controlled by this test (PostgreSQL config, version, insert timing, etc). That is to say,flaky.

Really, what we want is a realistic amount of data, and then to run repeated transactions, with overlapping pairs chosen randomly, and then to gather statistics on how many pairs interfered with one another. I don't think it makes sense to run these on every CI run. We could even extend to an arbitrary number of simultaneous transactions on different workspaces --- since as deployments grow in size the number of simultaneous builds increases.

Copy link

MemberAuthor

EmyrkOct 31, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are right. So this does not put anything to bed,however this is a step in the right direction. I would prefer to iterate on this with more PRs.

I added metrics to track and log serialization failures:#15215

So at the very least, in the next release we can begin collecting better frequency metrics on these events.

I can spin up some larger datasets. I think we should take the easy index optimizations you pointed out above.

Emyrk added2 commits

October 31, 2024 09:14

text cleanup

a99882a

tests to assert errors, rather then skip the test

6ce8bfe

Emyrk mentioned this pull request

Oct 31, 2024

chore: add PGLocks query to analyze what locks are held in pg#15308

Merged

spikecurtis approved these changes

Nov 1, 2024

View reviewed changes

Copy link

Contributor

spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Still some more optimizations we could do, but let's get this in, since I think it will help and we want to ship something to the customer who is having issues right now.

chore: add PGLocks query to analyze what locks are held in pg (#15308)

3a9270d

adds `PGLocks` query for debugging what pg_locks are held during transactions.

Emyrk merged commit854044e intomain

Nov 1, 2024

26 checks passed

Emyrk deleted the stevenmasley/postgres_tx_serial branch

November 1, 2024 15:05

github-actionsbot locked and limited conversation to collaborators

Nov 1, 2024

Labels

None yet

2 participants

Movatterモバイル変換

chore: fix concurrentCommitQuota transactions for unrelated users/orgs#15261

chore: fix concurrentCommitQuota transactions for unrelated users/orgs#15261

Uh oh!

Conversation

Emyrk commentedOct 28, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

What this PR does

GetQuotaConsumedForUser Changes

Future Work

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spikecurtisOct 31, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmyrkOct 31, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spikecurtis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chore: fix concurrent`CommitQuota` transactions for unrelated users/orgs#15261

chore: fix concurrent`CommitQuota` transactions for unrelated users/orgs#15261

Emyrk commentedOct 28, 2024•
edited
Loading

spikecurtisOct 31, 2024•
edited
Loading

EmyrkOct 31, 2024•
edited
Loading