Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multithreaded replication WIP#1454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
meiji163 wants to merge67 commits intomaster
base:master
Choose a base branch
Loading
frommeiji163/parallel-repl
Draft

Conversation

@meiji163
Copy link
Contributor

@meiji163meiji163 commentedOct 4, 2024
edited
Loading

Description

This PR introduces multi-threaded replication for applying DML queries to the ghost table. The goal is to be able to migrate tables with high rate of DML queries (e.g. >5k rows/s). Currently gh-ost lags behind in these situations, taking a very long time to complete or not completing at all.

Similar toMySQL replication threads, gh-ost will stream binlog events from the source and group them into transactions. It then submits the transactions to a pool of workers to apply the transactions concurrently on the ghost table. We ensure that dependent transactions are applied in a consistent order (equivalent to MySQL multi-threaded replication withreplica_parallel_type=LOGICAL_CLOCK andreplica_preserve_commit_order=0).

WithWRITESET enabled on the source, this enables a great amount of parallelism in the transaction applier.

Changes

TODO

Performance tests

TODO

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

arthurschreiberand others added30 commitsSeptember 30, 2024 09:15
Multiple `TableMap` events can happen inside of a single transaction, so we can't really use them to filter out events early or anything like that. Instead, we'll just skip them.Also adds some basic test cases for the transaction streaming.
Co-authored-by: Daniel Joos <danieljoos@github.com>
mhamza15
mhamza15 previously approved these changesApr 1, 2025
* Remove error return value since we don't use it.* Lock the mutex whenever we plan to update the low watermark to avoid a race condition.* Check for data races in our unit tests.* Still return an error from ProcessEventsUntilDrained but actually check it in our code.* Make coordinator_test.go to check the err from  ProcessEventsUntilDrained again* Remove unreachable return in ProcessEventsUntilDrained
hugodorea
hugodorea previously approved these changesApr 9, 2025
…ark (#1531)* Notify waiting channels on completed transaction, not just the watermark.* Add checksum validation to coordinator test* Use errgroup to perform transactions concurrently in coordinator_test.go* Configure concurrency separate from total number of transactions.* Run similar number of txs to previous test and ignore context.* Have at least 1 child in a transaction.* Notify waiting channels for the current sequence number.
hugodorea
hugodorea previously approved these changesApr 10, 2025
@meiji163
Copy link
ContributorAuthor

meiji163 commentedApr 12, 2025
edited
Loading

Despite promising performance results in testing, we stopped developing this branch since Nov 2024 after running into intermittent data inconsistency problems in internal replica tests. I believe I've tracked down the source of this issue. Below is the investigation for anyone interested.

Investigation

The data inconsistency appeared intermittently on several different ghost testing replicas running the MTR version.
I was able to reproduce the error locally using a dockerlocaltest with sysbench write load. The data inconsistency happens pretty reliably with ~900 trx/sec on the table.

The test table looks like this:

CREATETABLE `sbtest1` (`id`intNOT NULL AUTO_INCREMENT,`k`intNOT NULL DEFAULT'0',`c`char(120)NOT NULL DEFAULT'',`pad`char(60)NOT NULL DEFAULT'',PRIMARY KEY (`id`),  KEY`k_1` (`k`));

The testing result is checksum mismatch (usually only one row). In this case the row withid=5025. The second columnk is wrong. I ran the test withgeneral_log='ON' on the primary and replica to see what is going on. Here are the last two transactions onsbtest1 from sysbench affecting row5025:

-- 2025-04-12T03:28:44.209273Z        25 ExecuteBEGIN;DELETEFROM sbtest1WHERE id=5025;INSERT INTO sbtest1 (id, k, c, pad)VALUES (5025,5046,'55585975399-51936995975-90609908571-88981758242-41639509045-49015163211-63909390173-09873895014-17528416149-59787710722','90699347551-90936038435-69760642136-45340328341-67205199431');COMMIT;-- 2025-04-12T03:28:44.209695Z        28 ExecuteUPDATE sbtest1SET k=k+1WHERE id=5025;

And corresponding binlog events on the replica

last_committed=89053    sequence_number=89058DELETE FROM `test`.`sbtest1`WHERE  @1=5025  @2=4993  @3='72162371109-65711525437-30164254657-02236716337-47638530925-52423543892-06270192544-11372615750-04017656641-19388264173'  @4='44029122667-48848103638-83352868135-91599152925-97809617080'# at 145248382INSERT INTO `test`.`sbtest1`SET  @1=5025  @2=5046  @3='55585975399-51936995975-90609908571-88981758242-41639509045-49015163211-63909390173-09873895014-17528416149-59787710722'  @4='90699347551-90936038435-69760642136-45340328341-67205199431'# at 145248672...last_committed=89062    sequence_number=89065UPDATE `test`.`sbtest1`WHERE  @1=5025  @2=5046  @3='55585975399-51936995975-90609908571-88981758242-41639509045-49015163211-63909390173-09873895014-17528416149-59787710722'  @4='90699347551-90936038435-69760642136-45340328341-67205199431'SET  @1=5025  @2=5047  @3='55585975399-51936995975-90609908571-88981758242-41639509045-49015163211-63909390173-09873895014-17528416149-59787710722'  @4='90699347551-90936038435-69760642136-45340328341-67205199431'# at 145258107

So, the correct value isk=5047. But on_sbtest1_gho the order of the transactions is switched. First gh-ost updates the row tok=5047 then deletes and reinserts it withk=5046, resulting in wrong final valuek=5046.

We can look at the dependency (sub)graph for the original transactions onsbtest1 (sequence numbers89058 and89065). It looks like this:

graph LR;     89058--> 89053;     89053--> 89050;     89065--> 89062;     89062--> 89053;
Loading

This means once transaction89053 and89062 finish, the coordinator may schedule89058 and89065 concurrently.

Comparing to what the MySQL replication applier coordinator does (sql/rpl_rli_pbd.cc), I realized that a transaction should be scheduled if and only iflowWaterMark >= lastCommitted.

ThelastCommitted of a transaction is itsmost recent (i.e. greatest) dependent transaction. ThelowWaterMark is a global variable that maintains the invariant that all sequence numbers <= lowWaterMark are complete. Therefore if we schedule a transaction whenlastCommitted > lowWaterMark it is possible it has dependent transactions that haven't completed, even iflastCommitted is complete.

In the example,89065 must wait untillowWaterMark >= 89062, at which point it's guaranteed that89058 completed.

Fix

In our Coordinator, the culprit is this line inWaitForTransaction:

if_,ok:=c.completedJobs[lastCommitted];ok {
returnnil
}

In the example it allowed89065 to be applied after89062, but before89058 completed.

After removing these lines the sysbench localtest is consistently passing.

susanzhang27 reacted with hooray emojisusanzhang27 reacted with rocket emoji

arthurschreiber
arthurschreiber previously approved these changesApr 12, 2025
@TomKnaepen
Copy link

Hi@meiji163, my team has been running into the exact limits you describe in this PR, so we were very excited to see the progress already made here. However, over the last 2 weeks I've run some tests with this PR and still notice a data inconsistency issue.

We use gh-ost on AWS Aurora, and my testing has been with a production-like setup. On the table we're trying to migrate the only interesting queries (not SELECTs) are fairly simple: we insert rows and update some of them fairly quickly to remove a timestamp value.

The inconsistency I see is caused by some UPDATEs not being applied to the ghost table. This happens infrequently but consistently throughout the tests (50 rows affected out of ~2 million).
Here's an example problematic row:

#250516 13:06:46 server id 1951482084  end_log_pos 26539240 CRC32 0x364830a9   Anonymous_GTID  last_committed=1822 sequence_number=1824    rbr_only=yes    original_committed_timestamp=1747393606597218   immediate_commit_timestamp=1747393606597218 transaction_length=3944# at 26539538### INSERT INTO `DB_ACTIVE`.`message`### SET###   @1=39181433--#250516 13:06:46 server id 1951482084  end_log_pos 26589750 CRC32 0x2b5cf4fe   Anonymous_GTID  last_committed=1867 sequence_number=1868    rbr_only=yes    original_committed_timestamp=1747393606648774   immediate_commit_timestamp=1747393606648774 transaction_length=7447# at 26590057### UPDATE `DB_ACTIVE`.`message`### WHERE###   @1=39181433###   @52=1747480006593### SET###   @1=39181433###   @52=NULL--#250516 13:06:46 server id 1951482084  end_log_pos 28141453 CRC32 0xd8e3f39c   Anonymous_GTID  last_committed=2029 sequence_number=2030    rbr_only=yes    original_committed_timestamp=1747393606747371   immediate_commit_timestamp=1747393606747371 transaction_length=3949# at 28141756### INSERT INTO `DB_ACTIVE`.`_message_gho`### SET###   @1=39181433

It doesn't quite look like the same issue you ran into and described above because the dependency graph looked fairly linear:

last_committed=1822 sequence_number=1824 -- INSERTlast_committed=1824 sequence_number=1825last_committed=1825 sequence_number=1826last_committed=1826 sequence_number=1828last_committed=1828 sequence_number=1832last_committed=1832 sequence_number=1836last_committed=1836 sequence_number=1837last_committed=1837 sequence_number=1840last_committed=1840 sequence_number=1844last_committed=1844 sequence_number=1854last_committed=1854 sequence_number=1856last_committed=1856 sequence_number=1857last_committed=1857 sequence_number=1858last_committed=1858 sequence_number=1862last_committed=1862 sequence_number=1863last_committed=1863 sequence_number=1866last_committed=1866 sequence_number=1867last_committed=1867 sequence_number=1868 -- UPDATE

Do you have any ideas what could be the issue, and is there anything else I can provide?

Thank you!

@meiji163
Copy link
ContributorAuthor

Thanks for reporting this@TomKnaepen, I also ran into more data consistency issues after the latest changes, but I haven't had time to investigate further. We thought it might be related to the binlogsyncer connection being closed and reopened. Is there any error in your gh-ost output?

@TomKnaepen
Copy link

I don't recall seeing any errors at the time; I ran some new tests on a smaller scale this week to see if I could find anything but there's no errors or anything noteworthy in the gh-ost output.
Tested both revisions2eacc78 and7c3032f since I thought you might be referring to that commit, but both resulted in inconsistent data.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@danieljoosdanieljoosdanieljoos left review comments

@hugodoreahugodoreahugodorea left review comments

@rashiqrashiqAwaiting requested review from rashiqrashiq will be requested when the pull request is marked ready for reviewrashiq is a code owner

@timvaillancourttimvaillancourtAwaiting requested review from timvaillancourttimvaillancourt will be requested when the pull request is marked ready for reviewtimvaillancourt is a code owner

Copilot code reviewCopilotAwaiting requested review from CopilotCopilot will automatically review once the pull request is marked ready for review

+2 more reviewers

@arthurschreiberarthurschreiberarthurschreiber left review comments

@mhamza15mhamza15mhamza15 left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

8 participants

@meiji163@TomKnaepen@arthurschreiber@danieljoos@mhamza15@hugodorea@misalcedo

[8]ページ先頭

©2009-2025 Movatter.jp