Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Parallel import with trackable jobs.#275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
matchish merged 14 commits intomatchish:8.xfromDavisPuciriuss:improve-import-speed
May 30, 2024

Conversation

@DavisPuciriuss
Copy link

@DavisPuciriussDavisPuciriuss commentedMay 7, 2024
edited
Loading

Continuation of#78.

Added FromScope, that uses forPageAfterId instead of forPage.

Added a new option within ImportCommand - "--parallel".
This option requires user to set-up async job handler within their configuration.
Main thread continues to pull data from the database, while workers handle making the data searchable.

For the parallel import to work, addedTrackable-Jobs package, that looks to be actively maintained.
Updated the existing tests to pass, and added few tests to handle synced parallel import.
Started to make changes for larastan.

Performance testing (~250k records, mysql):

  • Release v7.6.0 no changes:
    ~600 seconds.
  • Release v7.6.0 with forPageAfterId instead of forPage:
    ~350 seconds.
  • Release v7.6.0 with forPageAfterId and "--parallel" option, Laravel Horizon using redis 1x queue, 4x processes:
    ~245 seconds.

Will appreciate any feedback on what should be improved for this PR to be merged.

matchish, edgarsn, hkulekci, and Watercycle reacted with rocket emoji
Davis Puciriuss added4 commitsMay 7, 2024 12:50
… package name.Makefile added option to run larastan seperately.gitignore added "/build/".
Updated StageInterface, updated existing stages to include the new interface methods.Changed PullFromSource to instead of being multiple instance, be one and iterable.Added PullFromSourceParallel to handle parallel import.
@matchish
Copy link
Owner

Thank you for your great work on this pull request! I really appreciate your effort to integrate the parallel import feature. Here are a few areas where I believe we could improve the implementation:

  1. Tests: Currently, the tests are failing. Could you take a look to ensure that the new changes don't break existing functionalities?
  2. Optional Dependency: It would be beneficial if thetrackable-jobs package remains optional for users who are not interested in the parallel import feature.
  3. Backward Compatibility: Please ensure that for users not utilizing the parallel import feature, there are no changes in behavior. This will help us maintain backward compatibility and potentially release this feature faster.
  4. Progress Visibility: During the parallel import, being able to see progress would enhance the user experience.
  5. Cancel Overlapping Imports: Finally, it would be ideal if a new import(sync or parallel) could cancel the current parallel import to prevent overlaps.

Thank you once again for your contribution! Looking forward to your thoughts and further improvements.

Trackable-Job package added to composer as a suggestion.styleci.
@matchish
Copy link
Owner

mateusjunges/laravel-trackable-jobs should be installed into test envhttps://github.com/matchish/laravel-scout-elasticsearch/blob/master/.github/workflows/test-application.yaml

@matchish
Copy link
Owner

matchish commentedMay 13, 2024
edited
Loading

Ensure that the master branch of the fork is synchronized with this repository, or add this line to your branch.

@DavisPuciriuss
Copy link
Author

DavisPuciriuss commentedMay 13, 2024
edited
Loading

Synchronized this PR with master branch.

@DavisPuciriuss
Copy link
Author

The master branch on the fork is now synced aswell. Missed previously that you mentioned that it needs to be synced on the master branch.

@matchish
Copy link
Owner

could you push a new commit. for example readme update with section about parallel feature. let's see if it will run new workflow

@matchish
Copy link
Owner

or close this pr and open new one

@matchishmatchish marked this pull request as ready for reviewMay 15, 2024 07:21
@codecov-commenter
Copy link

codecov-commenter commentedMay 15, 2024
edited
Loading

Codecov Report

Attention: Patch coverage is95.25862% with11 lines in your changes are missing coverage. Please review.

Project coverage is 98.11%. Comparing base(2a83b2b) to head(9842153).

FilesPatch %Lines
src/Jobs/Stages/PullFromSourceParallel.php91.07%5 Missing⚠️
...rc/ElasticSearch/EloquentHitsIteratorAggregate.php75.00%2 Missing⚠️
src/Engines/ElasticSearchEngine.php83.33%2 Missing⚠️
src/Jobs/ProcessSearchable.php92.30%1 Missing⚠️
src/Jobs/Stages/StopTrackedJobs.php93.75%1 Missing⚠️
Additional details and impacted files
@@             Coverage Diff              @@##                8.x     #275      +/-   ##============================================+ Coverage     95.99%   98.11%   +2.12%- Complexity      194      255      +61============================================  Files            36       40       +4       Lines           624      797     +173     ============================================+ Hits            599      782     +183+ Misses           25       15      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

@matchish
Copy link
Owner

Tests: Currently, the tests are failing. Could you take a look to ensure that the new changes don't break existing functionalities?
Optional Dependency: It would be beneficial if the trackable-jobs package remains optional for users who are not interested in the parallel import feature.
Backward Compatibility: Please ensure that for users not utilizing the parallel import feature, there are no changes in behavior. This will help us maintain backward compatibility and potentially release this feature faster.
Progress Visibility: During the parallel import, being able to see progress would enhance the user experience.
Cancel Overlapping Imports: Finally, it would be ideal if a new import(sync or parallel) could cancel the current parallel import to prevent overlaps.

First two points for sure done.
What about rest? Do you think we are safe?

Also plz apply diff from stylecidiff file
Use the git apply command followed by the path to your diff file, like this
git apply O375ML.diff

@DavisPuciriuss
Copy link
Author

Hey, applied the diff from styleci.

Progress Visibility: If this is meant as the progress bar within console, when importing. Then this is already done and there is a test for this specifically here:
https://github.com/DavisPuciriuss/laravel-scout-elasticsearch/blob/a1ae08089a8f563fa12e778f6d1ec3fcb57a7d6d/tests/Integration/Jobs/ImportTest.php#L47

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

Backward Compatibility: The major changes are within StageInterface and ImportSource interfaces. Both of these interfaces have new functions. In the StageInterface case, the problem was providing the next PullFromSource with the lastId for usage in forPageAfterId(). Solved this by adding a completed() method, allowing some stages to become iterable.
This could also be changed into creating another StageInterfaceIterable, but then it would require re-creating the existing stages and would create a lot of duplicated code. Maybe you have a better suggestion for this ?

hkulekci reacted with eyes emoji

@matchish
Copy link
Owner

matchish commentedMay 15, 2024
edited
Loading

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.

@DavisPuciriuss
Copy link
Author

Okay, will make some changes tomorrow with in-progress or pending jobs, when starting another import job.

What about the other two points, are those fine or should I make some changes there aswell ?

@matchish
Copy link
Owner

What about the other two points, are those fine or should I make some changes there aswell ?

I have to dive deeper. Can't answer now

@DavisPuciriuss
Copy link
Author

Cancel Overlapping Imports: Would like some more explanation on this, as currently cancelling the parallel import it will not continue to pull in data from DB, but will finish the pending jobs. Is there a way to detect if someone has cancelled the current import ?

I believe proper way is to check if there are started/pending import jobs and cancelling them before to start a new import. otherwise could be overlapping with jobs that not cancelled yet from previous import.

This has been added, if you start a new import, the previous one will cancel.

@matchish
Copy link
Owner

I’m not sure when I can dive deep into the code. There are some things to polish, but overall, it looks fine.

Since the code is not backward compatible and to avoid blocking the release, I propose releasing it as8.0.0-alpha.1 to highlight that the API might change. We can merge it into the 8x branch, which will live on top of the master branch. This way, we can improve it based on feedback from developers and continue contributing to stable versions.

hkulekci, edgarsn, and DavisPuciriuss reacted with heart emoji

@matchishmatchish changed the base branch frommaster to8.xMay 28, 2024 14:46
@matchish
Copy link
Owner

If agree plz update Changelog and we can release

@DavisPuciriuss
Copy link
Author

Hey, great to hear back from you.
The alpha release would be great, to both test the code out more and receive more feedback as you mentioned.
Updated the changelog.

@matchish
Copy link
Owner

Releasing. Huge thanks to@DavisPuciriuss! Your contribution is a significant step forward for the project.

@matchishmatchish merged commit304ff4a intomatchish:8.xMay 30, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@DavisPuciriuss@matchish@codecov-commenter

[8]ページ先頭

©2009-2025 Movatter.jp