Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork847
Avoid processing a single commit multiple times#1058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Avoid processing a single commit multiple times#1058
Uh oh!
There was an error while loading.Please reload this page.
Conversation
The repo in question:https://github.com/OpenVoxProject/openvox-agent
|
Uh oh!
There was an error while loading.Please reload this page.
When listing the commits that appear in a tag, some commits are addedmultiple time to the resulting Set. As a Set cannot contain duplicate,this is only a waste of computing resources, but with large repositorieswith a lot of branches and merges, this can require a tremendous amountof time.To illustrate the issue, consider the following git history with 5commits:```A---B---C---D \ / E```Assuming A is the root commit, and we call `commits_in_tag(D)`. Let'sinspect the value of `shas` and `queue` at the beginning of the loop,and what is pushed to these variables on each iteration:1. `shas = {}; queue = [D] #=> shas << D; queue << C`2. `shas = {D}; queue = [C] #=> shas << C; queue << [B, E]`3. `shas = {D, C}; queue = [B, E] #=> shas << B; queue << A`4. `shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B`5. `shas = {D, C, B, E}; queue = [A, B] #=> shas << A`6. `shas = {D, C, B, E, A}; queue = [B] #=> shas << B; queue << A`7. `shas = {D, C, B, E, A}; queue = [A] #=> shas << A`8. `shas = {D, C, B, E, A}; queue = [] #=> exit loop`As you can see above, 6 and 7 attempt to add B and A to the set, butthey where already added at step 3 and 5, so these action are no-op.This change checks if the current commit is already present in `shas`,and if so skip to the next queued commit. The above example becomes:1. `shas = {}; queue = [D] #=> shas << D; queue << C`2. `shas = {D}; queue = [C] #=> shas << C; queue << [B, E]`3. `shas = {D, C}; queue = [B, E] #=> shas << B; queue << A`4. `shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B`5. `shas = {D, C, B, E}; queue = [A, B] #=> shas << A`6. `shas = {D, C, B, E, A}; queue = [B] #=> next`7. `shas = {D, C, B, E, A}; queue = [] #=> exit loop`While we where previously unable to generate the ChangeLog of one of ourprojects in a reasonable amount of time (Ctrl+C after 15+ minutes ofwaiting at 100% CPU), this change allows us to generate it in ~45s on mylaptop.
4fa93b7
toa0344af
Compare
Uh oh!
There was an error while loading.Please reload this page.
When listing the commits that appear in a tag, some commits are added multiple time to the resulting Set. As a Set cannot contain duplicate, this is only a waste of computing resources, but with large repositories with a lot of branches and merges, this can require a tremendous amount of time.
To illustrate the issue, consider the following git history with 5 commits:
Assuming A is the root commit, and we call
commits_in_tag(D)
. Let's inspect the value ofshas
andqueue
at the beginning of the loop, and what is pushed to these variables on each iteration:shas = {}; queue = [D] #=> shas << D; queue << C
shas = {D}; queue = [C] #=> shas << C; queue << [B, E]
shas = {D, C}; queue = [B, E] #=> shas << B; queue << A
shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B
shas = {D, C, B, E}; queue = [A, B] #=> shas << A
shas = {D, C, B, E, A}; queue = [B] #=> shas << B; queue << A
shas = {D, C, B, E, A}; queue = [A] #=> shas << A
shas = {D, C, B, E, A}; queue = [] #=> exit loop
As you can see above, 6 and 7 attempt to add B and A to the set, but they where already added at step 3 and 5, so these action are no-op.
This change checks if the current commit is already present in
shas
, and if so skip to the next queued commit. The above example becomes:shas = {}; queue = [D] #=> shas << D; queue << C
shas = {D}; queue = [C] #=> shas << C; queue << [B, E]
shas = {D, C}; queue = [B, E] #=> shas << B; queue << A
shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B
shas = {D, C, B, E}; queue = [A, B] #=> shas << A
shas = {D, C, B, E, A}; queue = [B] #=> next
shas = {D, C, B, E, A}; queue = [] #=> exit loop
While we where previously unable to generate the ChangeLog of one of our projects in a reasonable amount of time (Ctrl+C after 15+ minutes of waiting at 100% CPU), this change allows us to generate it in ~45s on my laptop.