- Notifications
You must be signed in to change notification settings - Fork1.6k
Refactorto_dataframe deterministicly update progress bar.#8303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Previously, a background thread was used to collect progress bar updatesfrom worker threads. So as not to block downloads for progress barupdates, `put_nowait` was used to make progress bar updates. Missedwrites to the progress bar were ignored. This caused non-deterministicprogress bar updates and test flakiness.Now, worker threads push dataframes to the queue, and the return valuesfor `download_dataframe_bqstorage` and`download_dataframe_tabledata_list` have been updated to return aniterable of pandas DataFrame objects instead of a single DataFrame. Thisallows progress bar updates to be done independently of which underlyingAPI is used to download the DataFrames.Also, the logic for working with pandas has been moved to the`_pandas_helpers` module.
plamut left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The changes generally seem fine to me, but I did leave a few comments that might be relevant - please check, just in case.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
plamut left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good, and thanks for the additional explanation of the design decisions!
Will wait if the other reviewers have something else to add.
| # prevents the queue from filling up, because the main thread | ||
| # has smaller gaps in time between calls to the queue's get | ||
| # method. For a detailed explaination, see: | ||
| # https://friendliness.dev/2019/06/18/python-nowait/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is great. 👍
Previously, a background thread was used to collect progress bar updates
from worker threads. So as not to block downloads for progress bar
updates,
put_nowaitwas used to make progress bar updates. Missedwrites to the progress bar were ignored. This caused non-deterministic
progress bar updates and test flakiness.
Now, worker threads push dataframes to the queue, and the return values
for
download_dataframe_bqstorageanddownload_dataframe_tabledata_listhave been updated to return aniterable of pandas DataFrame objects instead of a single DataFrame. This
allows progress bar updates to be done independently of which underlying
API is used to download the DataFrames.
Also, the logic for working with pandas has been moved to the
_pandas_helpersmodule.Closes#8175