Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: Refactor DataFrame.to_pandas_batches and update TableWidget#2250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
shuoweil wants to merge1 commit intomain
base:main
Choose a base branch
Loading
fromshuowei-anywidget-index-testcase

Conversation

@shuoweil
Copy link
Contributor

@shuoweilshuoweil commentedNov 10, 2025
edited
Loading

This branch introduces significant enhancements to the interactive table widget, particularly for displaying complex data structures like nested STRUCTs and ARRAYs, which implicitly improves the visual handling of multi-index DataFrames.

  • Improved Nested Data Display (html.py)
  • Refined Multi-Index and Row Grouping (anywidget.py,table_widget.js,table_widget.css)
  • General UI/UX Improvements

TODO(b/469861913): Nested columns from structs (e.g., 'struct_col.name') are not currently sortable

See an example here: screen/48TZkxRbmuWNYBB

Fixes #<459515995> 🦕

@shuoweilshuoweil self-assigned thisNov 10, 2025
@shuoweilshuoweil requested review froma team ascode ownersNovember 10, 2025 22:11
@review-notebook-app
Copy link

Check out this pull request on ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered byReviewNB

@product-auto-labelproduct-auto-labelbot added size: lPull request size is large. api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API. labelsNov 10, 2025
@shuoweilshuoweil changed the titlefeat: Enhance anywidget notebook with Index and MultiIndex example and improve test robustnessfeat: Display index and mutiindex in anywidget modeNov 10, 2025
@shuoweilshuoweil marked this pull request as draftNovember 11, 2025 19:13
@shuoweilshuoweil removed the request for review fromtswastNovember 11, 2025 19:13
@shuoweilshuoweilforce-pushed theshuowei-anywidget-index-testcase branch from441bb94 to33fd6e3CompareNovember 11, 2025 20:30
@shuoweilshuoweil marked this pull request as ready for reviewNovember 11, 2025 21:42
@shuoweilshuoweil changed the titlefeat: Display index and mutiindex in anywidget modefeat: Display index and multi-index in anywidget modeNov 12, 2025
Comment on lines 32 to 80
def_calculate_rowspans(dataframe:pd.DataFrame)->list[list[int]]:
"""Calculates the rowspan for each cell in a MultiIndex DataFrame.
Args:
dataframe (pd.DataFrame):
The DataFrame for which to calculate index rowspans.
Returns:
list[list[int]]:
A list of lists, where each inner list corresponds to an index level
and contains the rowspan for each row at that level. A value of 0
indicates that the cell should not be rendered (it's covered by a
previous rowspan).
"""
ifnotisinstance(dataframe.index,pd.MultiIndex):
# If not a MultiIndex, no rowspans are needed for the index itself.
# Return a structure that indicates each index cell should be rendered once.
return [[1]*len(dataframe.index)]ifdataframe.index.nlevels>0else []

rowspans:list[list[int]]= []
forlevel_idxinrange(dataframe.index.nlevels):
current_level_spans:list[int]= []
current_value=None
current_span=0

foriinrange(len(dataframe.index)):
value=dataframe.index.get_level_values(level_idx)[i]

ifvalue==current_value:
current_span+=1
current_level_spans.append(0)# Mark as covered by previous rowspan
else:
# If new value, finalize previous span and start a new one
ifcurrent_span>0:
# Update the rowspan for the start of the previous span
current_level_spans[i-current_span]=current_span
current_value=value
current_span=1
current_level_spans.append(0)# Placeholder, will be updated later

# Finalize the last span
ifcurrent_span>0:
current_level_spans[len(dataframe.index)-current_span]=current_span

rowspans.append(current_level_spans)

returnrowspans


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While this makes for a somewhat cleaner view, I think we'd be better off not implementing rowpans. For example sorting by the various index columns is less intuitive if the index isn't rendered the same as other columns.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A new implementation is used without rowspans.

table_html.append(' <tr style="text-align: left;">')

# Add index headers
fornameindataframe.index.names:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's try to actually refer to the original BigFrames DataFrame when deciding which index columns to render or not. In particular, we should not render the index if the BigFrame DataFrame has a NULL index.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My new version correctly refers to the original BigFrames DataFrame's _block.has_index property. If has_index is False (indicating a "NULL index"), the _prepare_dataframe_for_display method ensures that a default "Row" column is inserted for display instead of attempting to render a non-existent index. My recent refactoring preserved this established logic.

@shuoweilshuoweil marked this pull request as draftDecember 5, 2025 18:53
@product-auto-labelproduct-auto-labelbot added size: xlPull request size is extra large. and removed size: lPull request size is large. labelsDec 17, 2025
@shuoweilshuoweil changed the titlefeat: Display index and multi-index in anywidget modefeat: Display custom multi-index in anywidget modeDec 17, 2025
@shuoweilshuoweil marked this pull request as ready for reviewDecember 19, 2025 02:40
orderable_columns=traitlets.List(traitlets.Unicode(), []).tag(sync=True)
_initial_load_complete=traitlets.Bool(False).tag(sync=True)
_batches:Optional[blocks.PandasBatches]=None
_batches:blocks.PandasBatches|None=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This syntax was added in Python 3.10.https://peps.python.org/pep-0604/

Currently, we are supporting Python 3.9

python_requires=">=3.9",
but since this version is end-of-life, we should probably drop it sooner rather than later.

Please wait to use this syntax until we drop Python 3.9. I have filed b/470438395 to track this.

shuoweil reacted with thumbs up emoji
Comment on lines 269 to 270
ifhasattr(self,"_setting_html"):
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd prefer we initialize this to false ininit instead of relying on hasattr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also, consider multi threading. We probably need a lock to fully prevent two table htmls being set at the same time.

Copy link
ContributorAuthor

@shuoweilshuoweilDec 19, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Here is what I change to: (1) self._setting_html = False and self._setting_html_lock = threading.Lock() were added to the TableWidget.init method
(2) The _set_table_html method was refactored to use with self._setting_html_lock: and the self._setting_html flag to prevent race conditions during HTML rendering. The flag is reset in the finally block to ensure it's always released.

# TODO(b/463715504): Support sorting by index columns.
df_to_display=df_to_display.sort_values(
by=self.sort_column,ascending=self.sort_ascending
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I would prefer a context manager instead of a try/except/finally block. That way we can be more explicit about what the intention is. Are you sure you want to competely skip setting the html if another thread tries to set it? If not, just do a with block with a lock here. If so, please explain why it's safe to skip.

Aside: I would actually expect the opposite here, where later set calls would interrupt any set call that is currently running.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A threading.Lock was used as a context manager (with self._setting_html_lock:) within _set_table_html. The logic now explicitly uses the _setting_html flag within this locked context to control whether HTML rendering proceeds or is skipped, ensuring thread safety and clear intent.

def_get_page_data(self)->pd.DataFrame:
"""Get the data for the current page, handling unknown row count."""
# This loop is to handle auto-correction of page number when row count is unknown
whileTrue:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks like it's intended for retries, but this is a very unsafe way to do that, if is the intention.

Also, I don't understand how this change relates to adding columns for mutiindex.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No, it is not related to multiindex display. I mix so many features into the same pr. I have refactor them out. And move this one to a separate new branch.

@shuoweilshuoweilforce-pushed theshuowei-anywidget-index-testcase branch froma7783d7 to15993a3CompareDecember 19, 2025 23:20
@product-auto-labelproduct-auto-labelbot added size: mPull request size is medium. and removed size: xlPull request size is extra large. labelsDec 19, 2025
@shuoweilshuoweil marked this pull request as draftDecember 19, 2025 23:26
@shuoweilshuoweil changed the titlefeat: Display custom multi-index in anywidget modefeat: Refactor DataFrame.to_pandas_batches and update TableWidgetDec 19, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@tswasttswasttswast requested changes

Requested changes must be addressed to merge this pull request.

Assignees

@shuoweilshuoweil

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.size: mPull request size is medium.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@shuoweil@tswast

[8]ページ先頭

©2009-2025 Movatter.jp