Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feature: Support passing DataFrames to table.table#28830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
timhoffm merged 14 commits intomatplotlib:mainfromanijjar:feat/28726
Oct 29, 2024

Conversation

anijjar
Copy link
Contributor

@anijjaranijjar commentedSep 17, 2024
edited
Loading

PR summary

Resolves#28726 : feature request: support passing DataFrames to table.table
dawnwangcx wanted a feature to enable building tables using Pandas's DataFrame object. I implemented his solution and made a Unit test in test_table.py to confirm the DataFrame was correctly converted into a table.

PR checklist

@anijjar
Copy link
ContributorAuthor

PyTest Results

image

Copy link
Member

@timhoffmtimhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for the contribution. The solution is slightly more complex because pandas is an optional dependency.

Also this should get a test. Something like (unchecked - you may need to slightly adapt it to make it work):

def test_table_dataframe():    df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])    fig, ax = plt.subplots()    table = ax.table(df)    assert table[0, 0] == 'a'    assert table[0, 1] == 'b'    assert table[1, 0] == '1'    assert table[1, 1] == '2'    assert table[2, 0] == '3'    assert table[2, 1] == '4'

@anijjar
Copy link
ContributorAuthor

Pytest test_table.py::test_table_dataframe

Unit Test

def test_table_dataframe():    # Test if Pandas Data Frame can be passed in cellText    import pandas as pd    data = {        'Letter': ['A', 'B', 'C'],        'Number': [100, 200, 300]    }    df = pd.DataFrame(data)    fig, ax = plt.subplots()    ax.axis('off')    table = ax.table(df, loc='center')    assert table[0, 0].get_text().get_text() == 'Letter'    assert table[0, 1].get_text().get_text() == 'Number'    assert table[1, 0].get_text().get_text() == 'A'    assert table[1, 1].get_text().get_text() == str(100)    assert table[2, 0].get_text().get_text() == 'B'    assert table[2, 1].get_text().get_text() == str(200)    assert table[3, 0].get_text().get_text() == 'C'    assert table[3, 1].get_text().get_text() == str(300)

Result

Passed

Copy link
Member

@timhoffmtimhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy link
Member

@story645story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Slight concern about the wording in the what's new so gonna give you time to address, but otherwise I think this is a great addition!

Copy link
Member

@story645story645 left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To fix the failing tests, add pandas to the testing requirements
I think tohttps://github.com/matplotlib/matplotlib/blob/main/requirements/testing/minver.txt or limit the new tests to 3.10+ like the xarray tests

I'm honestly extra confused why those tests don't find pandas since it's in the extra testing

@anijjar
Copy link
ContributorAuthor

To fix the failing tests, add pandas to the testing requirements I think tohttps://github.com/matplotlib/matplotlib/blob/main/requirements/testing/minver.txt or limit the new tests to 3.10+ like the xarray tests

I'm honestly extra confused why those tests don't find pandas since it's in the extra testing

This is where I am confused also. For now, , Ill modify the the mini version file to include pandas and hopefully, that test can run correctly.

story645 reacted with thumbs up emoji

@anijjar
Copy link
ContributorAuthor

I think I found the problem. Inside the tests.yml file, the extra-requirements line is missing from the other workflows. We could add pandas to all.txt, but I suggest against it because it is not a default dependency. So instead, Ill add the extra-requirements line to the other workflows.

          - os: ubuntu-22.04            python-version: '3.11'            # https://www.riverbankcomputing.com/pipermail/pyqt/2023-November/045606.html            pyqt6-ver: '!=6.6.0'            # https://bugreports.qt.io/projects/PYSIDE/issues/PYSIDE-2346            pyside6-ver: '!=6.5.1'            extra-requirements: '-r requirements/testing/extra.txt'          - os: ubuntu-22.04            python-version: '3.12'            # https://www.riverbankcomputing.com/pipermail/pyqt/2023-November/045606.html            pyqt6-ver: '!=6.6.0'            # https://bugreports.qt.io/projects/PYSIDE/issues/PYSIDE-2346            pyside6-ver: '!=6.5.1'          - os: ubuntu-22.04....

For the minimum versions workflow, minver.txt doesn't add dependencies, but defines the package versions for the all.txt file. Ill revert my changes there.

@story645
Copy link
Member

story645 commentedSep 23, 2024
edited
Loading

Sorry for the backtracking, but I think instead of changing the minimum version tests to run everything, you should flag this test to only run when pandas is installed.

@@ -674,7 +676,7 @@ def table(ax,

Parameters
----------
cellText : 2D list of str, optional
cellText : 2D list of str,Pandas.DataFrame,optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure we want to call out pandas data frames here. How do we do this elsewhere? I'd just state "object" and then below, state that
"""
IfcellText is not a list of texts, we attempt to accesscellText.columns.to_numpy() for the column headers,
andcellText.to_numpy() for the table's cells.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We don't have any other functions explicitly take DataFrames. Typically, DataFrame is covered by what we callarray-like (and internally runs viato_numpy(). It's different here because we take the column headers as well. Due to this particular usage, we've chosen to explicitly check for DataFrames (_is_pandas_dataframe()). I believe that duck-typing would be a bit too vague here: An object withobj.columns.to_numpy() andobj.to_numpy() is very particular. I'm not sure if this matches anything but pandas.DataFrame - and if there exists other such objects, I'm not clear that accepting them is appropriate. If the need for similar objects should arise, we can always revisit. For now, the explicit type is the simplest documentation and matches implementation:

Suggested change
cellText :2Dlistofstr,Pandas.DataFrame,optional
cellText :2Dlistofstrorpandas.DataFrame,optional

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I linked an example of another very popular package, polars, that has the same form of data frame. I'm sure there are others out there. I don't think we should be special casing pandas here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Polars is logically equivalent, but it does not have the same API. See#28830 (comment).

There are several other dataframes out their (https://data-apis.org/dataframe-api/draft/purpose_and_scope.html#history-and-dataframe-implementations), but the problem is that their API is currently not consistent. IMHO we therefore would have to special case to support multiple of them.

While I appreciate the attempt to duck-type instead of explicitly relying on types, I believe the current dataframe APIs are to inhomogeneous to support reasonable duck-typing. Any object that supportsobj.columns.to_numpy() andobj.to_numpy() is too technical - i suspect that at least half of the pandas users would not know whether that holds forpandas.DataFrame or not. Duck-typing would work ifdataframe-like was well-defined, but it is currently not (see the draft dataframe API standard).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thinknp.asanyarray(df.columns) works on both pandas and polars, and is pretty straightforward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If you really want to push for duck-typing, go for it. But I'd require reasonable type stubs (acceptingAny is not good enough) and type documentation (users should at least easily see that pandas.DataFrame as the most common dataframe type is supported).

Ideally, that claimed behavior would also be tested. (It's easy for pandas, because it's a test dependency. For others, you'd likely need a mock similar totest_unpack_to_numpy_from_torch.)

I personally believe it's not worth the added effort and would advise against requiring this from a first-time contribution. When accepting the limited scope of pandas, this PR is almost ready apart from minor documentation issues and the type stub. I therefore suggest to take this PR as a limited but well scoped improvement. We don't paint us in any corner with that. You can generalize in a followup PR.

anijjar reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But I'd require reasonable type stubs

OK, but I dont' think you will get pandas dataframes in a typestub from this PR either, will you? I'm personally pretty 👎 on losing functionality because of type stubbing issues.

@@ -744,6 +746,13 @@ def table(ax,
cols = len(cellColours[0])
cellText = [[''] * cols] * rows

# Check if we have a Pandas DataFrame
if _is_pandas_dataframe(cellText):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd actually make this more generic, and check ifcellText has a columns attribute and ato_numpy method. That lets non-pandas objects also work; for instance polars is a pandas competitor, and there is no reason its dataframes could not be passed in here:https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.columns.html
https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.to_numpy.html

Copy link
Member

@timhoffmtimhoffmSep 24, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note thatpolars.DataFrame.columns is of typelist[str], so the current implementation usingcellText.columns.to_numpy() would fail. These are exactly the subtleties I did not want to go into as part of this first-time contributor PR. Let's start with pandas, which is a clean addition. We can always generalize later, which should likely conform to thedataframe API standard (note that this is still draft)

As a side-remark: The table implementation has lots of design problems. If one was serious about tables, the wholeTable would need to be rewritten/replaced. Therefore, I wouldn't spend too much effort on trying to improve implementation.

Copy link
Member

@timhoffmtimhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This needs an update to the table stub intables.pyi

anijjar reacted with thumbs up emoji
@anijjar
Copy link
ContributorAuthor

Sorry for the backtracking, but I think instead of changing the minimum version tests to run everything, you should flag this test to only run when pandas is installed.

Sure, Ill revert my changes and add a flag instead.

Thank you all for the thorough review of my commits. I'm learning a lot👍🏾. About the polars package, I dont mind repeating what I did here and make another PR for it. If it is as popular as you say, I can see the value in adding support.

@anijjar
Copy link
ContributorAuthor

anijjar commentedSep 24, 2024
edited
Loading

When trying to commit the tables.pyi file, I get this error from pre-commit.

mypy.....................................................................Failed- hook id: mypy- exit code: 1Warning: Unpack is already enabled by defaultlib\matplotlib\table.pyi:71: error: Name "pandas.DataFrame" is not defined  [name-defined]Found 1 error in 1 file (checked 107 source files)

from this change

cellText: Sequence[Sequence[str]] | 'pandas.DataFrame' | None = ...,

Im not sure how to fix this unless I outright import the pandas module.

@timhoffm
Copy link
Member

When trying to commit the tables.pyi file, I get this error from pre-commit.

mypy.....................................................................Failed- hook id: mypy- exit code: 1Warning: Unpack is already enabled by defaultlib\matplotlib\table.pyi:71: error: Name "pandas.DataFrame" is not defined  [name-defined]Found 1 error in 1 file (checked 107 source files)

from this change

cellText: Sequence[Sequence[str]] | 'pandas.DataFrame' | None = ...,

Im not sure how to fix this unless I outright import the pandas module.

I believe it should work to

try:    from pandas import DataFrameexcept ImportError:    DataFrame = None...cellText: Sequence[Sequence[str]] | DataFrame | None = ...,

but maybe@QuLogic or@ksunden have a better solution.

Copy link
Member

@story645story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Bunch of hopefully small things and also this PR will need two approvals

from typing import Any, Literal
from typing import Any, Literal, TYPE_CHECKING

if TYPE_CHECKING:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Having anif TYPE_CHECKING: in a pyi is kind of redundant, as pyi isonly used for type checking.

This does introduce a type-check time requirement onpandas, which I don't fully love, but is notthat bad. The if/else here does not protect against not having Pandas (and not even sure a try/except would work in apyi... it would at least be unusual)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll reiterate that I don't think we should be special casing pandas at all, and we should not have a pandas dataframe as a type annotation. Is there anywhere else in the library that we do this?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should I import the DataFrame method directly?

@anijjaranijjar requested a review fromstory645October 3, 2024 04:43
Copy link
Member

@timhoffmtimhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Following#28726 (comment), I compared with the pandas implementation:

https://github.com/pandas-dev/pandas/blob/5829e3ea20adc978ebfb82f08d3d5347108be0f0/pandas/plotting/_matplotlib/tools.py#L72-L88

We should mingle columns and index intorowLabels andcolLabels: IfrowLabels /colLabels are not given, use index / columns as the respective labels.

We have two options for handling these with additional explicit labels:

  1. explicit labels take precedence and overwrite index/columns values
  2. we error if both are given.

While 1) could be convenient to overwrite the labels, I'm inclined to go with 2) - in the face of ambiguity, refuse the temptation to guess. We could always expand to 1) later if the need would arise.

Copy link
Member

@story645story645 left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this works, but I think we should absolutely support overwriting the pandas column and row labels b/c I'm pretty sure that would be my use case at least half the time.

I don't think 1. is guessing b/c they're explicitly passing in labels.

@timhoffm
Copy link
Member

@story645 adding the support for row/column labels is a straight forward extension that can be added later. I thought it might be nice to get a basic version in (i) because there already has been a lot of discussion and I don't want to overburden the PR with additional tweaks, in particular since this is a first-time contribution, and (ii) because it would be nice to push this into 3.10, which we most likely will miss when doing additional changes.

story645 reacted with thumbs up emoji

@timhoffmtimhoffm added this to thev3.10.0 milestoneOct 29, 2024
@timhoffm
Copy link
Member

I'll go and merge as is. Titles can be handled in a followup PR.

@timhoffmtimhoffm merged commit8e6d6b6 intomatplotlib:mainOct 29, 2024
41 of 42 checks passed
@timhoffm
Copy link
Member

@anijjar thanks and congratulations on your first contribution to Matplotlib 🎉. We hope to see you again.

anijjar reacted with heart emoji

@anijjaranijjar deleted the feat/28726 branchNovember 1, 2024 04:06
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jklymakjklymakjklymak left review comments

@ksundenksundenksunden left review comments

@timhoffmtimhoffmtimhoffm approved these changes

@story645story645story645 approved these changes

Assignees
No one assigned
Projects
Milestone
v3.10.0
Development

Successfully merging this pull request may close these issues.

feature request: support passing DataFrames to table.table
5 participants
@anijjar@story645@timhoffm@jklymak@ksunden

[8]ページ先頭

©2009-2025 Movatter.jp