Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix PDF bloat for off-axis scatter with per-point colors#30746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
FazeelUsmani wants to merge5 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromFazeelUsmani:fix-matplotlib-pdf-scatter

Conversation

@FazeelUsmani
Copy link
Contributor

@FazeelUsmaniFazeelUsmani commentedNov 12, 2025
edited
Loading

Fixes#2488

PR summary

Problem

When usingscatter() with per-point colors and then moving the axes limits so all points are off-screen, the PDF backend still writes all marker paths to the file, resulting in unnecessarily large PDFs (~400KB instead of ~7KB for 1000 points).

Solution

Added bounds checking indraw_path_collection (backend_pdf.py:2121-2125) to skip markers outside the visible canvas. This mirrors the existing optimization already present indraw_markers (lines 2157-2159).

Implementation

  • Check marker offsets(xo, yo) against canvas bounds(0, 0) to(width*72, height*72)
  • Skip theoutput() call for out-of-bounds markers
  • Conservative, backend-specific change (PDF only)
  • No API changes or user-facing behavior modifications

Reproduction Example

importnumpyasnpimportmatplotlib.pyplotaspltx=np.random.random(1000)y=np.random.random(1000)c=np.random.random(1000)# Before fix: ~410 KB PDF# After fix: ~7-15 KB PDFplt.figure()plt.scatter(x,y,c=c)plt.xlim(20,30)# Move all points off-axis (data is 0-1 range)plt.savefig('scatter_offaxis.pdf')

Testing

Added three new tests intest_backend_pdf.py:

  • test_scatter_offaxis_colored_pdf_size
  • test_scatter_offaxis_colored_visual
  • test_scatter_mixed_onoff_axis

PR checklist

Skip emitting markers outside canvas bounds in draw_path_collectionto reduce PDF file size when scatter points are off-axis.Fixesmatplotlib#2488
@FazeelUsmaniFazeelUsmani marked this pull request as draftNovember 12, 2025 09:41
@FazeelUsmaniFazeelUsmani marked this pull request as ready for reviewNovember 12, 2025 10:29
@jklymak
Copy link
Member

This seems to remove a marker if it's center is outside the document. What happens if the marker size is large enough that the edge of the marker should still be shown even if the center is not on the page?

# The off-axis colored scatter should be close to empty size
# Allow up to 50KB overhead for axes/metadata, but should be much smaller
# than if all 1000 markers were written (which would add ~200-400KB)
assertsize_offaxis_colored<size_empty+50_000, (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The 50kb padding is confusing to me. The axes should be the same output either way (as the limits are the same etc). If there was anything extra I would expect it to be the structures for the scatter that has no paths (there are some open/close group stuff emitted if I recall correctly). To account for that we could adax2.scatter([], []) to the empty test.

It is probably worth also adding a third example where there are markers in in the plot (ax3.scatter(x+20, y+20, c=c)) to test that fully off axis one is about the same size as the empty one and both are smaller than the one with visible markers.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've updated the tests per your suggestions:

  1. Addedax2.scatter([], []) to the baseline test to match the axes structure exactly
  2. Reduced tolerance from 50KB to 5KB since axes output is now identical
  3. Added a third test case with visible markers (x + 20, y + 20, c=c) that validates:
    • Visible scatter is >50KB larger than empty
    • Visible scatter is >50KB larger than off-axis

Now, we can compare all three outputs:

  • fully off-axis scatter --> small file
  • empty scatter --> small file
  • visible scatter --> larger file

# may be partially visible even if its center is outside the canvas.
canvas_width=self.file.width*72
canvas_height=self.file.height*72
ifnot (-max_marker_extent<=xo<=canvas_width+max_marker_extent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why did you use a maximum extent rather than doing the computation here and having per-maker and per-direction filtering?

We have to compute the extent of every maker no matter what so might as well get the better behavior by doing it here.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I initially used a singlemax_marker_extent to simplify the bounds check and avoid recalculating the extent for each marker.

Yes, you're right - per-marker filtering is more precise. I've updated the commit.

bbox=path.get_extents(transform)
max_marker_extent=max(max_marker_extent,
bbox.width/2,bbox.height/2)
name=self.file.pathCollectionObject(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this write something into the file our only update our Python side sate?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It does not write into file immediately but seems it writes after all drawing is done. Let me fix this by creating path templates only for used path_ids and compute extents for creted paths.

# Get the bounding box of the transformed marker path.
# Use get_extents() which is more efficient than transforming
# all vertices, and add padding for stroke width.
bbox=path.get_extents(transform)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Did you test that this works as expected with log scale and polar?

@FazeelUsmani
Copy link
ContributorAuthor

This seems to remove a marker if it's center is outside the document. What happens if the marker size is large enough that the edge of the marker should still be shown even if the center is not on the page?

Great catch! You're absolutely right - the initial implementation didn't account for marker size. I've
updated the code to compute per-marker extents (bounding boxes) and only skip markers that are
completely outside the canvas.

The new implementation:

  • Calculates the extent (half-width, half-height) for each marker path
  • Checks if the marker's bounding box intersects the canvas:(-extent_x <= xo <= canvas_width + extent_x)
  • Only skips markers where the entire bounding box is outside the visible area

I've also addedtest_scatter_large_markers_partial_clip which specifically tests markers with centers
outside the canvas but edges extending into the visible area, checked and these are now correctly rendered.

@jklymak
Copy link
Member

@FazeelUsmani that looks correc. Nice job!

However, now I wonder if we are trading file-size bloat for run-time bloat? This seems to check the extent of every marker - what if I have 10,000 markers, and they are all in the axes - will this be super slow in needlessly getting all the extents? Does a hybrid of the two approaches seem useful (eg only update the extent if the markers centre is not in the figure?)

FazeelUsmani reacted with eyes emoji

@FazeelUsmani
Copy link
ContributorAuthor

@jklymak Yes, hybrid suits well here. We will get extent only if the marker is off the grid.
Added the changes.

@FazeelUsmani
Copy link
ContributorAuthor

@jklymak, it's ready for review.

Copy link
Member

@jklymakjklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good.

FazeelUsmani reacted with hooray emoji
@FazeelUsmani
Copy link
ContributorAuthor

Hi@tacaswell, can you please take another look?

@FazeelUsmani
Copy link
ContributorAuthor

Thanks! for approving this PR@jklymak.

I've a question: Usually, when the PR is merged after changes are accepted? Is it before release?

@jklymak
Copy link
Member

The PR requires two approvals and then will be merged

@FazeelUsmani
Copy link
ContributorAuthor

I see.@jklymak, can you please look at this PR#30756? It has one approval, just need another.

@FazeelUsmani
Copy link
ContributorAuthor

The PR requires two approvals and then will be merged

@tacaswell waiting for you. Let me know if you've any questions or need clarifications

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@jklymakjklymakjklymak approved these changes

@tacaswelltacaswellAwaiting requested review from tacaswell

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Off-axes scatter() points unnecessarily saved to PDF when coloured

3 participants

@FazeelUsmani@jklymak@tacaswell

[8]ページ先頭

©2009-2025 Movatter.jp