Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.1k
Fix PDF bloat for off-axis scatter with per-point colors#30746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
Skip emitting markers outside canvas bounds in draw_path_collectionto reduce PDF file size when scatter points are off-axis.Fixesmatplotlib#2488
jklymak commentedNov 12, 2025
This seems to remove a marker if it's center is outside the document. What happens if the marker size is large enough that the edge of the marker should still be shown even if the center is not on the page? |
| # The off-axis colored scatter should be close to empty size | ||
| # Allow up to 50KB overhead for axes/metadata, but should be much smaller | ||
| # than if all 1000 markers were written (which would add ~200-400KB) | ||
| assertsize_offaxis_colored<size_empty+50_000, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The 50kb padding is confusing to me. The axes should be the same output either way (as the limits are the same etc). If there was anything extra I would expect it to be the structures for the scatter that has no paths (there are some open/close group stuff emitted if I recall correctly). To account for that we could adax2.scatter([], []) to the empty test.
It is probably worth also adding a third example where there are markers in in the plot (ax3.scatter(x+20, y+20, c=c)) to test that fully off axis one is about the same size as the empty one and both are smaller than the one with visible markers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've updated the tests per your suggestions:
- Added
ax2.scatter([], [])to the baseline test to match the axes structure exactly - Reduced tolerance from 50KB to 5KB since axes output is now identical
- Added a third test case with visible markers (x + 20, y + 20, c=c) that validates:
- Visible scatter is >50KB larger than empty
- Visible scatter is >50KB larger than off-axis
Now, we can compare all three outputs:
- fully off-axis scatter --> small file
- empty scatter --> small file
- visible scatter --> larger file
| # may be partially visible even if its center is outside the canvas. | ||
| canvas_width=self.file.width*72 | ||
| canvas_height=self.file.height*72 | ||
| ifnot (-max_marker_extent<=xo<=canvas_width+max_marker_extent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why did you use a maximum extent rather than doing the computation here and having per-maker and per-direction filtering?
We have to compute the extent of every maker no matter what so might as well get the better behavior by doing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I initially used a singlemax_marker_extent to simplify the bounds check and avoid recalculating the extent for each marker.
Yes, you're right - per-marker filtering is more precise. I've updated the commit.
| bbox=path.get_extents(transform) | ||
| max_marker_extent=max(max_marker_extent, | ||
| bbox.width/2,bbox.height/2) | ||
| name=self.file.pathCollectionObject( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this write something into the file our only update our Python side sate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It does not write into file immediately but seems it writes after all drawing is done. Let me fix this by creating path templates only for used path_ids and compute extents for creted paths.
| # Get the bounding box of the transformed marker path. | ||
| # Use get_extents() which is more efficient than transforming | ||
| # all vertices, and add padding for stroke width. | ||
| bbox=path.get_extents(transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Did you test that this works as expected with log scale and polar?
FazeelUsmani commentedNov 13, 2025
Great catch! You're absolutely right - the initial implementation didn't account for marker size. I've The new implementation:
I've also added |
jklymak commentedNov 13, 2025
@FazeelUsmani that looks correc. Nice job! However, now I wonder if we are trading file-size bloat for run-time bloat? This seems to check the extent of every marker - what if I have 10,000 markers, and they are all in the axes - will this be super slow in needlessly getting all the extents? Does a hybrid of the two approaches seem useful (eg only update the extent if the markers centre is not in the figure?) |
FazeelUsmani commentedNov 14, 2025
@jklymak Yes, hybrid suits well here. We will get extent only if the marker is off the grid. |
FazeelUsmani commentedNov 16, 2025
@jklymak, it's ready for review. |
jklymak left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good.
FazeelUsmani commentedNov 17, 2025
Hi@tacaswell, can you please take another look? |
FazeelUsmani commentedNov 18, 2025
Thanks! for approving this PR@jklymak. I've a question: Usually, when the PR is merged after changes are accepted? Is it before release? |
jklymak commentedNov 18, 2025
The PR requires two approvals and then will be merged |
FazeelUsmani commentedNov 18, 2025
FazeelUsmani commentedNov 18, 2025
@tacaswell waiting for you. Let me know if you've any questions or need clarifications |
Uh oh!
There was an error while loading.Please reload this page.
Fixes#2488
PR summary
Problem
When using
scatter()with per-point colors and then moving the axes limits so all points are off-screen, the PDF backend still writes all marker paths to the file, resulting in unnecessarily large PDFs (~400KB instead of ~7KB for 1000 points).Solution
Added bounds checking in
draw_path_collection(backend_pdf.py:2121-2125) to skip markers outside the visible canvas. This mirrors the existing optimization already present indraw_markers(lines 2157-2159).Implementation
(xo, yo)against canvas bounds(0, 0)to(width*72, height*72)output()call for out-of-bounds markersReproduction Example
Testing
Added three new tests in
test_backend_pdf.py:test_scatter_offaxis_colored_pdf_sizetest_scatter_offaxis_colored_visualtest_scatter_mixed_onoff_axisPR checklist