Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Deprecate spider attributes that can be replaced by settings, round 2#7039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Laerte wants to merge15 commits intoscrapy:master
base:master
Choose a base branch
Loading
fromLaerte:master

Conversation

@Laerte
Copy link
Member

close:#7038

@LaerteLaerte self-assigned thisSep 3, 2025
@LaerteLaerte requested a review fromCopilotSeptember 3, 2025 04:34
Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull Request Overview

This PR deprecates spider attributes that can be replaced by settings, continuing the work from issue#7038. The changes introduce a centralized warning function for deprecated spider attributes and apply deprecation warnings across multiple middleware components.

  • Adds a centralizedwarn_on_deprecated_spider_attribute function for consistent deprecation messaging
  • Replaces inline deprecation warnings with calls to the new function across multiple middleware files
  • Updates redirect middleware to includeREDIRECT_ALLOWED_HTTP_CODES setting support

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
FileDescription
scrapy/utils/deprecate.pyAdds centralized function for spider attribute deprecation warnings
scrapy/spidermiddlewares/httperror.pyIntegrates REDIRECT_ALLOWED_HTTP_CODES setting into allowed codes list
scrapy/extensions/throttle.pyAdds deprecation warning for download_delay spider attribute
scrapy/downloadermiddlewares/useragent.pyAdds deprecation warning for user_agent spider attribute
scrapy/downloadermiddlewares/redirect.pyAdds REDIRECT_ALLOWED_HTTP_CODES setting support and handle_httpstatus_list deprecation
scrapy/downloadermiddlewares/httpcompression.pyRefactors to use centralized deprecation warning function
scrapy/downloadermiddlewares/downloadtimeout.pyAdds deprecation warning for download_timeout spider attribute
scrapy/core/http2/protocol.pyAdds deprecation warnings for download size attributes
scrapy/core/downloader/handlers/http11.pyAdds deprecation warnings for download size attributes
scrapy/core/downloader/init.pyRefactors to use centralized deprecation warning function
docs/topics/downloader-middleware.rstUpdates documentation to reference new setting instead of deprecated attribute

Tip: Customize your code reviews with copilot-instructions.md.Create the file orlearn how to get started.

Comment on lines +43 to +46
self.handle_httpstatus_list:list[int]= [
*settings.getlist("HTTPERROR_ALLOWED_CODES"),
*settings.getlist("REDIRECT_ALLOWED_HTTP_CODES"),
]
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We already haveHTTPERROR_ALLOWED_CODES that overridehandle_httpstatus_list but the setting name is not genericHTTPERROR_, we could use onRedirectMiddleware instead of creating a new setting.

wRAR reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right,handle_httpstatus_list (the spider attr and the meta key) are mostly forHttpErrorMiddleware,RedirectMiddleware is their additional user. Similarly forHTTPERROR_ALLOW_ALL (the setting) andhandle_httpstatus_all (the spider attr and the meta key). But out of theseRedirectMiddleware only uses both meta keys and thehandle_httpstatus_list spider attr, not using the spider attr and the setting forall. This is inconsistent (though I haven't considered what will the spider behavior be if this made consistent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don’t like the HTTP errors middleware reading theREDIRECT_ALLOWED_HTTP_CODES setting…

I wonder if we should take a step back and try to think through what the ideal “settings and meta API” should be, and then try to move towards it in a backward-compatible way with deprecations where needed.

I feel this should be similar to dont_filter, where you have a generic setting/meta and a specific setting/meta (e.g. allow_offsite). I think the specific should override the generic, i.e. no merging, e.g. if generic is [401] and want to extend with 402 for a specific middleware, you set the specific setting/meta to [401,402], not just [402]. But I am not completely sure.

wRAR reacted with thumbs up emoji
@codecov
Copy link

codecovbot commentedSep 3, 2025
edited
Loading

❌ 1 Tests Failed:

Tests completedFailedPassedSkipped
35131351287
View the top 1 failed test(s) by shortest run time
tests/test_downloadermiddleware_httpcompression.py::tests.test_downloadermiddleware_httpcompression
Stack Traces | 0s run time
scrapy/downloadermiddlewares/httpcompression.py:39: in <module>    brotli.Decompressor.can_accept_more_dataE   AttributeError: type object 'brotli.Decompressor' has no attribute 'can_accept_more_data'During handling of the above exception, another exception occurred:tests/test_downloadermiddleware_httpcompression.py:10: in <module>    from scrapy.downloadermiddlewares.httpcompression import (scrapy/downloadermiddlewares/httpcompression.py:41: in <module>    warnings.warn(    ^^^^^^^^E   NameError: name 'warnings' is not defined. Did you forget to import 'warnings'?

To view more test analytics, go to theTest Analytics Dashboard
📋 Got 3 mins?Take this short survey to help us improve Test Analytics.

@wRAR
Copy link
Member

wRAR commentedSep 3, 2025
edited
Loading

There are tests that use these newly deprecated attributes that need to be fixed or removed.

For download handler tests that likely means creating the crawler and the handler in the test function like already done intest_download_allow_data_loss_via_setting() (I'm still not sure what is the most idiomatic way to get a fixture based on some argument), I haven't checked other tests.

@wRAR
Copy link
Member

wRAR commentedSep 14, 2025
edited
Loading

In case you've missed it (as it's only inextra-deps),tests/test_downloader_handler_twisted_http2.py::TestHttps2::test_download_with_maxsize_very_large_file still produces a warning.

),
],
)
deftest_startdelay_definition(min_spider,min_setting,start_setting,expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this test useless/duplicate withoutmin_spider?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It seems, if you see the only assert is:assert spider.download_delay == expected lmk if you want to keep and drop themin_spider no sure what we would check in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK, this is stupid :)

The extension setsspider.download_delay in_spider_opened() (based onAUTOTHROTTLE_START_DELAY,DOWNLOAD_DELAY and the original value ofspider.download_delay, just like the params of this test), so that the downloader could use it as the start delay value for newly created slots inscrapy.core.downloader._get_concurrency_delay(). So if we want to stop usingspider.download_delay we need this to work in some other way (and I don't think we can change theDOWNLOAD_DELAY setting value here).

@Gallaecio thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Puff, headache incoming… 😓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe we should keep download_delay until#6660 is addressed?

Copy link
Member

@wRARwRAROct 6, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm fine with keeping it. Note that we need to revert a part of#6994 before the next release for this:#7117

@Laerte
Copy link
MemberAuthor

In case you've missed it (as it's only inextra-deps),tests/test_downloader_handler_twisted_http2.py::TestHttps2::test_download_with_maxsize_very_large_file still produces a warning.

Thanks for flagging, fixed.

@wRARwRAR added this to theScrapy 2.14 milestoneOct 3, 2025
Comment on lines +96 to +101
ifhasattr(spider,"download_maxsize"):
warn_on_deprecated_spider_attribute("download_maxsize","DOWNLOAD_MAXSIZE")
ifhasattr(spider,"download_warnsize"):
warn_on_deprecated_spider_attribute(
"download_warnsize","DOWNLOAD_WARNSIZE"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if we should try to find a way to trigger this only once per spider, but no strong opinion. Maybe louder is better 🙂

Comment on lines 156 to +167
ingetattr(self.crawler.spider,"handle_httpstatus_list", [])
orresponse.statusinrequest.meta.get("handle_httpstatus_list", [])
orresponse.statusinself.handle_httpstatus_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if we should handle this somewhat differently.

i.e., when both the setting and the spider attribute are defined, the spider attribute, even if deprecated, should be considered an override of the setting value, and the setting value should be ignored.

In fact, I wonder if we should also consider the meta value an override of both the setting and the spider attribute, and hence ignore those if the meta value is set. But I am fine with keeping that out of scope here.

Either way, we should make sure we clarify this in the docs and the release notes.

@wRARwRAR mentioned this pull requestOct 24, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@wRARwRARwRAR left review comments

@GallaecioGallaecioGallaecio left review comments

Copilot code reviewCopilotCopilot left review comments

Assignees

@LaerteLaerte

Labels

None yet

Projects

None yet

Milestone

Scrapy 2.14

Development

Successfully merging this pull request may close these issues.

Deprecate spider attributes that can be replaced by settings, round 2

3 participants

@Laerte@wRAR@Gallaecio

[8]ページ先頭

©2009-2025 Movatter.jp