Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Async API for download handlers.#7164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
wRAR wants to merge5 commits intoscrapy:master
base:master
Choose a base branch
Loading
fromwRAR:download-handlers-api

Conversation

@wRAR
Copy link
Member

Also deprecatesscrapy.utils.decorators.defers().

Fixes#6778,fixes#4944.

This intentionally breaks the (undocumented) signatures of existing built-in download handlers, but keeps compatibility with (also undocumented) old custom ones, emitting deprecation warnings.

The actual API is up for discussion, mostly I wonder if thelazy attr should be mandatory or not.

@codecov
Copy link

codecovbot commentedNov 28, 2025
edited
Loading

Codecov Report

❌ Patch coverage is98.13084% with2 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.44%. Comparing base (1e8de24) to head (a8d4c98).
✅ All tests successful. No failed tests found.

Files with missing linesPatch %Lines
scrapy/core/downloader/handlers/ftp.py90.90%1 Missing and 1 partial⚠️
Additional details and impacted files
@@            Coverage Diff             @@##           master    #7164      +/-   ##==========================================+ Coverage   91.40%   91.44%   +0.03%==========================================  Files         165      166       +1       Lines       12636    12623      -13       Branches     1619     1613       -6     ==========================================- Hits        11550    11543       -7+ Misses        813      808       -5+ Partials      273      272       -1
Files with missing linesCoverage Δ
scrapy/core/downloader/__init__.py92.90% <100.00%> (ø)
scrapy/core/downloader/handlers/__init__.py96.92% <100.00%> (+10.03%)⬆️
scrapy/core/downloader/handlers/base.py100.00% <100.00%> (ø)
scrapy/core/downloader/handlers/datauri.py93.75% <100.00%> (-0.37%)⬇️
scrapy/core/downloader/handlers/file.py100.00% <100.00%> (ø)
scrapy/core/downloader/handlers/http10.py100.00% <100.00%> (ø)
scrapy/core/downloader/handlers/http11.py93.67% <100.00%> (-0.04%)⬇️
scrapy/core/downloader/handlers/http2.py100.00% <100.00%> (ø)
scrapy/core/downloader/handlers/s3.py100.00% <100.00%> (+5.88%)⬆️
scrapy/extensions/telnet.py89.09% <100.00%> (-0.20%)⬇️
... and2 more

... and2 files with indirect coverage changes

defgotClient(
self,client:FTPClient,request:Request,filepath:str
)->Deferred[Response]:
self.client=client
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Never used in Scrapy (and changing it on every request makes it not useful).

def_build_response(
self,result:Any,request:Request,protocol:ReceivedDataProtocol
)->Response:
self.result=result
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ditto.

Comment on lines -29 to -33
aws_access_key_id:str|None=None,
aws_secret_access_key:str|None=None,
aws_session_token:str|None=None,
httpdownloadhandler:type[HTTP11DownloadHandler]=HTTP11DownloadHandler,
**kw:Any,
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

User code can't pass these extra args. I've studied why were they even added but forgot it again.

):
classS3DownloadHandler(BaseDownloadHandler):
def__init__(self,crawler:Crawler):
ifnotis_botocore_available():
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It became clear after the refactoring that, technically, botocore is not needed for anonymous requests (they are converted intohttps://s3.amazonaws.com ones), but it may be more confusing to change this.


classTestFTPBase:

classTestFTPBase(ABC):
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This change is unrelated, I've found thatTestFTPBase methods were also run as test cases.

r=awaitself.download_request(dh,request)
r=awaitdh.download_request(request)
assertr.status==404
assertr.body==b"['550 nonexistent.txt: No such file or directory.']"
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, it's a repr of a list 🤷‍♂️

expected_urls= ["data:,a","data:,b","data:,c","data:,d"]
assertactual_urls==expected_urls,f"{actual_urls=} !={expected_urls=}"

@pytest.mark.skip(reason="Hangs")# requires changes from #7161
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Both this PR and#7161 introduce more context switch points to, probably,ExecutionEngine.close_spider_async() and this test starts to fail, theExecutionEngine workaround is included in that earlier PR and helps with this PR too.

@wRARwRAR added this to theScrapy 2.14 milestoneNov 29, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Projects

None yet

Milestone

Scrapy 2.14

Development

Successfully merging this pull request may close these issues.

Add support for async def functions in custom download handlers Document download handler interface

1 participant

@wRAR

[8]ページ先頭

©2009-2025 Movatter.jp