- Notifications
You must be signed in to change notification settings - Fork11.2k
Async API for download handlers.#7164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Conversation
codecovbot commentedNov 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@## master #7164 +/- ##==========================================+ Coverage 91.40% 91.44% +0.03%========================================== Files 165 166 +1 Lines 12636 12623 -13 Branches 1619 1613 -6 ==========================================- Hits 11550 11543 -7+ Misses 813 808 -5+ Partials 273 272 -1
|
| defgotClient( | ||
| self,client:FTPClient,request:Request,filepath:str | ||
| )->Deferred[Response]: | ||
| self.client=client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Never used in Scrapy (and changing it on every request makes it not useful).
| def_build_response( | ||
| self,result:Any,request:Request,protocol:ReceivedDataProtocol | ||
| )->Response: | ||
| self.result=result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ditto.
| aws_access_key_id:str|None=None, | ||
| aws_secret_access_key:str|None=None, | ||
| aws_session_token:str|None=None, | ||
| httpdownloadhandler:type[HTTP11DownloadHandler]=HTTP11DownloadHandler, | ||
| **kw:Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
User code can't pass these extra args. I've studied why were they even added but forgot it again.
| ): | ||
| classS3DownloadHandler(BaseDownloadHandler): | ||
| def__init__(self,crawler:Crawler): | ||
| ifnotis_botocore_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It became clear after the refactoring that, technically, botocore is not needed for anonymous requests (they are converted intohttps://s3.amazonaws.com ones), but it may be more confusing to change this.
| classTestFTPBase: | ||
| classTestFTPBase(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This change is unrelated, I've found thatTestFTPBase methods were also run as test cases.
| r=awaitself.download_request(dh,request) | ||
| r=awaitdh.download_request(request) | ||
| assertr.status==404 | ||
| assertr.body==b"['550 nonexistent.txt: No such file or directory.']" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, it's a repr of a list 🤷♂️
| expected_urls= ["data:,a","data:,b","data:,c","data:,d"] | ||
| assertactual_urls==expected_urls,f"{actual_urls=} !={expected_urls=}" | ||
| @pytest.mark.skip(reason="Hangs")# requires changes from #7161 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Both this PR and#7161 introduce more context switch points to, probably,ExecutionEngine.close_spider_async() and this test starts to fail, theExecutionEngine workaround is included in that earlier PR and helps with this PR too.
Also deprecates
scrapy.utils.decorators.defers().Fixes#6778,fixes#4944.
This intentionally breaks the (undocumented) signatures of existing built-in download handlers, but keeps compatibility with (also undocumented) old custom ones, emitting deprecation warnings.
The actual API is up for discussion, mostly I wonder if the
lazyattr should be mandatory or not.