Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
forked fromscrapy/scrapy

Commited63fa9

Browse files
authored
Deprecate spider args of middlewares and pipelines. (scrapy#7006)
* Deprecate spider args of spider middleware methods.* Deprecate the spider arg of pipeline process_item().* Fix a typing issue.* Deprecate the spider arg of pipeline {open,close}_spider().* Cleanup.* Update docs.* Add pragma: no cover to some deprecated code.* More tests.* More tests.* More tests.* Simplify _process_parallel().
1 parentb683308 commited63fa9

File tree

34 files changed

+619
-385
lines changed

34 files changed

+619
-385
lines changed

‎docs/faq.rst‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -349,7 +349,7 @@ method for this purpose. For example:
349349
350350
351351
classMultiplyItemsMiddleware:
352-
defprocess_spider_output(self,response,result,spider):
352+
defprocess_spider_output(self,response,result):
353353
for item_or_requestin result:
354354
ifisinstance(item_or_request, Request):
355355
continue

‎docs/news.rst‎

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,20 @@ Backward-incompatible changes
4444

4545
-:meth:`~scrapy.spidermiddlewares.referer.ReferrerPolicy.referrer`
4646

47+
-:class:`scrapy.middleware.MiddlewareManager` no longer includes code for
48+
handling ``open_spider()`` and ``close_spider()`` component methods. As
49+
this code was only used for pipelines it was moved into
50+
:class:`scrapy.pipelines.ItemPipelineManager`. This change should only
51+
affect custom subclasses of:class:`~scrapy.middleware.MiddlewareManager`.
52+
The following code was moved:
53+
54+
- ``scrapy.middleware.MiddlewareManager.open_spider()``
55+
56+
- ``scrapy.middleware.MiddlewareManager.close_spider()``
57+
58+
- Code in ``scrapy.middleware.MiddlewareManager._add_middleware()`` that
59+
processes ``open_spider()`` and ``close_spider()`` component methods.
60+
4761
-:meth:`scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware.process_request`
4862
now returns a coroutine, previously it returned a
4963
:class:`~twisted.internet.defer.Deferred` object or ``None``. The

‎docs/topics/coroutines.rst‎

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ shorter and cleaner:
191191
adapter["field"]= data
192192
return item
193193
194-
defprocess_item(self,item,spider):
194+
defprocess_item(self,item):
195195
adapter= ItemAdapter(item)
196196
dfd= db.get_some_data(adapter["id"])
197197
dfd.addCallback(self._update_item, item)
@@ -205,7 +205,7 @@ becomes:
205205
206206
207207
classDbPipeline:
208-
asyncdefprocess_item(self,item,spider):
208+
asyncdefprocess_item(self,item):
209209
adapter= ItemAdapter(item)
210210
adapter["field"]=await db.get_some_data(adapter["id"])
211211
return item
@@ -421,12 +421,12 @@ For example:
421421
..code-block::python
422422
423423
classUniversalSpiderMiddleware:
424-
defprocess_spider_output(self,response,result,spider):
424+
defprocess_spider_output(self,response,result):
425425
for rin result:
426426
# ... do something with r
427427
yield r
428428
429-
asyncdefprocess_spider_output_async(self,response,result,spider):
429+
asyncdefprocess_spider_output_async(self,response,result):
430430
asyncfor rin result:
431431
# ... do something with r
432432
yield r

‎docs/topics/exporters.rst‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ value of one of their fields:
6767
self.year_to_exporter[year]= (exporter, xml_file)
6868
returnself.year_to_exporter[year][0]
6969
70-
defprocess_item(self,item,spider):
70+
defprocess_item(self,item):
7171
exporter=self._exporter_for_item(item)
7272
exporter.export_item(item)
7373
return item

‎docs/topics/item-pipeline.rst‎

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Writing your own item pipeline
2626
Each item pipeline is a:ref:`component<topics-components>` that must
2727
implement the following method:
2828

29-
..method::process_item(self, item, spider)
29+
..method::process_item(self, item)
3030

3131
This method is called for every item pipeline component.
3232

@@ -42,25 +42,16 @@ implement the following method:
4242
:param item: the scraped item
4343
:type item::ref:`item object<item-types>`
4444

45-
:param spider: the spider which scraped the item
46-
:type spider::class:`~scrapy.Spider` object
47-
4845
Additionally, they may also implement the following methods:
4946

50-
..method::open_spider(self, spider)
47+
..method::open_spider(self)
5148

5249
This method is called when the spider is opened.
5350

54-
:param spider: the spider which was opened
55-
:type spider::class:`~scrapy.Spider` object
56-
57-
..method::close_spider(self, spider)
51+
..method::close_spider(self)
5852

5953
This method is called when the spider is closed.
6054

61-
:param spider: the spider which was closed
62-
:type spider::class:`~scrapy.Spider` object
63-
6455

6556
Item pipeline example
6657
=====================
@@ -82,7 +73,7 @@ contain a price:
8273
classPricePipeline:
8374
vat_factor=1.15
8475
85-
defprocess_item(self,item,spider):
76+
defprocess_item(self,item):
8677
adapter= ItemAdapter(item)
8778
if adapter.get("price"):
8879
if adapter.get("price_excludes_vat"):
@@ -107,13 +98,13 @@ format:
10798
10899
109100
classJsonWriterPipeline:
110-
defopen_spider(self,spider):
101+
defopen_spider(self):
111102
self.file=open("items.jsonl","w")
112103
113-
defclose_spider(self,spider):
104+
defclose_spider(self):
114105
self.file.close()
115106
116-
defprocess_item(self,item,spider):
107+
defprocess_item(self,item):
117108
line= json.dumps(ItemAdapter(item).asdict())+"\n"
118109
self.file.write(line)
119110
return item
@@ -153,14 +144,14 @@ The main point of this example is to show how to :ref:`get the crawler
153144
mongo_db=crawler.settings.get("MONGO_DATABASE","items"),
154145
)
155146
156-
defopen_spider(self,spider):
147+
defopen_spider(self):
157148
self.client= pymongo.MongoClient(self.mongo_uri)
158149
self.db=self.client[self.mongo_db]
159150
160-
defclose_spider(self,spider):
151+
defclose_spider(self):
161152
self.client.close()
162153
163-
defprocess_item(self,item,spider):
154+
defprocess_item(self,item):
164155
self.db[self.collection_name].insert_one(ItemAdapter(item).asdict())
165156
return item
166157
@@ -198,12 +189,19 @@ item.
198189
199190
SPLASH_URL="http://localhost:8050/render.png?url={}"
200191
201-
asyncdefprocess_item(self,item,spider):
192+
def__init__(crawler):
193+
self.crawler= crawler
194+
195+
@classmethod
196+
deffrom_crawler(cls,crawler):
197+
returncls(crawler)
198+
199+
asyncdefprocess_item(self,item):
202200
adapter= ItemAdapter(item)
203201
encoded_item_url= quote(adapter["url"])
204202
screenshot_url=self.SPLASH_URL.format(encoded_item_url)
205203
request= scrapy.Request(screenshot_url,callback=NO_CALLBACK)
206-
response=awaitspider.crawler.engine.download_async(request)
204+
response=awaitself.crawler.engine.download_async(request)
207205
208206
if response.status!=200:
209207
# Error happened, return item.
@@ -238,7 +236,7 @@ returns multiples items with the same id:
238236
def__init__(self):
239237
self.ids_seen=set()
240238
241-
defprocess_item(self,item,spider):
239+
defprocess_item(self,item):
242240
adapter= ItemAdapter(item)
243241
if adapter["id"]inself.ids_seen:
244242
raise DropItem(f"Item ID already seen:{adapter['id']}")

‎docs/topics/settings.rst‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -554,7 +554,7 @@ When writing an item pipeline, you can force a different log level by setting
554554
555555
556556
classMyPipeline:
557-
defprocess_item(self,item,spider):
557+
defprocess_item(self,item):
558558
ifnot item.get("price"):
559559
raise DropItem("Missing price data",log_level="INFO")
560560
return item

‎docs/topics/spider-middleware.rst‎

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ one or more of these methods:
9494
defprocess_start_requests(self,start,spider):
9595
yield from start
9696
97-
..method::process_spider_input(response, spider)
97+
..method::process_spider_input(response)
9898

9999
This method is called for each response that goes through the spider
100100
middleware and into the spider, for processing.
@@ -116,11 +116,7 @@ one or more of these methods:
116116
:param response: the response being processed
117117
:type response::class:`~scrapy.http.Response` object
118118

119-
:param spider: the spider for which this response is intended
120-
:type spider::class:`~scrapy.Spider` object
121-
122-
123-
..method::process_spider_output(response, result, spider)
119+
..method::process_spider_output(response, result)
124120

125121
This method is called with the results returned from the Spider, after
126122
it has processed the response.
@@ -149,10 +145,7 @@ one or more of these methods:
149145
:type result: an iterable of:class:`~scrapy.Request` objects and
150146
:ref:`item objects<topics-items>`
151147

152-
:param spider: the spider whose result is being processed
153-
:type spider::class:`~scrapy.Spider` object
154-
155-
..method::process_spider_output_async(response, result, spider)
148+
..method::process_spider_output_async(response, result)
156149
:async:
157150

158151
..versionadded::2.7
@@ -161,7 +154,7 @@ one or more of these methods:
161154
which will be called instead of:meth:`process_spider_output` if
162155
``result`` is an:term:`asynchronous iterable`.
163156

164-
..method::process_spider_exception(response, exception, spider)
157+
..method::process_spider_exception(response, exception)
165158

166159
This method is called when a spider or:meth:`process_spider_output`
167160
method (from a previous spider middleware) raises an exception.
@@ -186,8 +179,6 @@ one or more of these methods:
186179
:param exception: the exception raised
187180
:type exception::exc:`Exception` object
188181

189-
:param spider: the spider which raised the exception
190-
:type spider::class:`~scrapy.Spider` object
191182

192183
Base class for custom spider middlewares
193184
----------------------------------------

‎scrapy/core/spidermw.py‎

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -117,20 +117,30 @@ def _check_deprecated_process_start_requests_use(
117117
)
118118

119119
def_add_middleware(self,mw:Any)->None:
120-
super()._add_middleware(mw)
121120
ifhasattr(mw,"process_spider_input"):
122121
self.methods["process_spider_input"].append(mw.process_spider_input)
122+
self._check_mw_method_spider_arg(mw.process_spider_input)
123+
123124
ifself._use_start_requests:
124125
ifhasattr(mw,"process_start_requests"):
125126
self.methods["process_start_requests"].appendleft(
126127
mw.process_start_requests
127128
)
128129
elifhasattr(mw,"process_start"):
129130
self.methods["process_start"].appendleft(mw.process_start)
131+
130132
process_spider_output=self._get_async_method_pair(mw,"process_spider_output")
131133
self.methods["process_spider_output"].appendleft(process_spider_output)
134+
ifcallable(process_spider_output):
135+
self._check_mw_method_spider_arg(process_spider_output)
136+
elifisinstance(process_spider_output,tuple):
137+
forminprocess_spider_output:
138+
self._check_mw_method_spider_arg(m)
139+
132140
process_spider_exception=getattr(mw,"process_spider_exception",None)
133141
self.methods["process_spider_exception"].appendleft(process_spider_exception)
142+
ifprocess_spider_exceptionisnotNone:
143+
self._check_mw_method_spider_arg(process_spider_exception)
134144

135145
asyncdef_process_spider_input(
136146
self,
@@ -141,7 +151,10 @@ async def _process_spider_input(
141151
formethodinself.methods["process_spider_input"]:
142152
method=cast("Callable",method)
143153
try:
144-
result=method(response=response,spider=self._spider)
154+
ifmethodinself._mw_methods_requiring_spider:
155+
result=method(response=response,spider=self._spider)
156+
else:
157+
result=method(response=response)
145158
ifresultisnotNone:
146159
msg= (
147160
f"{global_object_name(method)} must return None "
@@ -212,7 +225,12 @@ def _process_spider_exception(
212225
ifmethodisNone:
213226
continue
214227
method=cast("Callable",method)
215-
result=method(response=response,exception=exception,spider=self._spider)
228+
ifmethodinself._mw_methods_requiring_spider:
229+
result=method(
230+
response=response,exception=exception,spider=self._spider
231+
)
232+
else:
233+
result=method(response=response,exception=exception)
216234
if_isiterable(result):
217235
# stop exception handling by handing control over to the
218236
# process_spider_output chain if an iterable has been returned
@@ -298,7 +316,12 @@ def _process_spider_output(
298316
)
299317
recovered=MutableChain(recovered_collected)
300318
# might fail directly if the output value is not a generator
301-
result=method(response=response,result=result,spider=self._spider)
319+
ifmethodinself._mw_methods_requiring_spider:
320+
result=method(
321+
response=response,result=result,spider=self._spider
322+
)
323+
else:
324+
result=method(response=response,result=result)
302325
exceptExceptionasex:
303326
exception_result:Failure|MutableChain[_T]|MutableAsyncChain[_T]= (
304327
self._process_spider_exception(response,ex,method_index+1)
@@ -421,7 +444,7 @@ async def process_start(
421444
ifself._use_start_requests:
422445
sync_start=iter(self._spider.start_requests())
423446
sync_start=awaitself._process_chain(
424-
"process_start_requests",sync_start,self._spider
447+
"process_start_requests",sync_start,always_add_spider=True
425448
)
426449
start:AsyncIterator[Any]=as_async_generator(sync_start)
427450
else:

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp