PlaywrightCrawler error_handler cannot access Page object #1482

New issue

Open

Bug

Open

PlaywrightCrawler error_handler cannot access Page object#1482

Bug

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Description

janbuchar

opened

on Oct 14, 2025

In the Javascript version, the error handler is able to access thePage object viaPlaywrightCrawlingContext.page. I discovered that the Python version doesn't implement this when porting theContextPipeline to Javascript.

Test case

asyncdeftest_error_handler_can_access_page(server_url:URL)->None:crawler=PlaywrightCrawler(max_request_retries=2)request_handler=mock.AsyncMock(side_effect=RuntimeError('Intentional crash'))crawler.router.default_handler(request_handler)error_handler_calls:list[str|None]= []@crawler.error_handlerasyncdeferror_handler(context:BasicCrawlingContext|PlaywrightCrawlingContext,_error:Exception)->None:error_handler_calls.append(awaitcontext.page.content()ifisinstance(context,PlaywrightCrawlingContext)elseNone        )awaitcrawler.run([str(server_url/'hello-world')])asserterror_handler_calls== [HELLO_WORLD,HELLO_WORLD,HELLO_WORLD]

Possible solutions

Run the error handlers before the cleanup step of the context pipeline
- this is a fairly big change and we probably want to do it afterfix: Only apply requestHandlerTimeout to request handler #1474
- changing this in the adaptive playwright crawler will be especially tricky
Add some "deferred cleanup" step to the context pipeline and callthat after error handlers are done
- it's unclear how this would fit in the current async generator based middleware model
- considerable refactoring of the_run_request_handler and__run_task_function would still be necessary - error handlers are called by the latter and context pipeline is only handled in the former

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PlaywrightCrawler error_handler cannot access Page object #1482

Description

Test case

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions