Contributing to Scrapy

Important

Double check that you are reading the most recent version of this documentathttps://docs.scrapy.org/en/master/contributing.html

By participating in this project you agree to abide by the terms of ourCode of Conduct. Pleasereport unacceptable behavior toopensource@zyte.com.

There are many ways to contribute to Scrapy. Here are some of them:

  • Report bugs and request features in theissue tracker, trying to followthe guidelines detailed inReporting bugs below.

  • Submit patches for new functionalities and/or bug fixes. Please readWriting patches andSubmitting patches below for details on how towrite and submit a patch.

  • Blog about Scrapy. Tell the world how you’re using Scrapy. This will helpnewcomers with more examples and will help the Scrapy project to increase itsvisibility.

  • Join theScrapy subreddit and share your ideas on how toimprove Scrapy. We’re always open to suggestions.

  • Answer Scrapy questions atStack Overflow.

Reporting bugs

Note

Please report security issuesonly toscrapy-security@googlegroups.com. This is a private list only open totrusted Scrapy developers, and its archives are not public.

Well-written bug reports are very helpful, so keep in mind the followingguidelines when you’re going to report a new bug.

  • check theFAQ first to see if your issue is addressed in awell-known question

  • if you have a general question about Scrapy usage, please ask it atStack Overflow(use “scrapy” tag).

  • check theopen issues to see if the issue has already been reported. If ithas, don’t dismiss the report, but check the ticket history and comments. Ifyou have additional useful information, please leave a comment, or considersending a pull request with a fix.

  • search thescrapy-users list andScrapy subreddit to see if it hasbeen discussed there, or if you’re not sure if what you’re seeing is a bug.You can also ask in the#scrapy IRC channel.

  • writecomplete, reproducible, specific bug reports. The smaller the testcase, the better. Remember that other developers won’t have your project toreproduce the bug, so please include all relevant files required to reproduceit. See for example StackOverflow’s guide on creating aMinimal, Complete, and Verifiable example exhibiting the issue.

  • the most awesome way to provide a complete reproducible example is tosend a pull request which adds a failing test case to theScrapy testing suite (seeSubmitting patches).This is helpful even if you don’t have an intention tofix the issue yourselves.

  • include the output ofscrapyversion-v so developers working on your bugknow exactly which version and platform it occurred on, which is often veryhelpful for reproducing it, or knowing if it was already fixed.

Finding work

If you have decided to make a contribution to Scrapy, but you do not know whatto contribute, you have a few options to find pending work:

  • Check out thecontribution GitHub page, which lists open issues taggedasgood first issue.

    There are alsohelp wanted issues but mind that some may requirefamiliarity with the Scrapy code base. You can also target any other issueprovided it is not tagged asdiscuss.

  • If you enjoy writing documentation, there aredocumentation issues aswell, but mind that some may require familiarity with the Scrapy code baseas well.

  • If you enjoywriting automated tests, you can work onincreasing ourtest coverage.

  • If you enjoy code cleanup, we welcome fixes for issues detected by ourstatic analysis tools. Seepyproject.toml for silenced issues that mayneed addressing.

    Mind that some issues we do not aim to address at all, and usually includea comment on them explaining the reason; not to confuse with comments thatstate what the issue is about, for non-descriptive issue codes.

If you have found an issue, make sure you read the entire issue thread beforeyou ask questions. That includes related issues and pull requests that show upin the issue thread when the issue is mentioned elsewhere.

We do not assign issues, and you do not need to announce that you are going tostart working on an issue either. If you want to work on an issue, just goahead andwrite a patch for it.

Do not discard an issue simply because there is an open pull request for it.Check if open pull requests are active first. And even if some are active, ifyou think you can build a better implementation, feel free to create a pullrequest with your approach.

If you decide to work on something without an open issue, please:

  • Do not create an issue to work on code coverage or code cleanup, create apull request directly.

  • Do not create both an issue and a pull request right away. Either open anissue first to get feedback on whether or not the issue is worthaddressing, and create a pull request later only if the feedback from theteam is positive, or create only a pull request, if you think a discussionwill be easier over your code.

  • Do not add docstrings for the sake of adding docstrings, or only to addresssilenced Ruff issues. We expect docstrings to exist only when they addsomething significant to readers, such as explaining something that is noteasier to understand from reading the corresponding code, summarizing along, hard-to-read implementation, providing context about calling code, orindicating purposely uncaught exceptions from called code.

  • Do not add tests that use as much mocking as possible just to touch a givenline of code and hence improve line coverage. While we do aim to maximizetest coverage, tests should be written for real scenarios, with minimummocking. We usually prefer end-to-end tests.

Writing patches

The better a patch is written, the higher the chances that it’ll get accepted and the sooner it will be merged.

Well-written patches should:

  • contain the minimum amount of code required for the specific change. Smallpatches are easier to review and merge. So, if you’re doing more than onechange (or bug fix), please consider submitting one patch per change. Do notcollapse multiple changes into a single patch. For big changes consider usinga patch queue.

  • pass all unit-tests. SeeRunning tests below.

  • include one (or more) test cases that check the bug fixed or the newfunctionality added. SeeWriting tests below.

  • if you’re adding or changing a public (documented) API, please includethe documentation changes in the same patch. SeeDocumentation policiesbelow.

  • if you’re adding a private API, please add a regular expression to thecoverage_ignore_pyobjects variable ofdocs/conf.py to exclude the newprivate API from documentation coverage checks.

    To see if your private API is skipped properly, generate a documentationcoverage report as follows:

    tox-edocs-coverage
  • if you are removing deprecated code, first make sure that at least 1 year(12 months) has passed since the release that introduced the deprecation.SeeDeprecation policy.

Submitting patches

The best way to submit a patch is to issue apull request on GitHub,optionally creating a new issue first.

Remember to explain what was fixed or the new functionality (what it is, whyit’s needed, etc). The more info you include, the easier will be for coredevelopers to understand and accept your patch.

If your pull request aims to resolve an open issue,link it accordingly,e.g.:

Resolves #123

You can also discuss the new functionality (or bug fix) before creating thepatch, but it’s always good to have a patch ready to illustrate your argumentsand show that you have put some additional thought into the subject. A goodstarting point is to send a pull request on GitHub. It can be simple enough toillustrate your idea, and leave documentation/tests for later, after the ideahas been validated and proven useful. Alternatively, you can start aconversation in theScrapy subreddit to discuss your idea first.

Sometimes there is an existing pull request for the problem you’d like tosolve, which is stalled for some reason. Often the pull request is in aright direction, but changes are requested by Scrapy maintainers, and theoriginal pull request author hasn’t had time to address them.In this case consider picking up this pull request: opena new pull request with all commits from the original pull request, as well asadditional changes to address the raised issues. Doing so helps a lot; it isnot considered rude as long as the original author is acknowledged by keepinghis/her commits.

You can pull an existing pull request to a local branchby runninggitfetchupstreampull/$PR_NUMBER/head:$BRANCH_NAME_TO_CREATE(replace ‘upstream’ with a remote name for scrapy repository,$PR_NUMBER with an ID of the pull request, and$BRANCH_NAME_TO_CREATEwith a name of the branch you want to create locally).See also:https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally#modifying-an-inactive-pull-request-locally.

When writing GitHub pull requests, try to keep titles short but descriptive.E.g. For bug #411: “Scrapy hangs if an exception raises in start_requests”prefer “Fix hanging when exception occurs in start_requests (#411)”instead of “Fix for #411”. Complete titles make it easy to skim throughthe issue tracker.

Finally, try to keep aesthetic changes (PEP 8 compliance, unused importsremoval, etc) in separate commits from functional changes. This will make pullrequests easier to review and more likely to get merged.

Coding style

Please follow these coding conventions when writing code for inclusion inScrapy:

Pre-commit

We usepre-commit to automatically address simple code issues before everycommit.

After your create a local clone of your fork of the Scrapy repository:

  1. Install pre-commit.

  2. On the root of your local clone of the Scrapy repository, run the followingcommand:

    pre-commitinstall

Now pre-commit will check your changes every time you create a Git commit. Uponfinding issues, pre-commit aborts your commit, and either fixes those issuesautomatically, or only reports them to you. If it fixes those issuesautomatically, creating your commit again should succeed. Otherwise, you mayneed to address the corresponding issues manually first.

Documentation policies

For reference documentation of API members (classes, methods, etc.) usedocstrings and make sure that the Sphinx documentation uses theautodoc extension to pull the docstrings. API referencedocumentation should follow docstring conventions (PEP 257) and beIDE-friendly: short, to the point, and it may provide short examples.

Other types of documentation, such as tutorials or topics, should be covered infiles within thedocs/ directory. This includes documentation that isspecific to an API member, but goes beyond API reference documentation.

In any case, if something is covered in a docstring, use theautodoc extension to pull the docstring into thedocumentation instead of duplicating the docstring in files within thedocs/ directory.

Documentation updates that cover new or modified features must use Sphinx’sversionadded andversionchanged directives. UseVERSION as version, we will replace it with the actual version right beforethe corresponding release. When we release a new major or minor version ofScrapy, we remove these directives if they are older than 3 years.

Documentation about deprecated features must be removed as those features aredeprecated, so that new readers do not run into it. New deprecations anddeprecation removals are documented in therelease notes.

Tests

Tests are implemented using theTwisted unit-testing framework. Running tests requirestox.

Running tests

To run all tests:

tox

To run a specific test (saytests/test_loader.py) use:

tox--tests/test_loader.py

To run the tests on a specifictox environment, use-e<name> with an environment name fromtox.ini. For example, to runthe tests with Python 3.10 use:

tox-epy310

You can also specify a comma-separated list of environments, and usetox’sparallel mode to run the tests on multiple environments inparallel:

tox-epy39,py310-pauto

To pass command-line options topytest, add them after-- in your call totox. Using-- overrides thedefault positional arguments defined intox.ini, so you must include thosedefault positional arguments (scrapytests) after-- as well:

tox--scrapytests-x# stop after first failure

You can also use thepytest-xdist plugin. For example, to run all tests onthe Python 3.10tox environment using all your CPU cores:

tox-epy310--scrapytests-nauto

To see coverage report installcoverage(pipinstallcoverage) and run:

coveragereport

see output ofcoverage--help for more options like html or xml report.

Writing tests

All functionality (including new features and bug fixes) must include a testcase to check that it works as expected, so please include tests for yourpatches if you want them to get accepted sooner.

Scrapy uses unit-tests, which are located in thetests/ directory.Their module name typically resembles the full path of the module they’retesting. For example, the item loaders code is in:

scrapy.loader

And their unit-tests are in:

tests/test_loader.py