Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.

License

NotificationsYou must be signed in to change notification settings

EFForg/badger-sett

Repository files navigation

Asett or set is a badger's den which usually consists of a network of tunnelsand numerous entrances. Setts incorporate larger chambers used for sleeping orrearing young.

This script is designed to raise youngPrivacy Badgers by teaching themabout the trackers on popular sites. Every day,crawler.py visits thousands of the top sites from theTranco List with the latest version of Privacy Badger, and saves its findings inresults.json.

See the following EFF.org blog post for more information:Giving Privacy Badger a Jump Start.

Development setup

  1. Install Python 3.8+

  2. Create and activate a Python virtual environment:

    python3 -m venv venvsource ./venv/bin/activatepip install -U pip

    For more, readthis blog post.

  3. Install Python dependencies withpip install -r requirements.txt

  4. Run static analysis withprospector

  5. Run unit tests withpytest

  6. Take a look at Badger Sett commandline flags with./crawler.py --help

  7. Git clone thePrivacy Badger repository somewhere

  8. Try running a tiny scan:

    ./crawler.py firefox 5 --no-xvfb --log-stdout --pb-dir /path/to/privacybadger

Production setup with Docker

Docker takes care of all dependencies, including setting up the latest browser version.

However, Docker brings its own complexity. Problems from improper file ownership and permissions are a particular pain point.

  1. Prerequisites: haveDocker installed.Make sure your user is part of thedocker group so that you can build andrun docker images withoutsudo. You can add yourself to the group with

    $ sudo usermod -aG docker $USER
  2. Clone the repository

    $ git clone https://github.com/efforg/badger-sett
  3. Run a scan

    $ BROWSER=firefox ./runscan.sh 500

    This will scan the top 500 sites on the Tranco list in Chromewith the latest version of Privacy Badger's master branch.

    To run the script with a different branch of Privacy Badger, set thePB_BRANCHvariable. e.g.

    $ PB_BRANCH=my-feature-branch BROWSER=firefox ./runscan.sh 500

    You can also pass arguments tocrawler.py, the Python script that doesthe actual crawl. Any arguments passed torunscan.sh will beforwarded tocrawler.py. For example, to exclude all websites endingwith .gov and .mil from your website visit list:

    $ BROWSER=edge ./runscan.sh 500 --exclude .gov,.mil
  4. Monitor the scan

    To have the scan print verbose output about which sites it's visiting, usethe--log-stdout argument.

    If you don't use that argument, all output will still be logged todocker-out/log.txt, beginning after the script outputs "Running scan inDocker..."

Automatic scanning

To set up the script to run periodically and automatically update therepository with its results:

  1. Create a new ssh key withssh-keygen. Give it a name unique to therepository.

    $ ssh-keygenGenerating public/private rsa key pair.Enter file in which to save the key (/home/USER/.ssh/id_rsa): /home/USER/.ssh/id_rsa_badger_sett
  2. Add the new key as a deploy key with R/W access to the repo on Github.https://developer.github.com/v3/guides/managing-deploy-keys/

  3. Add a SSH host alias for Github that uses the new key pair. Create or open~/.ssh/config and add the following:

    Host github-badger-sett  HostName github.com  User git  IdentityFile /home/USER/.ssh/id_rsa_badger_sett
  4. Configure git to connect to the remote over SSH. Edit.git/config:

    [remote "origin"]  url = ssh://git@github-badger-sett:/efforg/badger-sett

    This will havegit connect to the remote using the new SSH keys by default.

  5. Create a cron job to callrunscan.sh once a day. Set the environmentvariableRUN_BY_CRON=1 to turn off TTY forwarding todocker run (whichwould break the script in cron), and setGIT_PUSH=1 to have the scriptautomatically commit and pushresults.json when the scan finishes. Here's anexamplecrontab entry:

    0 0 * * *  RUN_BY_CRON=1 GIT_PUSH=1 BROWSER=chrome /home/USER/badger-sett/runscan.sh 6000 --exclude=.mil,.mil.??,.gov,.gov.??,.edu,.edu.??
  6. If everything has been set up correctly, the script should push a new versionofresults.json after each scan.

About

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.

Resources

License

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp