Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A web privacy measurement framework

License

NotificationsYou must be signed in to change notification settings

openwpm/OpenWPM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenWPM is a web privacy measurement framework which makes it easy tocollect data for privacy studies on a scale of thousands to millionsof websites. OpenWPM is built on top of Firefox, with automation providedby Selenium. It includes several hooks for data collection. Check outthe instrumentation section below for more details.

Table of Contents

Installation

OpenWPM is tested on Ubuntu 18.04 via GitHub actions and is commonly used via the docker containerthat this repo builds, which is also based on Ubuntu. Although we don't officially supportother platforms, conda is a cross platform utility and the install script can be expectedto work on OSX and other linux distributions.

OpenWPM does not support windows:#503

Pre-requisites

The main pre-requisite for OpenWPM is conda, a fast cross-platform package management tool.

Conda is open-source and can be installed fromhttps://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html.

Install

An installation script,install.sh is included to: install the conda environment,install unbranded firefox, and build the instrumentation extension.

All installation is confined to your conda environment and should not affect your machine.The installation script will, however, override any existing conda environment named openwpm.

To run the install script, run

./install.sh

After running the install script, activate your conda environment by running:

conda activate openwpm

Mac OSX

You may need to installmake /gcc in order to build the extension.The necessary packages are part of xcode:xcode-select --install

We do not run CI tests for Mac, so new issues may arise. We welcome PRs to fixthese issues and add full CI testing for Mac.

Running Firefox with xvfb on OSX is untested and will require the user to installan X11 server. We suggestXQuartz. This setup has notbeen tested, we welcome feedback as to whether this is working.

Quick Start

Once installed, it is very easy to run a quick test of OpenWPM. Check outdemo.py for an example. This will use the default setting specified inopenwpm/config.py::ManagerParams andopenwpm/config.py::BrowserParams, with the exception of the changesspecified indemo.py.

The demo script also includes a sample of how to use theTranco top sites list via the optional command lineflagdemo.py --tranco. Note that since this is a real top sites list it willinclude NSFW websites, some of which will be highly ranked.

More information on the instrumentation and configuration parameters is givenbelow.

The docs provide a morein-depth tutorial,and a description of themethods of data collectionavailable.

Troubleshooting

  1. WebDriverException: Message: The browser appears to have exited before we could connect...

    This error indicates that Firefox exited during startup (or was prevented fromstarting). There are many possible causes of this error:

    • If you are seeing this error for all browser spawn attempts check that:

      • Both selenium and Firefox are the appropriate versions. Run the followingcommands and check that the versions output match the required versions ininstall.sh andenvironment.yaml. If not, re-run the install script.
      cd firefox-bin/firefox --version

      and

        conda list selenium
      • If you are running in a headless environment (e.g. a remote server), ensurethat all browsers have theheadless browser parameter set toTrue beforelaunching.
    • If you are seeing this error randomly during crawls it can be caused byan overtaxed system, either memory or CPU usage. Try lowering the number ofconcurrent browsers.

  2. In older versions of firefox (pre 74) the setting to enable extensions was calledextensions.legacy.enabled. If you need to work with earlier firefox, update thesetting nameextensions.experiments.enabled inopenwpm/deploy_browsers/configure_firefox.py.

  3. Make sure you're conda environment is activated (conda activate openwpm). You can seeyou environments and the activate one by runningconda env list the active environmentwill have a* by it.

  4. make /gcc may need to be installed in order to build the web extension.On Ubuntu, this is achieved withapt-get install make. On OSX the necessarypackages are part of xcode:xcode-select --install.

  5. On a very sparse operating system additional dependencies may need to beinstalled. See theDockerfile for more inspiration, or openan issue if you are still having problems.

  6. If you see errors related to incompatible or non-existing python packages,try re-running the file with the environment variablePYTHONNOUSERSITE set. E.g.,PYTHONNOUSERSITE=True python demo.py.If that fixes your issues, you are experiencingissue 689, which can befixed by clearing yourpythonuser site packages directory,by prependingPYTHONNOUSERSITE=True to a specific command, or by settingthe environment variable for the session (e.g.,export PYTHONNOUSERSITE=Truein bash). Please also add a comment to that issue to let us know you raninto this problem.

Documentation

Further information is available atOPENWPM's Documentation Page.

Advice for Measurement Researchers

OpenWPM isoften used for webmeasurement research. We recommend the following for researchers using the tool:

Use a versionedrelease. Weaim to follow Firefox's release cadence, which is roughly once every fourweeks. If we happen to fall behind on checking in new releases, please file anissue. Versions more than a few months out of date will use unsupportedversions of Firefox, which are likely to have known securityvulnerabilities. Versions less than v0.10.0 are from a previous architectureand should not be used.

Include the OpenWPM version number in your publication. As of v0.10.0OpenWPM pins all python, npm, and system dependencies. Including thisinformation alongside your work will allow other researchers to contextualizethe results, and can be helpful if future versions of OpenWPM haveinstrumentation bugs that impact results.

Developer instructions

If you want to contribute to OpenWPM have a look at ourCONTRIBUTING.md

Instrumentation and Configuration

OpenWPM provides a breadth of configuration options which can be foundinConfiguration.mdMore detail on the output is availablebelow.

Storage

OpenWPM distinguishes between two types of data, structured and unstructured.Structured data is all data captured by the instrumentation or emitted by the platform.Generally speaking all data you download is unstructured data.

For each of the data classes we offer a variety of storage providers, and you are encouragedto implement your own, should the provided backends not be enough for you.

We have an outstanding issue to enable saving content generated by commands, such asscreenshots and page dumps to unstructured storage (see#232).
For now, they get saved tomanager_params.data_directory.

Local Storage

For storing structured data locally we offer two StorageProviders:

  • The SQLiteStorageProvider which writes all data into a SQLite database
    • This is the recommended approach for getting started as the data is easily explorable
  • The LocalArrowProvider which stores the data into Parquet files.
    • This method integrates well with NumPy/Pandas
    • It might be harder to ad-hoc process

For storing unstructured data locally we also offer two solutions:

  • The LevelDBProvider which stores all data into a LevelDB
    • This is the recommended approach
  • The LocalGzipProvider that gzips and stores the files individually on disk
    • Please note that file systems usually don't like thousands of files in one folder
    • Use with care or for single site visits

Remote storage

When running in the cloud, saving records to disk is not a reasonable thing to do.So we offer a remote StorageProviders for S3 (See#823) and GCP.Currently, all remote StorageProviders write to the respective object storage service (S3/GCS).The structured providers use the Parquet format.

NOTE: The Parquet and SQL schemas should be kept in sync exceptoutput-specific columns (e.g.,instance_id in the Parquet output). You can comparethe two schemas by runningdiff -y openwpm/DataAggregator/schema.sql openwpm/DataAggregator/parquet_schema.py.

Docker Deployment for OpenWPM

OpenWPM can be run in a Docker container. This is similar to running OpenWPM ina virtual machine, only with less overhead.

Building the Docker Container

Step 1: install Docker on your system. Most Linux distributions have Dockerin their repositories. It can also be installed fromdocker.com. For Ubuntu you can use:sudo apt-get install docker.io

You can test the installation with:sudo docker run hello-world

Note, in order to run Docker without root privileges, add your user to thedocker group (sudo usermod -a -G docker $USER). You will have tologout-login for the change to take effect, and possibly also restart theDocker service.

Step 2: to build the image, run the following command from a terminalwithin the root OpenWPM directory:

    docker build -f Dockerfile -t openwpm.

After a few minutes, the container is ready to use.

Running Measurements from inside the Container

You can run the demo measurement from inside the container, as follows:

First of all, you need to give the container permissions on your localX-server. You can do this by running:xhost +local:docker

Then you can run the demo script using:

    mkdir -p docker-volume&& docker run -v$PWD/docker-volume:/opt/OpenWPM/datadir \    -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --shm-size=2g \    -it --init openwpm

Note: the--shm-size=2g parameter is required, as it increases theamount of shared memory available to Firefox. Without this parameter you canexpect Firefox to crash on 20-30% of sites.

This command usesbind-mounts to share scripts and output between thecontainer and host, as explained below (note the paths in the command assumeit's being run from the root OpenWPM directory):

  • run starts theopenwpm container and executes thepython /opt/OpenWPM/demo.py command.

  • -v binds a directory on the host ($PWD/docker-volume) to adirectory in the container (/opt/OpenWPM/datadir). Binding allows the script'soutput to be saved on the host (./docker-volume), and also allowsyou to pass inputs to the docker container (if necessary). We first createthedocker-volume direction (if it doesn't exist), as docker willotherwise create it with root permissions.

  • The-it option states the command is to be run interactively (use-d for detached mode).

  • The demo scripts runs instances of Firefox that are not headless. As such,this command requires a connection to the host display server. If you arerunning headless crawls you can remove the following options:-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix.

Alternatively, it is possible to run jobs as the useropenwpm in the containertoo, but this might cause problems with none headless browers. It is thereforeonly recommended for headless crawls.

MacOS GUI applications in Docker

Requirements: Install XQuartz by followingthese instructions.

Given properly installed prerequisites (including a reboot), the helper scriptrun-on-osx-via-docker.sh in the project root folder can be used to facilitateworking with Docker in Mac OSX.

To open a bash session within the environment:

./run-on-osx-via-docker.sh /bin/bash

Or, run commands directly:

./run-on-osx-via-docker.sh python demo.py./run-on-osx-via-docker.sh python -m test.manual_test./run-on-osx-via-docker.sh python -m pytest./run-on-osx-via-docker.sh python -m pytest -vv -s

Citation

If you use OpenWPM in your research, please cite our CCS 2016publicationon the infrastructure. You can use the following BibTeX.

@inproceedings{englehardt2016census,author    ="Steven Englehardt and Arvind Narayanan",title     ="{Online tracking: A 1-million-site measurement and analysis}",booktitle ={Proceedings of ACM CCS 2016},year      ="2016",}

OpenWPM has been used in over75 studies.

License

OpenWPM is licensed under GNU GPLv3. Additional code has been included fromFourthParty andPrivacy Badger, both of whichare licensed GPLv3+.


[8]ページ先頭

©2009-2025 Movatter.jp