Movatterモバイル変換

Parsing/Visual Diff Testing

From mediawiki.org

The code for generating visual diffs is in theintegration/visualdiff repo on gerrit. The main two directories are:

diffserver/ has code for running a visual-diff server for generating diffs on demand.
testreduce/ has code for running mass visual diff testing via the testreduce setup, and for configuring the testreduce server.

This uses the test reduce code which is in themediawiki/services/parsoid/testreduce repo on gerrit. The testreduce repository includes the sql code for getting up a new database and scripts to extract title lists.

On github, these two repositories are mirrored at:


visualdiff	testreduce
GitHub: project page git repository URL^[help] commit history	GitHub: project page git repository URL^[help] commit history

Overview

[edit]

We have visual diff code set up onctt-qa-03.wikitextexp.eqiad1.wikimedia.cloud. ctt-qa-03 is a labs server and you can run visual diff tests only against public APIs (whether Parsoid, mediawiki, something else altogether).

Currently, there is one visual-diff instance on this VM.

http://parsoid-vs-core.wmflabs.org is used with the parsoid_vs_core and other databases, and parsoid-vs-core-vd and parsoid-vs-core-vd-client testreduce services. This instance is set up to compare Parsoid rendering and core parser rendering for production wiki pages.

Other visual diff instances can be set up as long as the right visualdiff, testreduce, proxy domain and nginx configs are updated.

Testreduce code

[edit]

The testreduce code is in/srv/testreduce which is used to run the parsoid-vs-core-vd and parsoid-vs-core-vd-client services. The systemd controller files for these services are in/lib/systemd/system/parsoid-vs-core-vd.service and/lib/systemd/system/parsoid-vs-core-vd-client.services — these files have derived from the puppetized code for similar services on scandum used for Parsoid's roundtrip testing.

The testreduce server config is in/etc/testreduce/parsoid-vs-core-vd.settings.js. The testreduce client config is in/etc/testreduce/parsoid-vs-core-vd-client.config.js which also includes a section that provides the config for the visual diff tests that are to be run.

Visualdiff code

[edit]

The visualdiff code is in/srv/visualdiff that also provides config and hooks to use it with testreduce. The file/etc/testreduce/parsoid-vs-core-vd-client.config.js also provides the visualdiff config. It specifies how to fetch the HTML for the two screenshots, specifics uprightdiff as the diffing engine to use, and a few other parameters that control these -- the comments should be fairly self-explanatory. The uprightdiff code is in/srv/uprightdiff.

There is a separate helper service for viewing results for a single title without having to go digging for them in the directory containing them. Onctt-qa-03, the code in/srv/visualdiff/diffserver/diffserver.js is run as the visualdiff-item service. The config for this is in/etc/visualdiff/parsoid-vs-core-diffserver.config.js. The systemd controller file is in/lib/systemd/system/parsoid-vs-core-diffserver.service.

Managing services: parsoid-vs-core-vd, parsoid-vs-core-vd-client, parsoid-vs-core-diffserver

[edit]

To {stop,restart,start} all clients:

sudoserviceparsoid-vs-core-vd-clientstopsudoserviceparsoid-vs-core-vd-clientrestartsudoserviceparsoid-vs-core-vd-clientstart

Client logs are in systemd journals and can be accessed as:

### Logs for the parsoid-vs-core-vd-client service# equivalent to tail -f <log-file>sudojournalctl-f-uparsoid-vs-core-vd-client# equivalent to tail -n 1000sudojournalctl-n1000-uparsoid-vs-core-vd-client### Logs of the parsoid-vs-core-vd testreduce serversudojournalctl-f-uparsoid-vs-core-vd### Logs of the parsoid-vs-core-diffserver servicesudojournalctl-uparsoid-vs-core-diffserver

The public-facing web UIs for these services are managed by a nginx config in/etc/nginx/sites-available/parsoid-vs-core-vd and provides access to the web UI for the parsoid-vs-core-vd and parsoid-vs-core-diffserver services and also enables directory listing for the screenshots generated during the test runs. The config should be self-explanatory.

Updating the code to test (and being run by the clients)

[edit]

Unlike Parsoid where the code to test is determined by the latest git commit, in the parsoid-vs-core setup, the code to run lives on a separate VM, and sometimes the change might be in the config files, and may not be available in a git repository (at least as of today). The testreduce codebase implicitly assumes that the test to run is a git commit. However, the testreduce client config file (/etc/testreduce/parsoid-vs-core-vd-client.config.js) can declare a getGitCommit function that is then used by the server as clients to identify the test run in the database. So, in our case, this function simply returns a unique string identifying the test run. So, to initiate a new test run, simply change the string being returned by this function, save the file, and restart the parsoid-vs-core-vd-client service and you will be ready to go.

Anyway, here are the steps:

Login toctt-qa-03.wikitextexp.eqiad1.wikimedia.cloud. Edit/etc/testreduce/parsoid-vs-core-vd-client.config.js and update the string in thegetGitCommit function at the bottom.
Restarting the parsoid-vs-core-vd service shouldn't be necessary, but occasonally that service might crash and might need restarting

Updating the testreduce, visualdiff, uprightdiff code

[edit]

Of course, there will continue to be bug fixes and tweaks to these codebases. To update the relevant code, simply go to/srv/testreduce,/srv/visualdiff, or/srv/uprightdiff, and do agit pull, and restart the affected services. As simple as that!

Generating new title lists (method 1)

[edit]

There is a scriptserver/scripts/gen_titles.js in thetestreduce repo to generate title lists. Read theREADME file in that repo for hints on its use. Briefly: first create/edittestdb.info.js in that repository to include the target wikis, using theirdatabase name in theprefix column. It will probably be much smaller than the file which is checked-in, which is for complete round trip testing. For example:

#!/usr/bin/env node'use strict';module.exports={// How many titles do you want?size:10000,// How many of those do you want from traffic popularitypopular_pages_percentage:50,// How many of those do you want from the dumps?// Rest will come from recent changes streamdump_percentage:25,wikis:[// wikivoyage{prefix:'cswikivoyage',limit:1},{prefix:'hiwikivoyage',limit:1},{prefix:'shnwikivoyage',limit:1},{prefix:'pswikivoyage',limit:1},{prefix:'trwikivoyage',limit:1},],};

Then run the scripts in the order described in the README, after first runningnpm install at the top level of the testreduce repo:

npminstallcdserver/scriptsnodefetch_rc.jsnodefetch_top_ranked.jsnodegen_titles.js

You will now have a bunch of*.sql files, one for each target wiki. Transfer theseas well as thecreate_everything.mysql script toctt-qa-03.wikitextexp.eqiad1.wikimedia.cloud with something like:

cat../sql/create_everything.mysql*.sql>20240801.sqlscp20240801.sqlctt-qa-03:

Creating a new database with your generated titles

[edit]

We're going to assume you're going to make a new database for these new test results. You could optionally delete the old database and use these instructions to create it with the same name. Start by logging in toctt-qa-03 using ssh. Start by editing/etc/testreduce/parsoid-vs-core-vd.settings.js:

// Database to use.database: "parsoid_rv_deploy_targets",// User for MySQL login.user: "testreduce",// Password.    password: "$PASSWORD"

Change the database name (in our example, toparsoid_rv2_deploy_targets) and remember it. Also look at what the password is set to and remember it. Now start mysql:

mysql-utestreduce-p$PASSWORD

and run the following commands (substituting your own new database name forparsoid_rv2_deploy_targets):

createdatabaseparsoid_rv2_deploy_targets;useparsoid_rv2_deploy_targets;source20240801.mysql;quit

Edit/etc/testreduce/parsoid-vs-core-vd-client.config.js and update the string in thegitCommitFetch function at the bottom to match the latest running version of mediawiki fromversions.toolforge.org/ .Restart all the services:

sudoserviceparsoid-vs-core-vdrestartsudoserviceparsoid-vs-core-vd-clientrestart

And check that everything is running:

sudojournalctl-f-uparsoid-vs-core-vd

Generating new title lists (method 2)

[edit]

There is a scripttools/gen_visualdiff_titles.js in theParsoid repo, which has some hints for its use in a comment at the top of the file. It starts by usingQuarry to get a list of titles from your target wiki(s). Go toquarry.wmcloud.org and login with your meta.wikimedia.org account. Click the "New Query" button. The first thing you will need to do is enter thedbname corresponding to the wiki you are targeting.Check this list to map from project domain todbname, it can sometimes be unintuitive.

For testing parsoid read views in the main article space, use the following query:

selectpage_title,page_namespacefrompagewherepage_is_redirect=0andpage_namespace=0;

For testing discussion tools (ie, selecting pages in the Talk namespaces), use:

selectpage_title,page_namespacefrompagewherepage_is_redirect=0andmod(page_namespace,2)=1;

Use the "Download Data" button and save this asallpages.json.

This will be used to generate a random sample of all pages on a wiki. It is recommended to supplement this with a "most frequently viewed" list, to ensure that the most popular pages are included in the visualdiff. Go tohttps://pageviews.wmcloud.org/topviews/, enter your target project (in domain form this time, notdbname). If you are looking at main article space pages, leave the "Show only mainspace pages" box checked. If you want discussion tools pages, uncheck the "Show only mainspace pages" box and type "Talk:" in the Search box.

Use the "Download' button and save this astopviews.json.

Now use thegen_visualdiff_titles.js tool as follows:

nodetools/gen_visualdiff_titles.js$DBNAMEallpages.json1000>titles.sqlnodetools/gen_visualdiff_titles.js$DBNAMEtopviews.json1000>>titles.sqlsort-utitles.sql>$DBNAME-titles.sql

You will now have a bunch of*-titles.sql files, one for each target wiki. Transfer these toctt-qa-03.wikitextexp.eqiad1.wikimedia.cloud.

Now follow the instructions under "method 1" above to create a new database and load these.sql files into it.

Retesting a subset of titles

[edit]

There are a few useful scripts in tools/ directory in the visualdiff repo (checked out in /srv/visualdiff on ctt-qa-03).

tools/purge_404s.sh : Useful to run once in a while (or even after every run) to delete titles that have been deleted from the wiki to eliminate failure noise.
tools/retry_significant_failures.sh : Retry all test titles in the latest run that had scores >= 1000 indicating a diff that is not all due to vertical whitespace shifts.
tools/retry_significant_regressions.sh : Retry all test titles where the old run had a non-significant-diff result and latest run reported a significant-diff result.
tools/retry_all_regressions.sh : Retry all test titles where the latest run had a higher diff score than the previous run.

Some other useful scripts

[edit]

The visualdiff repo has a couple useful scripts in the tools/ directory

tools/stats.sh: This script can be used to generate a wikitable of stats (you can tweak it to generate stats for all wikis in the db or a subset of wkis) ordered by diff runs in reverse chronological order. This script was used to generate the tableshere andhere.
tools/diffs.sh: This script can be used to generate a wikitable (or CSV dump) of diffs (with score > 1000 unless you tweak it for other thresholds) per wiki which is useful to prioritize work as well as to distribute the work of analyzing diffs by wiki. This script was used to generate tablehere.

Some useful sql commands

[edit]

Here is a command to list diff titles to further inspect (edit suitably):

selectlatest_score,concat("http://parsoid-vs-core.wmflabs.org/diff/",prefix,"/",title)frompageswhereprefix='eowikivoyage'andlatest_score>1000orderbylatest_scoredesc;

Resource usage and # of test clients

[edit]

parsng-qa-01 is a large labs vm with 12 cpu cores, 32 gb memory, and a 400+gb disk. Even so, visual diff testing can use up all these resources. 20 testreduce clients seem to be about the upper-end of how many can be run at the same time. This is enough to sometimes bring cpu load to 13-15 and memory usage to 28+gb. Probably 16 clients is a more comfortable number. The # of test clients to run can be tweaked by editing /lib/system/systemd/parsoid-vs-core-vd-client.service

The screenshots from puppeteer and from uprightdiff are written to /data/visualdiffs/pngs organized by wiki prefix. These images are overwritten with each test run. It takes too much disk space to store these images per test run. 125GB is used per test run. But, in the future, we could consider storing results from the most recent 2-3 runs or get a larger disk and expand that range a bit more.

Web UI for browsing results

[edit]

The screenshots from puppeteer and from uprightdiff are written to /srv/visualdiff/pngs organized by wiki prefix and are accessible via HTTP @https://parsoid-vs-core.wmflabs.org/images/.

However, a better way of browsing these results is via the parsoid-vs-core-vd web UI athttps://parsoid-vs-core.wmflabs.org. The /topfails link sorts results in descending order of score which makes it easy to look at pages that generate the most prominent diffs first. The @remote link on these results listing page is a easy way to look at the 2 HTML screenshots and the uprightdiff screenshot. That output is outsourced to the visualdiff-item service. It simply links to the existing screenshots (or if missing, generates them on demand).

Uprightdiff numeric scoring

[edit]

Uprightdiff compares the two candidate images and returns 3 metrics:

* modifiedArea : This is a simple count of the number of pixels for which the source does not match the destination (after they have both been expanded to the same size).* movedArea    : The number of pixels for which nonzero motion was detected.* residualArea : The number of pixels which differed between the resulting image and the second input image.

In other words,

if modifiedArea == 0, then the images had pixel-perfect match. In this scenario, movedArea and residualArea will also be zero.
if modifiedArea > 0, then the images obviously differed. If residualArea == 0, then it tells us that all the differences could be accounted for by vertical motion and the rendering differences are mostly insignificant. In this scenario, movedArea tells us how many pixels were affected.

The goal of generating a numerical score is to be able to (a) compare test results for different pages and identify the most significant ones, and (b) compare test results for the same page across test runs and determine whether our fixes improved or worsened the situation. With these goals in mind, the visual diffing code takes the totalArea of the image and uses the above 3 metrics to generate 2 different numbers.

SignificantDiffMetric (when residualArea > 0): 75 * residualArea / totalArea + 0.25 * min(max(2^(residualArea / 100000) - 1, 0), 100)
InsignificantDiffMetric (when residualArea == 0): 50 * modifiedArea / totalArea + 50 * movedArea / totalArea
ErrorMetric: 1 if the test had a fatal error, 0 otherwise.

The total score is then computed as 1,000,000 * ErrorMetric + 1,000 * SignificantDiffMetric + InsignificantDiffMetric (In other words, this can be seen as a number in base-1000 notation).

This scoring technique gives us what we want. In addition, the signficant diff metric tries to flag pages that are really large (big totalArea value), that have a sizeable pixel diff (big residualArea), but which is fairly small relative to the size of the page (small residualArea / totalArea ratio). A simple residualArea / totalArea ratio would favor small pages with mostly insignificant residualArea values over large pages with mostly significant residualArea values. So, we pick a 1M area as our baseline and figure out how big the residual area is relative to that and use exponentiation to weight those heavily.

We believe that this numeric metric lets us quickly identify problematic rendering differences and use mass visual diff testing without having to manually sift through thousands of diff images to identify where to focus our efforts.

Updating the VMs

[edit]

Just to be clear, the above talks about labs VMs which, in the following discussion, are thehosts to the VMs that mediawiki-vagrant spins up. This section is about keeping mediawiki-vagrant and the VMs it spins up up-to-date.

In the future, it might be easiest to just create new labs VMs and start from scratch.https://phabricator.wikimedia.org/T204566#4797907 has some notes from when we updated the VMs this way in 2018. In addition, the following notes might nevertheless be a useful guide in cases problems arise while upgrading.

Troubleshooting notes from May 2020 while upgrading vagrant and mediawiki checkout

[edit]

Keeping mediawiki-vagrantup-to-date is supposed to be as simple asgit pull && vagrant provision. In practice, that wasn't so. This is most likely because of nfs issues that were left unresolved when setting it up. At the time,vagrant reload was abused until no errors were reported when starting up. To consistently startup without error, the suggestion fromT139859 was used to setvagrant config nfs_shares no.

Unfortunately, after booting, the permissions in/vagrant are in a problematic state. In order to work around it, on the hosts, dosudo chown -R mwvagrant:wikidev /srv/mediawiki-vagrant and, in the VM, dosudo chown -R vagrant:www-data /vagrant. That at least allows for basic vagrant commands to work.

Generally, to update mediawiki in the VMs,vagrant ssh in and then fix the permissions. Then, instead of usingvagrant git-update on the hosts, just invokerun-git-update from inside the VM.

There were a few other one off problems when provisioning the VMs that requiredapt-get install php-redis php-igbinary php-luasandbox and fixing links to the available modules when going to php 7.2 that won't likely need repeating. These were fromT213016 andT213993

All that said and done, the major hurdle was that we were using an import from before theactor migration began. I imagine that because users weren't imported, when the schema migration scripts inmaintenance/update.php ran, we ended up in a broken state. In order to fix therevision_actor_temp table, I just assigned all the actions to the Admin user. Based onT249185#6028521, in the VMs, create a filet.sql,

insertintorevision_actor_temp(revactor_rev,revactor_actor,revactor_timestamp,revactor_page)selectrev_id,1,rev_timestamp,rev_pagefromrevisionrwherenotexists(select1fromrevision_actor_tempawherea.revactor_rev=r.rev_id);

and then run,

#!/usr/bin/env bashfordbin$(alldbs);doecho$dbmysql$db<t.sqldone

Retrieved from "https://www.mediawiki.org/w/index.php?title=Parsing/Visual_Diff_Testing&oldid=7045118"

Category:

Visual diffs

[8]ページ先頭