Parsoid A bidirectional runtime wikitext parser. Converts back and forth between wikitext and HTML/XML DOM with RDFa.
|

For the older version of Parsoid written in JavaScript (Node.js), seeParsoid/JS.
Parsoid is a PHP library bundled with MediaWiki (since version 1.35) that is used for converting back and forth betweenwikitext and HTML.It has been under development since 2012, originally written in JavaScript and built to support theVisualEditor.Eventually, the goal is tofully replace MediaWiki's currentnative parser with Parsoid.
The legacy parser is still supported in MediaWiki 1.43 (LTS), but likely will not in the next LTS.
Parsoid is an application which can translate back and forth between MediaWiki'swikitext syntax and an equivalent HTML/RDFa document model with enhanced support for automated processing and rich editing.
It has been under development by a team at the Wikimedia Foundation since 2012.It is currently used extensively byVisualEditor,Content translation andother applications.
Parsoid is intended to provide flawless back-and-forth conversion, i.e. to avoid information loss and also prevent "dirty diffs".
On Wikimedia wikis, for several applications, Parsoid is currently proxied behindRESTBase, which stores the HTML translated by Parsoid.It is expected that RESTBasewill eventually be replaced with a cache more tightly integrated with MediaWiki.
For more on the overall project, seethis blog post from March 2013.To read about the HTML model being used, seeMediaWiki DOM spec.
Parsoid was originally structured as a web service and written in JavaScript, making use ofNode.js.Atech talk from February 2019 (slides) andblog post describes the process ofporting it to PHP.The Parsoid extension API is currently under active development; atech talk from August 2020 describes this work.
GitHub Repository:https://github.com/wikimedia/parsoid
Parsoid has been included in MediaWiki since version 1.35.No configuration is necessary to enable it.
Parsoid exports an internal REST API which was historically used by RESTBase and not accessible outside the WMF internal cluster.This is no longer required for Visual Editor or core read views, and the internal API is being deprecated and is planned for removal in MW 1.43.
Parsoid is nominally a composer library used by mediawiki core.If you still require the internal API for some reason, you can explicitly load Parsoid "as an extension" by adding the following toLocalSettings.php:
wfLoadExtension('Parsoid',"$IP/vendor/wikimedia/parsoid/extension.json");
Any remaining third-party users of the internal Parsoid API are strongly encouraged to migrate to the core REST HTML page endpoint which provides equivalent functionality.
Development happens in theParsoid Git repository.Code review happens inGerrit.SeeGerrit/Getting started to set up an account for yourself.
If you use theMediaWiki-Vagrant development environment using a virtual machine, you can simply add the rolevisualeditor to it and it will set up a working Parsoid along withExtension:VisualEditor.
The instructions below are for MediaWiki 1.35 or later.CheckParsoid/JS if you are running the old version of Parsoid written in JavaScript, and used for MW 1.34 and earlier.
In a standard MediaWiki installation, Parsoid is included from MediaWiki as a composer library,wikimedia/parsoid.
For development purposes you usually want to use a git checkout of Parsoid, and not the version bundled in MediaWiki core as a composer library.The following lines added toLocalSettings.php allow use of a git checkout of Parsoid (optionally), load the Parsoid REST API withwfLoadExtension (rather than using the version bundled in VisualEditor) and manually do the Parsoid configuration which is usually done by VisualEditor:
$parsoidInstallDir='vendor/wikimedia/parsoid';# bundled copy#$parsoidInstallDir = '/my/path/to/git/checkout/of/Parsoid';// For developers: ensure Parsoid is executed from $parsoidInstallDir,// (not the version included in mediawiki-core by default)// Must occur *before* wfLoadExtension()if($parsoidInstallDir!=='vendor/wikimedia/parsoid'){functionwfInterceptParsoidLoading($className){// Only intercept Parsoid namespace classesif(preg_match('/(MW|Wikimedia\\\\)Parsoid\\\\/',$className)){$fileName=Autoloader::find($className);if($fileName!==null){require$fileName;}}}spl_autoload_register('wfInterceptParsoidLoading',true,true);// AutoLoader::registerNamespaces was added in MW 1.39AutoLoader::registerNamespaces([// Keep this in sync with the "autoload" clause in// $parsoidInstallDir/composer.json'Wikimedia\\Parsoid\\'=>"$parsoidInstallDir/src/",]);}wfLoadExtension('Parsoid',"$parsoidInstallDir/extension.json");# Manually configure Parsoid$wgVisualEditorParsoidAutoConfig=false;$wgParsoidSettings=['useSelser'=>true,'rtTestMode'=>false,'linting'=>false,];$wgVirtualRestConfig['modules']['parsoid']=[// URL to the Parsoid instance.// If Parsoid is not running locally, you should change $wgServer to match the non-local host// While using Docker in macOS, you may need to replace $wgServer with http://host.docker.internal:8080// While using Docker in linux, you may need to replace $wgServer with http://172.17.0.1:8080'url'=>$wgServer.$wgScriptPath.'/rest.php',// Parsoid "domain", see below (optional, rarely needed)// 'domain' => 'localhost',];unset($parsoidInstallDir);
These lines are not necessary for most users of VisualEditor, who can use VisualEditor's auto-configuration and the bundled Parsoid code included in MediaWiki, but they will be required for most developers.
If you're serving MediaWiki with Nginx, you'll need to also add something like this in your server block (Assuming your MediaWiki setup has its files residing in/w/):
location/w/rest.php/{try_files$uri$uri//w/rest.php?$query_string;}
If you are running Mediawiki using Docker, and linking your local Parsoid repository to Mediawiki, you need to map additional volume to the docker container in docker-compose.override.yml file in media wiki project. To do so, the simplest way is to create docker-compose.override.yml in mediawiki project and put code bellow inside (with path modification). If you already have docker-compose.override.yml file, modify it accordingly.
services:mediawiki:volumes:-./:/var/www/html/w:cached-/my/path/to/git/checkout/of/Parsoid:/my/path/to/git/checkout/of/Parsoid
To test proper configuration, visit{$wgScriptPath}/rest.php/{$domain}/v3/page/html/Main%20Page where$domain is the hostname in your$wgCanonicalServer.(Note that production WMF servers do not expose the Parsoid REST API to the external network.)
To run all parser tests and mocha tests:
$composertest
The parser tests have quite a few options now which can be listed usingphp bin/parserTests.php --help.
If you have the environment variableMW_INSTALL_DIR pointing to a configured MediaWiki installation, you can run some additional tests with:
$composerphan-integratedYou can convert simple wikitext snippets from the command line using theparse.php script in thebin/ directory:
$echo'[[Foo]]'|phpbin/parse.php
The parse script has a lot of options.php bin/parse.php --help gives you information about this.
SeeParsoid/Debugging for debugging tips.
As of October 2021
Parsoid is always available as a library since it is a composer dependency of MediaWiki core. But two pieces are not enabled:
The test runnerQuibble would enable it if it detectsmediawiki/services/parsoid.git has been cloned as part of the build.In which case it:
Wikimedia\Parsoid to the cloned code (effectively replacing the version installed by composer)wfLoadExtension( 'Parsoid', '/path/to/cloned/repo' );The ServiceWiring should be enabled in MediaWiki starting with 1.38.
The REST API would theorically never get merged in MediaWiki: a) it has never been exposed to the public in production, it is an internal API used by RESTBase which is going away; b) it never has been security audited and c) it is redundant with the enterprise MediaWiki API.The solution will be for VisualEditor to invoke Parsoid directly via the VisualEditor Action API which would save a round trip through the REST API.
Loading the extension is thus a hack which enables using interfaces subject to change and which we don't really want people to use yet.
For most purposes, parsoid should thus not be added as a CI dependency, the only exception as of October 2021 is the Disambiguator MediaWiki extension.
Loading parsoid as an extension let us run MediaWiki integration test jobs againstmediawiki/services/parsoid.git (such as Quibble, apitesting) and ensure Parsoid and MediaWiki work together.
An extension may be able to write tests with Parsoid even when the repository has not been cloned.Since it is a composer dependency of MediaWiki core theMediaWiki\Parsoid namespace is available, but the service wiring part is not (it isextension/src in the Parsoid repository and exposed as the\MWParsoid namespace).TheParsoidTestFileSuite.php code would only run the parser tests if Parsoid has been loaded (which should be the default with MediaWiki 1.38).
For CI, Parsoid is tested against the tip of mediawiki, whereas mediawiki is tested with the composer dependency.In case of a breaking change, the Parsoid change get merged first (which breaks its CI but not MediaWiki one) and MediaWiki get adjusted when Parsoid is updated.It is thus a one way change.
For MediaWiki release builds, we have an integration of Parsoid ServiceWiring into VisualEditor in order to have VisualEditor work without further configuration (beside awfLoadExtension( 'VisualEditor' )).The release build also enables the REST API and hook everything us so that parsoid works out of the box.This is done by copying a bit of parsoid code into VisualEditor which is not in the master branch of VisualEditor since that would be obsolete as soon as Parsoid is updated.Instead the code is maintained in two places.
The original application was written in JavaScript (using Node.js) and started running on the Wikimedia cluster in December 2012.In 2019, Parsoid was ported to PHP, and the PHP version replaced the JS version on the Wikimedia cluster in December 2019.Parsoidis being integrated into core MediaWiki, with the goal of eventually replacing MediaWiki's current native parser.In early 2024, Parsoid began to be used on some production wikis of the Wikimedia Foundation as the default parser for read views.
Parsoid (the PHP version) has beennatively bundled with MediaWiki since version 1.35, released in September 2020.For non-Wikimedia installations, Parsoid/JS was supported until the end-of-life of MediaWiki 1.31 (LTS) inSeptember 2021.
?useparsoid=1 to the URL$wgParserMigrationEnableParsoidArticlePages = true;$wgParserMigrationEnableParsoidDiscussionTools = true;If you need help or have questions/feedback, you can contact us in#mediawiki-parsoidconnect orthe wikitext-l mailing list.If all that fails, you can also contact us by email atcontent-transform-team at thewikimedia.org domain.
Parsoid is maintained bytheContent Transform Team. Get help:
|