Currently, we have two separate wikitext parsers that are used in MediaWiki on the Wikimedia cluster (and several other third-party MediaWiki installations).One is theoriginal core parser (legacy parser), and the other isParsoid.As of early 2023, the core parser was used for all desktop and mobile web read views, while Parsoid was used to serve all editing clients (VisualEditor, Structured Discussions, Content Translation), linting tools (Extension:Linter), some gadgets, mobile apps, Kiwix offline reader, Wikimedia Enterprise, and the Google knowledge graph project.
The goal of this project is to arrive at asingle parser that supports all clients and use cases. This will make our wikismore reliable and consistent for editors, readers, and tools to use. Having a single code base for wikitext processing willfacilitate future wikitext features.
This project is primarily driven by theContent Transform Team (previouslyParsing Team) with participation from the MediaWiki Platform team, all the internal teams that develop Parsoid clients,Movement Communications team (previously Community Relations Specialists), Wikimedia wiki editor communities, and third party MediaWiki projects since this parser unification will touch them all.
This page contains an overview of the project; we also maintain aroadmap, milestones, and updates page, and a list of pages with additionaltechnical information.
Longer Term Goal: Parsoid is the default wikitext engine for MediaWiki and the legacy parser is removed from the codebase
Intermediate Goal: Parsoid replaces the core parser for all wikitext use cases on the Wikimedia cluster.
Why unification?: Maintaining two wikitext engines requires a lot of resources and would require a duplication of efforts for new features.
Why Parsoid?: Parsoid meets all the editing use cases, API use cases that are unique to Parsoid (ex: Enterprise, Kiwix), and active work is in progress for it to meet all the read use cases. The legacy parser does not support HTML-based editing use cases (ex: VisualEditor).
As our rollout progresses, we will continue identify other QA and testing methodologies as required to ensure we can roll out this change in as smooth and non-disruptive fashion as possible.
At this stage of this project, we have split this work into a number of steps to achieve the intermediate goal.
To validate our road-map evolution and use data-driven decision making for deployments, we have developed a Confidence Framework for Parsoid Read Views.This framework contains the guidelines for how we prioritise features, bugfixes, and deployments.
For the most part, the switch to Parsoid generated HTML should be transparent to most users.But, below, we outline some possible impacts on readers, editors, and developers.
Parsoid models and processes wikitext differently compared to the legacy parser and this can sometimeslead to differences in rendering in some edge case scenarios.If some wikitext pattern is commonly used, we have attempted to support that in Parsoid where possible, and where not, by either fixing or providing support to fix them up.At this time, we believe all rendering differences we expect to run into will be edge cases that can likely be adjusted by fixing wikitext either on individual pages or on templates.
Parsoid's internal processing model is different fromthe legacy parser.As a result, extensions may need to be updated.This only impacts extensions that do one or more of the following: (a) operate on wikitext (b) provide handlers for parser hooks (c) call a public method of the legacy parser.
Extensions that process wikitext will definitely need to be updated to work with Parsoid.To date, the vast majority of such extensions have been updated.Since Parsoid continues to access the legacy parser for expanding templates, processing parser functions, any parser hooks triggered during this processing will continue to operate and extensions that rely on these hooks will continue to operate.For the rest, we are exploring strategies to minimize updates needed to extensions.
We file phabricator tasks for all impacted extensions, and will fix whatever extensions we can within our team.If you are an extension developer, we would greatly appreciate any proactive work and prompt code review for patches we might submit.
The Content Transform Team is driving this project.Our goal is to make this switch to Parsoid as seamless as possible.So, we have tried to roll out changes over the years gradually.
We started with replacing HTML4 Tidy with HTML5 RemexHtml in the 2015 - 2018 timeframe.In 2019, in preparation to integrate Parsoid into MediaWiki core more closely, we ported Parsoid from JS to PHP.This switch went very smoothly.In the 2020 - 2022 timeframe, we started work to unify the media output generated by Parsoid and by core.This has mostly involved making changes to core, but we have occasionally adjusted Parsoid's output based on feedback and other technical considerations.In 2024 we began deploying Parsoid as the default parser for page views on wikivoyage.
Going forward, we will provide support in the following ways:
Starting November 2023,you can opt-in to using the new Parsoid parser for reading articles on Wikipedia.SeeHelp:Extension:ParserMigration for more information!
Other things you can do to help: