This page documents the internals of DiscussionTools for developers of the extension and tools that build on top of it, like JS gadgets or SQL queries.
To learn about common reasons why DiscussionTools might not work as expected on a specific page, seeWhy can't I reply to this comment?.
Most DiscussionTools features rely on the talk pagecomment parser introduced in this extension (no relation to the MediaWiki wikitext parser).
The comment parser takes as input the HTML rendering of the discussion page (produced by either Parsoid or the old wikitext parser), and gives as output a representation of the comments and threads on the page.
Note that DiscussionTools does not deal with the wikitext at all, only with HTML.
DiscussionTools recognizes two kinds of items: headings and comments. Other content on the page is not included in the representation.
Headings and comments form a tree structure. Comments can be top-level (represented as replies to headings), or be replies to other comments. Headings can be top-level, or be sub-headings (represented as replies to other headings). A thread is a heading together with its tree of replies.
Each item has the following properties:
Comments additionally have:
Headings additionally have:
Most of this information (notablyexcluding the ranges / content) is stored in thedatabase used for permalinks. Some of it is also encoded back into the HTML in the formatter, as described below.
Below is an example discussion, and the comment parser's representation of it (pseudocode):
A | [ HeadingItem( { level: 0, range: (h2: A), replies: [ CommentItem( { level: 1, range: (p: B), replies: [ CommentItem( { level: 2, range: (li: C, li: C), replies: [ CommentItem( { level: 3, range: (li: D), replies: [ CommentItem( { level: 4, range: (li: E), replies: [] }, CommentItem( { level: 4, range: (li: F), replies: [] }, ] }, ] }, CommentItem( { level: 2, range: (li: G), replies: [] }, ] }, CommentItem( { level: 1, range: (p: H), replies: [ CommentItem( { level: 2, range: (li: I), replies: [] }, ] }, ] } )] |
First step to obtain the above is to find the comments and headings that exist on the page.
Timestamps are parsed by an algorithm that reverses the steps taken by MediaWiki to output them. Only timestamps that exactly match the MediaWiki's date formats are accepted, to guarantee that they can be parsed unambigously. DST timezones and language variants are supported.
Comments are assigned as replies to other comments depending on the indentation level.
Item IDs and names are computed based only on the author, date and time, and thread structure. They do not depend on the text of the comment or the heading. This allows identical IDs/names to be assigned to the same comment even if it is modified in later revisions of the page, or the same heading even if it is renamed, and to be identical when language variants are in use.
Item IDs are unique within the revision being parsed. If two items were to be otherwise indistinguishable, they are numbered sequentially.
Item names are consistent across all pages and revisions where the item might appear, even when it's moved or changed.
As a result of the assignment logic above, when multiple comments or headings have the same author, date and time, they will be assigned the same ID (but only if they're in different revisions or pages) and the same name (possible even within a single revision). This is rare, but it does happen.
Some discussion features will treat those comments as if they were the same comment, which may be surprising if they look obviously different to a human. See details below.
To really identify a single item, you must use the revision ID plus the item ID.
Theformatter inserts reply links into the DOM in PHP, as well as comment start and end markers.
Care is taken not to insert them in invalid places, like inside a<style>
or a<br>
tag.
Item properties from the comment tree data structure are included as JSON data attribute on the reply links. Together with the markers, they are later used in JS code to reconstruct the comment tree without running the comment parser.
We use markers instead of directly storing the range to allow some compatibility with other extensions and gadgets that modify the client-side DOM.
Themodifier inserts the reply widget into the DOM in JS, as if the reply widget was a new reply to the comment.
The DOM tree is suitably rearranged to ensure correct indentation level of the reply (wrapper nodes are added, and other nodes may be moved around).
The reply is added below all existing replies to the given comment (and replies to them), with indentation level of the given comment plus 1.
Saving comments uses the same modifier algorithm, implemented in PHP. The contents of each paragraph in the reply are inserted inside a list item node. Then the HTML is converted back to wikitext using Parsoid, which is saved as a new revision of the page.
When replying in wikitext mode, each line of wikitext is added inside a list item node as a transclusion. Parsoid includes the wikitext unchanged in its output.
Saving comments does not operate directly on wikitext, but rather uses HTML throughout the process and Parsoid to convert it. This has some benefits and drawbacks.
Benefits:
Drawbacks:
When running the comment parser on Parsoid HTML, we can use the information about comment ranges from our comment parser and information about template-generated content from Parsoid HTML to determine whether a comment visible on the page has been transcluded from a different page, and post the reply there.
Most of the comment parser, modifier, and data structure code has two implementations: in PHP and JavaScript. It is a historical accident, as the tools were first prototyped in JS to make it easy to test them with live content on Wikipedia, and then reimplemented in PHP to improve performance (particularly to avoid fetching and sending the full page's Parsoid HTML when saving replies). But once we had them, we kept them both: it helps avoid bugs by comparing the two implementations and allows some client-side actions to happen without consulting the server, e.g. inserting the reply widget.
Unlike the reply tool, the new topic tool saves the comment as wikitext, using the existing APIs to add a new section to a page. In visual mode the comment is converted to wikitext first.
Conceptually, in our data structure, adding a new topic thread is the same as adding a new heading and then adding a top-level comment as a reply to that heading. The interface code reuses much of the reply tool by putting that concept into reality. It seemed like a good idea at the time.
Users can subscribe to receive notifications about new replies in a topic. We currently only allow subscribing to level 2 headings that have comments directly underneath (not in sub-sections – this may change:T275943,T298617#7695392).
The model could theoretically support subscribing to notifications about replies to any comment or heading. However, it would require much more complexity in the user interface (particularly in managing subscriptions when multiple subscriptions with different states could overlap), so we gave up on it.
Each subscription has the following properties:
sub_item
field in SQL). This is a concatenation of the username and timestamp of the first comment under that heading. This is used when generating notifications. Example data:h-Admin-20231223222800
sub_namespace
andsub_title
) and section title (sub_section
) where this item appeared when the subscription was created. This isnot used when generating notifications, and may not match where the thread actually appears (if it was archived, or renamed). It's only intended to be used as a human-readable label when managing subscriptions (not implemented yet).sub_state
), subscribed (1
) or unsubscribed (0
). Currently unused but intended to be used for unsubscribing from automatic subscriptions.sub_user
. Corresponds to theuser_id
)sub_created
)sub_notified
, which can be null)This data is stored in a database table calleddiscussiontools_subscription
.
Echo separates the concepts ofevents andnotifications. A single event can results in notifications sent to many users, depending on its user locators (to include users) and user filters (to exclude them).
Whenever an edit to a talk page is saved, Echo compares the previous and new page revision to generate its events, e.g. mentions. DiscussionTools extends this mechanism, and compares the previous and new comment trees to find new comments and generate events for them.
Each event has the following properties:
This data is stored in one of Echo's database tables, however only the title and agent can be queried directly. Everything else is in a serialized blob.
We generate an event for every new talk page comment, regardless of whether anyone is subscribed to the thread it's in. We generate notifications only for subscribed users.
If the edit would result in an Echo event related to talk pages (that is: mention, mention-summary, or edit-user-talk) as well as a DiscussionTools comment event, we avoid sending double notifications by using a filter to exclude the users who were mentioned and, if the edit was to a user talk page, its owner. Instead we enhance the Echo event with the comment's ID and name to show a direct link to the new comment (rather than just a section where it was added) and the comment's content to show a snippet (unless Echo provided one).
Sections you subscribe to are identified by the author, date and time of the oldest comment (this is theiritem name). This allows for sections to be moved, renamed, or archived/unarchived, without losing the subscriptions. It also allows subscriptions to be handled consistently for sections that are transcluded on different pages (e.g. some wikis' village pumps are set up like that).
There are some scenarios where the item name will change, and the connection between the subscriptions and the topic is lost:
If you subscribe to a heading whose item name isindistinguishable from another's, everything behaves as if you had subscribed to both – e.g. you'll get notifications for both of them, and unsubscribing from one will also unsubscribe you from the other. This is necessary to handle transcluded sections mentioned above.
The changes provided by this feature are intended to make talk pages look more clearly like places where people are commenting.
When enabled, the HTML markup contains two versions of the markup, and CSS classes are used to toggle which one is visible. This increases the HTML size, but avoids splitting the parser cache, and allows the changes to be disabled without reloading the page (this is used by mobile "Read as wiki page" button).
Metatadata related to discussion activity is shown for each topic: link to and date of latest comment, number of comments and number of people in discussion. It is computed based on thestructure from the comment parser, and is only shown in sections that contain at least one discussion comment (we currently ignore comments in sub-sections, but this may change:T298617#7695392). Only the data in the comments is considered, not historical information (e.g. someone who fixed a typo in a comment, but didn't leave any comments, is not counted in "people in discussion").
The changes are only applied to in "Talk" and "User talk" namespaces, to avoid unexpected formatting in namespaces that mix content and discussion (e.g. "Wikipedia:" namespace in many projects).
MediaWiki version: | ≥ 1.39 Gerrit change 771974 |
When discussion topics get archived, or moved to a different page for some other reason, or when discussion comments are moved to a different place on the page, the normal links to comments break. This affects links in our own notifications, as well as internal links using comment IDs used in wikitext discussions (of the form[[Talk:Blah#c-...]]
).
To solve this problem, you can use permanent links of the form[[Special:GoToComment/c-...]]
. The special page will redirect to the current location of the comment. This only works when$wgDiscussionToolsEnablePermalinksBackend
is enabled.
In some rare cases we might not be able to redirect to the "current location" for the comment (or heading).
It might not be visible anywhere, because:
Or it might be visible in more than one place, because:
Or it might be older than the permalinks feature (it only has data about comments added after it has been deployed – unless we back-fill the data for older comments, this hasn't been decided yet).
In these cases the permanent link will instead redirect toSpecial:FindComment
, which displays as much information as possible to help you figure out what happened:
This feature is backed by a database of comment metadata. For every comment ID and name that has ever appeared on wiki pages (ever since the feature was enabled), we store every page title on which it appeared, and the oldest and newest revision in which it appeared. This information is generated entirely from the wikitext and there's no API to edit it (like thepagelinks table).
The data is generated/updated as a part of refreshlinksjobs. Under normal circumstances these updates are small (just recording the comments that have been added or removed on a page since the last edit or template refresh). However, right after the feature is enabled, the relevant database tables are empty; any refreshlinks job will cause the information about all comments on the page to be generated. To populate the data of all pages after enabling the feature, thepersistRevisionThreadItems.php
maintenance script must be run. Otherwise, it will be populate when talk pages get edited directly or indirectly as part of template changes, which may cause overload if a template used on many talk pages is edited (phab:T334258).
The database also includes some additional information:
discussiontools_items
table)discussiontools_item_pages
table)discussiontools_item_revisions
table)discussiontools_item_revisions
table)discussiontools_item_ids
table)This is a subset of thecomment parser data structure; notablyexcluding the content of the comment/heading. We will use it to improve the performance of features that previously needed to run the comment parser repeatedly on old revisions (e.g. checking for new comments while the user is replying, generating notifications). It maybe also become useful in the future to measure talk page usage (e.g. how many people comment in topics, or how long it takes until topics are archived).
To save disk space, only data about the oldest and newest revisions of items is kept in thediscussiontools_item_revisions
table. After the page is edited and the comment appears in a newer revision of the page, the row for the older revision (previously newest) is deleted.
You can find a few examples in thePHPUnit integration tests of the extension.
Each directory contains a MediaWiki dump that can beimported in to your wiki, and JSON dumps of the database tables produced by importing it.