On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page'sarchive index. The current archive is located at2025/10.
SpBotarchives all sections tagged with{{Section resolved|1=~~~~}} after 1 day and sections whose oldest comment is older than 7 days.
Latest comment:3 days ago27 comments13 people in discussion
Currently, for many topics and applications, it would make more sense to query Wikipedia than to query Wikidata and to extract the data from there. This kind of negates the point of Wikidata.
Many people, software, projects, plants, diseases, films and organizations have more data in their infoboxes and categories than in their Wikidata item.
This could be images of the subject like a skin disease or the info which license a free software is under or which programming language it was programmed in.
Another more concrete example arespoken text audio(P989) – those are set in the articles of the Wikipedias but in Wikidata only English ones are widely set and even these are quite a way from being complete.
I think for Wikidata to be really useful and successful in the real world, such as for querying metadata about things like software or films, it needs to be AT LEAST as good as Wikipedia for the data and then move on from there to cover additional areas and contain additional data in more parseable formats.
One can also query Wikipedia in various ways so currently, it would often if not usually be more advisable to recommend or implement uses of APIs to query Wikipedia. I think striving to have all the data and cover the same applications when it comes to data as Wikipedia would be a great goal for Wikidata for now if it is to become successful in terms of people using it in practice (and by extension of many people being aware of it existing).
Currently, it seems like there only is one major tool to synchronize data from Wikipedia into Wikidata and from Wikidata into Wikipedia:Harvest Templates. That tool appears to be unmaintained, not used much, not known about much, there aren't many discussions or meta pages or coordination regarding it, and it has severe bugs and limitations. For example, it can't set a qualifier alongside a value and when trying to import spoken Wikipedia audios via the Wikipedia template for these it fails due to some bugafter 30 seconds or so.
A nice thing about it is that people can share the configurations they use for importing data so other people could pick it up –https://pltools.toolforge.org/harvesttemplates/share.php Note that however, there are no indications for how much is still being done and whether the harvest is currently already regularly done or which language versions it could be used on too and so on.Here's the HT for Wikipedia audio versions that is also in that large list. You could use this for testing and to see the aforementioned bug. Those harvests are all specific to one language Wikipedia, one can't adjust it to import the narrated article audios from all the Wikipedias with a template for these so one would have to adjust the harvest for each and every of the Wikipedia of the hundreds of Wikipedias that have these...and then regularly update this manually every once in a while. It doesn't seem like anybody is doing or did this for most (well or all) Wikipedias. Likewise, Wikipedia has more images on skin diseases than the Wikidata items about the skin diseases and software items do not have the programming language info set despite that this metadata could be imported from for example GitHub and/or the Wikipedia categories.
I think being more useful than Wikipedia for some applications would be a second step after first outmatching Wikipedia on structured data.
"Currently, for many topics and applications, it would make more sense to query Wikipedia than to query Wikidata and to extract the data from there. " There are hundreds if not thousands of infobox parameters which does not have an equivalent data property on Wikidata. That's the issue. There is also the issue that new data property proposals tends to face an heavy amount of scrutinization compared to other types of properties which can discourage users from trying to make new proposals. --Trade (talk)00:30, 2 October 2025 (UTC)Reply
I am curious how to implement a query of Wikipedia, given the infoboxes are not structured : do you mean it's more useful to copy-paste article per article from a wikipedia than to run a wikidata query ?Bouzinac💬●✒️●💛08:57, 2 October 2025 (UTC)Reply
how to implement a query of Wikipedia, given the infoboxes are not structured since you got no reply and since Trade only mentioned this and I think only so in reference to what I said: Wikipedia can be queried using APIs and you can then extract the data in the infoboxes because they are structured by the predefined infobox parameters. Search the Web for example forwikipedia get data from infobox and you'll find many tutorials and premade tools for this using the Wikipedia API such asthis post in 2012 with a premade tool to extract the infobox data. These are probably more viewed or popular on the Web than questions & tutorials about querying data from Wikidata. Additionally, there is a new dataset which has the data prestructured; see Matěj Suchánek's comment below. Just to clarify again the thread topic and intent: I think Wikidata would have to be at least as good as Wikipedia for structured data for this to change in the sense of (category&)infobox data there being all covered here.Prototyperspective (talk)11:20, 5 October 2025 (UTC)Reply
Not sure what you mean by "data property proposals" vs other types of properties? External identifiers are the exception here, with slightly lower requirements for approval and creation.
More importantly we need more people regularly commenting on and supporting properties they want created; without clear support they won't be. SeeWikidata:Property proposal/Overview for the current list, there are a lot still open with no comments in the last 3 months, many 6 months or more.ArthurPSmith (talk)17:53, 2 October 2025 (UTC)Reply
I don't think the challenges or scrutiny faced by new data properties are a major factor – there already are very many properties but they aren't set on many items where Wikipedia has the data. The main example of one such wasspoken text audio(P989) but this also applies to many other properties.
It may be an issue but it's not "the" issue and as far as I can see not among the main issues. Instead, people propose more and more properties but there is little work and consideration of how to get the data for these into Wikidata,not even when lots or all of that data is readily available in other Wikimedia projects.Prototyperspective (talk)17:55, 2 October 2025 (UTC)Reply
"there is little work and consideration of how to get the data for these into Wikidata" There's plenty of people who do want the data into Wikidata but there are much less people willing to "lend" out their bots by giving it additional tasksTrade (talk)22:05, 3 October 2025 (UTC)Reply
That doesn't or wouldn't change anything – so if you're arguing there is not little consideration but moderate or even much consideration of that, I couldn't see any evidence or indications of that and would welcome any info suggesting to the contrary. Moreover, these aren't two separate groups – if people want to have the data, they could for example get Harvest Template to work or ask about HT and at least document and discuss its issues or build a bot or ask for bots like that.Prototyperspective (talk)22:42, 4 October 2025 (UTC)Reply
Combining data from multiple sources is a useful skill, more people should get into the habit of doing this. DBpedia might have more data imported from Wikipedia templates (e.g[1]), and thanks to sitelinks there should be no problem combining this with data from Wikidata in a query.Infrastruktur (talk)10:21, 2 October 2025 (UTC)Reply
On a related note,Wikimedia Enterprise has recently started publishing datasets of Wikipedia's "parsed content", including infoboxes (seeTech News). I wonder if this is something where we could go for missing data and import them to Wikidata.
However, the habit of copying Wikipedia infoboxes to Wikidata has been somewhat controversial because of its error rate and lack of references to sources (or means to import them). On Wikidata, we can take a different approach and import directly from external sources (public registries, etc.), and we also indicate them as the source. Of course, this is possible on Wikipedia, too, but AFAIC imports like this are not done often. On Wikidata, this is "by design".
Also, queries over Wikipedia are more difficult, inefficient, or the available data varies by Wikipedia editions (e.g., categorization trees). Nothing comparable to WDQS is available for Wikipedia.
So to answer your questionhow can Wikidata be useful IRL if it has less data than Wikipedia: 1) by offeringactual structured data, 2) by offering quality, 3) by offering efficient means for aggregating results over many data entries. --Matěj Suchánek (talk)11:17, 2 October 2025 (UTC)Reply
publishing datasets of Wikipedia's "parsed content", including infoboxes (see Tech News). I wonder if this is something where we could go for missing data and import them to Wikidata. That's a great idea! Has somebody looked into this? This may enable importing more data more quickly more easily in comparison to Harvest Templates. Please also let me know if you know of any resources that are about importing data to Wikidata from these dumps.
Good point. 1. many people also edit items by hand (and without adding a source) 2. many values in Wikipedia do have sources which could be imported too 3. for many values like again spoken Wikipedia audios there is no need for a source as it would make no sense 4. if there are some flaws from time to time, these are quite rare on at least ENWP and other sources where data is being imported from can also be flawed; most importantly though if the value gets edited some tool could show a mismatch so that it can be fixed in Wikipedia or Wikidata wherever the error is.
There's the Wikipedia API and probably more questions, demos, and projects online to get data from Wikipedia infoboxes etc than from Wikidata even though querying Wikidata via sparql is often more convenient (despite that it can quickly run into timeouts and sparql is not sth many people know much about). That the available data varies per language is unimportant as, depending on the data sought, people can simply import from the largest Wikipedia, ENWP. They aren't necessarily more difficult or inefficient; people can use premade packages for making Wikipedia API calls and if you get less data with WD that is still less efficient even when WDQS is more designed for these kinds of things.
1) Wikipedia also has structured data in infoboxes and categories. Here's just more and basically only structured data and things built with queryability in mind. That it has structured data doesn't answer the question as is; if the data is too incomplete it can be as structured as it can be but not enable any real-world uses 2) Wikipedia is checked by far more people and other databases have quality too so I wonder how that would answer the question – it's more like a goal or positive aspect; WD is not super reliable and how millions of for example book and study items with essentially no watcher can be all effectively protected from vandalism may be a subject for another day 3) appreciate the answer but here again what is missing is the real-world practical part – this may be greatin theory but in practice even the best efficient means for aggregating results over many data entries are of no use if the data is very incomplete. The emphasis is on the use in practice, secondarily in specific uses that are of interest to many people and only third about Wikidata in terms of technical tool / potential aspects.
"many people also edit items by hand (and without adding a source)" Perhaps Wikidata should make editing items by hand less cumbersome then?Trade (talk)22:12, 3 October 2025 (UTC)Reply
You can't take out tiny parts completely out of context like that. I said this in the context of Suchánek's claim about error rate of data in English Wikipedia infoboxes. This doesn't address anything I said and Wikidata editing isn't cumbersome I think except that the add statement button shouldn't just be at a variable place at the bottom.Prototyperspective (talk)22:38, 4 October 2025 (UTC)Reply
Well, since there appareantly isn't an issue with manual editing or adding sources being cumbersome i guess there's nothing for us to discuss then?
Your comments really aren't constructive. If you personally find there is nothing to discuss, I'd suggest to not comment. The comment by Suchánek whose thread you derailed by bringing up something entirely unrelated and unconnected to the thread topics was precisely ontopic, insightful and interesting.Prototyperspective (talk)11:13, 5 October 2025 (UTC)Reply
it's more like a goal or positive aspect Yes, maybe this describes my view better.
are of no use if the data is very incomplete I agree. It's worth noting, though, that we can use it to measure (in)completeness, too.
about error rate of data in English Wikipedia infoboxes In infoboxes in general. Perhaps English Wikipedia has higher quality, yet it doesn't cover all items. If true, this would be a good motivation for trying to import these dumps.
Any effort to import data from infoboxes should be avoided unless the citations used in those infoboxes are imported as well. Too often I see large imports usingimported from Wikimedia project(P143) as a reference. Citing the Wikipedia article is as bad as having nothing at all, not only because the data may change and we're left with an incongruity, and because those imports can sit for years without being properly reviewed and cited, but also for the same reasonWikipedia:Wikipedia is not a reliable source(Q22964187) exists on multiple sites.—Huntster(t@c)13:07, 5 October 2025 (UTC)Reply
that we can use it to measure (in)completeness, too If one could measure it that way by comparing both data then that would probably usually already 3/4th of importing the Wikipedia data into Wikidata. I guess except if one takes a small sample and compares the data for these.Perhaps English Wikipedia has higher quality, yet it doesn't cover all items It depends on the subject for many cases (and one could also limit the imports to these), it has like >95% of items across the Wikipedias. Basically one could limit infobox data imports to the global Wikipedia which is ENWP.Mismatch Finder? Interesting, didn't know about it and if that tool can't do this yet this seems like a perfect candidate tool for functionality to find mismatches and missing data in Wikidata when comparing the linked Wikipedia article(s).Prototyperspective (talk)23:30, 5 October 2025 (UTC)Reply
My attempts to use wikipedia infoboxes generally fail as there is huge inconsistency in the way infoboxes are coded on each page, with attempts to parse fields scuppered by inconsistent units and endless inline templates. The raw wikicode is hard to parse, the formatted html is easier, but mostly other sources that just render databases are better still. And unlike here there is no attempt make consistent infoboxes across the whole of human knowledge, its very balkanised.Vicarage (talk)18:16, 2 October 2025 (UTC)Reply
It would be very useful if you could document the difficulties you faces as well as the methods (including the tools) you used. In my limited experience templates simply had parameters that were set in the articles in consistent ways with no embedded templates other than refs and these kinds of[sup notes] which I think could be excluded via regexing away {{xyz}} basically. Interesting to hear from your experience with this.
I built a Python bot to extract date of birth and date of deaths from various language Wikipedias. The code ishere and I used a YAML file to map out the different templates and parameters, that ishere. I used mwparserfromhell to parse the wikitext, but it failed quite a bit more often than I expected. From what I've seen, most of the sourcing tends to live in the article body rather than in the infoboxes.Difool (talk)01:54, 24 October 2025 (UTC)Reply
@Prototyperspective: Making a page to list the status quo of what we have and what issues are there sounds a good idea. : When it comes to importing sources, Harvest Templates was written before we had LLMs. Given that LLMs now exist, a better setup would use an LLM to check the sources of the Wikipedia article and import the source as well if the source backs up the claim and otherwise not import a given claim. It could optionally provide a list of wrong citations that can be shared with the Wikipedia as well to help them improve their data quality.ChristianKl ❪✉❫11:46, 19 October 2025 (UTC)Reply
Good idea – things like this could also be mentioned on that page and discussed or developed further on its talk page over time. However, I don't think use of LLMs would necessarily be better. As in the example above, an LLM is not needed to extract an ID from a Wikipedia template. Secondly, Harvest Template could be improved by adding LLM features. Moreover, there could be a separate tool and both could be useful each for their own use-cases. Your idea seems specific to sources and I don't know if you're suggesting this to be used also for other kinds of data. For importing things from infoboxes including the source, an LLM is not needed and may be less efficient or less reliable or infeasible. If the purpose is to generate as many statements from a Wikipedia article as possible, each along with their references then an LLM would be the right tool but there can still be issues that need to be mitigated of it making errors and of the Wikipedia article changing over time (e.g. faulty things being removed and new data being added). By the way, I'd imagine if Abstract Wikipedia is ever going to be comprehensive it would need something like that to get data from Wikipedia into Wikidata into AW (however I'm not convinced at all that this project has much potential and that it would be good to focus on that instead of an alternative approach that has become feasible in recent years).Prototyperspective (talk)23:21, 23 October 2025 (UTC)Reply
Strong support for moving InfoBox data from WikiPedia to WikiData, andStrong support for one global InfoBox template, or very few global InfoBox templates with strictly defined scope.Strong oppose gazillions of local templates, nonsense redirects, parameter aliases, notorious usage of unsupported parameters, still bad template design 12 years after LUA got activated, and the unwillingness of the WikiPedia communities to migrate from indiscriminate quantity and obsession by new features to "strength by simplicity".Taylor 49 (talk)11:57, 19 October 2025 (UTC)Reply
The premise is wrong. There are multiple Wikipedias and one Wikidata. Wikidata contains data different fro any Wikipedia however, from a dataset point of view, its data is often similar but inherently different from what Wikipedias hold. No red links or black links and the likelihood of false friends is different as well. Wikidata knows about data missing from categories. It knows about data that Wikipedias are unable to maintain. There are too many lists that change regularly but where are too many Wikipedias that cannot keep up.
If we want to be smart about it, we do harvest data on an organised scale and seed the harvested data on an organised scale. There must be a point to it and the only valid point is that our projects aim to serve an audience. We will our audience better once we start cooperating and collaborating at scale. Thanks,GerardM (talk)19:09, 20 October 2025 (UTC)Reply
Usually, when some data is updated, the latest information is entered on a Wikipedia infobox by a Wikipedia user as the user searches for sources and expands a Wikipedia article. Then, some other Wikipedia users translate the Wikipedia article into other languages and update the infoboxes elsewhere. Finally, a bot crawls one of these Wikipedia infoboxes and updates it here. But that is uncertain if outdated information is already present in Wikidata by previous bot and the bot can't be sure which one is the latest.Midleading (talk)03:02, 24 October 2025 (UTC)Reply
Latest comment:4 days ago6 comments4 people in discussion
The Wounded Tone(Q3291150) is about a band with no real notability. There are articles in 10 languages but all very short with no references, and Google search only finds Wikipedia and pages based on Wikipedia articles, but I'm not sure a Wikidata deletion request would succeed because of the number of sitelinks that would have to be deleted first. Is there somewhere to make deletion requests such as this?Peter James (talk)20:17, 12 October 2025 (UTC)Reply
My guess is that someone related to the band made it and because they did so around 2009 and because the articles were so old they flew under the radarImmanuelle (talk)21:08, 12 October 2025 (UTC)Reply
We cannot delete it here while valid sitelinks remain. You would need to request deletion on every project and wait for that to complete. You might findTwinkle Global useful, and note the "Request at m:Global sysops/Requests" option available on small wikis.Bovlb (talk)21:16, 12 October 2025 (UTC)Reply
This is a common issue. Still, you must get those 10 articles on 10 wikipedias (or at least 9 of 10) deleted before the Q-item can and ultimately should be deleted.Taylor 49 (talk)09:24, 23 October 2025 (UTC)Reply
Generally we don't expect text properties to contain wiki text, including square brackets for page links. However, there are Wikipedias that rely on it for infoboxes, like I think is happening here. The articleit:Venezia (metropolitana di Roma) on itwiki has a template that retrieves that statement to value to be parsed as wiki text.
I am opposed to this practice, but I don't often change it when I see it. It would really require a dialogue with that wiki and the template users before making a breaking change.William Graham (talk)13:38, 16 October 2025 (UTC)Reply
What I was wondering is whether I'd ask a cleaning bot (removing brackets and wikicode stuff) inside text such as adresses, or if it'd better be cleaned using quickstatements (removing, cleaning, restored) ?Bouzinac💬●✒️●💛08:44, 17 October 2025 (UTC)Reply
Wikitext shouldn't be used on Wikidata, because it is interpreted differently per project (i.e.,[[link]] may be blue on itwiki, but red on cswiki). This makes the data useful only for the project where it was imported from, and we shouldn't really give up on this. --Matěj Suchánek (talk)07:49, 18 October 2025 (UTC)Reply
I see your points and understand there's a more broad problem than simply cleaning some texts.
Nevertheless, querying the counts with [[ : 281 and without [[ : 2182917, so I'd guess it's more of a small fixable problem
Latest comment:6 days ago5 comments4 people in discussion
Hi everyone! Do we have a way to identify items that have long descriptions in a given language? For reference, this itemSiddharth Patel(Q136481316) has a description that doesn’t follow the description guidelines. I was wondering if there’s a way to find other items that fall into this category, maybe with a query? ThanksSoylacarli (talk)16:20, 17 October 2025 (UTC)Reply
Latest comment:7 days ago11 comments4 people in discussion
On the pageburgomaster(Q177529) the short descriptions in most languages mention (to be used in P39). That is useful on wikidata, but it causes issues on wikipedias so should we remove it? I already removed it on English.Immanuelle (talk)00:16, 18 October 2025 (UTC)Reply
No. I don't think such use should be encouraged, but I also don't think actively removing it is a good idea. It's a clunky workaround for clunky UX problem, but there aren't any other useful solutions right now. We haveWikidata usage instructions(P2559) but that isn't actually surfaced anywhere in the Wikidata UI AFAIK. (seephab:T97566 andphab:T140131)M2Ys4U (talk)05:01, 18 October 2025 (UTC)Reply
I have avery experimental user script to surface the usage instructions atUser:Ainali/usageInstructions.js. If there are any, a small icon is shown next to the property that gives a tiny popup with the instructions. (Unfortunately, it still only works on existing statements, not while adding new ones.)Ainali (talk)12:31, 18 October 2025 (UTC)Reply
I think usage instructions should not be in the description field. (There might be exceptions for "meta"-items, that is, items that we mainly use on Wikidata properties, but even then P2559 should be preferred.)Ainali (talk)06:56, 19 October 2025 (UTC)Reply
If you want to revive a discussion like that it would make sense for you to do the work to make it easy to engage for other people. Linking to a long discussion doesn't make it easy for other people to engage. Having a document with what you think the structure should be, would make it easier for someone else to comment whether they agree or think there are issues.ChristianKl ❪✉❫12:14, 18 October 2025 (UTC)Reply
Latest comment:20 hours ago5 comments5 people in discussion
Hello Wikidata editors,
I am Ryan Martin, and I hold the Guinness World Record for the most basketball free throws made in one hour. I would like to have my record added to Wikidata properly. Here are the details:
Sure. Ryan Martin, grew up in Maine, played college basketball at UMAINE and Keene State College and was drafted to play professional basketball in the NBL of Canada. Now, travel around the country giving shooting clinics to all levels of athletes.169.244.86.6714:09, 20 October 2025 (UTC)Reply
Latest comment:4 days ago5 comments3 people in discussion
Hello everyone,
We are a team building aweb application for the Supreme Court of Ghana cases, powered by theWikidata Query Service. The goal is to make Ghana’s case law more accessible to students, lawyers, and researchers using open data.
However, we’ve recently discovered a few major data issues that are affecting the reliability of our queries and app output, and we’d appreciate the community’s advice on how to fix them properly.
During our earlier batch uploads, we mistakenly used theGhanaian date format (DD/MM/YYYY) instead of the ISO format (YYYY-MM-DD).
→ Many items now appear with incorrect years such as14 or19 instead of2014 or2019.
Several cases share identical or very similar titles (e.g.,Republic v. High Court).
→ This has led toautomated or manual merges of distinct cases into single items, causing entries to have multiple decision dates and citations that don’t belong together.
Extract all Ghana Supreme Court case data using SPARQL (titles, dates, citations).
Correct all wrong dates (convert DD/MM/YYYY → YYYY-MM-DD) and prepare re-uploads.
Separate merged cases by restoring or creating new unique items for each decision.
Prevent future merges by:
Including unique identifiers (citations, registry numbers, or full dates) in item labels,
Requesting semi-protection for verified cases, or
Setting up a dedicated taskforce (e.g.,WikiProject Ghana Law) to coordinate ongoing cleanup and monitoring.
We’re also open to discussing whether adedicated Wikibase instance for Ghana’s legal data (linked to Wikidata) might be a sustainable long-term solution.
What’s the most efficient way tofix large numbers of incorrect date values on Wikidata?
How can weunmerge or clean up merged case items?
Are there existingtools or workflows to prevent similar-title merges in future?
Would it be acceptable todelete and re-upload the affected data, or should weedit in place?
How can we bestprotect verified items that serve as stable references for live applications?
We’d be grateful for any technical guidance, lessons from similar projects, or help from experienced contributors.Thanks,
You can undo batch edits athttps://editgroups.toolforge.org/b/QSv2T/1759991263799/ (the last number is the batch Id, for each batch you created you can do that) but I don't think it deletes already created items as normal users don't have the deletion right on Wikidata.
This is not an use-case for which semi-protection is used on Wikidata.
The general workflow is to start with a small batch of items and look at the items you created whether everything is right before you make huge edits.
Unique identifiers should not be in the item label, but you can have them in the item description. The item description is the place to add information to disambiguate cases.ChristianKl ❪✉❫12:23, 19 October 2025 (UTC)Reply
Thank you very much for the feedback and clarification @ChristianKl.
We understand the normal workflow with QuickStatements and EditGroups, and we’ve used the batch undo tool before.
However, in this case, the issue goes beyond just reverting batches. It affects data reliability at the structural level because the data was uploaded using the wrong date format (DD/MM/YYYY instead of YYYY-MM-DD), and because several similarly titled cases have since been merged or cross-linked by mistake, it has become **very difficult to separate and clean** the affected items manually.
For our use case, a liveweb app for the Supreme Court of Ghana cases powered by WQS, we need the data to be fully reliable and unambiguous.
Given the scale of the problem, we think the best path forward would be to:
Permanently delete all items that match the specific properties of the previous Ghana Supreme Court case uploads.
Re-upload the entire cleaned dataset in one verified batch, following the correct date format and improved disambiguation (using unique descriptions instead of identifiers in labels).
After upload, coordinate with admins to monitor future edits and prevent accidental merges.
This would restore the data integrity and ensure that Wikidata remains a dependable source for Ghana’s legal information.
Would any administrator be willing to assist with or authorize a **mass deletion** of the affected items so that we can re-upload the corrected dataset in a single batch?
We’re happy to provide the exact list of item IDs (QIDs) for deletion if needed.
Thank you again for your guidance; we truly want to handle this the right way and keep the Ghana legal data consistent with Wikidata’s quality standards.
How many items are we talking about? If the case law is strongly self-contained, perhaps you own wikibase instance would be better. How many bad merges have occurred, can they be spotted and undone manually. We certainly don't like doing mass deletes, as it leaves floating references in our data structure, and you don't know who's started using the data.Vicarage (talk)21:44, 20 October 2025 (UTC)Reply
Thank you very much for the response and for raising those important points, @Vicarage.
The dataset was recently updated, and we’re currently looking at over1,000 Supreme Court of Ghana cases affected by the wrong date format, and we do not yet have a definite figure for the merging issues.
Unfortunately, many of the merges are quite complex. Several items have accumulated multiple decision dates and citations, so identifying and undoing them manually would take an enormous amount of time and still risk leaving inconsistencies behind.
We completely understand the hesitation around mass deletions and the risk of floating references, but given that this particular dataset isself-contained and recently uploaded, it does not yet have widespread external usage.
That is why we believe aclean re-upload of all the cases (after full correction and verification) would be the safest and most reliable approach for both Wikidata integrity and our web app’s data accuracy.
If mass deletion is not an option, we would appreciate guidance on any other method the admins would recommend to achieve the same clean reset, for example, bulk null edits or admin-assisted cleanup scripts.
This is not a normal use-case for deletion. I wouldn't object to another admin doing the deletions but deleting items to make readding data less complicated isn't something we normally do in Wikidata. It wastes IDs.
The fact that you suggest doing everything in one batch instead of creating one small batch to check whether things are alright as I suggested, also doesn't exactly raise my confidence.
"After upload, coordinate with admins to monitor future edits and prevent accidental merges." it's not the job of Wikidata admins to monitor edits. If you are interested in data, it's your role to monitor the data you are interested in.ChristianKl ❪✉❫14:09, 22 October 2025 (UTC)Reply
Help us decide the name of the new Abstract Wikipedia project
Latest comment:6 days ago1 comment1 person in discussion
Hello. Please help pick a name for the new Abstract Wikipedia wiki project. This project will be a wiki that will enable users to combine functions fromWikifunctions and data from Wikidata in order to generate natural language sentences in any supported languages. These sentences can then be used by any Wikipedia (or elsewhere).
There will be two rounds of voting, each followed by legal review of candidates, with votes beginning on 20 October and 17 November 2025. Our goal is to have a final project name selected on mid-December 2025. If you would like to participate, thenplease learn more and vote now at meta-wiki.Thank you!
Latest comment:6 days ago3 comments2 people in discussion
Hi, I hope this is the correct spot for my question. I'm MartinD on NL Wikipedia, Commons and NL Wikivoyage. One of my pastimes is bringing articles on French municipalities up to date. This is done with a template that asks Wikidata for a number of details for a given municipality, in any case the most recent population number, and the source for this information. At this moment, that is the population as of January 1st, 2022, and the source is a publication by INSEE,Populations de référence 2022 (Q131560738).
The templates in the (to some extent) standard article on NL led to a footnote after the tekst itself (XXX persons as of Januari 1st, 2022"") and an appendix that stated the source of this number. Please seethe article on Coutevroult, updated October 15 for how it worked, until a few days ago.
As far as we can see, there have been no changes to the software on our side that could explain this. Obviously, we would like, if possible, to "back up" our claim as to the population with a reliable source.
According to my esteemed colleague Mbch331 on the Dutch Wikipedia, it is. Last evening he informed me that the problem is solved. My thanks to everybody who has been involved in this matter! Kind regards,MartinD (talk)06:58, 21 October 2025 (UTC)Reply
🎉 Wikidata's 13th Birthday! Join us online, October 29 17:00 UTC (in your timezone) for presents, birthday messages, games...and a 📣surprise announcement (don't miss it!). 🎁 Add your own gifts to the list - anything that celebrates Wikidata and its amazing community... a script you’ve written, tool improvement, visual, poem...the sky's the limit. Get the call link here:13th Birthday Presents & Messages call.
The Dublin CoreDCMI 2025 conference will take place in Barcelona, Oct 22-25.
Presentation:Derivative Relationships and Bibliographic Families Among Creative Works: A Systematic Study of Their Application by the Wikidata Community from the FRBR and BIBFRAME Perspective
Workshop byJneubert on converting a complex web application into a large static site integrated with Wikidata
Tutorial fromJelabra on shaping linked data and knowledge graphs.
WikiLokal - The tool uses your device location to find Wikidata Items and Wikipedia articles within a 3 kilometre radius of your location. Created byUser:Affandy Murad
Newest General datatypes:organizational chart (image that displays the structure of this organization or government agency and the relationships and relative ranks of its parts and positions/jobs)
Showcase Lexemes:led (L762994) - Norwegian Nynorsk noun (leː) meaning "a joint between bones", "a movable body part", or "a generation"
Development
Mobile statement editing: We made progress on support for more datatypes for editing.
Wikidata integration in Wikipedia and co:
We are investigating an issue with infoboxes using data from Wikidata after some recent changes to usage tracking (phab:T407684)
We are working on more improvements to the Databox module
Ontology federation: We are starting to work on making it possible for other Wikibase instances to use Wikidata's Items as values in their statements. This expands on previous work around federated properties.
Dumps: We have worked on fixing issues with the dumps that were failing recently. They should now be generated again and we continue to look into the cause.
@Kolja21: the items containingdescribed by source(P1343)BEIC Digital Library(Q51955019) are 14953 (https://w.wiki/FkfU). The identity of the person can be established looking at the bibliographic record(s) present in BEIC, except in cases like the ones above where the link is faulty. Among these items, 2638 have no identifiers (https://w.wiki/FkfS); these as of now IMHO do not respectWD:N2 since they are note instances of "a clearly identifiable conceptual or material entity"; at least, they are not presently, and a certain amount of manual work is needed to fix them. @Spinoziano (BEIC): would you like to start improving them? I also notice thatWikidata:BEIC would need an update.Epìdosis21:36, 20 October 2025 (UTC)Reply
Those "search links" were the way to access authority records by the corresponding identifier, the controlled author name. In most cases BEIC contributed these to CERL, so they can also be found on the CERL Thesaurus (which was also added back then).
It's true that the digital library has endured some software and data migrations, by reason of which some records may no longer be accessible, or not from the same locations. In the case of Joshua Bell, seeing he was musician, it was probably some opera recording, which might have been removed for copyright or other reasons (it was a small collection and the partnership with the sources may have ended).Nemo21:50, 20 October 2025 (UTC)Reply
Ifixed Pierozzi andremoved Bell. I already regularly do a manual activity to fix these items, but it is a long-term maintenance job. I think that it is not a problem when BEIC Digital Library is cited as the only source if the link provides clear biographical data and reference of works, possibly with links to works available online (indicated in the BEIC source) and files uploaded on Wikimedia Commons, but, as Epìdosis notices, there are evidently many items in which this is missing and which need to be revised and corrected with more references. --Spinoziano (BEIC) (talk)07:32, 21 October 2025 (UTC)Reply
Latest comment:3 days ago7 comments4 people in discussion
I've been gone for a while but I'm back to try to create more integration between wikidata and enwiki. I really think increasing cross-wiki usage is key for wikidata's long term success. You can see some discussion of this aten:Template_talk:Infobox_social_media_personality#Module:YouTubeSubscribers.
I already have permission formy bot to sync youtube subscriber counts but one potential stumbling block is youtube view data. Traditionally we haven't sync'd that data but enwiki wants total number of views. It's trivial for me to have my bot do that but I'm not sure if there's consensus. Should I just "do it" following the same rules we wrote for subscribers?
I truly wonder if ephemeral data like this is best kept at Wikidata. What I would do is use Wikidata to establish the link between the article and the Youtube channel. Then update once in a while the data on the article. Thanks,GerardM (talk)06:21, 21 October 2025 (UTC)Reply
I mean what counts as "ephemeral data"? Is the information tracked byUser:Github-wiki-bot ephemeral?
I see generally two objections to tracking information like this. One is edit volume and the other is historical retention. The edit volume is going to happen either way. Indeed it's better to have the edit volume in one place rather than multiplies across all the wikis. Historical retention is a real issue. It's historical wikidata law that we retain all "valid" but dated data but we could consider changing that rule (e.g. only retain one distinct value per year). Personally I got involved in tracking social media data because I had use cases where I wanted to execute SPARQL queries with access to these statistics.BrokenSegue (talk)11:43, 21 October 2025 (UTC)Reply
I think changing the rules for historical data would be a good solution for constantly changing values like subscribers, views, exchange rates, members, etc. We want current data, but we must avoid bloating the system. "One value per year for values older than one year" would be my preferred rule.NGOgo (WikiProject Nonprofits!)13:16, 21 October 2025 (UTC)Reply
My broader concern is that wikis are not going to be interested in feeding in data from wikidata that never changes. What's the value? Labor and quality savings come from centralizing and automating ongoing work.BrokenSegue (talk)16:46, 21 October 2025 (UTC)Reply
Ultimately prefer ONE YouTube module over over gazillion YouTube modules. IfUser:BorkedBot already has approval to update subscriber counts, then updating closely related view data on same approval and consensus IMHO is no problem. Still, I am very skeptical towards Gugl and YouTube. They areno way reliable sources, subscriber counts and view data can be manipulated easily by the owner or by third parties. LooTube's policy when it comes to privacy is absolutely unacceptable. Access to information without being tracked or pressured to give away personal details is a right, and a substantial part or LooTube's "technology" serves nothing else than infringing this right. Plus LooTube aggressively prefers proprietary file formats over free formats, makes downloading of videos difficult, uses aninherently addictive business model, and has unreasonably high system requirements. I discourage from Gugl and LooTube since they are a threat against both our freedom and our planet. Also support sane rules for constantly changing values such as "number of members" or "CO2 concentration". Do not edit more than one time per year, unless an extraordinary event causes a sudden unusually large change.Taylor 49 (talk)09:18, 23 October 2025 (UTC)Reply
Latest comment:5 days ago4 comments3 people in discussion
When I merge an item, it changes one of the merged items into a redirect to another, leaving properties of items which refer to the item that is changed to the redirect dangling, breaking constraints such as inverse relationship constraints. Is there a bot that fixes these?The Anome (talk)08:02, 21 October 2025 (UTC)Reply
Latest comment:4 days ago4 comments2 people in discussion
If we accept that books (at least those literary works that were formally published ) are notable, because they are clearly identifiable entities, and there are plenty of reputable reference sites, from the Library of Congress downwards, to document them, can we argue that all the authors involved are notable enough to have their own entries. I think there is a clear structural need, because a book needs authors, but the author entry could be very bare, if all that is recorded is their name and occupation of author, and perhaps a book database author table entry. If I wanted to upload a bibliography for a subject area, I'd want to get the book titles, dates and subjects well characterised, but not want to go down the rabbit hole of author biographies. I know we do this for academics writing papers, but should we for the general public? Is it preferable to useauthor name string(P2093) instead?Vicarage (talk)07:29, 22 October 2025 (UTC)Reply
I see the value in separating book item creation from author identification during mass uploads. But if I know the author isn't yet in Wikidata, and I'm adding multiple books by them, I'd create an item to ensure the link is present, even if the author item is very bare. I see your point when it comes to one-time authors, though I wonder how often those truly exist?
To me Unknown suggests not known generally, rather than merely a person who does not yet have a wikidata entry, and I certainly find Unknown placemarkers much harder to deal with than absences. And if a work has 2 authors, but only one is in WD, I think the Unknown approach feels very odd.Vicarage (talk)18:26, 22 October 2025 (UTC)Reply
As you aren't an established user, you should not assume that you fully understand what company items are valid. Wikidata is not a place to advertise your company.
Latest comment:3 days ago3 comments3 people in discussion
Is there an efficient/quick way to either create new items or add an honor or award to existing items for these2024 and 2025 Class of Honorary Fellows of the American Psychiatric Association? Many if not all of them may meet the English Wiki WPːPROF notability requirement. Any assistance is appreicatedǃCoqui002 (talk)00:31, 23 October 2025 (UTC)Reply
The amount of capital invested in Grokipedia is a lot higher than random AI slop projects. I think for each project proposal we should ask whether it helps Wikidata and not just add one project because we add other projects that share the same group.ChristianKl ❪✉❫22:22, 23 October 2025 (UTC)Reply
If someone open a property proposal which is then immediately flooded with votes based on hearsay, another proposal can always be made later when people have had a chance to familiarize themselves with it, and the website have had a chance to stomp out inevitable issues. There's no rush.Infrastruktur (talk)21:29, 23 October 2025 (UTC)Reply
Latest comment:4 days ago2 comments2 people in discussion
Hi all, I created a new Wikidata item for **Kontraktorstudio** (Q136544079), an Indonesian enterprise providing soundproofing and acoustic interior services (Silent Box portable booths, studio design).
Please check whether the item meets community standards and if any structural improvements are suggested (preferred properties, qualifiers, or additional references).
Happy to add more references or external IDs on request.
We also havebackup or reserve team or crew(P3015), which does not have an inverse, so we have the odd situation that an astronaut only has part of their career visibly documented in their entry (Unless you used qualifiers, and that would make it too easy to put Fred Haise on Apollo 11). So I think you are right, we can only eliminateastronaut mission(P450), as a space mission would be odd with only its backup crew listed in the entry.Vicarage (talk)06:24, 24 October 2025 (UTC)Reply
Q4340209 and Q42844 are almost clearly seperated, yet messily entangled
Latest comment:2 days ago3 comments2 people in discussion
- right?
Q4340209 should precisely not be a subclass ofQ12136, because that's whatQ42844 is for, correct?
I am currently not sure I'm misunderstanding something, but if I'm not overlooking something, I'd go throughQ43402090 and remove/downrank everything that's not related to the mood, but the disease.
Onthe statement, the value "212 kilogram" is supported by a non reliable reference that is, imported from Wikimedia project Russian Wikipedia,the article on Russian Wikipedia hasno mention of the mass so, what you can do is:
- Search a reference about his weight;
- Tell me the value, I can edit the article as I am extended confirmed; and
Thanks to you all. If you found a more recent source that states 66 kg, no problem, what is important is that 212 kg (obviously exaggerated) is changed.Piccionaia (talk)05:46, 25 October 2025 (UTC)Reply
You proposal speak about somehow define new properties while skipping the step of create property proposals to seek consensus to create new properties. It ignores that currently have 20 subproperties ofhas part(s)(P527) and somehow suggests we can analyze the use ofhas part(s)(P527) without looking at it's current subproperties and then propose a list of subproperties.part of(P361) currently has 33 subproperties and if you want to look at mereology that probably also makes sense to look at before proposing how to reorganizing everything.ChristianKl ❪✉❫11:22, 25 October 2025 (UTC)Reply
Has part(s), when describing physical parthood, is transitive. So human would have part human hand, since human has part human body, and human body has human hand.
About the 20 subproperties, they are mostly single use and not like the parthood relationships that we want to start with. The reason we're not using properties but qualifiers is indeed because attaining community consensus is really difficult and takes up much more time than simply using qualifiers. I don't think it does any harm to add qualifiers like "Describes physical parthood" or "Describes conceptual parthood" etc. I would love to hear you out if you disagree with this though.Egezort (talk)12:20, 25 October 2025 (UTC)Reply
I think it's problematic to have people who don't understand how properties work inside of Wikidata but study ontology in other contexts try to dictate how things should be done within Wikidata without familiarizing themselves with the needs of Wikidata.
If you try to make a huge change without seeking community consensus you create a mess where data doesn't follow a uniform structure. Saying that you want to create a task force and get a grant but don't want to spend the time to attain community consensus because it feels too hard, is in general something that should lead to rejecting any grant.
'qualifiers like "Describes physical parthood"' this sounds to me like you don't understand how qualifiers in Wikidata work. There's no "describes" property that you could use that way and the motivation seems pretty strange. The primary reason we have the property proposal process is because we want consistent data modeling. The position that you want to start a Mereology Task Force but don't care about consistent data modeling of mereology seems strange to me. Yes, understanding the needs of Wikidata well enough to create a consensus for the change you want to make is not easy but it's what the core problem is about.
I know that "Describes physical parthood" is not a suitable qualifier. I didn't explicitly say what I would do but gave an example. For a realer example, what I would do is probably:
This wouldn't be inconsistent modelling. We would only use it for physical parthood relationships. If you explain why you think this is inconsistent, or bad modelling, I'd be happy to hear, and change my approach if necessary.
"body and hand" and "Body and left hand" are both physical parthood relationships. But the transitive nature doesn't apply here, left hand is a subclass of hand, not a part. For an example with the transitive quality, it would be "body has hands" and "hands have fingers", therefore "body has fingers".
I think the "has part(s) of the class" property is a step in the right direction, but I'm not convinced that it is sufficient.
I'm not trying to bypass community consensus completely, but the property proposal process is very hard and also it may prove impossible to go further. I've been an observer of some efforts that halted for weeks just because they couldn't get properties passed. This is not productive in any way. However, we do commit to always listening to objections and trying to find a suitable solution, no matter what the objection is. I would encourage you to follow our efforts and I can ping you whenever a discussion comes up. In cases of significant opposition, of course we will not go forward without convincing the objectors.Egezort (talk)00:27, 27 October 2025 (UTC)Reply
New items being created now don't have many gaps - Q136644610 and Q136644703 were not created but all IDs inbetween exist - but when this was reported there were many gaps. I couldn't find a gap of 94, but Q136628825 is followed by Q136628908; there could have been a deleted item or two somewhere in that gap, but most did not exist. It could have been a batch with invalid data, sitelinks already in use or to nonexistent pages, or labels and descriptions matching existing items.Peter James (talk)10:36, 25 October 2025 (UTC)Reply
Probably, but when I added the topic there was indeed massive gaps (some were like 130), these gaps have diminished but they’re still here - you’re likely right - prob invalid or deleted dataQwertyZ34 (talk)12:35, 25 October 2025 (UTC)Reply
page_id's are assigned sequentially (i.e., no deletions), many entity ids are skipped.
In general, gaps occur when creating a new item fails due to anti-spam filters or similar, or they indicate some rogue bot or tool. However, neitherSpecial:AbuseLog nor spamblacklist give an idea. Not sure where we could get more hints. --Matěj Suchánek (talk)18:26, 25 October 2025 (UTC)Reply
In most cases it would be better the items were created with the data corrected or blacklisted links removed, or (for items repeatedly created and deleted, or non-notable page types) if requests were not submitted. I think a check that the data is valid could be performed first (at least with "Create a new item" and QuickStatements and similar tools) - QuickStatements would be better if it was possible to return to the screen where data can be edited before retrying. The same could probably done with blacklists and filters - I thought requests would go through the filters before submitting it to the database. Items with identical label and description as another could probably be created and logged somewhere as it's more likely that the descriptions should be changed - the decision not to allow them has resulted in incomplete imports and in QuickStatements is one cause of the error "no success flag set in API result" (Q136647709, Q136647711, Q136647714 and Q136647720 fromhttps://quickstatements.toolforge.org/#/batch/251405). The other cause of failure to create an item I have encountered is sitelink conflicts, for which one solution would be to create an item then add sitelinks. I think "item created but sitelink to ... could not be added as it is used in Q..., consider merging" or even "item created but sitelink could not be added as the page was not found" and logging it in a list of potentially duplicate or non-notable items would be better than "the save has failed", which is all QuickStatements told me when I attempted to create a new item with a link I thought I had removed from another item but was still there.Peter James (talk)04:43, 26 October 2025 (UTC)Reply
I would say P291 is currently a bit unclear and would benefit from clarification, to be more specific about it's meaning. That then would allow for making the call.ChristianKl ❪✉❫11:28, 25 October 2025 (UTC)Reply
Moving one property to another preserving qualifiers and references
Latest comment:1 day ago4 comments2 people in discussion
I want to move one property to another for about 2000 items. I know I can use Move Claim in the GUI, or write special SPARQL and write my own QuickStatements, but surely someone has written a batch tool I can run from bash with a list of Q values and a P one.Vicarage (talk)14:53, 25 October 2025 (UTC)Reply
A proper deletion discussion for P450 would be required here. It would then be pretty simple to move existing claims to P5096 including all qualifiers and references. One issue might be already existing P5096 claims, but that is a future problem at this point. —MisterSynergy (talk)16:47, 25 October 2025 (UTC)Reply
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment.RVA2869 (talk)13:22, 26 October 2025 (UTC)Reply
Currently this element is sort of a mess. It's stated that it's only instance ofprofession(Q28640) andoccupation(Q12737077) – which are correct for the "football coach"profession, – but most of the sitelinks mostly describe the "football manager" teamposition (position(Q4164871)) – and it's even named differently in different languages, for some it's coach, for some manager. I believe this element should be splitted into two, "football coach" profession and "football manager" position (being a subclass ofhead coach(Q3246315)) – so that the first one could be used foroccupation(P106) property while the second one forposition held(P39). What do you think? And what should we do with sitelinks then?Well very well (talk)14:19, 26 October 2025 (UTC)Reply
It's kind of a mess in the real world as well. Some teams have a manager called a coach and some have a coach called a manager! My own team's manager has a job title now of Head Coach (I think every one of his immediate predecessors was First Team Manager) but everyone calls him manager and regards him as a continuation of that position. So even in that example is he a coach or a manager? Your suggestion does make sense but it's going to be hard to reconcile everything as the two labels are so fluid.GrimRob (talk)22:19, 26 October 2025 (UTC)Reply
I believe that still in this example his profession is coach and his position is manager/head coach. This actually also benefits some other elements likesenior coach(Q136649133) andassistant coach(Q11703711), which are clearly position-only (e.g. "assistant coach" doesn't make sense as a profession — profession is also coach — but makes much sense as a position).Well very well (talk)03:29, 27 October 2025 (UTC)Reply
Latest comment:15 hours ago1 comment1 person in discussion
Здравствуйте!
Опубликован RFC-запрос о системной интеграции ИИ Копилота как встроенного агента для поддержки авторских систем координат, таких как 10-Ц / 10-V. Предложение касается автоматической замены устаревших артефактов, семантической преемственности и перехода к глобальной этике через машинно читаемые ценности.