Over the weekend of 25 November, citation templates receivedsome updates. One change, in particular, goes a long way in flagging freely-available resources. Here's a short history of what was needed for the most recent changes to fully pay off.
In October 2016, so-called "access locks" were deployed inCS1 andCS2 templates (seeSignpost coverage). After a fewRfCs on visual appearance, things settled in the current scheme:
Access locks for always-free resources, like papers hosted onarXiv or papers withPMCIDs, were automatically rolled out. But the main identifier for scientific articles is thesometimes-freeDOI, which requires the presence of|doi-access=free to signal whether or not a particular DOI link is free to read.
For those unfamiliar with DOIs, they are roughly the equivalent of whatISBNs are for books, and usually point to individual academic papers published in peer-reviewed journals. Their structure is10.xxxxx/foobar, with the10.xxxxx part being theDOI prefix, identifying who has registered the DOI in question. DOI registrants can be access platforms likeJSTOR (10.2307), individual journals likeNotre Dame Journal of Formal Logic (10.1305), or publishers like theIEEE (10.1109).
While the initial roll-out of DOI access locks was done manually and semi-automatically withWP:AWB,OA Bot greatly assisted in flagging free-to-read resources on select articles. However, OA Bot tends to be user-activated on specific articles, rather than systematically crawling every article on Wikipedia.
One way to find swathes of free DOIs is to identify DOI prefixes belonging to known open-access publishers. For example,10.3389 belongs to the (in)famousFrontiers Media, while10.3390 belongs to the equally controversialMDPI. It's then a simple matter to haveCitation botflag them. It worked pretty well for the big publishers, so an effort was madeto identify more open-access DOI prefixes, and the bot was updated accordingly.
Targeted Citation bot runs were done fromdatabase dumps — rather efficiently to begin with. But while database scans are good at finding articles containing specific DOI prefixes, they are bad at finding articles containing unflagged DOIs with these prefixes. Meaning that if, hypothetically, 92% of all articles with MDPI DOIs were flagged, you'd be wasting your processing power on 92% of articles with MDPI prefixes in them. As of writing, that's12,151 articles — meaning well over 11,000 articles would be processed for nothing to catch the other ~1000. And the next time, if you have 98% flagged ... you'll have an even more inefficient run.
Luckily, with therecent update to the CS1 and CS2 citation templates, we have a solution:Category:CS1 maint: unflagged free DOI. This is a category that specifically tracks if a citation has a) a known free DOI prefix and b) a DOI that has been flagged as free. As of writing, a bit over 16,000 Wikipedia articles have been identified and processed. Here's an example edit:flagging 2 DOIs with prefix10.3847, belonging to theAmerican Astronomical Society. Here's another:flagging 4 DOIs with prefixes10.1186, associated withBioMed Central journals, and10.1073, associated withProceedings of the National Academy of Sciences of the United States of America.
The hope is to have the category mostly cleared by the end of December, when it will contain only new additions. Those should be easily handled by daily bot runs.
About 2 to 3% of the 16,000 or so articles seem to have a free DOI that is unflagged inWikidata, which are (mostly) the ones remaining in the category. Sadly,{{cite q}} makes it impossible to deal with it here, as well as the many other issues Citation bot is able to correct. Hopefully Wikidata people can look at the updates to the CS1 and CS2 templates and go through whatever is going on on their side of things and update things accordingly.
It should be a relatively straightforward task for someone that understands how Wikidata works. That someone isn't me. But it could beyou!
"a free DOI that is unflagged in Wikidata... Sadly,{{cite q}} makes it impossible to deal with it here"
No, it does not; for example:
{{Cite Q|Q55893751}}could be changed by the bot to:
{{Cite Q|Q55893751 |doi-access=free}}and would render as:
However, rather than adding metadata to multiple instances of the same citation, it's far more sensible to hold the data on Wikidata, and to render it as part of each citation from there - which is{{Cite Q}}'s purpose.
Those of us working on Cite Q, and on citation metadata on Wikidata, would have appreciated being informed of this initiative when it was being developed, in order that the functionality could be rolled out, and metadata updated (by a bot acting on DOI prefixes in exactly the same manner as described above), in parallel.Andy Mabbett (Pigsonthewing);Talk to Andy;Andy's edits16:39, 4 December 2023 (UTC)[reply]
--[[--------------------------< B U I L D _ K N O W N _ F R E E _ D O I _ R E G I S T R A N T S _ T A B L E >--
|doi-access=,{{cite Q}} doesn't mention what its equivalent Wikidata property is.Headbomb {t ·c ·p ·b}21:31, 4 December 2023 (UTC)[reply]