Wikiget is a Unix command-line tool to retrieve lists of article titles from Wikipedia, search Wikipedia, edit Wikipedia and more.
Features:
A list of target article titles is often needed for bot makers. For example all articles in a category, articles that use atemplate (backlinks), or articles edited by a username (user contributions). Wget provides a simple front-end to common API requests.
Search Wikipedia from the command-line with the option for regex and snippits output.
Editing Wikipedia couldn't be easier with the -E option. See EDITSETUP for authentication.
Wikiget options and examples:
Wikiget - command-line access to some Wikimedia API functionsUsage: Backlinks: -b <name> Backlinks for article, template, userpage, etc.. -t <types> (option) 1-3 letter string of types of backlinks: n(ormal)t(ranscluded)f(ile). Default: "ntf". See -h for more info -n <namespace> (option) Pipe-separated numeric value(s) of namespace(s) Only list pages in this namespace. Default: 0 See -h for NS codes and examples Forward-links: -F <name> Forward-links for article, template, userpage, etc.. Redirects: -B <name> Redirects for article, template, userpage, etc.. -n <namespace> (option) Pipe-separated numeric value(s) of namespace(s) Only list redirects in this namespace. Default: 0 See -h for NS codes and examples User contributions: -u <username> Username without User: prefix -s <starttime> Start time in YMD format (-s 20150101). Required with -u -e <endtime> End time in YMD format (-e 20151231). If same as -s, does 24hr range. Required with -u -i <regex> (option) Edit comment must include regex match -j <regex> (option) Edit comment must exclude regex match -n <namespace> (option) Pipe-separated numeric value(s) of namespace Only list pages in this namespace. Default: 0 See -h for NS codes and examples Recent changes: -r Recent changes (past 30 days) aka Special:RecentChanges Either -o or -t required -o <username> Only list changes made by this user -k <tag> Only list changes tagged with this tag -i <regex> (option) Edit comment must include regex match -j <regex> (option) Edit comment must exclude regex match -n <namespace> (option) Pipe-separated numeric value(s) of namespace Only list pages in this namespace. Default: 0 See -h for NS codes and examples Category list: -c <category> List articles in a category -q <types> (option) 1-3 letter string of types of links: p(age)s(ubcat)f(ile). Default: "p" Search-result list: -a <search> List of articles containing a search string See docs https://www.mediawiki.org/wiki/Help:CirrusSearch -d (option) Include search-result snippet in output (def: title) -g <target> (option) Search in "title" or "text" (def: "text") -n <namespace> (option) Pipe-separated numeric value(s) of namespace Only list pages in this namespace. Default: 0 See -h for NS codes and examples -i <maxsize> (option) Max number of results to return. Default: 10000 10k max limit imposed by search engine -j (option) Show number of search results External links list: -x <domain name> List articles containing domain name (Special:Linksearch) Works with domain-name only. To search for a full URI use regex. eg. -a "insource:/http:\/\/gq.com\/home.htm/" To include subdomains use wildcards: "-x *.domain.com" -n <namespace> (option) Pipe-separated numeric value(s) of namespace Only list pages in this namespace. Default: 0 See -h for NS codes and examples Print wiki text: -w <article> Print wiki text of article -p (option) Plain-text version (strip wiki markup) -f (option) Don't follow redirects (print redirect page) All pages: -A Print a list of page titles on the wiki (possibly very large) -t <# type> 1=All, 2=Skip redirects, 3=Only redirects. Default: 2 -k <#> Number of pages to return. 0 is all. Default: 10 -n <namespace> (option) Pipe-separated numeric value(s) of namespace Only list pages in this namespace. Default: 0 See -h for NS codes and examples Edit page: -E <title> Edit a page with this title. Requires -S and -P -S <summary> Edit summary -P <filename> Page content filename. If "STDIN" read from stdin See EDITSETUP for authentication configuration -R <page> Move from page name. Requires -T -T <page> Move to page name -G <page> Purge page -I Show OAuth userinfo Global options: -l <language> Wiki language code (default: en) See https://en.wikipedia.org/wiki/List_of_Wikipedias -z <project> Wiki project (default: wikipedia) https://en.wikipedia.org/wiki/Wikipedia:Wikimedia_sister_projects -m <#> API maxlag value (default: 5) See https://www.mediawiki.org/wiki/API:Etiquette#Use_maxlag_parameter -y Print debugging to stderr (show URLs sent to API) -V Version and copyright -h Help with examplesExamples: Backlinks: for a User: showing all link types ("ntf") wikiget -b "User:Jimbo Wales" for a User: showing normal and transcluded links wikiget -b "User:Jimbo Wales" -t nt for a Template: showing transcluded links wikiget -b "Template:Gutenberg author" -t t for a File: showing file links wikiget -b "File:Justforyoucritter.jpg" -t f for article "Paris (Idaho)" on the French Wiki wikiget -b "Paris (Idaho)" -l fr User contributions: show all edits from 9/10-9/12 on 2001 wikiget -u "Jimbo Wales" -s 20010910 -e 20010912 show all edits during the 24hrs of 9/11 wikiget -u "Jimbo Wales" -s 20010911 -e 20010911 show all edits when the edit-comment starts with 'A' wikiget -u "Jimbo Wales" -s 20010911 -e 20010911 -i "^A" articles only wikiget -u "Jimbo Wales" -s 20010911 -e 20010930 -n 0 talk pages only wikiget -u "Jimbo Wales" -s 20010911 -e 20010930 -n 1 talk and articles only wikiget -u "Jimbo Wales" -s 20010911 -e 20010930 -n "0|1" -n codes: https://www.mediawiki.org/wiki/Extension_default_namespaces Recent changes: show edits for prior 30 days by IABot made under someone else's name (ie. OAuth) with an edit summary including this target word wikiget -k "OAuth CID: 1804" -r -i "Bluelinking" CID list: https://en.wikipedia.org/wiki/Special:Tags Category list: pages in a category wikiget -c "Category:1900 births" subcats in a category wikiget -c "Category:Dead people" -q s subcats and pages in a category wikiget -c "Category:Dead people" -q sp Search-result list: article titles containing a search wikiget -a "Jethro Tull" -g title first 50 articles containing a search wikiget -a John -i 50 include snippet of text containing the search string wikiget -a John -i 50 -d search talk and articles only wikiget -a "Barleycorn" -n "0|1" regex search, include debug output wikiget -a "insource:/ia[^.]*[.]us[.]/" -y subpages of User:GreenC wikiget -a "user: subpageof:GreenC" search docs: https://www.mediawiki.org/wiki/Help:CirrusSearch -n codes: https://www.mediawiki.org/wiki/Extension_default_namespaces External link list: list articles containing a URL with this domain wikiget -x "news.yahoo.com" list articles in NS 1 containing a URL with this domain wikiget -x "*.yahoo.com" -n 1 All pages: all page titles excluding redirects w/debug tracking progress wikiget -A -t 2 -y > list.txt first 50 page titles including redirects wikiget -A -t 1 -k 50 > list.txt Print wiki text: wiki text of article "Paris" on the English Wiki wikiget -w "Paris" plain text of article "China" on the French Wiki wikiget -w "China" -p -l fr wiki text of article on Wikinews wikiget -w "Healthy cloned monkeys born in Shanghai" -z wikinews Edit page: Edit "Paris" by uploading new content from the local file paris.ws wikiget -E "Paris" -S "Fix spelling" -P "/home/paris.ws" Input via stdin cat /home/paris.ws | wikiget -E "Paris" -S "Fix spelling" -P STDIN
Download wikiget.awk
Set executable: chmod 750 wikiget.awk
Optionally create a symlink: ln -s wikiget.awk wikiget
Change hashbang (first line) to location of GNU Awk 4+ - use 'which gawk' to see where it is on your system.
Change the "Contact" line to your Wikipedia Username (optional or leave blank)
Requires one of the following to be in the path: wget, curl or lynx (use 'which wget' to see where it is on your system)
The advantage of working in Unix is access to other tools. Some examples follow.
A search-replace bot:
wikiget -w "Wikipedia" | sed 's/Wikipedia/Wikipodium/g' | wikiget -E "Wikipedia" -S "Change to Wikipodium" -P STDIN
Expand: download the wikisource (-w) for article "Wikipedia". Search/replace (sed) all occurances of 'Wikipedia' with 'Wikipodium'. Upload result (-E) with (-S) edit summary taking input from STDIN. This can be added to a for-loop that operates on a list of articles.
This unix pipe method is for light and quick work, for a production bot a script would invoke wikiget with -P and check its output for an error ie. a result other than "Success" or "No change" then make a retry. In about 5% of uploads the WMF servers fail and a retry is needed, up to 3 are usually enough. Retries are not built-in to Wikiget as it depends on the calling application how to handle error results.
To find the intersection of two categories (articles that exist in both), download the category lists using the -c option, then use grep to find the intersection:
Or to find the names unique to list2
by User:GreenC (en.wikipedia.org)
MIT License
Wikiget is part of the BotWikiAwk framework of tools and libraries for building and running bots on Wikipedia
https://github.com/greencardamom/BotWikiAwk