A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as theWikimedia Research Newsletter.
An open-access preprint[1] has announced the results from a study attempting to predict early box-office revenues from Wikipedia traffic and activity data. The authors – a team of computational social scientists fromBudapest University of Technology and Economics,Aalto University and theCentral European University – submit that behavioral patterns on Wikipedia can be used for accurate forecasting, matching and in some cases outperforming the use of social media data for predictive modeling. The results, based on a corpus of 312 English Wikipedia articles on movies released in 2010, indicate that the joint editing activity and traffic measures on Wikipedia are strong predictors of box-office revenue for highly successful movies.
The authors contrast theirearly prediction approach with more popularreal-time prediction/monitoring methods, and suggest that movie popularity can be accurately predicted well in advance, up to a month before the release. The study received broad press coverage and was featured inThe Guardian, theMIT Technology Review and theHollywood Reporter among others. The authors observe that their approach, being "free of any language based analysis, e.g., sentiment analysis, could be easily generalized to non-English speaking movie markets or even other kinds of products". The dataset used for this study, including the financial and Wikipedia activity data is available among the supplementary materials of the paper.
A study[2] by researchers atKyoto University presents a detailed assessment of the readability of the English Wikipedia againstEncyclopedia Britannica and theSimple English Wikipedia using a series ofreadability metrics and finds that Wikipedia "seems to lag behind the other encyclopedias in terms of readability and comprehensibility of its content". The paper, presented atCIKM’12, uses a variety of metrics spanning syntactical readability indices (such asFlesch reading ease, theautomated readability index and theColeman–Liau index) as well as metrics based on word popularity (including theDale–Chall readability formula and word frequency indices derived fromGoogle News or theAmerican National Corpus).
The authors prepared a corpus of matching articles for the purpose of comparison between the English and Simple English Wikipedia. The study did not perform a random selection of articles, but selected a sample based on the existence of a corresponding article in Simple Wikipedia. The findings of the first analysis indicate that Simple Wikipedia consistently outperforms the English Wikipedia on all readability metrics. Wikipedia also appears to contain on average more proper nouns than Britannica – which, the authors speculate, may be due to specific editorial policies. The second section of the paper measures readability for 500 articles for each one of eight topic categories selected from DBpedia (biology, chemistry, computing, economics, history, literature, mathematics, and philosophy).
The comparison indicates that articles in thecomputing category are the most readable by syntactical and familiarity measures.Biology andchemistry, on the other hand, seem to include the most difficult articles. The final section reviews the readability of Britannica articles, in particular comparing the readability of articles in the "introductory" class with that of Simple Wikipedia articles and the readability of "encyclopedia" class articles with that of Wikipedia articles. The findings indicate that Britannica outperforms Wikipedia in readability overall, while introductory articles outperform Simple Wikipedia articles. It should be noted that the comparisons were not performed on matched pairs and that the the criteria used to sample articles from Britannica were not specified.
A paper whose preprint waspreviously covered in this research report, and now published as a full research article inPLOS One,[3] found that the Simple English Wikipedia has a higher degree of complexity than the corpus ofCharles Dickens' books when measured via theGunning fog index, but is less complex than theBritish National Corpus, "which is a reasonable approximation to what we would want to think of as ‘English in general’". See also theSeptember issue of this research report for a summary of a third readability study which had applied the standardFlesch Reading Ease test to the English and Simple English Wikipedias.
An article appearing inInformation, Communication & Society[4] studies the discussion pages of English and GermanSeptember 11 attacks articles, contributing to the ongoing debates on collaborativeknowledge creation in the wikiWeb 2.0 context, participation of experts and amateurs on Wikipedia, and, indirectly,reliability of Wikipedia. The article's research question, coming from thesociology of knowledge andsocial constructivism perspectives, asks to what degree Wikipedia's "anyone can edit" policy democratizes the production of knowledge, removing it from traditional hierarchies "between experts and lay participants". The termdemocratization here is used in the context of such theoretical concepts aswisdom of crowds,participatory culture,produsage and (more critically) the notions ofcult of the amateur ordigital Maoism. All of these refer to the fact that Wikipedia's editors are more often amateurs ("lay participants") than professionally recognized experts.
Using thegrounded theory approach, the study focuses not on editors, but on their arguments. It finds that due to community-upheld Wikipedia policies such asWikipedia:Reliable sources, dissenting opinions ("traditionally marginalized types of knowledge") such as various conspiracy theories are still marginalized or straight-out excluded; according to the author, this "did not lead to a ‘democratization’ of knowledge production, but rather re-enacted established hierarchies". The finding should be taken in a certain context; as the author notes, the article was written by amateurs ("lay participants"), who however decided to reproduce traditional knowledge hierarchies, relegating various conspiracy theories and similar points not backed up to reliable sources to obscurity on Wikipedia. The paper concludes that Wikipedia, like other encyclopedias, is prone to a "scientism bias", i.e. treating scientifically backed knowledge as "better" than knowledge coming from alternative outlets. This despite the "anyone can edit" motto of Wikipedia, the paper finds support for the argument that Wikipedia puts more stress on article quality than democratic participation, or in the words of the article: "Although laypeople apparently play a significant part in the text production, this does not mean that they favor lay knowledge. On the contrary, it is clearly elite knowledge of well-established authorities which is finally included in the article, whereas alternative interpretations are harshly excluded or at least marginalized."
Side-note: The study's use of aFirefox add-onWired-Maker forcontent analysis rather ingenious, and applauds the mentioning of such a practical methodological tip in their paper.
At the Academy of Management conference in Boston, Dariusz Jemielniak presented a paper onTrust, Control, and Formalization in Open-Collaboration Communities: A Qualitative Study of Wikipedia[5]. It is built around a detailed description and interpretation of theEssjay controversy on the English Wikipedia in 2007 about the use of inaccurate credentials by active Wikipedian and administratorEssjay. The paper is framed in terms of the literature fromorganization theory on trust and control. Jemielniak argues that organization theory suggests that organizations must either be able or willing to trust participants or must rely on control systems which essentially obviate the need for trust. Using ethnographic data from Wikipedia, Jemielniak suggests that Wikipedia — and, perhaps, a series of similar computer-mediated "open-collaboration communities" — instead rely on a series of procedures and "legalistic remedies" which provide a previously untheorized alternative to traditional control systems used in organizations.
The working paper is the first in what Jemielniak suggests will be a series of papers based on a long-termparticipatoryethnographic study: over the past five years,Jemielniak has edited Wikipedia almost daily and is asteward on Wikimedia projects (as well as the chair of the Wikimedia movement's newly established Funds Dissemination Committee, and recentlyannounced the committee's recommendations on funding requests by various Wikimedia organizations totaling US$10.4M). Jemielniak uses his own experience as well as detailed on-wiki records from conversations surrounding the Essjay affair to walk through the controversy and its implications in depth. He discusses how Wikipedians construct authority and initially reacted with indifference to the revelation that Essjay had used fake credentials, how this changed when new information about Essjay's use of his credentials came to light, how a series of proposals to prevent or respond to such issues in the future were raised, and how the community essentially decided to keep thestatus quo.
The paper paints a detailed, nuanced, and deeply informed portrait of Wikipedians' responses to the controversy and the ways in which trust and its relationships to authority and credentials are navigated in the project. The author suggests that the creation of rules and legalistic procedures allowed Wikipedians to walk the line between rejecting descriptions of authorityper se while minimizing the effects of inaccurate descriptions of authority by suggesting that editors on Wikipedia should rely much more heavily on users' experience and on the degree to which particular contributions conform to Wikipedia's content guidelines.
A working paper by the same writer, presented at the annual meeting of theSociety for Applied Anthropology[6] gives an overview of Wikipedia's culture by reviewing the role of its norms, guidelines and policies.

The paper summary seems to convey the impression that R.König is a "far out there" ultra-relativist /strong programmist. Hope that's what was intended...AnonMoos (talk)17:24, 28 November 2012 (UTC)[reply]
"the UK [being] Europe's most visible country ... is quite interesting because it isn't the country in Europe that uses Wikipedia the most (Germany does)" - Perhaps it's because the Premier League is Europe's leading football league and British artists (especially actors and musicians) are much more famous than Germans. --NaBUru38 (talk)18:05, 28 November 2012 (UTC)[reply]
I always enjoy reading these interesting Recent Research Reports. Thank you to those who contribute to the reports! --Pine✉18:59, 28 November 2012 (UTC)[reply]