Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Wikipedia:WikiProject Languages

From Wikipedia, the free encyclopedia
Wikimedia subject-area collaboration
"Wikipedia:Languages" redirects here. You may be looking forWikipedia:Language ora list of Wikipedias in different languages.
"WP:LANG" redirects here. For the Manual of Style guideline, seeMOS:LANG.
This is aWikiProject, an open group of Wikipedia editors. New participants are welcome; feel free totalk to us!
Find a language
Enter anISO 639 code to find the corresponding language article
 
(Please report any incorrect links
to this talk page)

ThisWikiProject aims primarily to provide a consistent treatment of each humanlanguage on Wikipedia. Many languages already have extensive pages, and the systematic information on those pages is not presented in a consistent way. The purpose of this WikiProject is to present that information consistently, and to ensure that each of the major areas is covered at least briefly for each language.

These areonly suggestions, things to give you focus and to get you going, and youshouldn't feel obligated in the least to follow them. However, try to stick to the format for theInfobox for each language. See thetemplate for an example Infobox.

The easiest way to get started writing for a language that doesn't already have an article or to convert an article to the WikiProject format is to start with thetemplate.

Article alerts

[edit]

Today's featured articles

Articles for deletion

Categories for discussion

Redirects for discussion

Good article nominees

Requested moves

Articles to be merged

Articles to be split

(4 more...)

Articles for creation

(1 more...)

Quality articles

[edit]

Featured articles marked inbold have appeared on theMain Page.

Languages and language families

[edit]


Other

[edit]


Formerly recognized content

[edit]

Article assessment

[edit]

Place the{{WikiProject Languages}} project banner template on the talk pages of any language-related articles. To rate the article on the quality scale, add one of the following parameters:

  • class=FA forfeatured articles
  • class=A for A-class articles
  • class=GA forgood articles
  • class=B for B-class articles
  • class=start for Start-class articles
  • class=stub for Stub-class articles (which may not necessarily have a "stub" message on them!)
  • class=NA for non-articles (templates, images, etc.)

SeeWP:GRADES for pointers on classification.

Statistics

[edit]

Index ·Statistics ·Log

Language articles by quality and importance
QualityImportance
TopHighMidLowNA???Total
FA11158
GA46415130
B474870153104422
C651892347445401,772
Start153383291,6701,5293,881
Stub3622623,5393,1087,271
List6422924129257
Category7,0177,017
Disambig135135
File1414
Portal11
Project1818
Redirect24181851,1181,345
Template1,0071,007
NA4242
Other7272
Assessed1389729406,4039,4285,41123,292
Unassessed11
Total1389729406,4039,4285,41223,293
WikiWork factors (?)ω =71,441Ω = 5.34


Article names

[edit]
Further information:Category:Wikipedia lists of language names from common sources

The guidelines for article titles for languages are atWikipedia:Naming conventions (languages). In short, most language articles should be titledXXX language. Reasons for this recommendation:

  1. Ambiguity. While some language have special forms that refer unambiguously to the language, English is inherently ambiguous about language names. Having a standard of "XXX language" ensures that it's always unambiguous.
  2. Precedent. This is howEncyclopædia Britannica and many other English-language encyclopedias name their articles.

When there is nothing to disambiguate a language name from, such asHindi,Esperanto orInuktitut, there is no need for the "language".

Whether the varieties ofArabic andChinese should be called "languages" or "dialects" continues to be a highly controversial issue. The current convention is: use NAME + Arabic for Arabic varieties (e.g.Egyptian Arabic) and NAME + Chinese for Chinese varieties (e.g.Mandarin Chinese). Infoboxes are put at bothArabic language andChinese language and at their first-level subdivisions. However, where there is little controversy that a variety of Arabic or Chinese is a dialect (when it is demonstrably intelligible to other dialects), then 'dialect' is acceptable in the title.

Even in cases in which there is a consensus that varieties of a language have a dialect status, the number and divisions between such dialects are often vaguely-defined, and controversies exist among dialectologists over whether certain varieties should be treated in a unified way or are best understood as separate though related varieties. Separate articles should only be written on varieties (e.g.,Estuary English) or related groups of varieties (e.g.,Hispanic English) that have been well-enough studied by linguists that at least a minimal body of literature exists about that variety or group of varieties, as a distinct dialect or group of dialects. Phonological, morphosyntactic, or lexical variation that may be considered subdialectal should be noted as "differences within X dialect,", where X is a dialect as discussed in the relevant literature. Controversies over dialect status can be noted in articles as such, but should also be based on citable work. Names used to refer to that dialect in the title should be preferred over folk-linguistic terms (e.g.,Inland North versusMidwestern Accent).

Article structure

[edit]

If you would like to create an article on a new language, you can use {{subst:New language article}} to help streamline the process. An example structure and explanation of the sections can be found at/Template for oral languages and/Template (sign language) for sign languages. Language articles are subject to Wikipedia'sinclusion criteria.

Open tasks

[edit]

General

[edit]

Updates

[edit]

Population data has been mostly updated fromEthnologue 16 to 17. However, an unknown number of articles which did not have the ref field set to "e16" slipped through the cracks. For instance,Cumanagoto did not have a ref'd population figure because E16 had mistakenly listed it as extinct. Articles which are not ref'd toEthnologue could be checked in case E17 has a more recent figure.

User:PotatoBot helps keep ISO redirects in sync with changing WP articles and ISO standards. The results of the latest run are displayed atISO 639 log andISO 639 language articles missing.

Names atSpurious_languages#Spurious_according_to_Glottolog with asterisks have not been addressed.

Articles to improve:Category:Language articles with unknown population not citing Ethnologue 18

Articles citing previous editions ofEthnologue can be found in the following categories:

Articles citing undated versions can be found in:

Most should be updated to a reference to the latestEthnologue edition or to anotherreliable source. However, references to old editions may continue to be appropriate, for example, with undated citations, or where an old edition shows the date or range of estimates of the source, and that info has been lost from recent editions, or where a new source in the latest edition ofEthnologue just cites an old edition of Ethnologue, so we should cite the old edition ourselves.

Some articles do not use templates such as{{e25}}:

Short descriptions

[edit]

All articles should have ashort description. As of December 2022, about 1,000 articles about languages do not have one:-hastemplate:"Short description" hastemplate:"Infobox language"

Articles to be created

[edit]

Here you can see languages what others Wikipedias have, but English one is don't (SPARQL-code what you can run by yourself). Below is a list of the five what have the most interwikis:

Red links should either be redirected or have their own articles.

Articles with red links

99.9% of ISO language names have articles, though not always one-to-one (e.g.Fulani,Zhuang, andMazatec); the 0.01% which do not arespurious, dubious, or insufficiently attested to justify their own article, and are redirected to an article stating that.

Lists for evaluation

The lists below are of self-links in our articles, language names from various sources which do not have articles or redirects, and suspicious cases to keep track of.

Lists of obscure names from common refs
INALI
  • 48 atINALI names for Mexican languages (27 Mixtec & 6 Nahuatl to be reviewed; 12 Zapotec & 3 others attempted). Even blue links may be wrong, due to confusion of similar town names or misidentification atEthnologue.
AIATSIS
  • 7 potential languages w data. The AIATSIS db is periodically updated, with new languages confirmed.
Ethnologue 11
  • Holima ["near Dobu" – misreading of Molima?],Waelulu ["existence unconfirmed"; taken from V&V]
Voegelin (1977)
36 red-linked names; list doesn't bother with reds links for what Loukotka says is unattested.
Blue links have not been checked. Many are presumably inadvertent homonyms rather than the language intended by V&V.
Ruhlen (1987)
  • S.Am.: 12 (seekey) extremely obscure names of mostly unattested languages, not even listed inCampbell & Grondona 2012, and for only a few does Loukotka say anything other than 'unknown'. Those not found in Loukotka might be copy errors.
There are also at least half a dozen names in Ruhlen which take you to what is apparently the wrong article. One is a typo, 3 are unidentified, and 2 have perhaps just been reclassified.
Campbell & Grondona
Linguist List local-use ISO
Glottolog
25 atTalk:Glottolog#Unclassified_languages
93 more atWikipedia:WikiProject Languages/Glottolog languages without ISO codes -- both for Glottolog 2.2
Circular and suspicious links
Identity suspect
Nshi,Sotatipo,Lui,Pasto (wrong ISO?),Kanamarí andKaripuná (contradicted by E17),Gulei (marked "?" in list),Sonde, Ngoni, Pretoria-Tsonga (marked "§" in list) & Mangala
Circular links of ISO names with summary data
Loloish,Qiangic (3 listed + old name Pingfang, which I can't ID),unclassified Asian (Bhatola: presumably a Gond dialect,Warduji: presumably a Persian dialect),Hindi (Ghera: Pakistani enclave of unidentified Indian language),conlang codes (Kotava,Romanova: old articles were deleted as not-notable)
Cases to track
No 1-to-1 correspondence to ISO
Tracking only; no need to fix.
Gbaya language (Central African Republic),Gbaya language (Sudan),Syriac language
ISO languages without info box
Typically because there are problems in defining the language. Tracking only; no need to fix.
Minor languages covered in family article:Loloish (4)
Language uncertain:Mina,Majhwar
Rd. to script or history article:Epi-Olmec (undeciphered),Ancient Zapotec,Middle Korean
Rd. to spurious-language article:Parsi-Dari,Parsi,Tapeba
Newly discovered or unattested languages without ISO codes
Lubu (unattested and extinct)
Cuyama (unattested and extinct)

Requests for expansion

[edit]

Images for articles inCategory:Wikipedia requested photographs of languages.

Requests for attention

[edit]

(no article Ashéninka people; Keres functions as the lang article but reads as a family article)

Tagged categories

[edit]

Category:Articles lacking sources

[edit]

Only language varieties are included here. Subjects such as 'French language in Jordan' and 'Westernized Chinese language', though in bad shape, are not listed.

  • 2004–2014: (only articles with 'language', 'dialect', 'creole', or 'pidgin' in name are included; distilled from an insane number of articles)
English:Jewish English languages
Germanic:Central Franconian dialects,Eastphalian dialect,Hamburgisch dialect,Norwegian dialects,Orsamål dialect,Ripuarian language,Sognamål dialect
Romance:Chipilo Venetian dialect,Comasco-Lecchese dialects,Fornes dialects,Pavese dialect,Sabino dialect,Sutsilvan dialects (Romansh)
Slavic:Debar dialect,Reka dialect,Strumica dialect
Maltese:Qormi dialect,Żejtun dialect
Chinese:Luoyang dialect,Mango dialect,Qihai dialect,Weihai dialect,Ningbo dialect,Ganyu dialect,Fu'an dialect,Xuzhou dialect
other:Kfar Kama Adyghe dialect (Adyghe),Enuani dialect (Igbo),Thanjavur Marathi dialect,South Korean standard language

Titles containing 'language' checked through November 2024:Bangi–Tetela languages

Category:Orphaned articles

[edit]

(same search terms as missing sources)

Ordek-Burnu language (moved to 'stele')

Open ISO issues

[edit]

The following ISO requests for new languages from previous years were still open in 2016 Jan. The articles should be updated if they are accepted. (Seethe current list, reviewed to 2024-11.)

2023-006ynbYamben 2021-044ftgTaigi

Articles proposed for deletion

[edit]

includingWP:AFD,WP:PROD and other processes

Articles to watch

[edit]

The following are language articles which come under repeated POV attack, often for ethnic or nationalistic reasons. Feel free to add ones you've noticed, and to remove languages which have not been a problem for some time. That way, if one of us drops out from editing, the articles we've been watching hopefully won't go to pot.

(Note:Ethnologue 17 and the SwedishNationalencyklopedin use Indian census data, which is not a RS because it does not have a consistent definition of Hindi. For example, part of the Awadhi population is listed under Awadhi, but most is counted as Hindi. This problem is acknowledged in the presentation of the census results, but has gotten lost in 2ary sources.)

Interpreting online sources of data

[edit]
Essay on editing Wikipedia
This is anessay.
It contains the advice or opinions of one or more Wikipedia contributors. This page is not an encyclopedia article, nor is it one ofWikipedia's policies or guidelines, as it has not beenthoroughly vetted by the community. Some essays represent widespread norms; others only represent minority viewpoints.

Ethnologue has long been the default source for language data on WP, despite its often poor referencing. It was the only global reference that was freely available online when the majority of WP language articles were created, but since has become expensive for any use beyond basic information on a language (Essentials Plan: 480 US$/year as of 2023). In February 2023 a free "Starter" user category was created that gives information on language names, population figures (only in rough magnitude ranges), location, vitality and classification.[1] Users inlow and middle-income countries as defined by theWorld Bank are still eligible forfree access to the "Essentials" plan.People who contribute to Ethnologue at least once a year also receive a free Standard Plan. Alternatively, a combination ofGlottolog, for classification and for general sourcing, and theEndangered Languages Project, for demographic data, is probably another reliable default combination of free online sources, though there are also reliable specialized sites such asAIATSIS for Australia.Linguist List/MultiTree maintains some value for long-extinct languages.

There are several advantages toEthnologue: for many languages, it's the only demographic data we have; for others, it provides a check on the politicization and population inflation that we experience when we allow advocates of a language to cherry-pick sources. Nonetheless,Ethnologue data needs to be carefully evaluated. Beside the now prohibitive cost, there are a few common and serious problems:

Extended content
  • The family trees are auto-generated, and should not be relied on. Auto-generation is skewed by idiosyncratic entries in the language articles. In E16, for example, the Maban family was listed as a branch of the Luo languages, because one of the Luo languages was named Maban; meanwhile, there were two separate Luo branches of Nilotic due to the spelling of "Luo" not matching across articles. The more obvious problems of this sort had been remedied in E17, butEthnologue trees are still not a RS for classification, and the languages under a node are not a RS for the membership of a particular group. Many of our articles still say that there are X languages in the Y branch of a family, based onEthnologue, but all that can be relied on is the classification cited in individualEthnologue articles, and those are not sourced.
  • Speaker data is inconsistent. For instance, in E14, Gawwada was cited as having 32,698 mother tongue speakers, including 27,477 monolinguals, based on the 1998 census. In E17, it is cited as having 68,600 speakers based on the 2007 census, but still 27,500 monolinguals, without informing the reading that that figure comes from an older census. Similarly, the cited size of the ethnic group may be only half the cited number of speakers, due to it being several decades older. If the number of monolinguals or ethnic members is not given a citation date byEthnologue, it is useless and should not be repeated by us. The number of speakers and the dialects of the language may be from different sources, with the result that the number of speakers may not be that of all dialects, or may include speakers of other ISO languages. (This is occasionally noted in theEthnologue entry.) Very commonly, when a language is named after one of its dialects, the speaker number is that of the dialect, not of the language as a whole. Also, a language may be split up into separate ISO codes with the result that one article covers one variety but inherits the number of speakers of all varieties from the old article.Ethnologue has handled this well in recent years, but has not been able to go back and fix such errors inherited from old editions.
  • Ethnologue's arithmetic is consistently bad. For instance,Ethnologue lists five Central Iranian languages as having had 7,030 speakers reported in 2000. It appears that their source listed 35,000 speakers total, andEthnologue divided that figure by 5 for the individual articles, with no indication that the result was no more than a guess. This kind of problem is not uncommon. Even more commonly,Ethnologue will add together incompatible data from various sources, paying no attention tosignificant figures. For example, if one source reported 2 to 5 million speakers in country A in 1975, and another 5 to 10 thousand in country B in 2006,Ethnologue will report the total as 3,507,500 speakers (3.5 million, the median of 2 and 5 million, plus 7,500, the median of 5–10,000). Old editions such as E14 are actually more reliable in this regard, as they tend to note that the estimate for country A was 2 to 5 million, when later editions will simply report 3.5 million as if that were the figure in the source. If the original source cannot be verified, we should at least look at each of the country figures that make up the total and redo the arithmetic, so as to avoid spurious precision as much as practicable.
  • Dates are not reliable indicators of when the data was taken. Unless they are census data, which has the problem all censuses do of speakers intentionally misreporting their language, the dates given byEthnologue are the date of publication of their source. That can be several decades after the date the data was collected. The result is that an older cited date may report the same or more recent data than a newer cited date. For instance, several Australian languages were cited as "SIL 2011" in E17. However, in E16 they all had the same numbers of speakers cited to "Wurm and Hattori 1983". In other cases the source thatEthnologue uses may cite an old edition ofEthnologue, or the source thatEthnologue used in an old edition. And the sources themselves may have problems that are not mentioned inEthnologue. For instance, one source from the 1990s notes that its numbers are copied from a 1980s publication that was based on unpublished fieldwork that had been conducted in the 1950s. In theEthnologue entry, however, only the 1990s date was given. For another example, the data for the Hindi languages was updated between E16 and E17, based on the new Indian census. However, the census makes it clear that many Awadhi speakers, for example, reported their language to be "Hindi" rather than Awadhi. The result is that the E17 figure for Hindi is inflated by perhaps 100 million people who should be listed under other languages, but there is no warning about this inEthnologue. Many entries are also undated. Some of these are recent oversights that will be fixed in the next edition, but many are inherited from old editions ofEthnologue, and the editorial team may be unable to identify their source. In such cases, citing the edition ofEthnologue that first reported the figure might give the reader some indication that it is not recent data.
  • Figures may be ethnic numbers and an order of magnitude greater than the actual number of speakers. A good start in cleaning this up was made in E17, but there has been some backsliding as well, with old linguistic survey data of heritage languages being replaced with recent census data that reports ethnic identification rather than language ability.

Such problems are understandable:Ethnologue is an enormous project with a very small editorial team. For years,Ethnologue had a reputation for being unresponsive, so many linguists do not bother to correct the errors they find, but since ca. 2012 they have been appreciative of feedback, and the quality of their coverage has improved markedly. Nonetheless,Ethnologue's sources (when they can be identified) should be checked for the accuracy of its claims whenever possible, and other sources used when available andEthnologue's sources cannot be identified.

Glottolog is a reliably cited and well-researched alternative toEthnologue. Apart from not covering demographics, it does a generally superior job, for instance in verifying and updating the classifications it adopts, in marking languages as 'spurious' when they cannot be verified to exist, and most importantly in citing its sources both for the languages and for their classifications. But it is largely the work of a single person (Harald Hammarström), and he has not had the time to improve onEthnologue for all the languages of the world, so in some casesGlottolog is not (yet) an independent source. In most cases Hammarström has personally vetted the sources, even to the extent of doing his own comparison of the raw lexical or morphological data to evaluate which classification is the most accurate, though it may take some digging for the reader to determine all his evidence. He does however not distinguish whether a language with no known relatives is an isolate (a family of one) or simply unclassified due to lack of data or research, listing all such cases as 'isolates'. Maps are included, but the locations are points rather than areas as inEthnologue (not that the areas inEthnologue are necessarily accurate), and in some cases appear to be offset from where the language is actually spoken (all points on the map shifted by seemingly the same amount and direction, a problem that besets our automated location maps as well). Finally,Glottolog should not be relied on for dialects, as they were copied wholesale from MultiTree without verification and are often spurious. Only in a few cases hasGlottolog since evaluated dialect data. (Dialects are typeset in italics, languages in boldface. Where theGlottolog dialects differ from those ofMultiTree, they are likely to be Hammarström's or a colleague's work and thus reliable.)

TheEndangered Languages Project does not attempt to include all the world's languages (it ignores languages with millions of speakers, for example, and as of 2021 doesn't cover some poorly documented areas of the world), but as of 2021 it has articles on 3585 languages/lects, 285 without ISO codes. ELP concentrates on demographic data, and tries to provide the most recent reliable sources for speaker population, transmission rate, bilingualism, etc., and so nicely complementsGlottolog. In some cases it provides the date of the data, not just the date of its publication. Non-demographic data is minimal, and (likeGlottolog) the maps show the languages as points rather than areas. (E.g. Comorian, where the location dot is in the middle of the ocean between the islands.). There are indications that some of the data has been input by people who don't understand it, such as locations of African languages being on the wrong side of the continent. It should therefore be used for its references rather than as a reliable source in its own right.

AIATSIS presents data from multiple sources for the indigenous languages spoken within the national borders of Australia. Its primarily focus is on identifying the many names found in the literature, resolving synonyms and ambiguities, evaluating whether putative lects can be confirmed to be distinct languages or dialects, and identifying which names might benefit from further investigation of archival sources.

Unreliable sites

Linguist List / MultiTree is a former undergrad student project that includes a large number of language names not found inEthnologue, but their identification is highly unreliable, and can often be seen to be spurious with even a cursory glance at the literature. Since the creation ofGlottolog they are no longer of much value as a source of references for living languages, though they do provide some informative expert summaries of the literature for long-extinct languages.

ISO 639-3 is only a reliable source for ISO codes and names. It should not be relied on for preferred names or spellings, whether a lect is a distinct language or a dialect, or whether it is still spoken. For example, despite its stated ideal of distinguishing languages by mutual intelligibility, for political reasons ISO maintains separate 639-3 codes for Serbian and Croatian, Urdu and Modern Standard Hindi, and Malaysian and Indonesian, despite private acknowledgements that doing so violates their stated aim. (Such pluricentric distinctions would be better maintained at ISO 639-2.) However, because ISO 639-3 codes are widely used to identify languages, WP language articles should include the ISO 639-3 name in the lead or a dedicated section if they use something different for the article name, and we should created redirects for those ISO names and for the codes themselves.

Global Recordings Network copies much of its data fromEthnologue, misidentifies alternative names as languages, and contradicts itself with speaker numbers.

  1. ^"Ethnologue pricing information".Ethnologue.com. SIL International. Retrieved22 February 2023.

Templates

[edit]
  • {{interlinear}} for aligning interlinear glossing
  • {{IPA}} to format IPA,{{IPAc-en}} to convert ASCII input to normalized IPA for English, etc. (we have IPA templates for many other languages, which link to a pronunciation key)

Infoboxes

[edit]

Project banner

[edit]

Please add{{WikiProject Languages}} to talk pages of relevant articles. Articles with this template are put intoCategory:WikiProject Languages articles.

Stubs

[edit]
See also:Wikipedia:WikiProject Stub sorting/Stub types/Culture § Language and literature

Languagestubs should be tagged with the most appropriate template of these:

Userbox

[edit]

After yousign up, you can add the project userbox to your user page by adding the following:{{User WikiProject Languages}}. Your username will then automatically be added to theCategory:WikiProject Languages participants.

Related WikiProjects

[edit]

This WikiProject is a descendant ofWikiProject Linguistics. It has descendants of its own, most of which aren't particularly active at present.

See also:

Active

[edit]

Inactive or defunct

[edit]

Project volunteers

[edit]

If you'd like to help out, be contacted by others interested in this WikiProject's subject, and receive task assignments and project-related updates on your talk page, please add your name here:

Categories

[edit]
Wikimedia Commons has media related toLanguage.
Click on "►" below to display subcategories:

See also

[edit]
About Wikipedia (?)
Help for readers (?)
Contributing
to Wikipedia
 (?)
Getting started (?)
Dos and don'ts (?)
How-to pages and
information pages (?)
Coding (?)
Directories (?)
Culture
Geography
History and society
STEM
 General
 Directories and reports
 Culture and the arts
 Geographical
 History and society
 Science, technology
and engineering
 Wikipedia assistance
and tasks
Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:WikiProject_Languages&oldid=1313794766"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp