Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WiktionaryThe Free Dictionary
Search

Module:languages

From Wiktionary, the free dictionary

The followingdocumentation is located atModule:languages/documentation.[edit]
Useful links:subpage listlinkstransclusionstestcasessandbox

This module is used to retrieve and manage the languages that can have Wiktionary entries, and the information associated with them. SeeWiktionary:Languages for more information.

For the languages and language varieties that may be used in etymologies, seeModule:etymology languages. For language families, which sometimes also appear in etymologies, seeModule:families.

This module provides access to other modules. To access the information from within a template, seeModule:languages/templates.

The information itself is stored in the various data modules that are subpages of this module. These modules shouldnot be used directly by any other module, the data should only be accessed through the functions provided by this module.

Data submodules:

Extra data submodules (for less frequently used data):

Finding and retrieving languages

The module exports a number of functions that are used to find languages.

export.getDataModuleName

functionexport.getDataModuleName(code)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

export.getExtraDataModuleName

functionexport.getExtraDataModuleName(code)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

export.makeObject

functionexport.makeObject(code,data,dontCanonicalizeAliases)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

export.getByCode

functionexport.getByCode(code,paramForError,allowEtymLang,allowFamily)

Finds the language whose code matches the one provided. If it exists, it returns aLanguage object representing the language. Otherwise, it returnsnil, unlessparamForError is given, in which case an error is generated. IfparamForError istrue, a generic error message mentioning the bad code is generated; otherwiseparamForError should be a string or number specifying the parameter that the code came from, and this parameter will be mentioned in the error message along with the bad code. IfallowEtymLang is specified, etymology-only language codes are allowed and looked up along with normal language codes. IfallowFamily is specified, language family codes are allowed and looked up along with normal language codes.

export.getByCanonicalName

functionexport.getByCanonicalName(name,errorIfInvalid,allowEtymLang,allowFamily)

Finds the language whose canonical name (the name used to represent that language on Wiktionary) or other name matches the one provided. If it exists, it returns aLanguage object representing the language. Otherwise, it returnsnil, unlessparamForError is given, in which case an error is generated. IfallowEtymLang is specified, etymology-only language codes are allowed and looked up along with normal language codes. IfallowFamily is specified, language family codes are allowed and looked up along with normal language codes. The canonical name of languages should always be unique (it is an error for two languages on Wiktionary to share the same canonical name), so this is guaranteed to give at most one result. This function is powered byModule:languages/canonical names, which contains a pre-generated mapping of full-language canonical names to codes. It is generated by going through theCategory:Language data modules for full languages. WhenallowEtymLang is specified for the above function,Module:etymology languages/canonical names may also be used, and whenallowFamily is specified for the above function,Module:families/canonical names may also be used.

export.finalizeData

functionexport.finalizeData(data,main_type,variety)

Used byModule:languages/data/2 (et al.) andModule:etymology languages/data,Module:families/data,Module:scripts/data andModule:writing systems/data to finalize the data into the format that is actually returned.

export.err

functionexport.err(lang_code,param,code_desc,template_tag,not_real_lang)

For backwards compatibility only; modules should require the error themselves.

Language objects

ALanguage object is returned from one of the functions above. It is a Lua representation of a language and the data associated with it. It has a number of methods that can be called on it, using the: syntax. For example:

localm_languages=require("Module:languages")locallang=m_languages.getByCode("fr")localname=lang:getCanonicalName()-- "name" will now be "French"

Language:getCode

functionLanguage:getCode()

Returns the language code of the language. Example:"fr" for French.

Language:getCanonicalName

functionLanguage:getCanonicalName()

Returns the canonical name of the language. This is the name used to represent that language on Wiktionary, and is guaranteed to be unique to that language alone. Example:"French" for French.

Language:getDisplayForm

functionLanguage:getDisplayForm()

Return the display form of the language. The display form of a language, family or script is the form it takes when appearing as thesource in categories such asEnglish terms derived fromsource orEnglish given names fromsource, and is also the displayed text inmakeCategoryLink() links. For full and etymology-only languages, this is the same as the canonical name, but for families, it reads"name languages" (e.g."Indo-Iranian languages"), and for scripts, it reads"name script" (e.g."Arabic script").

Language:getHTMLAttribute

functionLanguage:getHTMLAttribute(sc,region)

Returns the value which should be used in the HTML lang= attribute for tagged text in the language.

Language:getAliases

functionLanguage:getAliases()

Returns a table of the aliases that the language is known by, excluding the canonical name. Aliases are synonyms for the language in question. The names are not guaranteed to be unique, in that sometimes more than one language is known by the same name. Example:{"High German","New High German","Deutsch"} forGerman.

Language:getVarieties

functionLanguage:getVarieties(flatten)

Return a table of the known subvarieties of a given language, excluding subvarieties that have been given explicit etymology-only language codes. The names are not guaranteed to be unique, in that sometimes a given name refers to a subvariety of more than one language. Example:{"Southern Aymara","Central Aymara"} forAymara. Note that the returned value can have nested tables in it, when a subvariety goes by more than one name. Example:{"North Azerbaijani","South Azerbaijani",{"Afshar","Afshari","Afshar Azerbaijani","Afchar"},{"Qashqa'i","Qashqai","Kashkay"},"Sonqor"} forAzerbaijani. Here, for example, Afshar, Afshari, Afshar Azerbaijani and Afchar all refer to the same subvariety, whose preferred name is Afshar (the one listed first). To avoid a return value with nested tables in it, specify a non-nil value for theflatten parameter; in that case, the return value would be{"North Azerbaijani","South Azerbaijani","Afshar","Afshari","Afshar Azerbaijani","Afchar","Qashqa'i","Qashqai","Kashkay","Sonqor"}.

Language:getOtherNames

functionLanguage:getOtherNames()

Returns a table of the "other names" that the language is known by, which are listed in theotherNames field. It should be noted that theotherNames field itself is deprecated, and entries listed there should eventually be moved to eitheraliases orvarieties.

Language:getAllNames

functionLanguage:getAllNames()

Return a combined table of the canonical name, aliases, varieties and other names of a given language.

Language:getTypes

functionLanguage:getTypes()

Returns a table of types as a lookup table (with the types as keys).

The possible types are

  • language: This is a language, either full or etymology-only.
  • full: This is a "full" (not etymology-only) language, i.e. the union ofregular,reconstructed andappendix-constructed. Note that the typesfull andetymology-only also exist for families, so if you want to check specifically for a full language and you have an object that might be a family, you should usehasType("language","full") and not simplyhasType("full").
  • etymology-only: This is an etymology-only (not full) language, whose parent is another etymology-only language or a full language. Note that the typesfull andetymology-only also exist for families, so if you want to check specifically for an etymology-only language and you have an object that might be a family, you should usehasType("language","etymology-only") and not simplyhasType("etymology-only").
  • regular: This indicates a full language that is attested according toWT:CFI and therefore permitted in the main namespace. There may also be reconstructed terms for the language, which are placed in theReconstruction namespace and must be prefixed with * to indicate a reconstruction. Most full languages are natural (not constructed) languages, but a few constructed languages (e.g. Esperanto and Volapük, among others) are also allowed in the mainspace and considered regular languages.
  • reconstructed: This language is not attested according toWT:CFI, and therefore is allowed only in theReconstruction namespace. All terms in this language are reconstructed, and must be prefixed with *. Languages such as Proto-Indo-European and Proto-Germanic are in this category.
  • appendix-constructed: This language is attested but does not meet the additional requirements set out for constructed languages (WT:CFI#Constructed languages). Its entries must therefore be in the Appendix namespace, but they are not reconstructed and therefore should not have * prefixed in links.

Language:hasType

functionLanguage:hasType(...)

Given a list of types as strings, returns true if the language has all of them.

Language:getWikimediaLanguages

functionLanguage:getWikimediaLanguages()

Returns a table containingWikimediaLanguage objects (seeModule:wikimedia languages), which represent languages and their codes as they are used in Wikimedia projects for interwiki linking and such. More than one object may be returned, as a single Wiktionary language may correspond to multiple Wikimedia languages. For example, Wiktionary's single codesh (Serbo-Croatian) maps to four Wikimedia codes:sh (Serbo-Croatian),bs (Bosnian),hr (Croatian) andsr (Serbian). The code for the Wikimedia language is retrieved from thewikimedia_codes property in the data modules. If that property is not present, the code of the current language is used. If none of the available codes is actually a valid Wikimedia code, an empty table is returned.

Language:getWikimediaLanguageCodes

functionLanguage:getWikimediaLanguageCodes()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getWikipediaArticle

functionLanguage:getWikipediaArticle(noCategoryFallback,project)

Returns the name of the Wikipedia article for the language.project specifies the language and project to retrieve the article from, defaulting to"enwiki" for the English Wikipedia. Normally if specified it should be the project code for a specific-language Wikipedia e.g. "zhwiki" for the Chinese Wikipedia, but it can be any project, including non-Wikipedia ones. If the project is the English Wikipedia and the propertywikipedia_article is present in the data module it will be used first. In all other cases, a sitelink will be generated from:getWikidataItem (if set). The resulting value (or lack of value) is cached so that subsequent calls are fast. If no value could be determined, andnoCategoryFallback isfalse,:getCategoryName is used as fallback; otherwise,nil is returned. Note that ifnoCategoryFallback isnil or omitted, it defaults tofalse if the project is the English Wikipedia, otherwise totrue. In other words, under normal circumstances, if the English Wikipedia article couldn't be retrieved, the return value will fall back to a link to the language's category, but this won't normally happen for any other project.

Language:makeWikipediaLink

functionLanguage:makeWikipediaLink()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getCommonsCategory

functionLanguage:getCommonsCategory()

Returns the name of the Wikimedia Commons category page for the language.

Language:getWikidataItem

functionLanguage:getWikidataItem()

Returns the Wikidata item id for the language ornil. This corresponds to the the second field in the data modules.

Language:getScripts

functionLanguage:getScripts()

Returns a table ofScript objects for all scripts that the language is written in. SeeModule:scripts.

Language:getScriptCodes

functionLanguage:getScriptCodes()

Returns the table of script codes in the language's data file.

Language:findBestScript

functionLanguage:findBestScript(text,forceDetect)

Given some text, this function iterates through the scripts of a given language and tries to find the script that best matches the text. It returns aScript object representing the script. If no match is found at all, it returns theNone script object.

Language:getFamily

functionLanguage:getFamily()

Returns aFamily object for the language family that the language belongs to. SeeModule:families.

Language:getFamilyCode

functionLanguage:getFamilyCode()

Returns the family code in the language's data file.

Language:getFamilyName

functionLanguage:getFamilyName()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:inFamily

functionLanguage:inFamily(...)

Check whether the language belongs tofamily (which can be a family code or object). A list of objects can be given in place offamily; in that case, return true if the language belongs to any of the specified families. Note that some languages (in particular, certain creoles) can have multiple immediate ancestors potentially belonging to different families; in that case, return true if the language belongs to any of the specified families.

Language:getParent

functionLanguage:getParent()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getParentCode

functionLanguage:getParentCode()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getParentName

functionLanguage:getParentName()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getParentChain

functionLanguage:getParentChain()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:hasParent

functionLanguage:hasParent(...)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getFull

functionLanguage:getFull()

If the language is etymology-only, this iterates through parents until a full language or family is found, and the corresponding object is returned. If the language is a full language, then it simply returns itself.

Language:getFullCode

functionLanguage:getFullCode()

If the language is an etymology-only language, this iterates through parents until a full language or family is found, and the corresponding code is returned. If the language is a full language, then it simply returns the language code.

Language:getFullName

functionLanguage:getFullName()

If the language is an etymology-only language, this iterates through parents until a full language or family is found, and the corresponding canonical name is returned. If the language is a full language, then it simply returns the canonical name of the language.

Language:getAncestors

functionLanguage:getAncestors()

Returns a table ofLanguage objects for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.

Language:getAncestorCodes

functionLanguage:getAncestorCodes()

Returns a table ofLanguage codes for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.

Language:hasAncestor

functionLanguage:hasAncestor(...)

Given a list of language objects or codes, returns true if at least one of them is an ancestor. This includes any etymology-only children of that ancestor. If the language's ancestor(s) are etymology-only languages, it will also return true for those language parent(s) (e.g. if Vulgar Latin is the ancestor, it will also return true for its parent, Latin). However, a parent is excluded from this if the ancestor is also ancestral to that parent (e.g. if Classical Persian is the ancestor, Persian would return false, because Classical Persian is also ancestral to Persian).

Language:getAncestorChain

functionLanguage:getAncestorChain()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getAncestorChainOld

functionLanguage:getAncestorChainOld()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getDescendants

functionLanguage:getDescendants()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getDescendantCodes

functionLanguage:getDescendantCodes()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getDescendantNames

functionLanguage:getDescendantNames()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:hasDescendant

functionLanguage:hasDescendant(...)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getChildren

functionLanguage:getChildren()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getChildrenCodes

functionLanguage:getChildrenCodes()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getChildrenNames

functionLanguage:getChildrenNames()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:hasChild

functionLanguage:hasChild(...)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getCategoryName

functionLanguage:getCategoryName(nocap)

Returns the name of the main category of that language. Example:"French language" for French, whose category is atCategory:French language. Unless optional argumentnocap is given, the language name at the beginning of the returned value will be capitalized. This capitalization is correct for category names, but not if the language name is lowercase and the returned value of this function is used in the middle of a sentence.

Language:makeCategoryLink

functionLanguage:makeCategoryLink()

Creates a link to the category; the link text is the canonical name.

Language:getStandardCharacters

functionLanguage:getStandardCharacters(sc)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:makeEntryName

functionLanguage:makeEntryName(text,sc)

Make the entry name (i.e. the correct page name).

Language:generateForms

functionLanguage:generateForms(text,sc)

Generates alternative forms using a specified method, and returns them as a table. If no method is specified, returns a table containing only the input term.

Language:makeSortKey

functionLanguage:makeSortKey(text,sc)

Creates a sort key for the given entry name, following the rules appropriate for the language. This removes diacritical marks from the entry name if they are not considered significant for sorting, and may perform some other changes. Any initial hyphen is also removed, and anything parentheses is removed as well. Thesort_key setting for each language in the data modules defines the replacements made by this function, or it gives the name of the module that takes the entry name and returns a sortkey.

Language:makeDisplayText

functionLanguage:makeDisplayText(text,sc,keepPrefixes)

Make the display text (i.e. what is displayed on the page).

Language:transliterate

functionLanguage:transliterate(text,sc,module_override)

Transliterates the text from the given script into the Latin script (seeWiktionary:Transliteration and romanization). The language must have thetranslit property for this to work; if it is not present,nil is returned. Returns three values:

  1. The transliteration.
  2. A boolean which indicates whether the transliteration failed for an unexpected reason. Iffalse, then the transliteration either succeeded, or the module is returning nothing in a controlled way (e.g. the input was"-"). Generally, this means that no maintenance action is required. Iftrue, then the transliteration isnil because either the input or output was defective in some way (e.g.Module:ar-translit will not transliterate non-vocalised inputs, and this module will fail partially-completed transliterations in all languages). Note that this value can be manually set by the transliteration module, so make sure to cross-check to ensure it is accurate.
  3. A table of categories selected by the transliteration module, which should be in the format expected byformat_categories inModule:utilities.

Thesc parameter is handled by the transliteration module, and how it is handled is specific to that module. Some transliteration modules may toleratenil as the script, others require it to be one of the possible scripts that the module can transliterate, and will show an error if it's not one of them. For this reason, thesc parameter should always be provided when writing non-language-specific code. Themodule_override parameter is used to override the default module that is used to provide the transliteration. This is useful in cases where you need to demonstrate a particular module in use, but there is no default module yet, or you want to demonstrate an alternative version of a transliteration module before making it official. It should not be used in real modules or templates, only for testing. All uses of this parameter are tracked byWiktionary:Tracking/languages/module_override.Known bugs:

  • This function assumestr(s1)..tr(s2)==tr(s1..s2). When this assertion fails, wikitext markups like ''' can cause wrong transliterations.
  • HTML entities like', often used to escape wikitext markups, do not work.

Language:overrideManualTranslit

functionLanguage:overrideManualTranslit(sc)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:link_tr

functionLanguage:link_tr(sc)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:hasTranslit

functionLanguage:hasTranslit()

Returnstrue if the language has a transliteration module, orfalse if it doesn't.

Language:hasDottedDotlessI

functionLanguage:hasDottedDotlessI()

Returnstrue if the language uses the letters I/ı and İ/i, orfalse if it doesn't.

Language:toJSON

functionLanguage:toJSON(opts)

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getData

functionLanguage:getData(extra,raw)

This function is not for use in entries or other content pages. Returns a blob of data about the language. The format of this blob is undocumented, and perhaps unstable; it's intended for things like the module's own unit-tests, which are "close friends" with the module and will be kept up-to-date as the format changes. Ifextra is set, any extra data in the relevant/extra module will be included. (Note that it will be included anyway if it has already been loaded into the language object.) Ifraw is set, then the returned data will not contain any data inherited from parent objects. -- Do NOT use these methods! -- All uses should be pre-approved on the talk page!

Language:loadInExtraData

functionLanguage:loadInExtraData()

This function lacks documentation. Please add a description of its usages, inputs and outputs, or its difference from similar functions, or make it local to remove it from the function list.

Language:getDataModuleName

functionLanguage:getDataModuleName()

Returns the name of the module containing the language's data. Currently, this is alwaysModule:scripts/data.

Language:getExtraDataModuleName

functionLanguage:getExtraDataModuleName()

Returns the name of the module containing the language's data. Currently, this is alwaysModule:scripts/data.

Error function

SeeModule:languages/error.

Subpages

See also

--[=[This module implements fetching of language-specific information and processing text in a given language.There are two types of languages: full languages and etymology-only languages. The essential difference is that onlyfull languages appear in L2 headings in vocabulary entries, and hence categories like [[:Category:French nouns]] existonly for full languages. Etymology-only languages have either a full language or another etymology-only language astheir parent (in the parent-child inheritance sense), and for etymology-only languages with another etymology-onlylanguage as their parent, a full language can always be derived by following the parent links upwards. For example,"Canadian French", code 'fr-CA', is an etymology-only language whose parent is the full language "French", code 'fr'.An example of an etymology-only language with another etymology-only parent is "Northumbrian Old English", code'ang-nor', which has "Anglian Old English", code 'ang-ang' as its parent; this is an etymology-only language whoseparent is "Old English", code "ang", which is a full language. (This is because Northumbrian Old English is considereda variety of Anglian Old English.) Sometimes the parent is the "Undetermined" language, code 'und'; this is the case,for example, for "substrate" languages such as "Pre-Greek", code 'qsb-grc', and "the BMAC substrate", code 'qsb-bma'.It is important to distinguish language ''parents'' from language ''ancestors''. The parent-child relationship is oneof containment, i.e. if X is a child of Y, X is considered a variety of Y. On the other hand, the ancestor-descendantrelationship is one of descent in time. For example, "Classical Latin", code 'la-cla', and "Late Latin", code 'la-lat',are both etymology-only languages with "Latin", code 'la', as their parents, because both of the former are varietiesof Latin. However, Late Latin does *NOT* have Classical Latin as its parent because Late Latin is *not* a variety ofClassical Latin; rather, it is a descendant. There is in fact a separate 'ancestors' field that is used to express theancestor-descendant relationship, and Late Latin's ancestor is given as Classical Latin. It is also important to notethat sometimes an etymology-only language is actually the conceptual ancestor of its parent language. This happens,for example, with "Old Italian" (code 'roa-oit'), which is an etymology-only variant of full language "Italian" (code'it'), and with "Old Latin" (code 'itc-ola'), which is an etymology-only variant of Latin. In both cases, the fulllanguage has the etymology-only variant listed as an ancestor. This allows a Latin term to inherit from Old Latinusing the {{tl|inh}} template (where in this template, "inheritance" refers to ancestral inheritance, i.e. inheritancein time, rather than in the parent-child sense); likewise for Italian and Old Italian.Full languages come in three subtypes:* {regular}: This indicates a full language that is attested according to [[WT:CFI]] and therefore permitted in themain namespace. There may also be reconstructed terms for the language, which are placed in the{Reconstruction} namespace and must be prefixed with * to indicate a reconstruction. Most full languagesare natural (not constructed) languages, but a few constructed languages (e.g. Esperanto and Volapük,among others) are also allowed in the mainspace and considered regular languages.* {reconstructed}: This language is not attested according to [[WT:CFI]], and therefore is allowed only in the{Reconstruction} namespace. All terms in this language are reconstructed, and must be prefixed with*. Languages such as Proto-Indo-European and Proto-Germanic are in this category.* {appendix-constructed}: This language is attested but does not meet the additional requirements set out forconstructed languages ([[WT:CFI#Constructed languages]]). Its entries must therefore be inthe Appendix namespace, but they are not reconstructed and therefore should not have *prefixed in links. Most constructed languages are of this subtype.Both full languages and etymology-only languages have a {Language} object associated with them, which is fetched usingthe {getByCode} function in [[Module:languages]] to convert a language code to a {Language} object. Depending on theoptions supplied to this function, etymology-only languages may or may not be accepted, and family codes may beaccepted (returning a {Family} object as described in [[Module:families]]). There are also separate {getByCanonicalName}functions in [[Module:languages]] and [[Module:etymology languages]] to convert a language's canonical name to a{Language} object (depending on whether the canonical name refers to a full or etymology-only language).Textual strings belonging to a given language come in several different ''text variants'':# The ''input text'' is what the user supplies in wikitext, in the parameters to {{tl|m}}, {{tl|l}}, {{tl|ux}},{{tl|t}}, {{tl|lang}} and the like.# The ''display text'' is the text in the form as it will be displayed to the user. This can include accent marks thatare stripped to form the entry text (see below), as well as embedded bracketed links that are variously processedfurther. The display text is generated from the input text by applying language-specific transformations; for mostlanguages, there will be no such transformations. Examples of transformations are bad-character replacements forcertain languages (e.g. replacing 'l' or '1' to [[palochka]] in certain languages in Cyrillic); and for Thai andKhmer, converting space-separated words to bracketed words and resolving respelling substitutions such as [กรีน/กฺรีน],which indicate how to transliterate given words.# The ''entry text'' is the text in the form used to generate a link to a Wiktionary entry. This is usually generatedfrom the display text by stripping certain sorts of diacritics on a per-language basis, and sometimes doing othertransformations. The concept of ''entry text'' only really makes sense for text that does not contain embedded links,meaning that display text containing embedded links will need to have the links individually processed to getper-link entry text in order to generate the resolved display text (see below).# The ''resolved display text'' is the result of resolving embedded links in the display text (e.g. converting them totwo-part links where the first part has entry-text transformations applied, and adding appropriate language-specificfragments) and adding appropriate language and script tagging. This text can be passed directly to MediaWiki fordisplay.# The ''source translit text'' is the text as supplied to the language-specific {transliterate()} method. The form ofthe source translit text may need to be language-specific, e.g Thai and Khmer will need the full unprocessed inputtext, whereas other languages may need to work off the display text. [FIXME: It's still unclear to me how embeddedbracketed links are handled in the existing code.] In general, embedded links need to be removed (i.e. converted totheir "bare display" form by taking the right part of two-part links and removing double brackets), but when thishappens is unclear to me [FIXME]. Some languages have a chop-up-and-paste-together scheme that sends parts of thetext through the transliterate mechanism, and for others (those listed with "cont" in {substition} in[[Module:languages/data]]) they receive the full input text, but preprocessed in certain ways. (The wisdom of this isstill unclear to me.)# The ''transliterated text'' (or ''transliteration'') is the result of transliterating the source translit text.Unlike for all the other text variants except the transcribed text, it is always in the Latin script.# The ''transcribed text'' (or ''transcription'') is the result of transcribing the source translit text, where"transcription" here means a close approximation to the phonetic form of the language in languages (e.g. Akkadian,Sumerian, Ancient Egyptian, maybe Tibetan) that have a wide difference between the written letters and spoken form.Unlike for all the other text variants other than the transliterated text, it is always in the Latin script.Currently, the transcribed text is always supplied manually be the user; there is no such thing as a{lua|transcribe()} method on language objects.# The ''sort key'' is the text used in sort keys for determining the placing of pages in categories they belong to. Thesort key is generated from the pagename or a specified ''sort base'' by lowercasing, doing language-specifictransformations and then uppercasing the result. If the sort base is supplied and is generated from input text, itneeds to be converted to display text, have embedded links removed (i.e. resolving them to their right side if theyare two-part links) and have entry text transformations applied.# There are other text variants that occur in usexes (specifically, there are normalized variants of several of theabove text variants), but we can skip them for now.The following methods exist on {Language} objects to convert between different text variants:# {makeDisplayText}: This converts input text to display text.# {lua|makeEntryName}: This converts input or display text to entry text. [FIXME: This needs some rethinking. Inparticular, {lua|makeEntryName} is sometimes called on display text (in some paths inside of [[Module:links]]) andsometimes called on input text (in other paths inside of [[Module:links]], and usually from other modules). We needto make sure we don't try to convert input text to display text twice, but at the same time we need to supportcalling it directly on input text since so many modules do this. This means we need to add a parameter indicatingwhether the passed-in text is input or display text; if that former, we call {lua|makeDisplayText} ourselves.]# {lua|transliterate}: This appears to convert input text with embedded brackets removed into a transliteration.[FIXME: This needs some rethinking. In particular, it calls {lua|processDisplayText} on its input, which won't workfor Thai and Khmer, so we may need language-specific flags indicating whether to pass the input text directly to thelanguage transliterate method. In addition, I'm not sure how embedded links are handled in the existing translit code;a lot of callers remove the links themselves before calling {lua|transliterate()}, which I assume is wrong.]# {lua|makeSortKey}: This converts entry text (?) to a sort key. [FIXME: Clarify this.]]=]localexport={}localdebug_track_module="Module:debug/track"localetymology_languages_data_module="Module:etymology languages/data"localfamilies_module="Module:families"localjson_module="Module:JSON"locallanguage_like_module="Module:language-like"locallanguages_data_module="Module:languages/data"locallanguages_data_patterns_module="Module:languages/data/patterns"locallinks_data_module="Module:links/data"localload_module="Module:load"localscripts_module="Module:scripts"localscripts_data_module="Module:scripts/data"localstring_encode_entities_module="Module:string/encode entities"localstring_pattern_escape_module="Module:string/patternEscape"localstring_replacement_escape_module="Module:string/replacementEscape"localstring_utilities_module="Module:string utilities"localtable_module="Module:table"localutilities_module="Module:utilities"localwikimedia_languages_module="Module:wikimedia languages"localmw=mwlocalstring=stringlocaltable=tablelocalchar=string.charlocalconcat=table.concatlocalfind=string.findlocalfloor=math.floorlocalget_by_code-- Defined below.localget_data_module_name-- Defined below.localget_extra_data_module_name-- Defined below.localgetmetatable=getmetatablelocalgmatch=string.gmatchlocalgsub=string.gsublocalinsert=table.insertlocalipairs=ipairslocalis_known_language_tag=mw.language.isKnownLanguageTaglocalmake_object-- Defined below.localmatch=string.matchlocalnext=nextlocalpairs=pairslocalremove=table.removelocalrequire=requirelocalselect=selectlocalsetmetatable=setmetatablelocalsub=string.sublocaltype=typelocalunstrip=mw.text.unstrip-- Loaded as needed by findBestScript.localHans_charslocalHant_charslocalfunctioncheck_object(...)check_object=require(utilities_module).check_objectreturncheck_object(...)endlocalfunctiondebug_track(...)debug_track=require(debug_track_module)returndebug_track(...)endlocalfunctiondecode_entities(...)decode_entities=require(string_utilities_module).decode_entitiesreturndecode_entities(...)endlocalfunctiondecode_uri(...)decode_uri=require(string_utilities_module).decode_urireturndecode_uri(...)endlocalfunctiondeep_copy(...)deep_copy=require(table_module).deepCopyreturndeep_copy(...)endlocalfunctionencode_entities(...)encode_entities=require(string_encode_entities_module)returnencode_entities(...)endlocalfunctionget_script(...)get_script=require(scripts_module).getByCodereturnget_script(...)endlocalfunctionfind_best_script_without_lang(...)find_best_script_without_lang=require(scripts_module).findBestScriptWithoutLangreturnfind_best_script_without_lang(...)endlocalfunctionget_family(...)get_family=require(families_module).getByCodereturnget_family(...)endlocalfunctionget_plaintext(...)get_plaintext=require(utilities_module).get_plaintextreturnget_plaintext(...)endlocalfunctionget_wikimedia_lang(...)get_wikimedia_lang=require(wikimedia_languages_module).getByCodereturnget_wikimedia_lang(...)endlocalfunctionkeys_to_list(...)keys_to_list=require(table_module).keysToListreturnkeys_to_list(...)endlocalfunctionlist_to_set(...)list_to_set=require(table_module).listToSetreturnlist_to_set(...)endlocalfunctionload_data(...)load_data=require(load_module).load_datareturnload_data(...)endlocalfunctionmake_family_object(...)make_family_object=require(families_module).makeObjectreturnmake_family_object(...)endlocalfunctionpattern_escape(...)pattern_escape=require(string_pattern_escape_module)returnpattern_escape(...)endlocalfunctionremove_duplicates(...)remove_duplicates=require(table_module).removeDuplicatesreturnremove_duplicates(...)endlocalfunctionreplacement_escape(...)replacement_escape=require(string_replacement_escape_module)returnreplacement_escape(...)endlocalfunctionsafe_require(...)safe_require=require(load_module).safe_requirereturnsafe_require(...)endlocalfunctionshallow_copy(...)shallow_copy=require(table_module).shallowCopyreturnshallow_copy(...)endlocalfunctionsplit(...)split=require(string_utilities_module).splitreturnsplit(...)endlocalfunctionto_json(...)to_json=require(json_module).toJSONreturnto_json(...)endlocalfunctionu(...)u=require(string_utilities_module).charreturnu(...)endlocalfunctionugsub(...)ugsub=require(string_utilities_module).gsubreturnugsub(...)endlocalfunctionulen(...)ulen=require(string_utilities_module).lenreturnulen(...)endlocalfunctionulower(...)ulower=require(string_utilities_module).lowerreturnulower(...)endlocalfunctionumatch(...)umatch=require(string_utilities_module).matchreturnumatch(...)endlocalfunctionuupper(...)uupper=require(string_utilities_module).upperreturnuupper(...)endlocalfunctiontrack(page)debug_track("languages/"..page)returntrueendlocalfunctionnormalize_code(code)returnload_data(languages_data_module).aliases[code]orcodeendlocalfunctioncheck_inputs(self,check,default,...)localn=select("#",...)ifn==0thenreturnfalseendlocalret=check(self,(...))ifret~=nilthenreturnretelseifn>1thenlocalinputs={...}fori=2,ndoret=check(self,inputs[i])ifret~=nilthenreturnretendendendreturndefaultendlocalfunctionmake_link(self,target,display)localprefix,mainifself:getFamilyCode()=="qfa-sub"thenprefix,main=display:match("^(the )(.*)")ifnotprefixthenprefix,main=display:match("^(a )(.*)")endendreturn(prefixor"").."[["..target.."|"..(mainordisplay).."]]"end-- Convert risky characters to HTML entities, which minimizes interference once returned (e.g. for "sms:a", "<!-- -->" etc.).localfunctionescape_risky_characters(text)-- Spacing characters in isolation generally need to be escaped in order to be properly processed by the MediaWiki software.ifumatch(text,"^%s*$")thenreturnencode_entities(text,text)endreturnencode_entities(text,"!#%&*+/:;<=>?@[\\]_{|}")end-- Temporarily convert various formatting characters to PUA to prevent them from being disrupted by the substitution process.localfunctiondoTempSubstitutions(text,subbedChars,keepCarets,noTrim)-- Clone so that we don't insert any extra patterns into the table in package.loaded. For some reason, using require seems to keep memory use down; probably because the table is always cloned.localpatterns=shallow_copy(require(languages_data_patterns_module))ifkeepCaretstheninsert(patterns,"((\\+)%^)")insert(patterns,"((%^))")end-- Ensure any whitespace at the beginning and end is temp substituted, to prevent it from being accidentally trimmed. We only want to trim any final spaces added during the substitution process (e.g. by a module), which means we only do this during the first round of temp substitutions.ifnotnoTrimtheninsert(patterns,"^([\128-\191\244]*(%s+))")insert(patterns,"((%s+)[\128-\191\244]*)$")end-- Pre-substitution, of "[[" and "]]", which makes pattern matching more accurate.text=gsub(text,"%f[%[]%[%[","\1"):gsub("%f[%]]%]%]","\2")locali=#subbedCharsfor_,patterninipairs(patterns)do-- Patterns ending in \0 stand are for things like "[[" or "]]"), so the inserted PUA are treated as breaks between terms by modules that scrape info from pages.localterm_dividerpattern=gsub(pattern,"%z$",function(divider)term_divider=divider=="\0"return""end)text=gsub(text,pattern,function(...)localm={...}localm1New=m[1]fork=2,#mdolocaln=i+k-1subbedChars[n]=m[k]localbyte2=floor(n/4096)%64+(term_dividerand128or136)localbyte3=floor(n/64)%64+128localbyte4=n%64+128m1New=gsub(m1New,pattern_escape(m[k]),"\244"..char(byte2)..char(byte3)..char(byte4),1)endi=i+#m-1returnm1Newend)endtext=gsub(text,"\1","%[%["):gsub("\2","%]%]")returntext,subbedCharsend-- Reinsert any formatting that was temporarily substituted.localfunctionundoTempSubstitutions(text,subbedChars)fori=1,#subbedCharsdolocalbyte2=floor(i/4096)%64+128localbyte3=floor(i/64)%64+128localbyte4=i%64+128text=gsub(text,"\244["..char(byte2)..char(byte2+8).."]"..char(byte3)..char(byte4),replacement_escape(subbedChars[i]))endtext=gsub(text,"\1","%[%["):gsub("\2","%]%]")returntextend-- Check if the raw text is an unsupported title, and if so return that. Otherwise, remove HTML entities. We do the pre-conversion to avoid loading the unsupported title list unnecessarily.localfunctioncheckNoEntities(self,text)localtextNoEnc=decode_entities(text)iftextNoEnc~=textandload_data(links_data_module).unsupported_titles[text]thenreturntextelsereturntextNoEncendend-- If no script object is provided (or if it's invalid or None), get one.localfunctioncheckScript(text,self,sc)ifnotcheck_object("script",true,sc)orsc:getCode()=="None"thenreturnself:findBestScript(text)endreturnscendlocalfunctionnormalize(text,sc)text=sc:fixDiscouragedSequences(text)returnsc:toFixedNFD(text)endlocalfunctiondoSubstitutions(self,text,sc,substitution_data,data_field,function_name,recursed)localfail,cats,actual_substitution_data=nil,{},substitution_data-- If there are language-specific substitutes given in the data module, use those.iftype(substitution_data)=="table"then-- If a script is specified, run this function with the script-specific data before continuing.localsc_code=sc:getCode()localhas_substitution_data=falseifsubstitution_data[sc_code]thenhas_substitution_data=truetext,fail,cats,actual_substitution_data=doSubstitutions(self,text,sc,substitution_data[sc_code],data_field,function_name,true)-- Hant, Hans and Hani are usually treated the same, so add a special case to avoid having to specify each one separately.elseifsc_code:match("^Han")andsubstitution_data.Hanithenhas_substitution_data=truetext,fail,cats,actual_substitution_data=doSubstitutions(self,text,sc,substitution_data.Hani,data_field,function_name,true)-- Substitution data with key 1 in the outer table may be given as a fallback.elseifsubstitution_data[1]thenhas_substitution_data=truetext,fail,cats,actual_substitution_data=doSubstitutions(self,text,sc,substitution_data[1],data_field,function_name,true)end-- Iterate over all strings in the "from" subtable, and gsub with the corresponding string in "to". We work with the NFD decomposed forms, as this simplifies many substitutions.ifsubstitution_data.fromthenhas_substitution_data=truefori,frominipairs(substitution_data.from)do-- Normalize each loop, to ensure multi-stage substitutions work correctly.text=sc:toFixedNFD(text)text=ugsub(text,sc:toFixedNFD(from),substitution_data.to[i]or"")endendifsubstitution_data.remove_diacriticsthenhas_substitution_data=truetext=sc:toFixedNFD(text)-- Convert exceptions to PUA.localremove_exceptions,substitutes=substitution_data.remove_exceptionsifremove_exceptionsthensubstitutes={}locali=0for_,exceptioninipairs(remove_exceptions)doexception=sc:toFixedNFD(exception)text=ugsub(text,exception,function(m)i=i+1localsubst=u(0x80000+i)substitutes[subst]=mreturnsubstend)endend-- Strip diacritics.text=ugsub(text,"["..substitution_data.remove_diacritics.."]","")-- Convert exceptions back.ifremove_exceptionsthentext=text:gsub("\242[\128-\191]*",substitutes)endendifnothas_substitution_dataandsc._data[data_field]thentext,fail,cats,actual_substitution_data=doSubstitutions(self,text,sc,sc._data[data_field],data_field,function_name,true)endelseiftype(substitution_data)=="string"then-- If there is a dedicated function module, use that.localmodule=safe_require("Module:"..substitution_data)ifmodulethen-- TODO: translit functions should take objects, not codes.-- TODO: translit functions should be called with form NFD.iffunction_name=="tr"thentext,fail,cats=module[function_name](text,self._code,sc:getCode())elsetext,fail,cats=module[function_name](sc:toFixedNFD(text),self,sc)end-- TODO: get rid of the `fail` and `cats` return values.iffail~=nilthentrack("fail")track("fail/"..self._code)endifcats~=nilthentrack("cats")track("cats/"..self._code)endelseerror("Substitution data '"..substitution_data.."' does not match an existing module.")endelseifsubstitution_data==nilandsc._data[data_field]then-- If language-specific sort key (etc.) is nil, fall back to script-wide sort key (etc.).text,fail,cats,actual_substitution_data=doSubstitutions(self,text,sc,sc._data[data_field],data_field,function_name,true)end-- Don't normalize to NFC if this is the inner loop or if a module returned nil.ifrecursedornottextthenreturntext,fail,cats,actual_substitution_dataend-- Fix any discouraged sequences created during the substitution process, and normalize into the final form.returnsc:toFixedNFC(sc:fixDiscouragedSequences(text)),fail,cats,actual_substitution_dataend-- Split the text into sections, based on the presence of temporarily substituted formatting characters, then iterate over each one to apply substitutions. This avoids putting PUA characters through language-specific modules, which may be unequipped for them.localfunctioniterateSectionSubstitutions(self,text,sc,subbedChars,keepCarets,substitution_data,data_field,function_name,notrim)localfail,cats,sections=nil,{}-- See [[Module:languages/data]].ifnotfind(text,"\244")or(load_data(languages_data_module).substitution[self._code]=="cont")thensections={text}elsesections=split(text,"\244[\128-\143][\128-\191]*",true)endlocalactual_substitution_datafor_,sectioninipairs(sections)do-- Don't bother processing empty strings or whitespace (which may also not be handled well by dedicated modules).ifgsub(section,"%s+","")~=""thenlocalsub,sub_fail,sub_cats,this_actual_substitution_data=doSubstitutions(self,section,sc,substitution_data,data_field,function_name)actual_substitution_data=this_actual_substitution_data-- Second round of temporary substitutions, in case any formatting was added by the main substitution process. However, don't do this if the section contains formatting already (as it would have had to have been escaped to reach this stage, and therefore should be given as raw text).ifsubandsubbedCharsthenlocalnoSubfor_,patterninipairs(require(languages_data_patterns_module))doifmatch(section,pattern.."%z?")thennoSub=trueendendifnotnoSubthensub,subbedChars=doTempSubstitutions(sub,subbedChars,keepCarets,true)endendif(notsub)orsub_failthentext=subfail=sub_failcats=sub_catsor{}breakendtext=subandgsub(text,pattern_escape(section),replacement_escape(sub),1)ortextiftype(sub_cats)=="table"thenfor_,catinipairs(sub_cats)doinsert(cats,cat)endendendendifnotnotrimthen-- Trim, unless there are only spacing characters, while ignoring any final formatting characters.-- Do not trim sort keys because spaces at the beginning are significant.text=textandtext:gsub("^([\128-\191\244]*)%s+(%S)","%1%2"):gsub("(%S)%s+([\128-\191\244]*)$","%1%2")end-- Remove duplicate categories.if#cats>1thencats=remove_duplicates(cats)endreturntext,fail,cats,subbedChars,actual_substitution_dataend-- Process carets (and any escapes). Default to simple removal, if no pattern/replacement is given.localfunctionprocessCarets(text,pattern,repl)localreprepeattext,rep=gsub(text,"\\\\(\\*^)","\3%1")untilrep==0returntext:gsub("\\^","\4"):gsub(patternor"%^",replor""):gsub("\3","\\"):gsub("\4","^")end-- Remove carets if they are used to capitalize parts of transliterations (unless they have been escaped).localfunctionremoveCarets(text,sc)ifnotsc:hasCapitalization()andsc:isTransliterated()andtext:find("^",1,true)thenreturnprocessCarets(text)elsereturntextendendlocalLanguage={}--[==[Returns the language code of the language. Example: {{code|lua|"fr"}} for French.]==]functionLanguage:getCode()returnself._codeend--[==[Returns the canonical name of the language. This is the name used to represent that language on Wiktionary, and is guaranteed to be unique to that language alone. Example: {{code|lua|"French"}} for French.]==]functionLanguage:getCanonicalName()localname=self._nameifname==nilthenname=self._data[1]self._name=nameendreturnnameend--[==[Return the display form of the language. The display form of a language, family or script is the form it takes whenappearing as the <code><var>source</var></code> in categories such as <code>English terms derived from<var>source</var></code> or <code>English given names from <var>source</var></code>, and is also the displayed textin {makeCategoryLink()} links. For full and etymology-only languages, this is the same as the canonical name, butfor families, it reads <code>"<var>name</var> languages"</code> (e.g. {"Indo-Iranian languages"}), and for scripts,it reads <code>"<var>name</var> script"</code> (e.g. {"Arabic script"}).]==]functionLanguage:getDisplayForm()localform=self._displayFormifform==nilthenform=self:getCanonicalName()-- Add article and " substrate" to substrates that lack them.ifself:getFamilyCode()=="qfa-sub"thenifnot(sub(form,1,4)=="the "orsub(form,1,2)=="a ")thenform="a "..formendifnotmatch(form," [Ss]ubstrate")thenform=form.." substrate"endendself._displayForm=formendreturnformend--[==[Returns the value which should be used in the HTML lang= attribute for tagged text in the language.]==]functionLanguage:getHTMLAttribute(sc,region)localcode=self._codeifnotfind(code,"-",1,true)thenreturncode.."-"..sc:getCode()..(regionand"-"..regionor"")endlocalparent=self:getParent()region=regionormatch(code,"%f[%u][%u-]+%f[%U]")ifparentthenreturnparent:getHTMLAttribute(sc,region)end-- TODO: ISO family codes can also be used.return"mis-"..sc:getCode()..(regionand"-"..regionor"")end--[==[Returns a table of the aliases that the language is known by, excluding the canonical name. Aliases are synonyms for the language in question. The names are not guaranteed to be unique, in that sometimes more than one language is known by the same name. Example: {{code|lua|{"High German", "New High German", "Deutsch"} }} for [[:Category:German language|German]].]==]functionLanguage:getAliases()self:loadInExtraData()returnrequire(language_like_module).getAliases(self)end--[==[Return a table of the known subvarieties of a given language, excluding subvarieties that have been givenexplicit etymology-only language codes. The names are not guaranteed to be unique, in that sometimes a given namerefers to a subvariety of more than one language. Example: {{code|lua|{"Southern Aymara", "Central Aymara"} }} for[[:Category:Aymara language|Aymara]]. Note that the returned value can have nested tables in it, when a subvarietygoes by more than one name. Example: {{code|lua|{"North Azerbaijani", "South Azerbaijani", {"Afshar", "Afshari","Afshar Azerbaijani", "Afchar"}, {"Qashqa'i", "Qashqai", "Kashkay"}, "Sonqor"} }} for[[:Category:Azerbaijani language|Azerbaijani]]. Here, for example, Afshar, Afshari, Afshar Azerbaijani and Afcharall refer to the same subvariety, whose preferred name is Afshar (the one listed first). To avoid a return valuewith nested tables in it, specify a non-{{code|lua|nil}} value for the <code>flatten</code> parameter; in that case,the return value would be {{code|lua|{"North Azerbaijani", "South Azerbaijani", "Afshar", "Afshari","Afshar Azerbaijani", "Afchar", "Qashqa'i", "Qashqai", "Kashkay", "Sonqor"} }}.]==]functionLanguage:getVarieties(flatten)self:loadInExtraData()returnrequire(language_like_module).getVarieties(self,flatten)end--[==[Returns a table of the "other names" that the language is known by, which are listed in the <code>otherNames</code> field. It should be noted that the <code>otherNames</code> field itself is deprecated, and entries listed there should eventually be moved to either <code>aliases</code> or <code>varieties</code>.]==]functionLanguage:getOtherNames()-- To be eventually removed, once there are no more uses of the `otherNames` field.self:loadInExtraData()returnrequire(language_like_module).getOtherNames(self)end--[==[Return a combined table of the canonical name, aliases, varieties and other names of a given language.]==]functionLanguage:getAllNames()self:loadInExtraData()returnrequire(language_like_module).getAllNames(self)end--[==[Returns a table of types as a lookup table (with the types as keys).The possible types are* {language}: This is a language, either full or etymology-only.* {full}: This is a "full" (not etymology-only) language, i.e. the union of {regular}, {reconstructed} and{appendix-constructed}. Note that the types {full} and {etymology-only} also exist for families, so if youwant to check specifically for a full language and you have an object that might be a family, you shoulduse {{lua|hasType("language", "full")}} and not simply {{lua|hasType("full")}}.* {etymology-only}: This is an etymology-only (not full) language, whose parent is another etymology-onlylanguage or a full language. Note that the types {full} and {etymology-only} also exist forfamilies, so if you want to check specifically for an etymology-only language and you have anobject that might be a family, you should use {{lua|hasType("language", "etymology-only")}}and not simply {{lua|hasType("etymology-only")}}.* {regular}: This indicates a full language that is attested according to [[WT:CFI]] and therefore permittedin the main namespace. There may also be reconstructed terms for the language, which are placed inthe {Reconstruction} namespace and must be prefixed with * to indicate a reconstruction. Most fulllanguages are natural (not constructed) languages, but a few constructed languages (e.g. Esperantoand Volapük, among others) are also allowed in the mainspace and considered regular languages.* {reconstructed}: This language is not attested according to [[WT:CFI]], and therefore is allowed only in the{Reconstruction} namespace. All terms in this language are reconstructed, and must be prefixedwith *. Languages such as Proto-Indo-European and Proto-Germanic are in this category.* {appendix-constructed}: This language is attested but does not meet the additional requirements set out forconstructed languages ([[WT:CFI#Constructed languages]]). Its entries must thereforebe in the Appendix namespace, but they are not reconstructed and therefore shouldnot have * prefixed in links.]==]functionLanguage:getTypes()localtypes=self._typesiftypes==nilthentypes={language=true}ifself:getFullCode()==self._codethentypes.full=trueelsetypes["etymology-only"]=trueendfortingmatch(self._data.type,"[^,]+")dotypes[t]=trueendself._types=typesendreturntypesend--[==[Given a list of types as strings, returns true if the language has all of them.]==]functionLanguage:hasType(...)Language.hasType=require(language_like_module).hasTypereturnself:hasType(...)end--[==[Returns a table containing <code>WikimediaLanguage</code> objects (see [[Module:wikimedia languages]]), which represent languages and their codes as they are used in Wikimedia projects for interwiki linking and such. More than one object may be returned, as a single Wiktionary language may correspond to multiple Wikimedia languages. For example, Wiktionary's single code <code>sh</code> (Serbo-Croatian) maps to four Wikimedia codes: <code>sh</code> (Serbo-Croatian), <code>bs</code> (Bosnian), <code>hr</code> (Croatian) and <code>sr</code> (Serbian).The code for the Wikimedia language is retrieved from the <code>wikimedia_codes</code> property in the data modules. If that property is not present, the code of the current language is used. If none of the available codes is actually a valid Wikimedia code, an empty table is returned.]==]functionLanguage:getWikimediaLanguages()localwm_langs=self._wikimediaLanguageObjectsifwm_langs==nilthenlocalcodes=self:getWikimediaLanguageCodes()wm_langs={}fori=1,#codesdowm_langs[i]=get_wikimedia_lang(codes[i])endself._wikimediaLanguageObjects=wm_langsendreturnwm_langsendfunctionLanguage:getWikimediaLanguageCodes()localwm_langs=self._wikimediaLanguageCodesifwm_langs==nilthenwm_langs=self._data.wikimedia_codesifwm_langsthenwm_langs=split(wm_langs,",",true,true)elselocalcode=self._codeifis_known_language_tag(code)thenwm_langs={code}else-- Inherit, but only if no codes are specified in the data *and*-- the language code isn't a valid Wikimedia language code.localparent=self:getParent()wm_langs=parentandparent:getWikimediaLanguageCodes()or{}endendself._wikimediaLanguageCodes=wm_langsendreturnwm_langsend--[==[Returns the name of the Wikipedia article for the language. `project` specifies the language and project to retrievethe article from, defaulting to {"enwiki"} for the English Wikipedia. Normally if specified it should be the projectcode for a specific-language Wikipedia e.g. "zhwiki" for the Chinese Wikipedia, but it can be any project, includingnon-Wikipedia ones. If the project is the English Wikipedia and the property {wikipedia_article} is present in the datamodule it will be used first. In all other cases, a sitelink will be generated from {:getWikidataItem} (if set). Theresulting value (or lack of value) is cached so that subsequent calls are fast. If no value could be determined, and`noCategoryFallback` is {false}, {:getCategoryName} is used as fallback; otherwise, {nil} is returned. Note that if`noCategoryFallback` is {nil} or omitted, it defaults to {false} if the project is the English Wikipedia, otherwiseto {true}. In other words, under normal circumstances, if the English Wikipedia article couldn't be retrieved, thereturn value will fall back to a link to the language's category, but this won't normally happen for any other project.]==]functionLanguage:getWikipediaArticle(noCategoryFallback,project)Language.getWikipediaArticle=require(language_like_module).getWikipediaArticlereturnself:getWikipediaArticle(noCategoryFallback,project)endfunctionLanguage:makeWikipediaLink()returnmake_link(self,"w:"..self:getWikipediaArticle(),self:getCanonicalName())end--[==[Returns the name of the Wikimedia Commons category page for the language.]==]functionLanguage:getCommonsCategory()Language.getCommonsCategory=require(language_like_module).getCommonsCategoryreturnself:getCommonsCategory()end--[==[Returns the Wikidata item id for the language or <code>nil</code>. This corresponds to the the second field in the data modules.]==]functionLanguage:getWikidataItem()Language.getWikidataItem=require(language_like_module).getWikidataItemreturnself:getWikidataItem()end--[==[Returns a table of <code>Script</code> objects for all scripts that the language is written in. See [[Module:scripts]].]==]functionLanguage:getScripts()localscripts=self._scriptObjectsifscripts==nilthenlocalcodes=self:getScriptCodes()ifcodes[1]=="All"thenscripts=load_data(scripts_data_module)elsescripts={}fori=1,#codesdoscripts[i]=get_script(codes[i])endendself._scriptObjects=scriptsendreturnscriptsend--[==[Returns the table of script codes in the language's data file.]==]functionLanguage:getScriptCodes()localscripts=self._scriptCodesifscripts==nilthenscripts=self._data[4]ifscriptsthenlocalcodes,n={},0forcodeingmatch(scripts,"[^,]+")don=n+1-- Special handling of "Hants", which represents "Hani", "Hant" and "Hans" collectively.ifcode=="Hants"thencodes[n]="Hani"codes[n+1]="Hant"codes[n+2]="Hans"n=n+2elsecodes[n]=codeendendscripts=codeselsescripts={"None"}endself._scriptCodes=scriptsendreturnscriptsend--[==[Given some text, this function iterates through the scripts of a given language and tries to find the script that best matches the text. It returns a {{code|lua|Script}} object representing the script. If no match is found at all, it returns the {{code|lua|None}} script object.]==]functionLanguage:findBestScript(text,forceDetect)ifnottextortext==""ortext=="-"thenreturnget_script("None")end-- Differs from table returned by getScriptCodes, as Hants is not normalized into its constituents.localcodes=self._bestScriptCodesifcodes==nilthencodes=self._data[4]codes=codesandsplit(codes,",",true,true)or{"None"}self._bestScriptCodes=codesendlocalfirst_sc=codes[1]iffirst_sc=="All"thenreturnfind_best_script_without_lang(text)endlocalcodes_len=#codesifnot(forceDetectorfirst_sc=="Hants"orcodes_len>1)thenfirst_sc=get_script(first_sc)localcharset=first_sc.charactersreturncharsetandumatch(text,"["..charset.."]")andfirst_scorget_script("None")end-- Remove all formatting characters.text=get_plaintext(text)-- Remove all spaces and any ASCII punctuation. Some non-ASCII punctuation is script-specific, so can't be removed.text=ugsub(text,"[%s!\"#%%&'()*,%-./:;?@[\\%]_{}]+","")if#text==0thenreturnget_script("None")end-- Try to match every script against the text,-- and return the one with the most matching characters.localbestcount,bestscript,length=0fori=1,codes_lendolocalsc=codes[i]-- Special case for "Hants", which is a special code that represents whichever of "Hant" or "Hans" best matches, or "Hani" if they match equally. This avoids having to list all three. In addition, "Hants" will be treated as the best match if there is at least one matching character, under the assumption that a Han script is desirable in terms that contain a mix of Han and other scripts (not counting those which use Jpan or Kore).ifsc=="Hants"thenlocalHani=get_script("Hani")ifnotHant_charsthenHant_chars=load_data("Module:zh/data/ts")Hans_chars=load_data("Module:zh/data/st")endlocalt,s,found=0,0-- This is faster than using mw.ustring.gmatch directly.forchingmatch(ugsub(text,"["..Hani.characters.."]","\255%0"),"\255(.[\128-\191]*)")dofound=trueifHant_chars[ch]thent=t+1ifHans_chars[ch]thens=s+1endelseifHans_chars[ch]thens=s+1elset,s=t+1,s+1endendiffoundthenift==sthenreturnHaniendreturnget_script(t>sand"Hant"or"Hans")endelsesc=get_script(sc)ifnotlengththenlength=ulen(text)end-- Count characters by removing everything in the script's charset and comparing to the original length.localcharset=sc.characterslocalcount=charsetandlength-ulen(ugsub(text,"["..charset.."]+",""))or0ifcount>=lengththenreturnscelseifcount>bestcountthenbestcount=countbestscript=scendendend-- Return best matching script, or otherwise None.returnbestscriptorget_script("None")end--[==[Returns a <code>Family</code> object for the language family that the language belongs to. See [[Module:families]].]==]functionLanguage:getFamily()localfamily=self._familyObjectiffamily==nilthenfamily=self:getFamilyCode()-- If the value is nil, it's cached as false.family=familyandget_family(family)orfalseself._familyObject=familyendreturnfamilyornilend--[==[Returns the family code in the language's data file.]==]functionLanguage:getFamilyCode()localfamily=self._familyCodeiffamily==nilthen-- If the value is nil, it's cached as false.family=self._data[3]orfalseself._familyCode=familyendreturnfamilyornilendfunctionLanguage:getFamilyName()localfamily=self._familyNameiffamily==nilthenfamily=self:getFamily()-- If the value is nil, it's cached as false.family=familyandfamily:getCanonicalName()orfalseself._familyName=familyendreturnfamilyornilenddolocalfunctioncheck_family(self,family)iftype(family)=="table"thenfamily=family:getCode()endifself:getFamilyCode()==familythenreturntrueendlocalself_family=self:getFamily()ifself_family:inFamily(family)thenreturntrue-- If the family isn't a real family (e.g. creoles) check any ancestors.elseifself_family:inFamily("qfa-not")thenlocalancestors=self:getAncestors()for_,ancestorinipairs(ancestors)doifancestor:inFamily(family)thenreturntrueendendendend--[==[Check whether the language belongs to `family` (which can be a family code or object). A list of objects can be given in place of `family`; in that case, return true if the language belongs to any of the specified families. Note that some languages (in particular, certain creoles) can have multiple immediate ancestors potentially belonging to different families; in that case, return true if the language belongs to any of the specified families.]==]functionLanguage:inFamily(...)ifself:getFamilyCode()==nilthenreturnfalseendreturncheck_inputs(self,check_family,false,...)endendfunctionLanguage:getParent()localparent=self._parentObjectifparent==nilthenparent=self:getParentCode()-- If the value is nil, it's cached as false.parent=parentandget_by_code(parent,nil,true,true)orfalseself._parentObject=parentendreturnparentornilendfunctionLanguage:getParentCode()localparent=self._parentCodeifparent==nilthen-- If the value is nil, it's cached as false.parent=self._data.parentorfalseself._parentCode=parentendreturnparentornilendfunctionLanguage:getParentName()localparent=self._parentNameifparent==nilthenparent=self:getParent()-- If the value is nil, it's cached as false.parent=parentandparent:getCanonicalName()orfalseself._parentName=parentendreturnparentornilendfunctionLanguage:getParentChain()localchain=self._parentChainifchain==nilthenchain={}localparent,n=self:getParent(),0whileparentdon=n+1chain[n]=parentparent=parent:getParent()endself._parentChain=chainendreturnchainenddolocalfunctioncheck_lang(self,lang)for_,parentinipairs(self:getParentChain())doif(type(lang)=="string"andlangorlang:getCode())==parent:getCode()thenreturntrueendendendfunctionLanguage:hasParent(...)returncheck_inputs(self,check_lang,false,...)endend--[==[If the language is etymology-only, this iterates through parents until a full language or family is found, and thecorresponding object is returned. If the language is a full language, then it simply returns itself.]==]functionLanguage:getFull()localfull=self._fullObjectiffull==nilthenfull=self:getFullCode()full=full==self._codeandselforget_by_code(full)self._fullObject=fullendreturnfullend--[==[If the language is an etymology-only language, this iterates through parents until a full language or family isfound, and the corresponding code is returned. If the language is a full language, then it simply returns thelanguage code.]==]functionLanguage:getFullCode()returnself._fullCodeorself._codeend--[==[If the language is an etymology-only language, this iterates through parents until a full language or family isfound, and the corresponding canonical name is returned. If the language is a full language, then it simply returnsthe canonical name of the language.]==]functionLanguage:getFullName()localfull=self._fullNameiffull==nilthenfull=self:getFull():getCanonicalName()self._fullName=fullendreturnfullend--[==[Returns a table of <code class="nf">Language</code> objects for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.]==]functionLanguage:getAncestors()localancestors=self._ancestorObjectsifancestors==nilthenancestors={}localancestor_codes=self:getAncestorCodes()if#ancestor_codes>0thenfor_,ancestorinipairs(ancestor_codes)doinsert(ancestors,get_by_code(ancestor,nil,true))endelselocalfam=self:getFamily()localprotoLang=famandfam:getProtoLanguage()ornil-- For the cases where the current language is the proto-language-- of its family, or an etymology-only language that is ancestral to that-- proto-language, we need to step up a level higher right from the-- start.ifprotoLangand(protoLang:getCode()==self._codeor(self:hasType("etymology-only")andprotoLang:hasAncestor(self)))thenfam=fam:getFamily()protoLang=famandfam:getProtoLanguage()ornilendwhilenotprotoLangandnot(notfamorfam:getCode()=="qfa-not")dofam=fam:getFamily()protoLang=famandfam:getProtoLanguage()ornilendinsert(ancestors,protoLang)endself._ancestorObjects=ancestorsendreturnancestorsenddo-- Avoid a language being its own ancestor via class inheritance. We only need to check for this if the language has inherited an ancestor table from its parent, because we never want to drop ancestors that have been explicitly set in the data.-- Recursively iterate over ancestors until we either find self or run out. If self is found, return true.localfunctioncheck_ancestor(self,lang)localcodes=lang:getAncestorCodes()ifnotcodesthenreturnnilendfori=1,#codesdolocalcode=codes[i]ifcode==self._codethenreturntrueendlocalanc=get_by_code(code,nil,true)ifcheck_ancestor(self,anc)thenreturntrueendendend--[==[Returns a table of <code class="nf">Language</code> codes for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.]==]functionLanguage:getAncestorCodes()ifself._ancestorCodesthenreturnself._ancestorCodesendlocaldata=self._datalocalcodes=data.ancestorsifcodes==nilthencodes={}self._ancestorCodes=codesreturncodesendcodes=split(codes,",",true,true)self._ancestorCodes=codes-- If there are no codes or the ancestors weren't inherited data, there's nothing left to check.if#codes==0orself:getData(false,"raw").ancestors~=nilthenreturncodesendlocali,code=1whilei<=#codesdocode=codes[i]ifcheck_ancestor(self,self)thenremove(codes,i)elsei=i+1endendreturncodesendend--[==[Given a list of language objects or codes, returns true if at least one of them is an ancestor. This includes any etymology-only children of that ancestor. If the language's ancestor(s) are etymology-only languages, it will also return true for those language parent(s) (e.g. if Vulgar Latin is the ancestor, it will also return true for its parent, Latin). However, a parent is excluded from this if the ancestor is also ancestral to that parent (e.g. if Classical Persian is the ancestor, Persian would return false, because Classical Persian is also ancestral to Persian).]==]functionLanguage:hasAncestor(...)localfunctioniterateOverAncestorTree(node,func,parent_check)localancestors=node:getAncestors()localancestorsParents={}for_,ancestorinipairs(ancestors)do-- When checking the parents of the other language, and the ancestor is also a parent, skip to the next ancestor, so that we exclude any etymology-only children of that parent that are not directly related (see below).localret=(parent_checkornotnode:hasParent(ancestor))andfunc(ancestor)oriterateOverAncestorTree(ancestor,func,parent_check)ifretthenreturnretendend-- Check the parents of any ancestors. We don't do this if checking the parents of the other language, so that we exclude any etymology-only children of those parents that are not directly related (e.g. if the ancestor is Vulgar Latin and we are checking New Latin, we want it to return false because they are on different ancestral branches. As such, if we're already checking the parent of New Latin (Latin) we don't want to compare it to the parent of the ancestor (Latin), as this would be a false positive; it should be one or the other).ifnotparent_checkthenreturnnilendfor_,ancestorinipairs(ancestors)dolocalancestorParents=ancestor:getParentChain()for_,ancestorParentinipairs(ancestorParents)doifancestorParent:getCode()==self._codeorancestorParent:hasAncestor(ancestor)thenbreakelseinsert(ancestorsParents,ancestorParent)endendendfor_,ancestorParentinipairs(ancestorsParents)dolocalret=func(ancestorParent)ifretthenreturnretendendendlocalfunctiondo_iteration(otherlang,parent_check)-- otherlang can't be selfif(type(otherlang)=="string"andotherlangorotherlang:getCode())==self._codethenreturnfalseendrepeatifiterateOverAncestorTree(self,function(ancestor)returnancestor:getCode()==(type(otherlang)=="string"andotherlangorotherlang:getCode())end,parent_check)thenreturntrueelseiftype(otherlang)=="string"thenotherlang=get_by_code(otherlang,nil,true)endotherlang=otherlang:getParent()parent_check=falseuntilnototherlangendlocalparent_check=truefor_,otherlanginipairs{...}dolocalret=do_iteration(otherlang,parent_check)ifretthenreturntrueendendreturnfalseenddolocalfunctionconstruct_node(lang,memo)localbranch,ancestors={lang=lang:getCode()}memo[lang:getCode()]=branchfor_,ancestorinipairs(lang:getAncestors())doifancestors==nilthenancestors={}endinsert(ancestors,memo[ancestor:getCode()]orconstruct_node(ancestor,memo))endbranch.ancestors=ancestorsreturnbranchendfunctionLanguage:getAncestorChain()localchain=self._ancestorChainifchain==nilthenchain=construct_node(self,{})self._ancestorChain=chainendreturnchainendendfunctionLanguage:getAncestorChainOld()localchain=self._ancestorChainifchain==nilthenchain={}localstep=selfwhiletruedolocalancestors=step:getAncestors()step=#ancestors==1andancestors[1]ornilifnotstepthenbreakendinsert(chain,step)endself._ancestorChain=chainendreturnchainendlocalfunctionfetch_descendants(self,fmt)localdescendants,family={},self:getFamily()-- Iterate over all three datasets.for_,datainipairs{require("Module:languages/code to canonical name"),require("Module:etymology languages/code to canonical name"),require("Module:families/code to canonical name"),}doforcodeinpairs(data)dolocallang=get_by_code(code,nil,true,true)-- Test for a descendant. Earlier tests weed out most candidates, while the more intensive tests are only used sparingly.if(code~=self._codeand-- Not self.lang:inFamily(family)and-- In the same family.(family:getProtoLanguageCode()==self._codeor-- Self is the protolanguage.self:hasDescendant(lang)or-- Full hasDescendant check.(lang:getFullCode()==self._codeandnotself:hasAncestor(lang))-- Etymology-only child which isn't an ancestor.))theniffmt=="object"theninsert(descendants,lang)elseiffmt=="code"theninsert(descendants,code)elseiffmt=="name"theninsert(descendants,lang:getCanonicalName())endendendendreturndescendantsendfunctionLanguage:getDescendants()localdescendants=self._descendantObjectsifdescendants==nilthendescendants=fetch_descendants(self,"object")self._descendantObjects=descendantsendreturndescendantsendfunctionLanguage:getDescendantCodes()localdescendants=self._descendantCodesifdescendants==nilthendescendants=fetch_descendants(self,"code")self._descendantCodes=descendantsendreturndescendantsendfunctionLanguage:getDescendantNames()localdescendants=self._descendantNamesifdescendants==nilthendescendants=fetch_descendants(self,"name")self._descendantNames=descendantsendreturndescendantsenddolocalfunctioncheck_lang(self,lang)iftype(lang)=="string"thenlang=get_by_code(lang,nil,true)endiflang:hasAncestor(self)thenreturntrueendendfunctionLanguage:hasDescendant(...)returncheck_inputs(self,check_lang,false,...)endendlocalfunctionfetch_children(self,fmt)localm_etym_data=require(etymology_languages_data_module)localself_code,children=self._code,{}forcode,langinpairs(m_etym_data)dolocal_lang=langrepeatlocalparent=_lang.parentifparent==self_codetheniffmt=="object"theninsert(children,get_by_code(code,nil,true))elseiffmt=="code"theninsert(children,code)elseiffmt=="name"theninsert(children,lang[1])endbreakend_lang=m_etym_data[parent]untilnot_langendreturnchildrenendfunctionLanguage:getChildren()localchildren=self._childObjectsifchildren==nilthenchildren=fetch_children(self,"object")self._childObjects=childrenendreturnchildrenendfunctionLanguage:getChildrenCodes()localchildren=self._childCodesifchildren==nilthenchildren=fetch_children(self,"code")self._childCodes=childrenendreturnchildrenendfunctionLanguage:getChildrenNames()localchildren=self._childNamesifchildren==nilthenchildren=fetch_children(self,"name")self._childNames=childrenendreturnchildrenendfunctionLanguage:hasChild(...)locallang=...ifnotlangthenreturnfalseelseiftype(lang)=="string"thenlang=get_by_code(lang,nil,true)endiflang:hasParent(self)thenreturntrueendreturnself:hasChild(select(2,...))end--[==[Returns the name of the main category of that language. Example: {{code|lua|"French language"}} for French, whose category is at [[:Category:French language]]. Unless optional argument <code>nocap</code> is given, the language name at the beginning of the returned value will be capitalized. This capitalization is correct for category names, but not if the language name is lowercase and the returned value of this function is used in the middle of a sentence.]==]functionLanguage:getCategoryName(nocap)localname=self._categoryNameifname==nilthenname=self:getCanonicalName()-- If a substrate, omit any leading article.ifself:getFamilyCode()=="qfa-sub"thenname=name:gsub("^the ",""):gsub("^a ","")end-- Only add " language" if a full language.ifself:hasType("full")then-- Unless the canonical name already ends with "language", "lect" or their derivatives, add " language".ifnot(match(name,"[Ll]anguage$")ormatch(name,"[Ll]ect$"))thenname=name.." language"endendself._categoryName=nameendifnocapthenreturnnameendreturnmw.getContentLanguage():ucfirst(name)end--[==[Creates a link to the category; the link text is the canonical name.]==]functionLanguage:makeCategoryLink()returnmake_link(self,":Category:"..self:getCategoryName(),self:getDisplayForm())endfunctionLanguage:getStandardCharacters(sc)localstandard_chars=self._data.standardCharsiftype(standard_chars)~="table"thenreturnstandard_charselseifscandtype(sc)~="string"thencheck_object("script",nil,sc)sc=sc:getCode()endif(notsc)orsc=="None"thenlocalscripts={}for_,scriptinpairs(standard_chars)doinsert(scripts,script)endreturnconcat(scripts)endifstandard_chars[sc]thenreturnstandard_chars[sc]..(standard_chars[1]or"")endend--[==[Make the entry name (i.e. the correct page name).]==]functionLanguage:makeEntryName(text,sc)if(nottext)ortext==""thenreturntext,nil,{}end-- Set `unsupported` as true if certain conditions are met.localunsupported-- Check if there's an unsupported character. \239\191\189 is the replacement character U+FFFD, which can't be typed directly here due to an abuse filter. Unix-style dot-slash notation is also unsupported, as it is used for relative paths in links, as are 3 or more consecutive tildes.-- Note: match is faster with magic characters/charsets; find is faster with plaintext.if(match(text,"[#<>%[%]_{|}]")orfind(text,"\239\191\189")ormatch(text,"%f[^%z/]%.%.?%f[%z/]")orfind(text,"~~~"))thenunsupported=true-- If it looks like an interwiki link.elseiffind(text,":")thenlocalprefix=gsub(text,"^:*(.-):.*",ulower)if(load_data("Module:data/namespaces")[prefix]orload_data("Module:data/interwikis")[prefix])thenunsupported=trueendend-- Check if the text is a listed unsupported title.localunsupportedTitles=load_data(links_data_module).unsupported_titlesifunsupportedTitles[text]thenreturn"Unsupported titles/"..unsupportedTitles[text],nil,{}endsc=checkScript(text,self,sc)localfail,catstext=normalize(text,sc)text,fail,cats=iterateSectionSubstitutions(self,text,sc,nil,nil,self._data.entry_name,"entry_name","makeEntryName")text=umatch(text,"^[¿¡]?(.-[^%s%p].-)%s*[؟?!;՛՜ ՞ ՟?!︖︕।॥။၊་།]?$")ortext-- Escape unsupported characters so they can be used in titles. ` is used as a delimiter for this, so a raw use of it in an unsupported title is also escaped here to prevent interference; this is only done with unsupported titles, though, so inclusion won't in itself mean a title is treated as unsupported (which is why it's excluded from the earlier test).ifunsupportedthenlocalunsupported_characters=load_data(links_data_module).unsupported_characterstext=text:gsub("[#<>%[%]_`{|}\239]\191?\189?",unsupported_characters):gsub("%f[^%z/]%.%.?%f[%z/]",function(m)returngsub(m,"%.","`period`")end):gsub("~~~+",function(m)returngsub(m,"~","`tilde`")end)text="Unsupported titles/"..textelse-- Check if this is a mammoth page. If so, which subpage should we link to?localmammoth_pages=load_data(links_data_module).mammoth_pagesifmammoth_pages[text]thenlocalcanonical_name=self:getCanonicalName()ifcanonical_name~="Translingual"andcanonical_name~="English"thenlocalthis_subpagefor_,subpage_specinipairs(load_data(links_data_module).mammoth_page_subpage_list)do-- unpack() fails utterly on data loaded using mw.loadData() even if offsets are givenlocalsubpage,pattern=subpage_spec[1],subpage_spec[2]ifpattern==trueorumatch(self:getCanonicalName(),pattern)thenthis_subpage=subpagebreakendendifnotthis_subpagethenerror("Internal error: Bad data in mammoth_page_subpage_list, in [[Module:links/data]]; last entry didn't have 'true' in it")endtext=text.."/"..this_subpageendendendreturntext,fail,catsend--[==[Generates alternative forms using a specified method, and returns them as a table. If no method is specified, returns a table containing only the input term.]==]functionLanguage:generateForms(text,sc)localgenerate_forms=self._data.generate_formsifgenerate_forms==nilthenreturn{text}endsc=checkScript(text,self,sc)returnrequire("Module:"..self._data.generate_forms).generateForms(text,self,sc)end--[==[Creates a sort key for the given entry name, following the rules appropriate for the language. This removes diacritical marks from the entry name if they are not considered significant for sorting, and may perform some other changes. Any initial hyphen is also removed, and anything parentheses is removed as well.The <code>sort_key</code> setting for each language in the data modules defines the replacements made by this function, or it gives the name of the module that takes the entry name and returns a sortkey.]==]functionLanguage:makeSortKey(text,sc)if(nottext)ortext==""thenreturntext,nil,{}endifmatch(text,"<[^<>]+>")thentrack("track HTML tag")end-- Remove directional characters, bold, italics, soft hyphens, strip markers and HTML tags.-- FIXME: Partly duplicated with remove_formatting() in [[Module:links]].text=ugsub(text,"[\194\173\226\128\170-\226\128\174\226\129\166-\226\129\169]","")text=text:gsub("('*)'''(.-'*)'''","%1%2"):gsub("('*)''(.-'*)''","%1%2")text=gsub(unstrip(text),"<[^<>]+>","")text=decode_uri(text,"PATH")text=checkNoEntities(self,text)-- Remove initial hyphens and * unless the term only consists of spacing + punctuation characters.text=ugsub(text,"^([􀀀-􏿽]*)[-־ـ᠊*]+([􀀀-􏿽]*)(.*[^%s%p].*)","%1%2%3")sc=checkScript(text,self,sc)text=normalize(text,sc)text=removeCarets(text,sc)-- For languages with dotted dotless i, ensure that "İ" is sorted as "i", and "I" is sorted as "ı".ifself:hasDottedDotlessI()thentext=gsub(text,"I\204\135","i")-- decomposed "İ":gsub("I","ı")text=sc:toFixedNFD(text)end-- Convert to lowercase, make the sortkey, then convert to uppercase. Where the language has dotted dotless i, it is usually not necessary to convert "i" to "İ" and "ı" to "I" first, because "I" will always be interpreted as conventional "I" (not dotless "İ") by any sorting algorithms, which will have been taken into account by the sortkey substitutions themselves. However, if no sortkey substitutions have been specified, then conversion is necessary so as to prevent "i" and "ı" both being sorted as "I".-- An exception is made for scripts that (sometimes) sort by scraping page content, as that means they are sensitive to changes in capitalization (as it changes the target page).localfail,catsifnotsc:sortByScraping()thentext=ulower(text)endlocalactual_substitution_data-- Don't trim whitespace here because it's significant at the beginning of a sort key or sort base.text,fail,cats,_,actual_substitution_data=iterateSectionSubstitutions(self,text,sc,nil,nil,self._data.sort_key,"sort_key","makeSortKey","notrim")ifnotsc:sortByScraping()thenifself:hasDottedDotlessI()andnotactual_substitution_datathentext=gsub(gsub(text,"ı","I"),"i","İ")text=sc:toFixedNFC(text)endtext=uupper(text)end-- Remove parentheses, as long as they are either preceded or followed by something.text=gsub(text,"(.)[()]+","%1"):gsub("[()]+(.)","%1")text=escape_risky_characters(text)returntext,fail,catsend--[==[Create the form used as as a basis for display text and transliteration.]==]localfunctionprocessDisplayText(text,self,sc,keepCarets,keepPrefixes)localsubbedChars={}text,subbedChars=doTempSubstitutions(text,subbedChars,keepCarets)text=decode_uri(text,"PATH")text=checkNoEntities(self,text)sc=checkScript(text,self,sc)localfail,catstext=normalize(text,sc)text,fail,cats,subbedChars=iterateSectionSubstitutions(self,text,sc,subbedChars,keepCarets,self._data.display_text,"display_text","makeDisplayText")text=removeCarets(text,sc)-- Remove any interwiki link prefixes (unless they have been escaped or this has been disabled).iffind(text,":")andnotkeepPrefixesthenlocalreprepeattext,rep=gsub(text,"\\\\(\\*:)","\3%1")untilrep==0text=gsub(text,"\\:","\4")whiletruedolocalprefix=gsub(text,"^(.-):.+",function(m1)returngsub(m1,"\244[\128-\191]*","")end)-- Check if the prefix is an interwiki, though ignore capitalised Wiktionary:, which is a namespace.ifnotprefixorprefix==textorprefix=="Wiktionary"ornot(load_data("Module:data/interwikis")[ulower(prefix)]orprefix=="")thenbreakendtext=gsub(text,"^(.-):(.*)",function(m1,m2)localret={}forsubbedCharingmatch(m1,"\244[\128-\191]*")doinsert(ret,subbedChar)endreturnconcat(ret)..m2end)endtext=gsub(text,"\3","\\"):gsub("\4",":")endreturntext,fail,cats,subbedCharsend--[==[Make the display text (i.e. what is displayed on the page).]==]functionLanguage:makeDisplayText(text,sc,keepPrefixes)if(nottext)ortext==""thenreturntext,nil,{}endlocalfail,cats,subbedCharstext,fail,cats,subbedChars=processDisplayText(text,self,sc,nil,keepPrefixes)text=escape_risky_characters(text)returnundoTempSubstitutions(text,subbedChars),fail,catsend--[==[Transliterates the text from the given script into the Latin script (see [[Wiktionary:Transliteration and romanization]]). The language must have the <code>translit</code> property for this to work; if it is not present, {{code|lua|nil}} is returned.Returns three values:# The transliteration.# A boolean which indicates whether the transliteration failed for an unexpected reason. If {{code|lua|false}}, then the transliteration either succeeded, or the module is returning nothing in a controlled way (e.g. the input was {{code|lua|"-"}}). Generally, this means that no maintenance action is required. If {{code|lua|true}}, then the transliteration is {{code|lua|nil}} because either the input or output was defective in some way (e.g. [[Module:ar-translit]] will not transliterate non-vocalised inputs, and this module will fail partially-completed transliterations in all languages). Note that this value can be manually set by the transliteration module, so make sure to cross-check to ensure it is accurate.# A table of categories selected by the transliteration module, which should be in the format expected by {{code|lua|format_categories}} in [[Module:utilities]].The <code>sc</code> parameter is handled by the transliteration module, and how it is handled is specific to that module. Some transliteration modules may tolerate {{code|lua|nil}} as the script, others require it to be one of the possible scripts that the module can transliterate, and will show an error if it's not one of them. For this reason, the <code>sc</code> parameter should always be provided when writing non-language-specific code.The <code>module_override</code> parameter is used to override the default module that is used to provide the transliteration. This is useful in cases where you need to demonstrate a particular module in use, but there is no default module yet, or you want to demonstrate an alternative version of a transliteration module before making it official. It should not be used in real modules or templates, only for testing. All uses of this parameter are tracked by [[Wiktionary:Tracking/languages/module_override]].'''Known bugs''':* This function assumes {tr(s1) .. tr(s2) == tr(s1 .. s2)}. When this assertion fails, wikitext markups like <nowiki>'''</nowiki> can cause wrong transliterations.* HTML entities like <code>&amp;apos;</code>, often used to escape wikitext markups, do not work.]==]functionLanguage:transliterate(text,sc,module_override)-- If there is no text, or the language doesn't have transliteration data and there's no override, return nil.ifnot(self._data.translitormodule_override)thenreturnnil,false,{}elseif(nottext)ortext==""ortext=="-"thenreturntext,false,{}end-- If the script is not transliteratable (and no override is given), return nil.sc=checkScript(text,self,sc)ifnot(sc:isTransliterated()ormodule_override)then-- temporary tracking to see if/when this gets triggeredtrack("non-transliterable")track("non-transliterable/"..self._code)track("non-transliterable/"..sc:getCode())track("non-transliterable/"..sc:getCode().."/"..self._code)returnnil,true,{}end-- Remove any strip markers.text=unstrip(text)-- Do not process the formatting into PUA characters for certain languages.localprocessed=load_data(languages_data_module).substitution[self._code]~="none"-- Get the display text with the keepCarets flag set.localfail,cats,subbedCharsifprocessedthentext,fail,cats,subbedChars=processDisplayText(text,self,sc,true)end-- Transliterate (using the module override if applicable).text,fail,cats,subbedChars=iterateSectionSubstitutions(self,text,sc,subbedChars,true,module_overrideorself._data.translit,"translit","tr")ifnottextthenreturnnil,true,catsend-- Incomplete transliterations return nil.localcharset=sc.charactersifcharsetandumatch(text,"["..charset.."]")then-- Remove any characters in Latin, which includes Latin characters also included in other scripts (as these are false positives), as well as any PUA substitutions. Anything remaining should only be script code "None" (e.g. numerals).localcheck_text=ugsub(text,"["..get_script("Latn").characters.."􀀀-􏿽]+","")-- Set none_is_last_resort_only flag, so that any non-None chars will cause a script other than "None" to be returned.iffind_best_script_without_lang(check_text,true):getCode()~="None"thenreturnnil,true,catsendendifprocessedthentext=escape_risky_characters(text)text=undoTempSubstitutions(text,subbedChars)end-- If the script does not use capitalization, then capitalize any letters of the transliteration which are immediately preceded by a caret (and remove the caret).iftextandnotsc:hasCapitalization()andtext:find("^",1,true)thentext=processCarets(text,"%^([\128-\191\244]*%*?)([^\128-\191\244][\128-\191]*)",function(m1,m2)returnm1..uupper(m2)end)end-- Track module overrides.ifmodule_override~=nilthentrack("module_override")endfail=text==niland(notnotfail)orfalsereturntext,fail,catsenddolocalfunctionhandle_language_spec(self,spec,sc)localret=self["_"..spec]ifret==nilthenret=self._data[spec]iftype(ret)=="string"thenret=list_to_set(split(ret,",",true,true))endself["_"..spec]=retendiftype(ret)=="table"thenret=ret[sc:getCode()]endreturnnotnotretendfunctionLanguage:overrideManualTranslit(sc)returnhandle_language_spec(self,"override_translit",sc)endfunctionLanguage:link_tr(sc)returnhandle_language_spec(self,"link_tr",sc)endend--[==[Returns {{code|lua|true}} if the language has a transliteration module, or {{code|lua|false}} if it doesn't.]==]functionLanguage:hasTranslit()returnnotnotself._data.translitend--[==[Returns {{code|lua|true}} if the language uses the letters I/ı and İ/i, or {{code|lua|false}} if it doesn't.]==]functionLanguage:hasDottedDotlessI()returnnotnotself._data.dotted_dotless_iendfunctionLanguage:toJSON(opts)localentry_name,entry_name_patterns,entry_name_remove_diacritics=self._data.entry_nameifentry_namethenifentry_name.fromthenentry_name_patterns={}fori,frominipairs(entry_name.from)doinsert(entry_name_patterns,{from=from,to=entry_name.to[i]or""})endendentry_name_remove_diacritics=entry_name.remove_diacriticsend-- mainCode should only end up non-nil if dontCanonicalizeAliases is passed to make_object().-- props should either contain zero-argument functions to compute the value, or the value itself.localprops={ancestors=function()returnself:getAncestorCodes()end,canonicalName=function()returnself:getCanonicalName()end,categoryName=function()returnself:getCategoryName("nocap")end,code=self._code,mainCode=self._mainCode,parent=function()returnself:getParentCode()end,full=function()returnself:getFullCode()end,entryNamePatterns=entry_name_patterns,entryNameRemoveDiacritics=entry_name_remove_diacritics,family=function()returnself:getFamilyCode()end,aliases=function()returnself:getAliases()end,varieties=function()returnself:getVarieties()end,otherNames=function()returnself:getOtherNames()end,scripts=function()returnself:getScriptCodes()end,type=function()returnkeys_to_list(self:getTypes())end,wikimediaLanguages=function()returnself:getWikimediaLanguageCodes()end,wikidataItem=function()returnself:getWikidataItem()end,wikipediaArticle=function()returnself:getWikipediaArticle(true)end,}localret={}forprop,valinpairs(props)doifnotopts.skip_fieldsornotopts.skip_fields[prop]theniftype(val)=="function"thenret[prop]=val()elseret[prop]=valendendend-- Use `deep_copy` when returning a table, so that there are no editing restrictions imposed by `mw.loadData`.returnoptsandopts.lua_tableanddeep_copy(ret)orto_json(ret,opts)endfunctionexport.getDataModuleName(code)localletter=match(code,"^(%l)%l%l?$")return"Module:"..(letter==niland"languages/data/exceptional"or#code==2and"languages/data/2"or"languages/data/3/"..letter)endget_data_module_name=export.getDataModuleNamefunctionexport.getExtraDataModuleName(code)returnget_data_module_name(code).."/extra"endget_extra_data_module_name=export.getExtraDataModuleNamedolocalfunctionmake_stack(data)localkey_types={[2]="unique",aliases="unique",otherNames="unique",type="append",varieties="unique",wikipedia_article="unique",wikimedia_codes="unique"}localfunction__index(self,k)localstack,key_type=getmetatable(self),key_types[k]-- Data that isn't inherited from the parent.ifkey_type=="unique"thenlocalv=stack[stack[make_stack]][k]ifv==nilthenlocallayer=stack[0]iflayerthen-- Could be false if there's no extra data.v=layer[k]endendreturnv-- Data that is appended by each generation.elseifkey_type=="append"thenlocalparts,offset,n={},0,stack[make_stack]fori=1,ndolocalpart=stack[i][k]ifpart==nilthenoffset=offset+1elseparts[i-offset]=partendendreturnoffset~=nandconcat(parts,",")ornilendlocaln=stack[make_stack]whiletruedolocallayer=stack[n]ifnotlayerthen-- Could be false if there's no extra data.returnnilendlocalv=layer[k]ifv~=nilthenreturnvendn=n-1endendlocalfunction__newindex()error("table is read-only")endlocalfunction__pairs(self)-- Iterate down the stack, caching keys to avoid duplicate returns.localstack,seen=getmetatable(self),{}localn=stack[make_stack]localiter,state,k,v=pairs(stack[n])returnfunction()repeatrepeatk=iter(state,k)ifk==nilthenn=n-1locallayer=stack[n]ifnotlayerthen-- Could be false if there's no extra data.returnnilenditer,state,k=pairs(layer)enduntilnot(k==nilorseen[k])-- Get the value via a lookup, as the one returned by the-- iterator will be the raw value from the current layer,-- which may not be the one __index will return for that-- key. Also memoize the key in `seen` (even if the lookup-- returns nil) so that it doesn't get looked up again.-- TODO: store values in `self`, avoiding the need to create-- the `seen` table. The iterator will need to iterate over-- `self` with `next` first to find these on future loops.v,seen[k]=self[k],trueuntilv~=nilreturnk,vendendlocal__ipairs=require(table_module).indexIpairsfunctionmake_stack(data)localstack={data,[make_stack]=1,-- stores the length and acts as a sentinel to confirm a given metatable is a stack.__index=__index,__newindex=__newindex,__pairs=__pairs,__ipairs=__ipairs,}stack.__metatable=stackreturnsetmetatable({},stack),stackendreturnmake_stack(data)endlocalfunctionget_stack(data)localstack=getmetatable(data)returnstackandtype(stack)=="table"andstack[make_stack]andstackornilend--[==[<span style="color: var(--wikt-palette-red,#BA0000)">This function is not for use in entries or other content pages.</span>Returns a blob of data about the language. The format of this blob is undocumented, and perhaps unstable; it's intended for things like the module's own unit-tests, which are "close friends" with the module and will be kept up-to-date as the format changes. If `extra` is set, any extra data in the relevant `/extra` module will be included. (Note that it will be included anyway if it has already been loaded into the language object.) If `raw` is set, then the returned data will not contain any data inherited from parent objects.-- Do NOT use these methods!-- All uses should be pre-approved on the talk page!]==]functionLanguage:getData(extra,raw)ifextrathenself:loadInExtraData()endlocaldata=self._data-- If raw is not set, just return the data.ifnotrawthenreturndataendlocalstack=get_stack(data)-- If there isn't a stack or its length is 1, return the data. Extra data (if any) will be included, as it's stored at key 0 and doesn't affect the reported length.ifstack==nilthenreturndataendlocaln=stack[make_stack]ifn==1thenreturndataendlocalextra=stack[0]-- If there isn't any extra data, return the top layer of the stack.ifextra==nilthenreturnstack[n]end-- If there is, return a new stack which has the top layer at key 1 and the extra data at key 0.data,stack=make_stack(stack[n])stack[0]=extrareturndataendfunctionLanguage:loadInExtraData()-- Only full languages have extra data.ifnotself:hasType("language","full")thenreturnendlocaldata=self._data-- If there's no stack, create one.localstack=get_stack(self._data)ifstack==nilthendata,stack=make_stack(data)-- If already loaded, return.elseifstack[0]~=nilthenreturnendself._data=data-- Load extra data from the relevant module and add it to the stack at key 0, so that the __index and __pairs metamethods will pick it up, since they iterate down the stack until they run out of layers.localcode=self._codelocalmodulename=get_extra_data_module_name(code)-- No data cached as false.stack[0]=modulenameandload_data(modulename)[code]orfalseend--[==[Returns the name of the module containing the language's data. Currently, this is always [[Module:scripts/data]].]==]functionLanguage:getDataModuleName()localname=self._dataModuleNameifname==nilthenname=self:hasType("etymology-only")andetymology_languages_data_moduleorget_data_module_name(self._mainCodeorself._code)self._dataModuleName=nameendreturnnameend--[==[Returns the name of the module containing the language's data. Currently, this is always [[Module:scripts/data]].]==]functionLanguage:getExtraDataModuleName()localname=self._extraDataModuleNameifname==nilthenname=notself:hasType("etymology-only")andget_extra_data_module_name(self._mainCodeorself._code)orfalseself._extraDataModuleName=nameendreturnnameornilendfunctionexport.makeObject(code,data,dontCanonicalizeAliases)localdata_type=type(data)ifdata_type~="table"thenerror(("bad argument #2 to 'makeObject' (table expected, got %s)"):format(data_type))end-- Convert any aliases.localinput_code=codecode=normalize_code(code)input_code=dontCanonicalizeAliasesandinput_codeorcodelocalparentifdata.parentthenparent=get_by_code(data.parent,nil,true,true)elseparent=Languageendparent.__index=parentlocallang={_code=input_code}-- This can only happen if dontCanonicalizeAliases is passed to make_object().ifcode~=input_codethenlang._mainCode=codeendlocalparent_data=parent._dataifparent_data==nilthen-- Full code is the same as the code.lang._fullCode=parent._codeorcodeelse-- Copy full code.lang._fullCode=parent._fullCodelocalstack=get_stack(parent_data)ifstack==nilthenparent_data,stack=make_stack(parent_data)end-- Insert the input data as the new top layer of the stack.localn=stack[make_stack]+1data,stack[n],stack[make_stack]=parent_data,data,nendlang._data=datareturnsetmetatable(lang,parent)endmake_object=export.makeObjectend--[==[Finds the language whose code matches the one provided. If it exists, it returns a <code class="nf">Language</code> object representing the language. Otherwise, it returns {{code|lua|nil}}, unless <code class="n">paramForError</code> is given, in which case an error is generated. If <code class="n">paramForError</code> is {{code|lua|true}}, a generic error message mentioning the bad code is generated; otherwise <code class="n">paramForError</code> should be a string or number specifying the parameter that the code came from, and this parameter will be mentioned in the error message along with the bad code. If <code class="n">allowEtymLang</code> is specified, etymology-only language codes are allowed and looked up along with normal language codes. If <code class="n">allowFamily</code> is specified, language family codes are allowed and looked up along with normal language codes.]==]functionexport.getByCode(code,paramForError,allowEtymLang,allowFamily)-- Track uses of paramForError, ultimately so it can be removed, as error-handling should be done by [[Module:parameters]], not here.ifparamForError~=nilthentrack("paramForError")endiftype(code)~="string"thenlocaltypifnotcodethentyp="nil"elseifcheck_object("language",true,code)thentyp="a language object"elseifcheck_object("family",true,code)thentyp="a family object"elsetyp="a "..type(code)enderror("The function getByCode expects a string as its first argument, but received "..typ..".")endlocalm_data=load_data(languages_data_module)ifm_data.aliases[code]orm_data.track[code]thentrack(code)endlocalnorm_code=normalize_code(code)-- Get the data, checking for etymology-only languages if allowEtymLang is set.localdata=load_data(get_data_module_name(norm_code))[norm_code]orallowEtymLangandload_data(etymology_languages_data_module)[norm_code]-- If no data was found and allowFamily is set, check the family data. If the main family data was found, make the object with [[Module:families]] instead, as family objects have different methods. However, if it's an etymology-only family, use make_object in this module (which handles object inheritance), and the family-specific methods will be inherited from the parent object.ifdata==nilandallowFamilythendata=load_data("Module:families/data")[norm_code]ifdata~=nilthenifdata.parent==nilthenreturnmake_family_object(norm_code,data)elseifnotallowEtymLangthendata=nilendendendlocalretval=codeanddataandmake_object(code,data)ifnotretvalandparamForErrorthenrequire("Module:languages/errorGetBy").code(code,paramForError,allowEtymLang,allowFamily)endreturnretvalendget_by_code=export.getByCode--[==[Finds the language whose canonical name (the name used to represent that language on Wiktionary) or other name matches the one provided. If it exists, it returns a <code class="nf">Language</code> object representing the language. Otherwise, it returns {{code|lua|nil}}, unless <code class="n">paramForError</code> is given, in which case an error is generated. If <code class="n">allowEtymLang</code> is specified, etymology-only language codes are allowed and looked up along with normal language codes. If <code class="n">allowFamily</code> is specified, language family codes are allowed and looked up along with normal language codes.The canonical name of languages should always be unique (it is an error for two languages on Wiktionary to share the same canonical name), so this is guaranteed to give at most one result.This function is powered by [[Module:languages/canonical names]], which contains a pre-generated mapping of full-language canonical names to codes. It is generated by going through the [[:Category:Language data modules]] for full languages. When <code class="n">allowEtymLang</code> is specified for the above function, [[Module:etymology languages/canonical names]] may also be used, and when <code class="n">allowFamily</code> is specified for the above function, [[Module:families/canonical names]] may also be used.]==]functionexport.getByCanonicalName(name,errorIfInvalid,allowEtymLang,allowFamily)localbyName=load_data("Module:languages/canonical names")localcode=byNameandbyName[name]ifnotcodeandallowEtymLangthenbyName=load_data("Module:etymology languages/canonical names")code=byNameandbyName[name]orbyName[gsub(name," [Ss]ubstrate$","")]orbyName[gsub(name,"^a ","")]orbyName[gsub(name,"^a ",""):gsub(" [Ss]ubstrate$","")]or-- For etymology families like "ira-pro".-- FIXME: This is not ideal, as it allows " languages" to be appended to any etymology-only language, too.byName[match(name,"^(.*) languages$")]endifnotcodeandallowFamilythenbyName=load_data("Module:families/canonical names")code=byName[name]orbyName[match(name,"^(.*) languages$")]endlocalretval=codeandget_by_code(code,errorIfInvalid,allowEtymLang,allowFamily)ifnotretvalanderrorIfInvalidthenrequire("Module:languages/errorGetBy").canonicalName(name,allowEtymLang,allowFamily)endreturnretvalend--[==[Used by [[Module:languages/data/2]] (et al.) and [[Module:etymology languages/data]], [[Module:families/data]], [[Module:scripts/data]] and [[Module:writing systems/data]] to finalize the data into the format that is actually returned.]==]functionexport.finalizeData(data,main_type,variety)localfields={"type"}ifmain_type=="language"theninsert(fields,4)-- script codesinsert(fields,"ancestors")insert(fields,"link_tr")insert(fields,"override_translit")insert(fields,"wikimedia_codes")elseifmain_type=="script"theninsert(fields,3)-- writing system codesend-- Families and writing systems have no extra fields to process.localfields_len=#fieldsfor_,entityinnext,datadoifvarietythen-- Move parent from 3 to "parent" and family from "family" to 3. These are different for the sake of convenience, since very few varieties have the family specified, whereas all of them have a parent.entity.parent,entity[3],entity.family=entity[3],entity.family-- Give the type "regular" iff not a variety and no other types are assigned.elseifnot(entity.typeorentity.parent)thenentity.type="regular"endfori=1,fields_lendolocalkey=fields[i]localfield=entity[key]iffieldandtype(field)=="string"thenentity[key]=gsub(field,"%s*,%s*",",")endendendreturndataend--[==[For backwards compatibility only; modules should require the error themselves.]==]functionexport.err(lang_code,param,code_desc,template_tag,not_real_lang)returnrequire("Module:languages/error")(lang_code,param,code_desc,template_tag,not_real_lang)endreturnexport
Retrieved from "https://en.wiktionary.org/w/index.php?title=Module:languages&oldid=88609244"
Categories:

[8]ページ先頭

©2009-2025 Movatter.jp