This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Identifier" computer languages – news ·newspapers ·books ·scholar ·JSTOR(September 2019) (Learn how and when to remove this message) |
This articleis missing information about Unicode character and UAX31 recommendation. Please expand the article to include this information. Further details may exist on thetalk page.(March 2021) |
In computerprogramming languages, anidentifier is alexical token (also called asymbol, but not to be confused with thesymbol primitive data type) that names the language's entities. Some of the kinds of entities an identifier might denote includevariables,data types,labels,subroutines, andmodules.
Which character sequences constitute identifiers depends on thelexical grammar of the language. A common rule isalphanumeric sequences, with underscore also allowed (in some languages, _ is not allowed), and with the condition that it can not begin with a numerical digit (to simplifylexing by avoiding confusing withinteger literals) – sofoo, foo1, foo_bar, _foo are allowed, but1foo is not – this is the definition used in earlier versions ofC andC++,Python, and many other languages. Later versions of these languages, along with many other modern languages, support many moreUnicode characters in an identifier. However, a common restriction is not to permit whitespace characters and language operators; this simplifies tokenization by making itfree-form andcontext-free. For example, forbidding+ in identifiers due to its use as a binary operation means thata+b anda + b can be tokenized the same, while if it were allowed,a+b would be an identifier, not an addition. Whitespace in an identifier is particularly problematic, because if spaces are allowed in identifiers, then a clause such asif rainy day then 1 is legal, withrainy day as an identifier, and tokenizing this requires the phrasal context of being in the condition of an if clause. Some languages do allow spaces in identifiers, however, such asALGOL 68 and some ALGOL variants – for example, the following is a valid statement:real half pi; which could be entered as.real. half pi; (keywords are represented in boldface, concretely viastropping). In ALGOL this was possible because keywords are syntactically differentiated, so there is no risk of collision or ambiguity, spaces are eliminated during theline reconstruction phase, and the source was processed viascannerless parsing, so lexing could be context-sensitive.
In most languages, some character sequences have the lexical form of an identifier but are known askeywords – for example,if is frequently a keyword for an if clause, but lexically is of the same form asig orfoo namely a sequence of letters. This overlap can be handled in various ways: these may be forbidden from being identifiers – which simplifies tokenization and parsing – in which case they arereserved words; they may both be allowed but distinguished in other ways, such as via stropping; or keyword sequences may be allowed as identifiers and which sense is determined from context, which requires a context-sensitive lexer. Non-keywords may also be reserved words (forbidden as identifiers), particularly forforward compatibility, in case a word may become a keyword in future. In a few languages, e.g.,PL/1, the distinction is not clear.
The scope, or accessibility within a program of an identifier can be either local or global. A global identifier is declared outside of functions and is available throughout the program. A local identifier is declared within a specific function and only available within that function.[1]
For implementations of programming languages that are using acompiler, identifiers are often onlycompile time entities. That is, atruntime the compiled program contains references to memory addresses and offsets rather than the textual identifier tokens (these memory addresses, or offsets, having been assigned by the compiler to each identifier).
In languages that supportreflection, such as interactive evaluation of source code (using an interpreter or an incremental compiler), identifiers are also runtime entities, sometimes even asfirst-class objects that can be freely manipulated and evaluated. InLisp, these are calledsymbols.
Compilers and interpreters do not usually assign any semantic meaning to an identifier based on the actual character sequence used. However, there are exceptions. For example:
In some languages, such as Go, identifiers' uniqueness is based on their spelling and their visibility.[2]
InHTML an identifier is one of the possibleattributes of anHTML element. It is unique within the document.