1.Introduction¶
This reference manual describes the Python programming language. It is notintended as a tutorial.
While I am trying to be as precise as possible, I chose to use English ratherthan formal specifications for everything except syntax and lexical analysis.This should make the document more understandable to the average reader, butwill leave room for ambiguities. Consequently, if you were coming from Mars andtried to re-implement Python from this document alone, you might have to guessthings and in fact you would probably end up implementing quite a differentlanguage. On the other hand, if you are using Python and wonder what the preciserules about a particular area of the language are, you should definitely be ableto find them here. If you would like to see a more formal definition of thelanguage, maybe you could volunteer your time — or invent a cloning machine:-).
It is dangerous to add too many implementation details to a language referencedocument — the implementation may change, and other implementations of thesame language may work differently. On the other hand, CPython is the onePython implementation in widespread use (although alternate implementationscontinue to gain support), and its particular quirks are sometimes worth beingmentioned, especially where the implementation imposes additional limitations.Therefore, you’ll find short “implementation notes” sprinkled throughout thetext.
Every Python implementation comes with a number of built-in and standardmodules. These are documented inThe Python Standard Library. A few built-in modulesare mentioned when they interact in a significant way with the languagedefinition.
1.1.Alternate Implementations¶
Though there is one Python implementation which is by far the most popular,there are some alternate implementations which are of particular interest todifferent audiences.
Known implementations include:
- CPython
This is the original and most-maintained implementation of Python, written in C.New language features generally appear here first.
- Jython
Python implemented in Java. This implementation can be used as a scriptinglanguage for Java applications, or can be used to create applications using theJava class libraries. It is also often used to create tests for Java libraries.More information can be found atthe Jython website.
- Python for .NET
This implementation actually uses the CPython implementation, but is a managed.NET application and makes .NET libraries available. It was created by BrianLloyd. For more information, see thePython for .NET home page.
- IronPython
An alternate Python for .NET. Unlike Python.NET, this is a complete Pythonimplementation that generates IL, and compiles Python code directly to .NETassemblies. It was created by Jim Hugunin, the original creator of Jython. Formore information, seethe IronPython website.
- PyPy
An implementation of Python written completely in Python. It supports severaladvanced features not found in other implementations like stackless supportand a Just in Time compiler. One of the goals of the project is to encourageexperimentation with the language itself by making it easier to modify theinterpreter (since it is written in Python). Additional information isavailable onthe PyPy project’s home page.
Each of these implementations varies in some way from the language as documentedin this manual, or introduces specific information beyond what’s covered in thestandard Python documentation. Please refer to the implementation-specificdocumentation to determine what else you need to know about the specificimplementation you’re using.
1.2.Notation¶
The descriptions of lexical analysis and syntax use a grammar notation thatis a mixture ofEBNFandPEG.For example:
name:letter
(letter
|digit
|"_")*letter:"a"..."z" |"A"..."Z"digit:"0"..."9"
In this example, the first line says that aname
is aletter
followedby a sequence of zero or moreletter
s,digit
s, and underscores.Aletter
in turn is any of the single characters'a'
through'z'
andA
throughZ
; adigit
is a single character from0
to9
.
Each rule begins with a name (which identifies the rule that’s being defined)followed by a colon,:
.The definition to the right of the colon uses the following syntax elements:
name
: A name refers to another rule.Where possible, it is a link to the rule’s definition.TOKEN
: An uppercase name refers to atoken.For the purposes of grammar definitions, tokens are the same as rules.
"text"
,'text'
: Text in single or double quotes must match literally(without the quotes). The type of quote is chosen according to the meaningoftext
:'if'
: A name in single quotes denotes akeyword."case"
: A name in double quotes denotes asoft-keyword.'@'
: A non-letter symbol in single quotes denotes anOP
token, that is, adelimiter oroperator.
e1e2
: Items separated only by whitespace denote a sequence.Here,e1
must be followed bye2
.e1|e2
: A vertical bar is used to separate alternatives.It denotes PEG’s “ordered choice”: ife1
matches,e2
isnot considered.In traditional PEG grammars, this is written as a slash,/
, rather thana vertical bar.SeePEP 617 for more background and details.e*
: A star means zero or more repetitions of the preceding item.e+
: Likewise, a plus means one or more repetitions.[e]
: A phrase enclosed in square brackets means zero orone occurrences. In other words, the enclosed phrase is optional.e?
: A question mark has exactly the same meaning as square brackets:the preceding item is optional.(e)
: Parentheses are used for grouping."a"..."z"
: Two literal characters separated by three dots mean a choiceof any single character in the given (inclusive) range of ASCII characters.This notation is only used inlexical definitions.<...>
: A phrase between angular brackets gives an informal descriptionof the matched symbol (for example,<anyASCIIcharacterexcept"\">
),or an abbreviation that is defined in nearby text (for example,<Lu>
).This notation is only used inlexical definitions.
The unary operators (*
,+
,?
) bind as tightly as possible;the vertical bar (|
) binds most loosely.
White space is only meaningful to separate tokens.
Rules are normally contained on a single line, but rules that are too longmay be wrapped:
literal: stringliteral | bytesliteral | integer | floatnumber | imagnumber
Alternatively, rules may be formatted with the first line ending at the colon,and each alternative beginning with a vertical bar on a new line.For example:
literal: | stringliteral | bytesliteral | integer | floatnumber | imagnumber
This doesnot mean that there is an empty first alternative.
1.2.1.Lexical and Syntactic definitions¶
There is some difference betweenlexical andsyntactic analysis:thelexical analyzer operates on the individual characters of theinput source, while theparser (syntactic analyzer) operates on the streamoftokens generated by the lexical analysis.However, in some cases the exact boundary between the two phases is aCPython implementation detail.
The practical difference between the two is that inlexical definitions,all whitespace is significant.The lexical analyzerdiscards all whitespace that is notconverted to tokens liketoken.INDENT
orNEWLINE
.Syntactic definitions then use these tokens, rather than source characters.
This documentation uses the same BNF grammar for both styles of definitions.All uses of BNF in the next chapter (Lexical analysis) are lexical definitions;uses in subsequent chapters are syntactic definitions.