Table of Contents
The purpose of this tutorial is to get an experienced Python programmer up to speed with the basics of theC language and how it’s used in theCPython source code. It assumes you already have an intermediate understanding of Python syntax.
That said, C is a fairly limited language, and most of its usage in CPython falls under a small set of syntax rules. Getting to the point where you understand the code is a much smaller step than being able to write C effectively. This tutorial is aimed at the first goal but not the second.
In this tutorial, you’ll learn:
One of the first things that stands out as a big difference between Python and C is the C preprocessor. You’ll look at that first.
Note: This tutorial is adapted from the appendix, “Introduction to C for Python Programmers,” inCPython Internals: Your Guide to the Python Interpreter.
Free Download:Get a sample chapter from CPython Internals: Your Guide to the Python 3 Interpreter showing you how to unlock the inner workings of the Python language, compile the Python interpreter from source code, and participate in the development of CPython.
The preprocessor, as the name suggests, is run on your source files before the compiler runs. It has very limited abilities, but you can use them to great advantage in building C programs.
The preprocessor produces a new file, which is what the compiler will actually process. All the commands to the preprocessor start at the beginning of a line, with a#
symbol as the first non-whitespace character.
The main purpose of the preprocessor is to do text substitution in the source file, but it will also do some basic conditional code with#if
or similar statements.
You’ll start with the most frequent preprocessor directive:#include
.
#include
#include
is used to pull the contents of one file into the current source file. There’s nothing sophisticated about#include
. It reads a file from the file system, runs the preprocessor on that file, and puts the results into the output file. This is donerecursively for each#include
directive.
For example, if you look at CPython’sModules/_multiprocessing/semaphore.c
file, then near the top you’ll see the following line:
#include"multiprocessing.h"
This tells the preprocessor to pull in the entire contents ofmultiprocessing.h
and put them into the output file at this position.
You’ll notice two different forms for the#include
statement. One of them uses quotes (""
) to specify the name of the include file, and the other uses angle brackets (<>
). The difference comes from which paths are searched when looking for the file on the file system.
If you use<>
for the filename, then the preprocessor will look only at system include files. Using quotes around the filename instead will force the preprocessor to look in the local directory first and then fall back to the system directories.
#define
#define
allows you to do simple text substitution and also plays into the#if
directives you’ll see below.
At its most basic,#define
lets you define a new symbol that gets replaced with a text string in the preprocessor output.
Continuing insemphore.c
, you’ll find this line:
#define SEM_FAILED NULL
This tells the preprocessor to replace every instance ofSEM_FAILED
below this point with the literal stringNULL
before the code is sent to the compiler.
#define
items can also take parameters as in this Windows-specific version ofSEM_CREATE
:
#define SEM_CREATE(name, val, max) CreateSemaphore(NULL, val, max, NULL)
In this case, the preprocessor will expectSEM_CREATE()
to look like a function call and have three parameters. This is generally referred to as amacro. It will directly replace the text of the three parameters into the output code.
For example, on line 460 ofsemphore.c
, theSEM_CREATE
macro is used like this:
handle=SEM_CREATE(name,value,max);
When you’re compiling for Windows, this macro will be expanded so that line looks like this:
handle=CreateSemaphore(NULL,value,max,NULL);
In a later section, you’ll see how this macro is defined differently on Windows and other operating systems.
#undef
This directive erases any previous preprocessor definition from#define
. This makes it possible to have a#define
in effect for only part of a file.
#if
The preprocessor also allows conditional statements, allowing you to either include or exclude sections of text based on certain conditions. Conditional statements are closed with the#endif
directive and can also make use of#elif
and#else
for fine-tuned adjustments.
There are three basic forms of#if
that you’ll see in the CPython source:
#ifdef <macro>
includes the subsequent block of text if the specified macro is defined. You may also see it written as#if defined(<macro>)
.#ifndef <macro>
includes the subsequent block of text if the specified macro isnot defined.#if <macro>
includes the subsequent block of text if the macro is definedand it evaluates toTrue
.Note the use of “text” instead of “code” to describe what’s included or excluded from the file. The preprocessor knows nothing of C syntax and doesn’t care what the specified text is.
#pragma
Pragmas are instructions or hints to the compiler. In general, you can ignore these while reading the code as they usually deal with how the code is compiled, not how the code runs.
#error
Finally,#error
displays a message and causes the preprocessor to stop executing. Again, you can safely ignore these for reading the CPython source code.
This section won’t coverall aspects of C, nor is it intended to teach you how to write C. It will focus on aspects of C that are different or confusing for Python developers the first time they see them.
Unlike in Python, whitespace isn’t important to the C compiler. The compiler doesn’t care if you split statements across lines or jam your entire program into a single, very long line. This is because it uses delimiters for all statements and blocks.
There are, of course, very specific rules for the parser, but in general you’ll be able to understand the CPython source just knowing that each statement ends with a semicolon (;
), and all blocks of code are surrounded by curly braces ({}
).
The exception to this rule is that if a block has only a single statement, then the curly braces can be omitted.
All variables in C must bedeclared, meaning there needs to be a single statement indicating thetype of that variable. Note that, unlike Python, the data type that a single variable can hold can’t change.
Here are a few examples:
/* Comments are included between slash-asterisk and asterisk-slash *//* This style of comment can span several lines - so this part is still a comment. */// Comments can also come after two slashes// This type of comment only goes until the end of the line, so new// lines must start with double slashes (//).intx=0;// Declares x to be of type 'int' and initializes it to 0if(x==0){// This is a block of codeinty=1;// y is only a valid variable name until the closing }// More statements hereprintf("x is %d y is %d\n",x,y);}// Single-line blocks do not require curly bracketsif(x==13)printf("x is 13!\n");printf("past the if block\n");
In general, you’ll see that the CPython code is very cleanly formatted and typically sticks to a single style within a given module.
if
StatementsIn C,if
works generally like it does in Python. If the condition is true, then the following block is executed. Theelse
andelse if
syntax should be familiar enough to Python programmers. Note that Cif
statements don’t need anendif
because blocks are delimited by{}
.
There’s a shorthand in C for shortif
…else
statements called theternary operator:
condition?true_result:false_result
You can find it insemaphore.c
where, for Windows, it defines a macro forSEM_CLOSE()
:
#define SEM_CLOSE(sem) (CloseHandle(sem) ? 0 : -1)
The return value of this macro will be0
if the functionCloseHandle()
returnstrue
and-1
otherwise.
Note: Boolean variable types are supported and used in parts of the CPython source, but they aren’t part of the original language. C interprets binary conditions using a simple rule:0
orNULL
is false, and everything else is true.
switch
StatementsUnlike Python, C also supportsswitch
. Usingswitch
can be viewed as a shortcut for extendedif
…elseif
chains. This example is fromsemaphore.c
:
switch(WaitForSingleObjectEx(handle,0,FALSE)){caseWAIT_OBJECT_0:if(!ReleaseSemaphore(handle,1,&previous))returnMP_STANDARD_ERROR;*value=previous+1;return0;caseWAIT_TIMEOUT:*value=0;return0;default:returnMP_STANDARD_ERROR;}
This performs a switch on the return value fromWaitForSingleObjectEx()
. If the value isWAIT_OBJECT_0
, then the first block is executed. TheWAIT_TIMEOUT
value results in the second block, and anything else matches thedefault
block.
Note that the value being tested, in this case the return value fromWaitForSingleObjectEx()
, must be an integral value or an enumerated type, and eachcase
must be a constant value.
There are three looping structures in C:
for
loopswhile
loopsdo
…while
loopsfor
loops have syntax that’s quite different from Python:
for(<initialization>;<condition>;<increment>){<codetobeloopedover>}
In addition to the code to be executed in the loop, there are three blocks of code that control thefor
loop:
The<initialization>
section runs exactly once when the loop is started. It’s typically used to set a loop counter to an initial value (and possibly to declare the loop counter).
The<increment>
code runs immediately after each pass through the main block of the loop. Traditionally, this will increment the loop counter.
Finally, the<condition>
runs after the<increment>
. The return value of this code will be evaluated and the loop breaks when this condition returns false.
Here’s an example fromModules/sha512module.c
:
for(i=0;i<8;++i){S[i]=sha_info->digest[i];}
This loop will run8
times, withi
incrementing from0
to7
, and will terminate when the condition is checked andi
is8
.
while
loops are virtually identical to theirPython counterparts. Thedo
…while
syntax is a little different, however. The condition on ado
…while
loop isn’t checked untilafter the body of the loop is executed for the first time.
There are many instances offor
loops andwhile
loops in the CPython code base, butdo
…while
is unused.
The syntax for functions in C is similar tothat in Python, with the addition that the return type and parameter types must be specified. The C syntax looks like this:
<return_type>function_name(<parameters>){<function_body>}
The return type can be any valid type in C, including built-in types likeint
anddouble
as well as custom types likePyObject
, as in this example fromsemaphore.c
:
staticPyObject*semlock_release(SemLockObject*self,PyObject*args){<statementsoffunctionbodyhere>}
Here you see a couple of C-specific features in play. First, remember that whitespace doesn’t matter. Much of the CPython source code puts the return type of a function on the line above the rest of the function declaration. That’s thePyObject *
part. You’ll take a closer look at the use of*
a little later, but for now it’s important to know that there are several modifiers that you can place on functions and variables.
static
is one of these modifiers. There are some complex rules governing how modifiers operate. For instance, thestatic
modifier here means something very different than if you placed it in front of a variable declaration.
Fortunately, you can generally ignore these modifiers while trying to read and understand the CPython source code.
The parameter list for functions is a comma-separated list of variables, similar to what you use in Python. Again, C requires specific types for each parameter, soSemLockObject *self
says that the first parameter is a pointer to aSemLockObject
and is calledself
. Note that all parameters in C are positional.
Let’s look at what the “pointer” part of that statement means.
To give some context, the parameters that are passed to C functions are allpassed by value, meaning the function operates on a copy of the value and not on the original value in the calling function. To work around this, functions will frequently pass in the address of some data that the function can modify.
These addresses are calledpointers and have types, soint *
is a pointer to an integer value and is of a different type thandouble *
, which is a pointer to a double-precision floating-point number.
As mentioned above, pointers are variables that hold the address of a value. These are used frequently in C, as seen in this example:
staticPyObject*semlock_release(SemLockObject*self,PyObject*args){<statementsoffunctionbodyhere>}
Here, theself
parameter will hold the address of, ora pointer to, aSemLockObject
value. Also note that the function will return a pointer to aPyObject
value.
Note: For an in-depth look at how to simulate pointers in Python, check outPointers in Python: What’s the Point?
There’s a special value in C calledNULL
that indicates a pointer doesn’t point to anything. You’ll see pointers assigned toNULL
and checked againstNULL
throughout the CPython source. This is important since there are very few limitations as to what values a pointer can have, and accessing a memory location that isn’t part of your program can cause very strange behavior.
On the other hand, if you try to access the memory atNULL
, then your program will exit immediately. This may not seem better, but it’s generally easier to figure out a memory bug ifNULL
is accessed than if a random memory address is modified.
C doesn’t have a string type. There’s a convention around which many standard library functions are written, but there’s no actual type. Rather, strings in C are stored as arrays ofchar
(for ASCII) orwchar
(for Unicode) values, each of which holds a single character. Strings are marked with anull terminator, which has a value0
and is usually shown in code as\\0
.
Basic string operations likestrlen()
rely on this null terminator to mark the end of the string.
Because strings are just arrays of values, they cannot be directly copied or compared. The standard library has thestrcpy()
andstrcmp()
functions (and theirwchar
cousins) for doing these operations and more.
Your final stop on this mini-tour of C is how you can create new types in C:structs. Thestruct
keyword allows you to group a set of different data types together into a new, custom data type:
struct<struct_name>{<type><member_name>;<type><member_name>;...};
This partial example fromModules/arraymodule.c
shows astruct
declaration:
structarraydescr{chartypecode;intitemsize;...};
This creates a new data type calledarraydescr
which has many members, the first two of which are achar typecode
and anint itemsize
.
Frequently structs will be used as part of atypedef
, which provides a simple alias for the name. In the example above, all variables of the new type must be declared with the full namestruct arraydescr x;
.
You’ll frequently see syntax like this:
typedefstruct{PyObject_HEADSEM_HANDLEhandle;unsignedlonglast_tid;intcount;intmaxvalue;intkind;char*name;}SemLockObject;
This creates a new, custom struct type and gives it the nameSemLockObject
. To declare a variable of this type, you can simply use the aliasSemLockObject x;
.
This wraps up your quick walk through C syntax. Although this description barely scratches the surface of the C language, you now have sufficient knowledge to read and understand the CPython source code.
In this tutorial, you learned:
Now that you’re familiar with C, you can deepen your knowledge of the inner workings of Python by exploring the CPython source code. Happy Pythoning!
Note: If you enjoyed what you learned in this sample fromCPython Internals: Your Guide to the Python Interpreter, then be sure to check outthe rest of the book.
🐍 Python Tricks 💌
Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.
AboutJim Anderson
Jim has been programming for a long time in a variety of languages. He has worked on embedded systems, built distributed build systems, done off-shore vendor management, and sat in many, many meetings.
» More about JimMasterReal-World Python Skills With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
MasterReal-World Python Skills
With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
What Do You Think?
What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.
Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.
Keep Learning
Related Topics:basics
Already have an account?Sign-In
Almost there! Complete this form and click the button below to gain instant access:
"CPython Internals: Your Guide to the Python 3 Interpreter" – Free Sample Chapter (PDF)