I do considerassignment statements and pointer variables to be among computer science's "most valuable treasures."
Donald Knuth,Structured Programming, with go to Statements[1]
A pointera pointing to the memory address associated with a variableb, i.e.,a contains the memory address 1008 of the variableb. In this diagram, the computing architecture uses the sameaddress space anddata primitive for both pointers and non-pointers; this need not be the case.
Incomputer science, apointer is anobject in manyprogramming languages that stores amemory address. This can be that of another value located incomputer memory, or in some cases, that ofmemory-mappedcomputer hardware. A pointerreferences a location in memory, and obtaining the value stored at that location is known asdereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlyingcomputer architecture.
Using pointers significantly improvesperformance for repetitive operations, like traversingiterable datastructures (e.g.strings,lookup tables,control tables,linked lists, andtree structures). In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
A pointer is a simple, more concrete implementation of the more abstractreferencedata type. Several languages, especiallylow-level languages, support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies todata structures whoseinterface explicitly allows the pointer to be manipulated (arithmetically viapointer arithmetic) as a memory address, as opposed to amagic cookie orcapability which does not allow such.[citation needed] Because pointers allow both protected and unprotected access tomemory addresses, there are risks associated with using them, particularly in the latter case. Primitive pointers are often stored in a format similar to aninteger; however, attempting to dereference or "look up" such a pointer whose value is not a valid memory address could cause a program tocrash (or contain invalid data). To alleviate this potential problem, as a matter oftype safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken (such asvalidation andbounds checking), to verify that the pointer variable contains a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.
In 1955, Soviet Ukrainian computer scientistKateryna Yushchenko created theAddress programming language that made possible indirect addressing and addresses of the highest rank – analogous to pointers. This language was widely used on the Soviet Union computers. However, it was unknown outside the Soviet Union and usuallyHarold Lawson is credited with the invention, in 1964, of the pointer.[2] In 2000, Lawson was presented the Computer Pioneer Award by theIEEE "[f]or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high-level language".[3] His seminal paper on the concepts appeared in the June 1967 issue of CACM entitled: PL/I List Processing. According to theOxford English Dictionary, thewordpointer first appeared in print as astack pointer in a technical memorandum by theSystem Development Corporation.
Adata primitive (or justprimitive) is any datum that can be read from or written tocomputer memory using one memory access (for instance, both abyte and aword are primitives).
Adata aggregate (or justaggregate) is a group of primitives that arelogically contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space). When an aggregate is entirely composed of the same type of primitive, the aggregate may be called anarray; in a sense, a multi-byteword primitive is an array of bytes, and some programs use words in this way.
In the context of these definitions, abyte is the smallest primitive; eachmemory address specifies a different byte. The memory address of the initial byte of a datum is considered the memory address (orbase memory address) of the entire datum.
Amemory pointer (or justpointer) is a primitive, the value of which is intended to be used as a memory address; it is said thata pointer points to a memory address. It is also said thata pointer points to a datum [in memory] when the pointer's value is the datum's memory address.
More generally, a pointer is a kind ofreference, and it is said thata pointer references a datum stored somewhere in memory; to obtain that datum isto dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.
References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamentaldata type inprogramming languages; instatically (orstrongly) typed programming languages, thetype of a pointer determines the type of the datum to which the pointer points.
Pointers are a very thinabstraction on top of the addressing capabilities provided by most modernarchitectures. In the simplest scheme, anaddress, or a numericindex, is assigned to each unit of memory in the system, where the unit is typically either abyte or aword – depending on whether the architecture isbyte-addressable orword-addressable – effectively transforming all of memory into a very largearray. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address (usually utilizing the machine'sgeneral-purpose registers).
In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as theIntel x86 architecture, be called asegmentation fault (segfault). The second case is possible in the current implementation ofAMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules (canonical addresses), so if a non-canonical pointer is dereferenced, the processor raises ageneral protection fault.
On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such asmemory segmentation orpaging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through thePAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bitprotected mode of the80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.
In order to provide a consistent interface, some architectures providememory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.
Pointers are directly supported without restrictions in languages such asPL/I,C,C++,Pascal,FreeBASIC, and implicitly in mostassembly languages. They are used mainly to constructreferences, which in turn are fundamental to construct nearly alldata structures, and to pass data between different parts of a program.
Infunctional programming languages that rely heavily on lists, data references are managed abstractly by using primitive constructs likecons and the corresponding elementscar and cdr, which can be thought of as specialised pointers to the first and second components of a cons-cell. This gives rise to some of the idiomatic "flavour" of functional programming. By structuring data in suchcons-lists, these languages facilitaterecursive means for building and processing data—for example, by recursively accessing the head and tail elements of lists of lists; e.g. "taking the car of the cdr of the cdr". By contrast, memory management based on pointer dereferencing in some approximation of anarray of memory addresses facilitates treating variables as slots into which data can be assignedimperatively.
When dealing with arrays, the criticallookup operation typically involves a stage calledaddress calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such aslinked lists, pointers are used as references to explicitly tie one piece of the structure to another.
Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.
Pointers can also be used toallocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in amemory leak (where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).
This declaresptr as the identifier of an object of the following type:
pointer that points to an object of typeint
This is usually stated more succinctly as "ptr is a pointer toint."
Because the C language does not specify an implicit initialization for objects of automatic storage duration,[5] care should often be taken to ensure that the address to whichptr points is valid; this is why it is sometimes suggested that a pointer be explicitly initialized to thenull pointer value, which is traditionally specified in C with the standardized macroNULL:[6]
int*ptr=NULL;
Dereferencing a null pointer in C producesundefined behavior,[7] which could be catastrophic. However, most implementations[citation needed] simply halt execution of the program in question, usually with asegmentation fault.
However, initializing pointers unnecessarily could hinder program analysis, thereby hiding bugs.
In any case, once a pointer has been declared, the next logical step is for it to point at something:
inta=5;int*ptr=NULL;ptr=&a;
This assigns the value of the address ofa toptr. For example, ifa is stored at memory location of 0x8130 then the value ofptr will be 0x8130 after the assignment. To dereference the pointer, an asterisk is used again:
*ptr=8;
This means take the contents ofptr (which is 0x8130), "locate" that address in memory and set its value to 8.Ifa is later accessed again, its new value will be 8.
This example may be clearer if memory is examined directly.Assume thata is located at address 0x8130 in memory andptr at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:
inta=5;int*ptr=NULL;
Address
Contents
0x8130
0x00000005
0x8134
0x00000000
(The NULL pointer shown here is 0x00000000.)By assigning the address ofa toptr:
ptr=&a;
yields the following memory values:
Address
Contents
0x8130
0x00000005
0x8134
0x00008130
Then by dereferencingptr by coding:
*ptr=8;
the computer will take the contents ofptr (which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:
Address
Contents
0x8130
0x00000008
0x8134
0x00008130
Clearly, accessinga will yield the value of 8 because the previous instruction modified the contents ofa by way of the pointerptr.
When setting updata structures likelists,queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, andstack pointers. These pointers can either beabsolute (the actualphysical address or avirtual address invirtual memory) orrelative (anoffset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).
Relative addresses are a form of manualmemory segmentation, and share many of its advantages and disadvantages. A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64KiB (216 bytes) of a data structure. This can easily be extended to 128, 256 or 512 KiB if the address pointed to is forced to bealigned on a half-word, word or double-word boundary (but, requiring an additional "shift left"bitwise operation—by 1, 2 or 3 bits—in order to adjust the offset by a factor of 2, 4 or 8, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer absolute addresses (and underlying that, aflat address space) is preferred.
A one byte offset, such as the hexadecimalASCII value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g., X'01'). In this way, characters can be very efficiently translated from 'raw data' to a usable sequentialindex and then to an absolute address without alookup table.
In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires thatarray[i] be equivalent to*(array + i).[8] Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),[8] and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an arrayarray can be declared and used in the following manner:
intarray[5];/* Declares 5 contiguous integers */int*ptr=array;/* Arrays can be used as pointers */ptr[0]=1;/* Pointers can be indexed with array syntax */*(array+1)=2;/* Arrays can be dereferenced with pointer syntax */*(1+array)=2;/* Pointer addition is commutative */2[array]=4;/* Subscript operator is commutative */
This allocates a block of five integers and names the blockarray, which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory frommalloc which returns a consecutive block of memory of no less than the requested size that can be used as an array.
While most operators on arrays and pointers are equivalent, the result of thesizeof operator differs. In this example,sizeof(array) will evaluate to5*sizeof(int) (the size of the array), whilesizeof(ptr) will evaluate tosizeof(int*), the size of the pointer itself.
Default values of an array can be declared like:
intarray[5]={2,4,3,1,5};
Ifarray is located in memory starting at address 0x1000 on a 32-bitlittle-endian machine then memory will contain the following (values are inhexadecimal, like the addresses):
0
1
2
3
1000
2
0
0
0
1004
4
0
0
0
1008
3
0
0
0
100C
1
0
0
0
1010
5
0
0
0
Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endianCPU architecture) and are stored consecutively starting at address 0x1000.
The syntax for C with pointers is:
array means 0x1000;
array + 1 means 0x1004: the "+ 1" means to add the size of 1int, which is 4 bytes;
*array means to dereference the contents ofarray. Considering the contents as a memory address (0x1000), look up the value at that location (0x0002);
array[i] means element numberi, 0-based, ofarray which is translated into*(array + i).
The last example is how to access the contents ofarray. Breaking it down:
array + i is the memory location of the (i)th element ofarray, starting at i=0;
*(array + i) takes that memory address and dereferences it to access the value.
Below is an example definition of alinked list in C.
/* the empty linked list is represented by NULL * or some other sentinel value */#define EMPTY_LIST NULLstructlink{void*data;/* data of this link */structlink*next;/* next link; EMPTY_LIST if there is none */};
This pointer-recursive definition is essentially the same as the reference-recursive definition from the languageHaskell:
dataLinka=Nil|Consa(Linka)
Nil is the empty list, andCons a (Link a) is acons cell of typea with another link also of typea.
The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with viawrapper functions, which are carefully checked for correctness.
Pointers can be used to pass variables by their address, allowing their value to be changed. For example, consider the followingC code:
/* a copy of the int n can be changed within the function without affecting the calling code */voidpassByValue(intn){n=12;}/* a pointer m is passed instead. No copy of the value pointed to by m is created */voidpassByAddress(int*m){*m=14;}intmain(void){intx=3;/* pass a copy of x's value as the argument */passByValue(x);// the value was changed inside the function, but x is still 3 from here on/* pass x's address as the argument */passByAddress(&x);// x was actually changed by the function and is now equal to 14 herereturn0;}
In some programs, the required amount of memory depends on whatthe user may enter. In such cases the programmer needs to allocate memory dynamically. This is done by allocating memory at theheap rather than on thestack, where variables usually are stored (although variables can also be stored in the CPU registers). Dynamic memory allocation can only be made through pointers, and names – like with common variables – cannot be given.
Pointers are used to store and manage the addresses ofdynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called theheap orfree store, from which objects are dynamically allocated.
The example C code below illustrates how structure objects are dynamically allocated and referenced. Thestandard C library provides the functionmalloc() for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.
/* Parts inventory item */structItem{intid;/* Part number */char*name;/* Part name */floatcost;/* Cost */};/* Allocate and initialize a new Item object */structItem*make_item(constchar*name){structItem*item;/* Allocate a block of memory for a new Item object */item=malloc(sizeof(structItem));if(item==NULL)returnNULL;/* Initialize the members of the new Item */memset(item,0,sizeof(structItem));item->id=-1;item->name=NULL;item->cost=0.0;/* Save a copy of the name in the new Item */item->name=malloc(strlen(name)+1);if(item->name==NULL){free(item);returnNULL;}strcpy(item->name,name);/* Return the newly created Item object */returnitem;}
The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the functionfree() for deallocating a previously allocated memory block and returning it back to the heap.
/* Deallocate an Item object */voiddestroy_item(structItem*item){/* Check for a null object pointer */if(item==NULL)return;/* Deallocate the name string saved within the Item */if(item->name!=NULL){free(item->name);item->name=NULL;}/* Deallocate the Item object itself */free(item);}
On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.
Assigning addresses to pointers is an invaluable tool when programmingmicrocontrollers. Below is a simple example declaring a pointer of type int and initialising it to ahexadecimal address in this example the constant 0x7FFF:
int*hardware_address=(int*)0x7FFF;
In the mid 80s, using theBIOS to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to accessCGA video memory directly by casting thehexadecimal constant 0xB8000 to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of anASCII code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:
#define VID ((unsigned short (*)[80])0xB8000)voidfoo(void){VID[4][1]=0x1F00|'A';}
Control tables that are used to controlprogram flow usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points tosubroutines to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point to earlier table entries (as in loop processing) or forward to skip some table entries (as in aswitch or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.
In many languages, pointers have the additional restriction that the object they point to has a specifictype. For example, a pointer may be declared to point to aninteger; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such asfloating-point numbers, eliminating some errors.
For example, in C
int*money;char*bags;
money would be an integer pointer andbags would be a char pointer.The following would yield a compiler warning of "assignment from incompatible pointer type" underGCC
bags=money;
becausemoney andbags were declared with different types.To suppress the compiler warning, it must be made explicit that you do indeed wish to make the assignment bytypecasting it
bags=(char*)money;
which says to cast the integer pointer ofmoney to a char pointer and assign tobags.
A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):[9]
char*external_buffer="abcdef";int*internal_data;internal_data=(int*)external_buffer;// UNDEFINED BEHAVIOUR if "the resulting pointer// is not correctly aligned"
In languages that allow pointer arithmetic, arithmetic on pointers takes into account the size of the type. For example, adding an integer number to a pointer produces another pointer that points to an address that is higher by that number times the size of the type. This allows us to easily compute the address of elements of an array of a given type, as was shown in the C arrays example above. When a pointer of one type is cast to another type of a different size, the programmer should expect that pointer arithmetic will be calculated differently. In C, for example, if themoney array starts at 0x2000 andsizeof(int) is 4 bytes whereassizeof(char) is 1 byte, thenmoney + 1 will point to 0x2004, butbags + 1 would point to 0x2001. Other risks of casting include loss of data when "wide" data is written to "narrow" locations (e.g.bags[0] = 65537;), unexpected results whenbit-shifting values, and comparison problems, especially with signed vs unsigned values.
Although it is impossible in general to determine at compile-time which casts are safe, some languages storerun-time type information which can be used to confirm that these dangerous casts are valid at runtime. Other languages merely accept a conservative approximation of safe casts, or none at all.
In C and C++, even if two pointers compare as equal that doesn't mean they are equivalent. In these languagesandLLVM, the rule is interpreted to mean that "just because two pointers point to the same address, does not mean they are equal in the sense that they can be used interchangeably", the difference between the pointers referred to as theirprovenance.[10] Casting to an integer type such asuintptr_t is implementation-defined and the comparison it provides does not provide any more insight as to whether the two pointers are interchangeable. In addition, further conversion to bytes and arithmetic will throw off optimizers trying to keep track the use of pointers, a problem still being elucidated in academic research.[11]
As a pointer allows a program to attempt to access an object that may not be defined, pointers can be the origin of a variety ofprogramming errors. However, the usefulness of pointers is so great that it can be difficult to perform programming tasks without them. Consequently, many languages have created constructs designed to provide some of the useful features of pointers without some of theirpitfalls, also sometimes referred to aspointer hazards. In this context, pointers that directly address memory (as used in this article) are referred to asraw pointers, by contrast withsmart pointers or other variants.
One major problem with pointers is that as long as they can be directly manipulated as a number, they can be made to point to unused addresses or to data which is being used for other purposes. Many languages, including most functional programming languages and recentimperative programming languages likeJava, replace pointers with a more opaque type of reference, typically referred to as simply areference, which can only be used to refer to objects and not manipulated as numbers, preventing this type of error. Array indexing is handled as a special case.
A pointer which does not have any address assigned to it is called awild pointer. Any attempt to use such uninitialized pointers can cause unexpected behavior, either because the initial value is not a valid address, or because using it may damage other parts of the program. The result is often asegmentation fault,storage violation orwild branch (if used as a function pointer or branch address).
In systems with explicit memory allocation, it is possible to create adangling pointer by deallocating the memory region it points into. This type of pointer is dangerous and subtle because a deallocated memory region may contain the same data as it did before it was deallocated but may be then reallocated and overwritten by unrelated code, unknown to the earlier code. Languages withgarbage collection prevent this type of error because deallocation is performed automatically when there are no more references in scope.
Some languages, likeC++, supportsmart pointers, which use a simple form ofreference counting to help track allocation of dynamic memory in addition to acting as a reference. In the absence of reference cycles, where an object refers to itself indirectly through a sequence of smart pointers, these eliminate the possibility of dangling pointers and memory leaks.Delphi strings support reference counting natively.
Anull pointer has a value reserved for indicating that the pointer does not refer to a valid object. Null pointers are routinely used to represent conditions such as the end of alist of unknown length or the failure to perform some action; this use of null pointers can be compared tonullable types and to theNothing value in anoption type.
Adangling pointer is a pointer that does not point to a valid object and consequently may make a program crash or behave oddly. In thePascal orC programming languages, pointers that are not specifically initialized may point to unpredictable addresses in memory.
The following example code shows a dangling pointer:
intfunc(void){char*p1=malloc(sizeof(char));/* (undefined) value of some place on the heap */char*p2;/* dangling (uninitialized) pointer */*p1='a';/* This is OK, assuming malloc() has not returned NULL. */*p2='b';/* This invokes undefined behavior */}
Here,p2 may point to anywhere in memory, so performing the assignment*p2 = 'b'; can corrupt an unknown area of memory or trigger asegmentation fault.
Where a pointer is used as the address of the entry point to a program or start of afunction which doesn't return anything and is also either uninitialized or corrupted, if a call orjump is nevertheless made to this address, a "wild branch" is said to have occurred. In other words, a wild branch is a function pointer that is wild (dangling).
The consequences are usually unpredictable and the error may present itself in several different ways depending upon whether or not the pointer is a "valid" address and whether or not there is (coincidentally) a valid instruction (opcode) at that address. The detection of a wild branch can present one of the most difficult and frustrating debugging exercises since much of the evidence may already have been destroyed beforehand or by execution of one or more inappropriate instructions at the branch location. If available, aninstruction set simulator can usually not only detect a wild branch before it takes effect, but also provide a complete or partial trace of its history.
Anautorelative pointer is a pointer whose value is interpreted as an offset from the address of the pointer itself; thus, if a data structure has an autorelative pointer member that points to some portion of the data structure itself, then the data structure may be relocated in memory without having to update the value of the auto relative pointer.[12]
The cited patent also uses the termself-relative pointer to mean the same thing. However, the meaning of that term has been used in other ways:
to mean an offset from the address of a structure rather than from the address of the pointer itself;[citation needed]
to mean a pointer containing its own address, which can be useful for reconstructing in any arbitrary region of memory a collection of data structures that point to each other.[13]
Abased pointer is a pointer whose value is an offset from the value of another pointer. This can be used to store and load blocks of data, assigning the address of the beginning of the block to the base pointer.[14]
In some languages, a pointer can reference another pointer, requiring multiple dereference operations to get to the original value. While each level of indirection may add a performance cost, it is sometimes necessary in order to provide correct behavior for complexdata structures. For example, in C it is typical to define alinked list in terms of an element that contains a pointer to the next element of the list:
This implementation uses a pointer to the first element in the list as a surrogate for the entire list. If a new value is added to the beginning of the list,head has to be changed to point to the new element. Since C arguments are always passed by value, using double indirection allows the insertion to be implemented correctly, and has the desirable side-effect of eliminating special case code to deal with insertions at the front of the list:
// Given a sorted list at *head, insert the element item at the first// location where all earlier elements have lesser or equal value.voidinsert(structelement**head,structelement*item){structelement**p;// p points to a pointer to an elementfor(p=head;*p!=NULL;p=&(*p)->next){if(item->value<=(*p)->value)break;}item->next=*p;*p=item;}// Caller does this:insert(&head,item);
In this case, if the value ofitem is less than that ofhead, the caller'shead is properly updated to the address of the new item.
A basic example is in theargv argument to themain function in C (and C++), which is given in the prototype aschar **argv—this is because the variableargv itself is a pointer to an array of strings (an array of arrays), so*argv is a pointer to the 0th string (by convention the name of the program), and**argv is the 0th character of the 0th string.
In some languages, a pointer can reference executable code, i.e., it can point to a function, method, or procedure. Afunction pointer will store the address of a function to be invoked. While this facility can be used to call functions dynamically, it is often a favorite technique of virus and other malicious software writers.
intsum(intn1,intn2){// Function with two integer parameters returning an integer valuereturnn1+n2;}intmain(void){inta,b,x,y;int(*fp)(int,int);// Function pointer which can point to a function like sumfp=∑// fp now points to function sumx=(*fp)(a,b);// Calls function sum with arguments a and by=sum(a,b);// Calls function sum with arguments a and b}
In doublylinked lists ortree structures, a back pointer held on an element 'points back' to the item referring to the current element. These are useful for navigation and manipulation, at the expense of greater memory use.
It is possible to simulate pointer behavior using an index to an (normally one-dimensional) array.
Primarily for languages which do not support pointers explicitly butdo support arrays, thearray can be thought of and processed as if it were the entire memory range (within the scope of the particular array) and any index to it can be thought of as equivalent to ageneral-purpose register in assembly language (that points to the individual bytes but whose actual value is relative to the start of the array, not its absolute address in memory).Assuming the array is, say, a contiguous 16megabyte characterdata structure, individual bytes (or astring of contiguous bytes within the array) can be directly addressed and manipulated using the name of the array with a 31 bit unsignedinteger as the simulated pointer (this is quite similar to theC arrays example shown above). Pointer arithmetic can be simulated by adding or subtracting from the index, with minimal additional overhead compared to genuine pointer arithmetic.
It is even theoretically possible, using the above technique, together with a suitableinstruction set simulator to simulateanymachine code or the intermediate (byte code) ofany processor/language in another language that does not support pointers at all (for exampleJava /JavaScript). To achieve this, thebinary code can initially be loaded into contiguous bytes of the array for the simulator to "read", interpret and execute entirely within the memory containing the same array.If necessary, to completely avoidbuffer overflow problems,bounds checking can usually be inserted by the compiler (or if not, hand coded in the simulator).
Ada is a strongly typed language where all pointers are typed and only safe type conversions are permitted. All pointers are by default initialized tonull, and any attempt to access data through anull pointer causes anexception to be raised. Pointers in Ada are calledaccess types. Ada 83 did not permit arithmetic on access types (although many compiler vendors provided for it as a non-standard feature), but Ada 95 supports “safe” arithmetic on access types via the packageSystem.Storage_Elements.
Several old versions ofBASIC for the Windows platform had support for STRPTR() to return the address of a string, and for VARPTR() to return the address of a variable. Visual Basic 5 also had support for OBJPTR() to return the address of an object interface, and for an ADDRESSOF operator to return the address of a function. The types of all of these are integers, but their values are equivalent to those held by pointer types.
Newer dialects ofBASIC, such asFreeBASIC orBlitzMax, have exhaustive pointer implementations, however. In FreeBASIC, arithmetic onANY pointers (equivalent to C'svoid*) are treated as though theANY pointer was a byte width.ANY pointers cannot be dereferenced, as in C. Also, casting betweenANY and any other type's pointers will not generate any warnings.
InC andC++ pointers are variables that store addresses and can benull. Each pointer has a type it points to, but one can freely cast between pointer types (but not between a function pointer and an object pointer). A special pointer type called the “void pointer” allows pointing to any (non-function) object, but is limited by the fact that it cannot be dereferenced directly (it shall be cast). The address itself can often be directly manipulated by casting a pointer to and from an integral type of sufficient size, though the results are implementation-defined and may indeed cause undefined behavior; while earlier C standards did not have an integral type that was guaranteed to be large enough,C99 specifies theuintptr_ttypedef name defined in<stdint.h>, but an implementation need not provide it.
C++ fully supports C pointers and C typecasting. It also supports a new group of typecasting operators to help catch some unintended dangerous casts at compile-time. SinceC++11, theC++ standard library also providessmart pointers (unique_ptr,shared_ptr andweak_ptr) which can be used in some situations as a safer alternative to primitive C pointers. C++ also supports another form of reference, quite different from a pointer, called simply areference orreference type.
Pointer arithmetic, that is, the ability to modify a pointer's target address with arithmetic operations (as well as magnitude comparisons), is restricted by the language standard to remain within the bounds of a single array object (or just after it), and will otherwise invokeundefined behavior. Adding or subtracting from a pointer moves it by a multiple of the size of itsdatatype. For example, adding 1 to a pointer to 4-byte integer values will increment the pointer's pointed-to byte-address by 4. This has the effect of incrementing the pointer to point at the next element in a contiguous array of integers—which is often the intended result. Pointer arithmetic cannot be performed onvoid pointers because thevoid type has no size, and thus the pointed address can not be added to, althoughgcc and other compilers will perform byte arithmetic onvoid* as a non-standard extension, treating it as if it werechar *.
Pointer arithmetic provides the programmer with a single way of dealing with different types: adding and subtracting the number of elements required instead of the actual offset in bytes. (Pointer arithmetic withchar * pointers uses byte offsets, becausesizeof(char) is 1 by definition.) In particular, the C definition explicitly declares that the syntaxa[n], which is then-th element of the arraya, is equivalent to*(a + n), which is the content of the element pointed bya + n. This implies thatn[a] is equivalent toa[n], and one can write, e.g.,a[3] or3[a] equally well to access the fourth element of an arraya.
While powerful, pointer arithmetic can be a source ofcomputer bugs. It tends to confuse noviceprogrammers, forcing them into different contexts: an expression can be an ordinary arithmetic one or a pointer arithmetic one, and sometimes it is easy to mistake one for the other. In response to this, many modern high-level computer languages (for exampleJava) do not permit direct access to memory using addresses. Also, the safe C dialectCyclone addresses many of the issues with pointers. SeeC programming language for more discussion.
Thevoid pointer, orvoid*, is supported in ANSI C and C++ as a generic pointer type. A pointer tovoid can store the address of any object (not function),[a] and, in C, is implicitly converted to any other object pointer type on assignment, but it must be explicitly cast if dereferenced.K&R C usedchar* for the “type-agnostic pointer” purpose (before ANSI C).
intx=4;void*p1=&x;int*p2=p1;// void* implicitly converted to int*: valid C, but not C++inta=*p2;intb=*(int*)p1;// when dereferencing inline, there is no implicit conversion
C++ does not allow the implicit conversion ofvoid* to other pointer types, even in assignments. This was a design decision to avoid careless and even unintended casts, though most compilers only output warnings, not errors, when encountering other casts.
intx=4;void*p1=&x;int*p2=p1;// this fails in C++: there is no implicit conversion from void*int*p3=(int*)p1;// C-style castint*p4=reinterpret_cast<int*>(p1);// C++ cast
In C++, there is novoid& (reference to void) to complementvoid* (pointer to void), because references behave like aliases to the variables they point to, and there can never be a variable whose type isvoid.
In C++ pointers to non-static members of a class can be defined. If a classC has a memberT a then&C::a is a pointer to the membera of typeT C::*. This member can be an object or afunction.[16] They can be used on the right-hand side of operators.* and->* to access the corresponding member.
structS{inta;intf()const{returna;}};Ss1{};S*ptrS=&s1;intS::*ptr=&S::a;// pointer to S::aint(S::*fp)()const=&S::f;// pointer to S::fs1.*ptr=1;std::cout<<(s1.*fp)()<<"\n";// prints 1ptrS->*ptr=2;std::cout<<(ptrS->*fp)()<<"\n";// prints 2
These pointer declarations cover most variants of pointer declarations. Of course it is possible to have triple pointers, but the main principles behind a triple pointer already exist in a double pointer. The naming used here is what the expressiontypeid(type).name() equals for each of these types when usingg++ orclang.[17][18]
charA5_A5_c[5][5];/* array of arrays of chars */char*A5_Pc[5];/* array of pointers to chars */char**PPc;/* pointer to pointer to char ("double pointer") */char(*PA5_c)[5];/* pointer to array(s) of chars */char*FPcvE();/* function which returns a pointer to char(s) */char(*PFcvE)();/* pointer to a function which returns a char */char(*FPA5_cvE())[5];/* function which returns pointer to an array of chars */char(*A5_PFcvE[5])();/* an array of pointers to functions which return a char */
The following declarations involving pointers-to-member are valid only in C++:
classC;classD;charC::*M1Cc;/* pointer-to-member to char */charC::*A5_M1Cc[5];/* array of pointers-to-member to char */char*C::*M1CPc;/* pointer-to-member to pointer to char(s) */charC::**PM1Cc;/* pointer to pointer-to-member to char */char(*M1CA5_c)[5];/* pointer-to-member to array(s) of chars */charC::*FM1CcvE();/* function which returns a pointer-to-member to char */charD::*C::*M1CM1Dc;/* pointer-to-member to pointer-to-member to pointer to char(s) */charC::*C::*M1CMS_c;/* pointer-to-member to pointer-to-member to pointer to char(s) */char(C::*FM1CA5_cvE())[5];/* function which returns pointer-to-member to an array of chars */char(C::*M1CFcvE)()/* pointer-to-member-function which returns a char */char(C::*A5_M1CFcvE[5])();/* an array of pointers-to-member-functions which return a char */
In theC# programming language, pointers are supported by either marking blocks of code that include pointers with theunsafe keyword, or byusing theSystem.Runtime.CompilerServices assembly provisions for pointer access. The syntax is essentially the same as in C++, and the address pointed can be eithermanaged orunmanaged memory. However, pointers to managed memory (any pointer to a managed object) must be declared using thefixed keyword, which prevents thegarbage collector from moving the pointed object as part of memory management while the pointer is in scope, thus keeping the pointer address valid.
However, an exception to this is from using theIntPtr structure, which is a memory managed equivalent toint*, and does not require theunsafe keyword nor theCompilerServices assembly. This type is often returned when using methods from theSystem.Runtime.InteropServices, for example:
// Get 16 bytes of memory from the process's unmanaged memoryIntPtrpointer=System.Runtime.InteropServices.Marshal.AllocHGlobal(16);// Do something with the allocated memory// Free the allocated memorySystem.Runtime.InteropServices.Marshal.FreeHGlobal(pointer);
The.NET framework includes many classes and methods in theSystem andSystem.Runtime.InteropServices namespaces (such as theMarshal class) which convert .NET types (for example,System.String) to and from manyunmanaged types and pointers (for example,LPWSTR orvoid*) to allow communication withunmanaged code. Most such methods have the same security permission requirements as unmanaged code, since they can affect arbitrary places in memory.
TheCOBOL programming language supports pointers to variables. Primitive or group (record) data objects declared within theLINKAGE SECTION of a program are inherently pointer-based, where the only memory allocated within the program is space for the address of the data item (typically a single memory word). In program source code, these data items are used just like any otherWORKING-STORAGE variable, but their contents are implicitly accessed indirectly through theirLINKAGE pointers.
Memory space for each pointed-to data object is typicallyallocated dynamically using externalCALL statements or via embedded extended language constructs such asEXEC CICS orEXEC SQL statements.
Extended versions of COBOL also provide pointer variables declared withUSAGEISPOINTER clauses. The values of such pointer variables are established and modified usingSET andSETADDRESS statements.
Some extended versions of COBOL also providePROCEDURE-POINTER variables, which are capable of storing theaddresses of executable code.
ThePL/I language provides full support for pointers to all data types (including pointers to structures),recursion,multitasking, string handling, and extensive built-infunctions. PL/I was quite a leap forward compared to the programming languages of its time.[citation needed] PL/I pointers are untyped, and therefore no casting is required for pointer dereferencing or assignment. The declaration syntax for a pointer isDECLARE xxx POINTER;, which declares a pointer named "xxx". Pointers are used withBASED variables. A based variable can be declared with a default locator (DECLARE xxx BASED(ppp); or without (DECLARE xxx BASED;), where xxx is a based variable, which may be an element variable, a structure, or an array, and ppp is the default pointer). Such a variable can be address without an explicit pointer reference (xxx=1;, or may be addressed with an explicit reference to the default locator (ppp), or to any other pointer (qqq->xxx=1;).
Pointer arithmetic is not part of the PL/I standard, but many compilers allow expressions of the formptr = ptr±expression. IBM PL/I also has the builtin functionPTRADD to perform the arithmetic. Pointer arithmetic is always performed in bytes.
IBMEnterprise PL/I compilers have a new form of typed pointer called aHANDLE.
TheEiffel object-oriented language employs value and reference semantics without pointer arithmetic. Nevertheless, pointer classes are provided. They offer pointer arithmetic, typecasting, explicit memory management,interfacing with non-Eiffel software, and other features.
Fortran-90 introduced a strongly typed pointer capability. Fortran pointers contain more than just a simple memory address. They also encapsulate the lower and upper bounds of array dimensions, strides (for example, to support arbitrary array sections), and other metadata. Anassociation operator,=> is used to associate aPOINTER to a variable which has aTARGET attribute. The Fortran-90ALLOCATE statement may also be used to associate a pointer to a block of memory. For example, the following code might be used to define and create a linked list structure:
typereal_list_treal::sample_data(100)type(real_list_t),pointer::next=>null()end typetype(real_list_t),target::my_real_listtype(real_list_t),pointer::real_list_tempreal_list_temp=>my_real_listdo read(1,iostat=ioerr)real_list_temp%sample_dataif(ioerr/=0)exit allocate(real_list_temp%next)real_list_temp=>real_list_temp%nextend do
Fortran-2003 adds support for procedure pointers. Also, as part of theC Interoperability feature, Fortran-2003 supports intrinsic functions for converting C-style pointers into Fortran pointers and back.
Go has pointers. Its declaration syntax is equivalent to that of C, but written the other way around, ending with the type. Unlike C, Go has garbage collection, and disallows pointer arithmetic. Reference types, like in C++, do not exist. Some built-in types, like maps and channels, are boxed (i.e. internally they are pointers to mutable structures), and are initialized using themake function. In an approach to unified syntax between pointers and non-pointers, the arrow (->) operator has been dropped: the dot operator on a pointer refers to the field or method of the dereferenced object. This, however, only works with 1 level of indirection.
There is no explicit representation of pointers inJava. Instead, more complex data structures likeobjects andarrays are implemented usingreferences. The language does not provide any explicit pointer manipulation operators. It is still possible for code to attempt to dereference a null reference (null pointer), however, which results in a run-timeexception being thrown. The space occupied by unreferenced memory objects is recovered automatically bygarbage collection at run-time.[20]
Pointers are implemented very much as in Pascal, as areVAR parameters in procedure calls.Modula-2 is even more strongly typed than Pascal, with fewer ways to escape the type system. Some of the variants of Modula-2 (such asModula-3) include garbage collection.
Much as with Modula-2, pointers are available. There are still fewer ways to evade the type system and soOberon and its variants are still safer with respect to pointers than Modula-2 or its variants. As withModula-3, garbage collection is a part of the language specification.
Unlike many languages that feature pointers, standardISOPascal only allows pointers to reference dynamically created variables that are anonymous and does not allow them to reference standard static or local variables.[21] It does not have pointer arithmetic. Pointers also must have an associated type and a pointer to one type is not compatible with a pointer to another type (e.g. a pointer to a char is not compatible with a pointer to an integer). This helps eliminate the type security issues inherent with other pointer implementations, particularly those used forPL/I orC. It also removes some risks caused bydangling pointers, but the ability to dynamically let go of referenced space by using thedispose standard procedure (which has the same effect as thefree library function found inC) means that the risk of dangling pointers has not been entirely eliminated.[22]
However, in some commercial and open source Pascal (or derivatives) compiler implementations —likeFree Pascal,[23]Turbo Pascal or theObject Pascal inEmbarcadero Delphi— a pointer is allowed to reference standard static or local variables and can be cast from one pointer type to another. Moreover, pointer arithmetic is unrestricted: adding or subtracting from a pointer moves it by that number of bytes in either direction, but using theInc orDec standard procedures with it moves the pointer by the size of thedata type it isdeclared to point to. An untyped pointer is also provided under the namePointer, which is compatible with other pointer types.
ThePerlprogramming language supports pointers, although rarely used, in the form of the pack and unpack functions. These are intended only for simple interactions with compiled OS libraries. In all other cases, Perl usesreferences, which are typed and do not allow any form of pointer arithmetic. They are used to construct complex data structures.[24]
^Some compilers allow storing the addresses of functions in void pointers. The C++ standard lists converting a function pointer tovoid* as a conditionally supported feature and the C standard says such conversions are "common extensions". This is required by thePOSIX functiondlsym.[15]
^ISO/IEC 9899, clause 7.17, paragraph 3:NULL... which expands to an implementation-defined null pointer constant...
^ISO/IEC 9899, clause 6.5.3.2, paragraph 4, footnote 87:If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined... Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer...
^us patent 6625718, Steiner, Robert C. (Broomfield, CO), "Pointers that are relative to their own present locations", issued 2003-09-23, assigned to Avaya Technology Corp. (Basking Ridge, NJ)
^us patent 6115721, Nagy, Michael (Tampa, FL), "System and method for database save and restore using self-pointers", issued 2000-09-05, assigned to IBM (Armonk, NY)