| C Programming Advanced data types | Pointers and arrays |
In the chapterVariables we looked at the primitive data types. However, more advanced data types allow us greater flexibility in managing data in our program.
Structs are data types made of variables of other data types (possibly including other structs). They are used to group pieces of information into meaningful units, and also permit some constructs not possible otherwise. The variables declared in a struct are calledmembers.
One defines a struct using thestruct keyword and a block of members. These members are specified using variable declarations. For example:
structmystruct{intint_member;doubledouble_member;charstring_member[25];}struct_var;
struct_var is a variable of typestructmystruct, which we declared along with the definition of the newstructmystruct data type.
This new type's name is made up of multiple words, just like some built-in types, such asunsignedlong. |
More commonly, struct variables are declared after the definition of the struct, using the form:
structmystruct{// ...};structmystructstruct_var;
The members of a struct variable may be accessed using the member access operator. (a dot):
struct_var.int_member=0;
It is often common practice to make atype synonym so we don't have to type "struct mystruct" all the time. C allows us the possibility to do so using atypedef statement, which aliases a type:
typedefstruct{// ...}Mystruct;
Thestruct itself is anincomplete type (by the absence of a name on the first line), but it is aliased asMystruct. Then the following may be used:
Mystructstruct_var;
Structs may contain not only their own variables but may also contain other structs:
#include<stdio.h>#include<stdlib.h>structweapon{charname[100];intattack_power;struct{intstrength;intagility;intintelligence;}attributes;};intmain(intargc,char*argv[]){structweaponsword={"A cool thing",5,{3,1,0}};printf("This sword requires %d STR.\n",sword.attributes.strength);printf("It also takes up %zd bytes.\n",sizeofsword);returnEXIT_SUCCESS;} Outputs: This sword requires 3 STR.It also takes up 116 bytes. |
If our new type really is a type, then like any other type, it must have a size.
Recallstructmystruct fromearlier. It is composed of anint,double, andchar[25]. On most modern systems, these have sizes of 4, 8, and 25 bytes, respectively. What do you think is the size ofstructmystruct as a whole? Would it be bytes?
| Storage and alignment sizes for each data type can vary from system to system. In this section, we'll use the numbers for a 64-bit computer, but the concepts themselves apply everywhere. |
Let's test this assumption.
#include<stdio.h>#include<stdlib.h>structmystruct{intint_member;doubledouble_member;charstring_member[25];};intmain(intargc,char*argv[]){printf("sizeof(int) = %zu\n",sizeof(int));printf("sizeof(double) = %zu\n",sizeof(double));printf("sizeof(char[25]) = %zu\n",sizeof(char[25]));printf("sizeof(struct mystruct) = %zu\n",sizeof(structmystruct));printf("alignof(int) = %zu\n",alignof(int));printf("alignof(double) = %zu\n",alignof(double));printf("alignof(char[25]) = %zu\n",alignof(char[25]));printf("alignof(struct mystruct) = %zu\n",alignof(structmystruct));returnEXIT_SUCCESS;} Output: sizeof(int) = 4sizeof(double) = 8sizeof(char[25]) = 25sizeof(struct mystruct) = 48alignof(int) = 4alignof(double) = 8alignof(char[25]) = 1alignof(struct mystruct) = 8 |
alignof(...) by itself is new in C23. In older development environments, you must#include<stdalign.h> beforehand or instead type it as_Alignof(...). |
The whole is greater than the sum of its parts! Why is this? It has to do with alignment, which we'll cover now.
If a two-byteshort is placed in memory at address0080, another one couldn't be placed at0079 or0081 since it will overlap;0078 and0082 are the next closest options, leaving no gap between them in memory. However, if there was a one-bytechar at0082, a following secondshort wouldn't be allocated at0083, but at0084, leaving one byte ofpadding in the middle.
This is because your processor is a little picky in how it wants to load data from and store data to memory. This pickiness is related to your processor's word size, which is 64 bits (or 8 bytes) on most computers today. There might be a performance penalty if either of the following happen:
Compiled programs must have the processor either access members of the new type directly or split a load or store of the whole thing into multiple word-size loads. And, on some systems, the above scenarios are outright prohibited. To avoid any performance penalties (at best) or crashes (at worst), all types in C have analignment, which is the number of bytes that must be between each instance of that type. When data is allocated, unused padding bytes fill space in memory before it until the alignment is met. All primitive types have sizes that are powers of two, so their alignments are always their sizes or the word size, whichever is smaller. Structs, being made of multiple members of possibly different sizes, have the alignment of their largest-alignment member.
Thealignof keyword in the above example works likesizeof. It is not a function; during compilation, it evaluates to the alignment of the type or object it is given. Using the values from the example, we can recreate how our data is laid out in memory:
| Offset | Value |
|---|---|
00 | 4 bytes forint_member |
04 | 4 bytes of padding, to aligndouble_member |
08 | 8 bytes fordouble_member |
0C | |
10 | 25 bytes forstring_member |
14 | |
18 | |
1C | |
20 | |
24 | |
28 | 7 bytes of padding, to align the struct itself |
2C |
If we rearrange the struct so that the members are ordered from greatest alignment to least alignment, the padding hole is removed from the middle of the struct.
Swapping the first two members:
structmystruct{doubledouble_member;intint_member;charstring_member[25];};
...results insizeof(structmystruct)==40, saving eight bytes! Let's check its new memory layout:
| Offset | Value |
|---|---|
00 | 8 bytes forint_member |
04 | |
08 | 4 bytes forint_member |
0C | 25 bytes forstring_member |
10 | |
14 | |
18 | |
1C | |
20 | |
24 | 7 bytes of padding, to align the struct itself |
28 |
Now, the only unused space is the unavoidable alignment padding at the end of the struct.
Enumerations are artificial data types representing associations between labels and integers. Unlike structs or unions, they are not composed of other data types. An example declaration:
enumweather{sunny,windy,cloudy,rain,}weather_outside;
In the example above,sunny equals 0,windy equals 1, ... and so on. It is possible to assign values to labels within the integer range, but they must be a literal.
Similar declaration syntax that applies for structs and unions also applies for enums. Also, onenormally doesn't need to be concerned with the integers that labels represent:
enumweatherweather_outside=rain;
This peculiar property makes enums especially convenient in switch-case statements:
switch(weather_outside){casesunny:wear_sunglasses();break;casewindy:wear_windbreaker();break;casecloudy:get_umbrella();break;caserain:get_umbrella();wear_raincoat();break;}
Sometimes, the underlying data type for the enumeration matters. Since C23, it is possible to set the exact type of integer to use:[1]
enumweather:short{sunny,windy,cloudy,rain,}weather_outside;
| This is an example of a feature that originated in C++ and some C compilers' extensions to the language, then later standardized. |
The definition of a union is similar to that of a struct. The difference between the two is that in a struct, the members occupy different areas of memory, but in a union, the members occupy the same area of memory. Thus, in the following type, for example:
union{inti;doubled;}u;
The programmer can access eitheru.i oru.d, but not both at the same time. Sinceu.i andu.d occupy the same area of memory, modifying one modifies the value of the other. |
The size of a union is the size of its largest member.
Imagine that you are developing a settings editor for an application. In this application, setting values can be integers, floating-point numbers, or single characters. Despite this, it's possible to represent all setting values with a single type.
Structs, enums, and unions can be combined to create atagged union, a complex type which pairs some data that varies in type with information on the current type of that data. The C runtime does not keep track of the types of data in your program, so an enum is used to define all types that data could be. A union represents the varying-type data, and a struct keeps the type enum and data union together. Combining this with type synonyms, we get the following:
#include<stdio.h>#include<stdlib.h>typedefenum{integer,decimal,character}SettingValueType;typedefstruct{SettingValueTypetype;union{intinteger;doubledecimal;charcharacter;}data;}SettingValue;intmain(intargc,char*argv[]){SettingValuevalue={decimal,{.decimal=3.7}};switch(value.type){caseinteger:printf("int value = %d\n",value.data.integer);break;casedecimal:printf("double value = %lf\n",value.data.decimal);break;casecharacter:printf("char value = %c\n",value.data.character);break;}printf("sizeof value = %zd\n",sizeofvalue);returnEXIT_SUCCESS;} Outputs: double value = 3.700000sizeof value = 16 |
| C Programming Advanced data types | Pointers and arrays |