12
\$\begingroup\$

I wrote a generic dynamic array using the preprocessor. My use case for it is a compiler I'm writing that uses dynamic arrays a lot, and my current void pointer implementation is becoming a bit hard to use as I have to remember the type each array stores when using it.I'm wondering if this is a good readable implementation and interface.

I'm aware that using an existing library would most likely be better, but as this is for learning purposes, I want to implement all the data structures I use.

#include <stdio.h>#include <stdlib.h>#include <assert.h>#define ARRAY_INITIAL_CAPACITY 8#define Array(type) type *// [capacity][used][data]...#define initArray(array) do { \    size_t *data = malloc(2 * sizeof(size_t) + ARRAY_INITIAL_CAPACITY * sizeof(*(array))); \    data[0] =  ARRAY_INITIAL_CAPACITY; \    data[1] = 0; \    (array) = (typeof(*(array)) *)&data[2]; \} while(0)#define freeArray(array) free((size_t *)(array)-2)#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : 0lu)#define arrayUsed(array) ((array) ? *((size_t *)(array)-1) : 0lu)#define arrayEmpty(array) (arrayUsed(array) == 0)#define arrayPush(array, value) do { \    if(arrayUsed(array) + 1lu >= arrayCapacity(array)) { \        size_t *data = ((size_t *)(array)-2); \        size_t newCap = arrayCapacity(array) * 2 + 2 * sizeof(size_t); \        data = realloc(data, newCap); \        (array) = (typeof(*(array)) *)&data[2]; \    } \    size_t *used = ((size_t *)(array) - 1); \    (array)[(*used)++] = (value); \} while(0)#define arrayPop(array) (assert(arrayUsed(array) > 0), (array)[--*((size_t *)(array) - 1)])#define arrayGet(array, idx) (assert((idx) < arrayUsed(array)), (array)[(idx)])// callback type has to be void (*)(T, U)// where T is the type of the items in the array,// and U is the type of the user_data argument.#define arrayMap(array, callback, user_data) do { \    for(size_t i = 0; i < arrayUsed(array); ++i) {\        callback(array[i], (user_data)); \    } \} while(0)#define arrayCopy(dest, src) do { \    for(size_t i = 0; i < arrayUsed(src); ++i) { \        arrayPush((dest), (src)[i]); \    } \} while(0)

Example usage:

static void callback(int item, void *user_data) {    printf("%d\n", item);}int main(void) {    Array(int) nums = NULL;    initArray(nums);    arrayPush(nums, 1);    arrayPush(nums, 2);    printf("capacity: %lu, used: %lu\n", arrayCapacity(nums), arrayUsed(nums));    // Iterating    for(int i = 0; i < arrayUsed(nums); ++i) {        printf("%d\n", arrayGet(nums, i));    }    // Iterating in reverse    for(int i = arrayUsed(nums)-1; i >= 0; --i) {        printf("%d\n", nums[i]);    }    Array(int) nums2 = NULL;    initArray(nums2);    arrayCopy(nums2, nums);    while(!arrayEmpty(nums)) {        printf("%d\n", arrayPop(nums));    }    arrayMap(nums2, callback, NULL);    freeArray(nums2);    freeArray(nums);    return 0;}
askedJul 4, 2022 at 12:39
Itai Nelken's user avatar
\$\endgroup\$

4 Answers4

9
\$\begingroup\$

General Observations

I think this is a great learning project, and asking for a review is a great idea.

The keywordtypeof is not part of the current C Programming Standard and that means the code isn't portable, it can only be compiled by compilers that have provided the extension.

Generally hiding memory allocation is a bad idea, it makes the code very hard to debug and maintain.

Expecting a function such ascallback() to be defined in a macro can be problematic.

While using macros in C can sometimes greatly reduce the amount of code written, it is very difficult to maintain code written this way, writing, debugging and maintaining such code is very difficult and needs to be heavily documented.

Alternate Implementation

Possibly a better way to implement this is to have a struct that contains the current capacity, the current size of the array in use and a void pointer that points to the data. Allocating the array would be a 2 step process, first allocate the struct and then allocate the data portion. Have a header file that defines the struct and function prototypes that operate on the struct. Have a C file that contains the functions that operate on the struct.

Test for Possible Memory Allocation Errors

In modern high-level languages such as C++, memory allocation errors throw an exception that the programmer can catch. This is not the case in the C programming language. While it is rare in modern computers because there is so much memory, memory allocation can fail, especially if the code is working in a limited memory application such as embedded control systems. In the C programming language when memory allocation fails, the functionsmalloc(),calloc() andrealloc() returnNULL. Referencing any memory address through aNULL pointer results inundefined behavior (UB).

Possible unknown behavior in this case can be a memory page error (in Unix this would be call Segmentation Violation), corrupted data in the program and in very old computers it could even cause the computer to reboot (corruption of the stack pointer).

To prevent thisundefined behavior a best practice is to always follow the memory allocation statement with a test that the pointer that was returned is not NULL.

answeredJul 4, 2022 at 14:29
pacmaninbw's user avatar
\$\endgroup\$
4
  • \$\begingroup\$In the compiler I'm writing (which is the use case for this array), I have wrapper functions for all the memory allocation functions that check for NULL. here for simplicity I replaced them with the regular libcmalloc,calloc, andrealloc.\$\endgroup\$CommentedJul 4, 2022 at 14:33
  • \$\begingroup\$Your alternate implementation is pretty much what I'm currently using. I was wondering if the benefits this macro implementation are enough to use it over even though it is harder to debug and maintain.\$\endgroup\$CommentedJul 4, 2022 at 14:36
  • 1
    \$\begingroup\$@ItaiNelken If you use the macros to call your struct functions it might be okay, otherwise the maintenance problems outweigh the benefits of using the macros. Please keep in mind when posting questions we want to review the actual code.\$\endgroup\$CommentedJul 4, 2022 at 14:38
  • 2
    \$\begingroup\$@ItaiNelken Still, even with your own “NULL checkingrealloc”, you should actuallyhandle allocation errors properly.realloc can fail, and you should (be able to) handle that after callingarrayPush. SoarrayPush should also have a way of returning/signaling errors - this seems impossible with these do-while macros? (Depending on the type of application you are writing, you may want to read “must” where I wrote “should”.)\$\endgroup\$CommentedJul 6, 2022 at 20:05
6
\$\begingroup\$

Bug

This is subtle.

(array) = (typeof(*(array)) *)&data[2]; may lead to UB.

OP's code assumes that the array element's can directly followCapacity andUsed - without padding. This is not certainly true as the array type may have an alignment greater than2*sizeof(size_t). See below for possible solution. Or perhaps use astatic_assert to at least detect test this problem.

Mis-matched specifier

#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : 0lu) is a problem as the return type will be the wider ofsize_t andunsigned long. Also applies toarrayUsed().

printf("capacity: %lu, used: %lu\n", arrayCapacity(nums), arrayUsed(nums)); works well when the return type isunsigned long but risks UB otherwise.

Instead, return a specific type object and use a matching specifer.

#define arrayCapacity(array) ((array) ? *((size_t *)(array)-2) : (size_t)0)printf("capacity: %zu, used: %zu\n", arrayCapacity(nums), arrayUsed(nums));`

Superfluousl

Thel inarrayUsed(array) + 1lu >= arrayCapacity(array) serves no purpose. Suggest a cleaner1u.

Mixed sign-ness and widths

for(int i = arrayUsed(nums)-1; i >= 0; --i) { incurs problems with extreme values. Stick to one type.

for(size_t i = arrayUsed(nums); i > 0;) {    i--;    printf("%d\n", nums[i]);}

Magic 8 ball?

Why 8 in#define ARRAY_INITIAL_CAPACITY 8? A naked 8 deserve explanation. 1 or 0 makes more sense.

Perhaps allow caller to override?

#ifndef ARRAY_INITIAL_CAPACITY#define ARRAY_INITIAL_CAPACITY 1#endif

Namespace

Rather thaninit...,free... items, use a consistent prefixarray... like all the others.

// #define initArray(array)#define arrayInit(array)

Codes usesarray...,ARRAY...,...Array and an unnamed file to hold it all. "array" is a very common word in coding.

Suggest a more unique name likedyarray for all to reduce collisions. Usedyarray... consistently to reduce scattering of namespace usage.


A clean way to handle the alignment issue mentioned above is to have aunion before the array instead of 2size_t. (or seealign_as)

typedef union {  struct {    size_t used;    size_t capacity;  } s;  array dummy;} array_header;

Allocate withsizeof(array_header) + sizeof(array)*n.array_header will have any needed padding to put it in front of an array ofarray.

answeredJul 4, 2022 at 14:52
chux's user avatar
\$\endgroup\$
7
  • 1
    \$\begingroup\$@ItaiNelken Extreme example: Considerarray aslong double * where*array may have an alignment requirement of 16 andsize_t is size 4. Then(array) = (typeof(*(array)) *)&data[2]; fails to form a valid pointer forarray. It really comes down to how portable you want your code.\$\endgroup\$CommentedJul 4, 2022 at 16:40
  • 2
    \$\begingroup\$@Itai Nelken, Yeslong double * works on your unnamed compiler/machine, but C does not specify that the alignment needs oflong double are not more that 2size_t. It can fail other places. Again, it really comes down to how portable you want your code. Do you want to code to spec or just your target compiler/machine?\$\endgroup\$CommentedJul 4, 2022 at 20:35
  • 2
    \$\begingroup\$@ItaiNelken: GCC for Linux or MacOS (thus using the x86-64 System V ABI) hasalignof(long double) == 16. For Windows it's only an 8-byte type, for compatibility with MSVC which doesn't expose the 80-bit x87 type.alignof(long double*) is of course only 8, we're talking about an array oflong double elements, not pointers. But on those ABIs,size_t is 8 bytes, andmalloc returns 16-byte-aligned memory, so you're actually fine.\$\endgroup\$CommentedJul 5, 2022 at 5:10
  • 3
    \$\begingroup\$@ItaiNelken: You'd have a problem for the Linux x32 ABI (ILP32; 32-bit pointers in long mode), wheresize_t is a 4-byte type, butalignof(long double) is still 16. So your array start would be an odd multiple of 8. (Try it withgcc -mx32:godbolt.org/z/aGocaEdK4)\$\endgroup\$CommentedJul 5, 2022 at 5:13
  • 1
    \$\begingroup\$You'd also have a problem on normal implementations with any type with more thanalignof(max_align_t), such as AVX__m256 on x86-64, or any 16-byte-aligned type (like__m128) on 32-bit x86 wherealignof(max_align_t) == 8, so malloc wouldn't be aligned enough to hold it. But over-aligned types are inherently problematic and would require avoiding realloc, unlike the 2x size_t assumption which as chux says you could avoid without fundamental changes to how you allocate.\$\endgroup\$CommentedJul 5, 2022 at 5:14
4
\$\begingroup\$

Take care here:

#define Array(type) type *

Observe thatconst Array(t) is adifferent type toArray(t) const! Similarly forvolatile. That might be what you want, but it may well trip up some of your users.

answeredJul 5, 2022 at 7:52
Toby Speight's user avatar
\$\endgroup\$
3
\$\begingroup\$
#define arrayPush(array, value) do { \    if(arrayUsed(array) + 1lu >= arrayCapacity(array)) { \

ifarray is null, 0 + 1 > 0 and this will go on to dereference the null pointer.

Also, it's unlikely to be reached, but this has a wraparound error if the arrayUsed value reaches max unsigned long. Should beused < capacity -1

answeredJul 5, 2022 at 19:47
AShelly's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.