Dynamic array of int in C

Question 1

This is my first C program I just wrote in like 30 minutes and I was wondering if you could give me any pointers on ways to improve this. I'm decent at programming (around 5 years experience), but have no experience with C so I imagine I'm not following the proper idioms and/or doing things properly.

Note: I didn't make it "generic", because learning templates was beyond the scope of this little adventure.

#include <assert.h>#include <stddef.h>#include <stdlib.h>#include <string.h>/** A dynamically sized array of `int` */typedef struct {  size_t capacity;  size_t length;  int *data;} DynamicArray;/** Appends `value` to the `DynamicArray` */void dynamic_array_push(DynamicArray *arr, int value) {  if (arr->length == arr->capacity) {    int *new_data = malloc(arr->capacity * 2 * sizeof(int));    memcpy(new_data, arr->data, arr->capacity * sizeof(int));    free(arr->data);    arr->data = new_data;    arr->capacity *= 2;  }  arr->data[arr->length] = value;  arr->length++;}/** Cleans up the `DynamicArray`. */void dynamic_array_cleanup(DynamicArray *arr) { free(arr->data); }DynamicArray dynamic_array_new(size_t cap) {  DynamicArray arr = {      .capacity = cap,      .length = 0,      .data = malloc(cap * sizeof(int)),  };  return arr;}int main() {  DynamicArray arr = dynamic_array_new(2);  // Putting in some test values  for (int i = 0; i < 30; i++) {    dynamic_array_push(&arr, i);  }  // Testing that the data came in correctly  for (int i = 0; i < arr.length; i++) {    assert(i == arr.data[i]);  }  dynamic_array_cleanup(&arr);  return 0;}

Question 2

C doesn't have templates, that's only in C++.

Question 3

A quick and dirty way of turning this into kind of generic is to wrap it in a macro that takes a type and the you can easily createDynamicArray for any type you like.

Question 4

Potential overflow incap * sizeof(int) is a vulnerability.

Question 5

@G.Sliepen C does havegeneric which can be used in a C++ template-like fashion.

Question 6

I know you're not supposed to have like "thank you" comments but seriously guys thanks for the overwhelming number of extremely helpful comments and answers. I wish I could accept multiple answers! This is honestly the best received post I've had across all of stackexchange and it has really made me happy!

Question 7

Thank you for the very nice Doxygen comment.Consider renaming the struct toDynamicIntArray,and then I won't even have to read the comment.Or perhaps in the use case you're tackling, there is a conceptin the business domain, such as Widget, that can berepresented as an int. Then we'd have aDynamicWidgetArray.

magic number

Some code bases, such asBIND9,choose to start (some of) their structswith a distinctmagic number.This is a debugging aid, so library code can choosetoassert that an object of the expected type was passed in.

It's completely optional, but you might consider adopting such a practice.It tends to make debugging withgdb a bit easier.

libc function reimplemented

    int *new_data = malloc(arr->capacity * 2 * sizeof(int));    memcpy(new_data, arr->data, arr->capacity * sizeof(int));    free(arr->data);    arr->data = new_data;

This is duplicating the functionality of arealloc()call.For the Author's sake, may as well just reuse a well -tested, -documented function.For the Gentle Reader's sake, write a single line instead of several, using a familiarcontract.

cleanup

Indynamic_array_cleanup the free() is very nice.Consider additionally zeroing out those three fields,so we don't have a dangling pointer that temptssome application to try a use-after-free blunder.Certainly it is accurate thatlength andcapacityare zero after the free().

Or write0xDeadBeefinto them, for the benefit of someone examining memory withgdb.

public API

You have defined a Public API that is broken -- it is not fit for purpose.

Sometimes malloc() returns NULL.I know, it's sad.But it's a fact of life.One which a library author must cope with.And C doesn't offer error reporting via exceptions.

Each time you make a call, your code must verify that malloc() succeeded.You should either document that malloc fail will tear downthe calling process, or you should return its status to the caller,so now it is the caller's problem to decide what to do.

mulitply by zero

If caller fails to supply new() with a positive numberthen Bad Things can happen.Recycling an old struct after free() can also run afoul of this.@MatthieuM. observes that

the growth function does not work with cap == 0, because 0 * 2 == 0. max(cap * 2, 1) will work.

Evencap * 2 + 1 would suffice.

Question 8

Thanks for the advice. Zeroing out the fields makes sense. In Rust I would have just taken ownership of the object if I needed to make a cleanup function and that would solve this, so I'm not used to thinking about it haha. With the magic number thing you are saying each struct would have a field at the beginning with the number and then we can essentially polymorphically check the magic number to check the type after receiving a pointer to verify it is of the correct type? That's very interesting... As for the malloc stuff yeah I completely forgot about that...

Question 9

malloc/copy/free isn't just reimplementingrealloc, it's forcing the worst-case behaviour. The actualrealloc function isn't required to suck the way C++std::vector is (unless libraries + compilers are very clever in working around the C++ allocator API), especially for large reallocations.

Question 10

I don't feel like adding an answer for just one tiny remark, so I would appreciate if you were to include it in your answer: the growth function does not work withcap == 0, because0 * 2 == 0.max(cap * 2, 1) will work.

Question 11

Use`realloc` - it can be much more efficient than malloc+copy+free

Unlike C++, C's allocator API doesn't totally suck: it hasrealloc.
(Unfortunately there's no portablealigned_realloc, butmalloc/realloc give you memory that's sufficiently for any standard type up tomax_align_t. MSVC has_aligned_realloc and_aligned_calloc, but I'm not aware of a POSIX or GNU equivalent.)

If there's free memory after the end of the existing allocation,realloc can just grow the existing allocation without having to copy anything. Or if the C library knows how to use a system call like Linuxmremap(MREMAP_MAYMOVE) (see theman page), it can move it to a new virtual address without actually copying the physical pages. (This probably only works if the allocation started at the beginning of a page, but that's what glibcmalloc does for large allocations. Actually it keeps the first 16 bytes of the page for bookkeeping data, so unfortunately you get a pointer that's aligned by 16 but misaligned by 32 and wider. But there isn't unrelated data thatmremap would unmap from where it's supposed.) Glibcrealloc does in fact usemremap(MREMAP_MAYMOVE); I checked withstrace ./a.out after bumping up the loop count for your testmain to 2 billion. Hopefully othermalloc implementations are similarly smart on platforms they use.

Besides the copying time, the other downside of alloc+copy that is that you need both copies allocated at once. For a 5 GiB array of int, you'd temporarily have 10 GiB of "dirty" anonymous pages, forcing the OS to find a lot of extra pages. At best just dropping some potentially-useful disk cache, at worst thrashing swap space as it pages out pages from other processes.

Callingrealloc is at worst like manually callingmalloc/memcpy/free, but can bemuch better, especially for large allocations. Don't fall into the same trap that C++std::vector is stuck in, of having to always alloc+copy+dealloc when growing.

As a GNU extension, there's areallocarray which checks for multiply overflow likecalloc does (but it doesn't zero the newly-allocated space when growing). (Available in glibc 2.26. OpenBSD 5.6, FreeBSD 11.0.) Portable C should do that manually.

Some implementations, including GCC,limit the max object size to PTRDIFF_MAX bytes. If you're going to check manually, that's probably a good limit, unless you really care about bending the implementation limits and having arrays of 2GiB or larger in 32-bit programs for example, right up toSIZE_MAX. (Glibcmalloc on 32-bit x86 Linux will allocate arrays larger than 2GiB if you ask it to, for example. Under a 64-bit kernel you have the full 4GiB of address-space. Subtractingint* first subtracts the raw pointers and then does an arithmetic right shift to scale bysizeof(int) (Godbolt), so you could get a negativeptrdiff_t instead of e.g. 1 billion, but probably nothing breaks if you don't run any code that relies on that.)

@chux points out thatptrdiff_t could be wider thansize_t in an implementation that wants to allow objects up to SIZE_MAX bytes. (Such an implementation would then have to effectively widen pointers before subtracting them and dividing or right shifting, perhaps with the help of the carry flag on ISAs that have one.)

So perhaps#define DYNAMIC_ARRAY_MAXBYTES (SIZE_MAX/2) as a conservative lower bound on implementation-defined limits. I think if you're going to useSIZE_MAX/2, there's no point involvingPTRDIFF_MAX, since I doubt there are any systems whereptrdiff_t is narrower thansize_t but the max object size is stillPTRDIFF_MAX. If you want to be paranoid about that,#define DYNAMIC_ARRAY_MAXBYTES ((PTRDIFF_MAX <= SIZE_MAX/2) ? PTRDIFF_MAX : SIZE_MAX/2). A C implementation could arbitrarily set a smaller limit on max object sizes if it wanted, but there's no standard macro to expose that.

With overflow and allocation-failure checking

#include <stdint.h>#include <stdbool.h>bool dynamic_array_push(DynamicArray *arr, int value){  size_t cap = arr->capacity;  if (cap == arr->length) {    cap *= 2;     // can't overflow if size_t is at least as wide as ptrdiff_t because it's unsigned.                  //   But that's not technically guaranteed, so maybe check cap < PTRDIFF_MAX / sizeof(int) / 2 before doubling.  (sizeof(int) == 1 is possible on some DSPs)    if (cap > PTRDIFF_MAX / sizeof(int)) {        cap = PTRDIFF_MAX / sizeof(int);   // grow to the max object size for typical C implementations        if (cap <= arr->length)            return false;    }    int *new_data = realloc(arr->data, cap*sizeof(int));    if (!new_data) {        return false;  // original data is still there, but we didn't grow    }    arr->data = new_data;    arr->capacity = cap;  }  arr->data[arr->length] = value;  arr->length++;  return true;}

I used the same constant twice (PTRDIFF_MAX / sizeof(int)) instead of checking againstPTRDIFF_MAX / sizeof(int) / 2 before doubling the capacity. This hopefully helps compilers only put one 64-bit constant into a register like x86-64 GCC does (Godbolt).

In your exiting API which returnsvoid, you could callabort() or something on failure, killing the whole process. That could be a appropriate for a simplistic program, but perhaps better to provide that behaviour via a wrapper likedynamic_array_push_or_abort so the documentation of the failure-case behaviour is right there in code using it, easy to search for and start replacing it on a case-by-case basis in a codebase that decides it wants to sometimes do something else.

A microbenchmark: over 2x faster for huge arrays

For growing the array from size 3 up to 2 billionints (8 GiB) and then reading them back,usingrealloc is more than twice as fast on my system, with 70x fewer page faults, 9x fewer TLB misses. (All the page faults are "soft", not paging to disk. TLB misses is only counting second-level TLB misses that cause page-walks.) A good fraction of the extra time is spent in the kernel, zeroing new pages

I timed this vs. the old version on my i7-6700k Skylake with dual-channel DDR4-2666 RAM, running Arch Linux, kernel 6.5, compiled with GCC 13.2.1-O3 -fno-plt (no-march options, although it doesn't use any new instructions even if compiled with-march=native. The branching on size for every push defeats auto-vectorization of the fill pattern.) Myenergy_performance_preference isbalance_performance so the CPU only runs 3.9GHz.Transparent hugepages are enabled (as with the default),defrag isdefer+madvise (and we're not makingmadvise calls, although my system has 32 GiB RAM, much of it free, so hopefully it is making hugepages.)

$ taskset -c 1 perf stat -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,dtlb_load_misses.miss_causes_a_walk,dtlb_store_misses.miss_causes_a_walk ./dynarray-slow Performance counter stats for './dynarray-slow':          5,616.15 msec task-clock                       #    0.999 CPUs utilized                             17      context-switches                 #    3.027 /sec                                       0      cpu-migrations                   #    0.000 /sec                                 779,398      page-faults                      #  138.778 K/sec                         21,628,006,730      cycles                           #    3.851 GHz                           31,442,121,299      instructions                     #    1.45  insn per cycle                29,540,890,921      uops_issued.any                  #    5.260 G/sec                              1,710,792      dtlb_load_misses.miss_causes_a_walk #  304.620 K/sec                             16,175,526      dtlb_store_misses.miss_causes_a_walk #    2.880 M/sec                            5.624499953 seconds time elapsed       3.514993000 seconds user       2.093666000 seconds sys

$ taskset -c 1 perf stat -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,dtlb_load_misses.miss_causes_a_walk,dtlb_store_misses.miss_causes_a_walk ./dynarray-fast  Performance counter stats for './dynarray-fast':          2,459.69 msec task-clock                       #    1.000 CPUs utilized                              8      context-switches                 #    3.252 /sec                                       0      cpu-migrations                   #    0.000 /sec                                  10,601      page-faults                      #    4.310 K/sec                          9,517,533,410      cycles                           #    3.869 GHz                           27,308,580,581      instructions                     #    2.87  insn per cycle                22,551,679,420      uops_issued.any                  #    9.169 G/sec                              1,831,019      dtlb_load_misses.miss_causes_a_walk #  744.412 K/sec                                116,378      dtlb_store_misses.miss_causes_a_walk #   47.314 K/sec                            2.459971145 seconds time elapsed       1.574258000 seconds user       0.881689000 seconds sys

The testmain is the same for both. The starting point of3 means we're about half way between growth points when we exit. (I was just randomly playing with different values for that, this isn't a recommendation, just what I happened to test with.)

int main(void){  DynamicArray arr = dynamic_array_new(3);  // Putting in some test values  for (int i = 0; i < 2000000000; i++) {    dynamic_array_push(&arr, i);  }  // Testing that the data came in correctly  for (int i = 0; i < arr.length; i++) {    assert(i == arr.data[i]);  }  dynamic_array_cleanup(&arr);  return 0;}

The system-calls it makes are:

... slow versionbrk(NULL)                               = 0x560b599ad000brk(0x560b599ce000)                     = 0x560b599ce000brk(0x560b599fe000)                     = 0x560b599fe000mmap(NULL, 200704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3717dd9000brk(0x560b599ce000)                     = 0x560b599ce000mmap(NULL, 397312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3717b87000munmap(0x7f3717dd9000, 200704)          = 0mmap(NULL, 790528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3717ac6000munmap(0x7f3717b87000, 397312)          = 0...

... fast versionbrk(NULL)                               = 0x5567b45c9000brk(0x5567b45ea000)                     = 0x5567b45ea000mmap(NULL, 200704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbc9f7d4000       # switch to mmap, will have to copy this timemremap(0x7fbc9f7d4000, 200704, 397312, MREMAP_MAYMOVE) = 0x7fbc9f582000mremap(0x7fbc9f582000, 397312, 790528, MREMAP_MAYMOVE) = 0x7fbc9f4c1000mremap(0x7fbc9f4c1000, 790528, 1576960, MREMAP_MAYMOVE) = 0x7fbc9f340000mremap(0x7fbc9f340000, 1576960, 3149824, MREMAP_MAYMOVE) = 0x7fbc9f03f000...

Glibc's defaultmalloc starts with small allocations from the break (brk). This presumably is just extending onrealloc, vs. alloc+copy leading to some heap-shifting (as @Davislor's answer describes) leaving some allocated + dirty space on the free list, not given back to the OS. (But which other small allocations could use.) So that's why we see twobrk allocations (after the initialbrk(NULL) to find the break address) from the slow version vs. only one from therealloc version before switching over tommap (and copying this one time, at size 1.5MiB I suppose).

Interestingly,mremap does keep moving the virtual address, forcing TLB invalidation for the existing part of the array. With our workload that's append-only until the end, those invalidations don't cause extra TLB misses except in the final read-through after the last growth (when it's so big the start of the array would probably already have been evicted from the TLBs in this case). And it's a single-threaded process so there's no need for TLB shootdowns (inter-processor interrupts) to other cores. The downside could be bigger in other cases.

It might be interesting to check/proc/<PID>/maps to see if there was free virtual address-space after the mapping. If not, we're seeing the Heap-Shifting effect that @Davislor's answer talked about, except in reverse. If there is space, it's a missed optimization not to use it.

Reducing the growth factor to 1.5 for sizes above 512 MiB or something could make sense, especially on platforms like GNU/Linux that are known to have an efficientrealloc which can use page-table tricks. Or depending on your use-case, even make growth linear instead of exponential above some threshold like 1 GiB or 1 billionint's. The amount of work to update N page-table entries is still O(N), but the constant factor istiny. But more branching for decision making on growth will bloat the code at each use ofpush, if it inlines like you want it to for efficiency in the non-growing case.

Partial inlining is possible:

static inline   // in the .hbool dynamic_array_push(DynamicArray *arr, int value) {  if (arr->capacity == arr->length) {    return dynamic_array_push___growth_needed(arr, value);  // defined in a separate .c file  }  arr->data[arr->length] = value;  arr->length++;  return true;}

This keeps machine-code size small at each call-site. It makes every caller a non-leaf function, but that was already true because the growth path calledrealloc.

We could have just split the growth code into a separate function, not duplicating thearr->data[arr->length] = value; /arr->length++; there and not passingvalue as an arg. But it's common topush values that are temporary at the call-site and not needed after the push. Passing it as an arg lets the caller avoid needing acall-preserved register for it, instead just computing it in a call-clobbered register since it's dead after the function call. (I haven't actually looked at an example caller in a loop to see what difference it makes. This makes the inlined store / increment code conditional on the inlined compare rather than the return value of the__growth_needed slowpath, which I expect is probably about equal. A simplistic caller that just pushes random numbers or loops another array might not be very representative anyway.)

Question 12

cap > PTRDIFF_MAX / sizeof(int) is the wrong test. Considerptrdif_t may have 2x the number of bits assize_t. Usecap > SIZE_MAX / sizeof(int).prtdiff_t not needed in this task.

Question 13

@chux-ReinstateMonica: In theoryptrdiff_t could be wider, but it seems unrealistic unless some implementation defines the behaviour of pointer subtraction for pointers to different objects, but limits size of any single object to much smaller than the address-space. (Or something with non-flat memory models). Perhapsmin(PTRDIFF_MAX, SIZE_MAX), since there are implementations (such as GCC) where the implementation-defined max object size is PTRDIFF_MAX bytes. (This means the compiler can assume pointer subtraction never has signed overflow, including beforesar for sizeof>1)

Question 14

Awiderptrdiff_t thansize_t insures pointer subtraction never has signed overflow, even with objects near sizeSIZE_MAX.

Question 15

A few points (without repeating thatrealloc() would fulfill the need better than a DIY version, or other points in current answers.)

int *new_data = malloc(arr->capacity * 2 * sizeof *new_data); would be appropriate if/when the array is to become an array oflong ofdouble. Less likely to overlook changing thesizeof(int) thereby creating a hard-to-find bug.
...push() needs to return success or failure, and that return code needs to be checked by the caller.
This code mixes passing an entirestruct (fromdynamic_array_new()) with passing apointer to thestruct. I'd prefer to see code following one practice or the other; not mixing both.
(nit-picking) Better to usesize_t (instead ofint) when coding an array index (as in the loops ofmain()).
Read the part of therealloc() documentation that says something like "acts just likemalloc() if original pointer is NULL." In other words, there's no need for an initial allocation in..._new(). Just let the first..._push() with a NULL pointer perform the initial allocation. Maybe the caller will only create 1 element in the array. You probably already sense that2 is an arbitrary initial allocation.

PS: Revising to use #5, above, would obviate #3. When you want/need an array, simply:

    array_t arrN = { 0 };

would do the job. All struct members are set to zero (or NULL), and the first..._push() handles initialising the length (grows by 1), capacity (set to 1, then doubles on each successive call) and thedata pointer is initialised then used. Simple...

Question 16

Usingvoid *new_data = ... instead ofint *new_data = ... is less maintenance as it supports the "Less likely to overlook changing the sizeof(int)...." concern.

Question 17

@chux-ReinstateMonica Yes, that's an alternative. I've tended to prefer the datatype "over on the left" andsizeof var_name as being easier to see. Is there an advantage (that I'm not seeing) in the version you suggest? Cheers! :-)

Question 18

This answer advocates "Less likely to overlook changing the sizeof(int) thereby creating a hard-to-find bug." By usingvoid * on-the-left, we have one less piece of code to update should the type of member.data change by similar reasoning.

Question 19

@chux-ReinstateMonica Thanks :-) I was still inmalloc() mindset, notrealloc(), when writing that comment... Thank you:-)...

Question 20

I should have suggestedvoid *new_data = realloc(arr->data, sizeof arr->data[0] * arr->capacity * 2); for a more complete idea.

Question 21

Check for Overflow

At present, the algorithm to enlarge the vector does not detect if the capacity wraps around. It is actually possible, on some architectures, to overflow asize_t before getting an out-of-memory error (typically by allocating more than 2GiB on a 32-bit memory model). Either compare the old size toSIZE_MAX/2, or (since it is unsigned, and overflow is guaranteed to truncate) check whether the new size is smaller than the old.

Beware of Heap-Shifting

Let’s say you’re working with big data and read the first megabyte into the vector. So far, so good. You’ve got plenty of memory in the heap. You run out of room and need to resize it. The algorithm checks the heap and finds one free block, past the end of the first one you allocated. It allocates 2 MB of that copies the entire array to its new location, and frees the original 1 MB.

You have more than 2 MB of data, so you need to resize again. The algorithm checks and finds one free block 1 MB in size, then the allocated block of 2 MB, then a big free block. It needs 4 MB, which is more than 1 MB, so it allocates 4 MB to the right of the current block, copies the entire array to that, then frees the 2 MB block. You now have a 3 MB free block followed by a 4 MB allocated block. You need to resize again. You now need 8 MB, which is too big for your first free block, so you move the array over to the right again and free the previous block to get a 7 MB one. This will keep happening each time, and you will never re-use any of the memory you freed. And copying a large array is expensive!

What happens if you increase the size by only 50% each time, instead of 100%? After five iterations of this, you will have a free block 8.125 times the size of your original, and you will only need one 7.59 times as large, so you will be able to go back to the beginning of the heap, not keep expanding it further and further to the right.

Of course, the real solution here is to userealloc(), which can detect if the current block can be enlarged without moving it. But that is a problem to be aware of, especially when using C++-style memory management without this feature.

Question 22

The way you explained whyrealloc is better made a lot of sense. Seems great to reuse the existing memory without moving if it can be done. Thanks!

Question 23

3 comments: 1) Good explanation. 2) Need to start with allocating2 to gain traction for 50% increases. 3) "Grow without shifting" may only apply to 'pure' cases of accessing heap. Intervening code (like an innocuousprintf()) may lead to having to relocate the block anyway.

Question 24

Unless you keep large chunks in the free-list (instead of returning them to the OS), that virtual address-space "at the start of the heap" will be unmapped for a while before you use it again. It's likely that different physical pages will be allocated to map those virtual pages when you map them again, so the reuse doesn't help with L3 cache hits. (And doesn't help with TLB hits since those TLB entries had to get invalidate on unmap).

Question 25

If you're worried about not being able to grow to more than half of virtual address-space with one such array that's the only thing you're allocating, then yeah, a good realloc, e.g. that can usemremap(MAYMOVE) or equivalent is important, like I showed in my answer, avoiding double-allocation and the actual memory traffic for copying. Or like you said by growing an existing allocation if there's nothing after it. (Or if there's enough space total before+after an allocation, allocate +memmove). But smaller growth factors don't seem great if you typically still copy every time.

Question 26

Clarifyingsize_t overflow concerns, etc.

Test before scaling, watch out for a 0 capacity, userealloc() and size to the referenced object, not type, check for allocation success:

// Prevent overflowif (arr->capacity > SIZE_MAX/2/sizeof arr->data[0]) {  // Handle overflow with TBD code}size_t new_capacity = (arr->capacity > 0) ? arr->capacity * 2 : 1;void *new_data = realloc(arr->data, new_capacity * sizeof arr->data[0]);if (new_data == NULL) {  // Handle out-of-memory with TBD code}arr->data = new_data;arr_capacity = new_capacity;

J_H 43.3k3 gold badges38 silver badges158 bronze badges · Accepted Answer · 2023-12-14 15:52:56Z

Thank you for the very nice Doxygen comment.Consider renaming the struct toDynamicIntArray,and then I won't even have to read the comment.Or perhaps in the use case you're tackling, there is a conceptin the business domain, such as Widget, that can berepresented as an int. Then we'd have aDynamicWidgetArray.

magic number

Some code bases, such asBIND9,choose to start (some of) their structswith a distinctmagic number.This is a debugging aid, so library code can choosetoassert that an object of the expected type was passed in.

It's completely optional, but you might consider adopting such a practice.It tends to make debugging withgdb a bit easier.

libc function reimplemented

    int *new_data = malloc(arr->capacity * 2 * sizeof(int));    memcpy(new_data, arr->data, arr->capacity * sizeof(int));    free(arr->data);    arr->data = new_data;

This is duplicating the functionality of arealloc()call.For the Author's sake, may as well just reuse a well -tested, -documented function.For the Gentle Reader's sake, write a single line instead of several, using a familiarcontract.

cleanup

Indynamic_array_cleanup the free() is very nice.Consider additionally zeroing out those three fields,so we don't have a dangling pointer that temptssome application to try a use-after-free blunder.Certainly it is accurate thatlength andcapacityare zero after the free().

Or write0xDeadBeefinto them, for the benefit of someone examining memory withgdb.

public API

You have defined a Public API that is broken -- it is not fit for purpose.

Sometimes malloc() returns NULL.I know, it's sad.But it's a fact of life.One which a library author must cope with.And C doesn't offer error reporting via exceptions.

Each time you make a call, your code must verify that malloc() succeeded.You should either document that malloc fail will tear downthe calling process, or you should return its status to the caller,so now it is the caller's problem to decide what to do.

mulitply by zero

If caller fails to supply new() with a positive numberthen Bad Things can happen.Recycling an old struct after free() can also run afoul of this.@MatthieuM. observes that

the growth function does not work with cap == 0, because 0 * 2 == 0. max(cap * 2, 1) will work.

Evencap * 2 + 1 would suffice.

Thanks for the advice. Zeroing out the fields makes sense. In Rust I would have just taken ownership of the object if I needed to make a cleanup function and that would solve this, so I'm not used to thinking about it haha. With the magic number thing you are saying each struct would have a field at the beginning with the number and then we can essentially polymorphically check the magic number to check the type after receiving a pointer to verify it is of the correct type? That's very interesting... As for the malloc stuff yeah I completely forgot about that...
malloc/copy/free isn't just reimplementingrealloc, it's forcing the worst-case behaviour. The actualrealloc function isn't required to suck the way C++std::vector is (unless libraries + compilers are very clever in working around the C++ allocator API), especially for large reallocations.
I don't feel like adding an answer for just one tiny remark, so I would appreciate if you were to include it in your answer: the growth function does not work withcap == 0, because0 * 2 == 0.max(cap * 2, 1) will work.

Movatterモバイル変換

Stack Exchange Network

Dynamic array of int in C

5 Answers5

magic number

libc function reimplemented

cleanup

public API

mulitply by zero

Use`realloc` - it can be much more efficient than malloc+copy+free

With overflow and allocation-failure checking

A microbenchmark: over 2x faster for huge arrays

Partial inlining is possible:

Check for Overflow

Beware of Heap-Shifting

You mustlog in to answer this question.

Linked

Related

Hot Network Questions

Subscribe to RSS

Movatterモバイル変換

Dynamic array of int in C

5 Answers5

magic number

libc function reimplemented

cleanup

public API

mulitply by zero

Userealloc - it can be much more efficient than malloc+copy+free

With overflow and allocation-failure checking

A microbenchmark: over 2x faster for huge arrays

Partial inlining is possible:

Check for Overflow

Beware of Heap-Shifting

You mustlog in to answer this question.

Linked

Related

Hot Network Questions

Use`realloc` - it can be much more efficient than malloc+copy+free