Arena Allocators

I never did much with C in the past. I had a brief introduction for a few weeks during a course discussing programming paradigms, but I never gave the language a real look. I figured a decent way to get accustomed to C syntax and semantics and learn something new along the way would be to implement one of the simpler memory allocation techniques, the arena allocator.

Arena allocators are different compared to your general purpose memory allocator which might use a linked list. Arenas do not suffer from internal memory fragmentation from deallocation and they can make allocations faster. How does it achieve that? Arena's are also called linear or bump allocators. Using an offset into a chunk of memory, you retrieve the next block of arbitrary size by bumping the offset forward. Since only an offset is tracked, the downside is that you cannot free individual allocations. The whole arena must be freed at once. Another potential issue is that the order of allocations can be can lead to wasted space. We'll discuss why later when we talk about memory alignment.

Implementation

To start, we'll need to define an arena struct to hold the required data.

typedef struct arena {
  void *memory;
  size_t size;
  size_t offset;
} arena_t;

The memory field will contain a pointer to the beginning of the block of memory, size is the total size of the arena in bytes, and offset tracks how many bytes is already in use and where the next allocation can be made.

Initializing the Arena

We'll define an arena_init function that accepts a size in bytes to be used to reserve memory.

arena_t *arena_init(size_t size) {
  size_t total_size = sizeof(arena_t) + size;

  void *region = mmap(NULL, total_size, PROT_READ | PROT_WRITE,
                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

  if (region == MAP_FAILED) {
    return NULL;
  }

  arena_t *arena = (arena_t *)region;
  arena->memory = (char *)region + sizeof(arena_t);
  arena->size = size;
  arena->offset = 0;

  return arena;
}

Since we'll also be placing the arena struct in the memory, we'll need to include the arena struct size in our calculations for the total required bytes. The mmap function is then used to request a mapping for the total byte size in virtual address space.

void *mmap(void addr[.length], size_t length, int prot, int flags, int fd, off_t offset);

The mmap system call is used to create mappings in virtual address space using files or devices. The prot argument is used for memory protection, so PROT_READ and PROT_WRITE allows reading and writing to the memory. The flags argument is used to modify visibility of changes made to the memory from other processes. Using MAP_PRIVATE makes updates private to the process, and MAP_ANONYMOUS is used when the mapping is not backed by a file. The rest of the fields are not applicable with MAP_ANONYMOUS.

The call to mmap can either return a pointer to the requested mapping, or MAP_FAILED (-1) if there was an error.

Continuing, the pointer returned from mmap is casted as an arena, fields initialized with arena->memory set taking into account that the arena itself is stored at the beginning of the memory. The region is casted as char * so that we can use single byte pointer arithmetic when adding sizeof(arena_t).

Freeing the Arena

Freeing the arena is very simple. Like mentioned earlier, you cannot free individual allocations made with the arena. So to free the arena allocations, you set the internal offset to 0. This way, further allocations are overwriting previous allocations. This freeing mechanism implies that all allocations have the same lifetime.

void arena_free(arena_t *arena) {
  arena->offset = 0;
}

Destroying the Arena

To destroy the arena if no further allocations need to be made, a call to munmap with the total memory size will return everything to the OS.

int munmap(void *addr, size_t len);

This system call will remove len amount of bytes, starting at addr. In our case, we want to clear the total arena size (useable allocation space AND the arena stored at the beginning) starting from arena->memory which stored the starting address.

int arena_destroy(arena_t *arena) {
  return munmap(arena->memory, sizeof(arena_t) + arena->size);
}

Making Allocations

Finally, we'll look at how allocations are made. Allocations are made by using the offset which tracks where available memory starts. We would then move the offset forward using the size of the allocation being made.

There is one problem that comes with this though, and that's alignment. Let's look at an example using 32-bit architecture.

Alignment

Say we were to allocate a char (1 byte), an integer (4 bytes). Unaligned memory might look like this:

 0 1 2 3 4 5 6 7
|       |       | (word boundaries)
|c|i|i|i|i| | | |

The issue is that systems will load the 4 byte integer using word boundaries. A 32-bit system has 4 byte words. If I was to try reading the integer, both words would need to be read, requiring more CPU processing.

It's important to note that data alignment is usually powers of 2. Modern CPUs are optimized to read data according to word boundaries. Chars are 1-byte aligned, ints are 4-byte aligned (alignment can be found using alignof(type). Now let's align the data properly according to data type alignment:

 0 1 2 3 4 5 6 7
|       |       |
|c| | | |i|i|i|i|

Notice the integer is now aligned with the next word boundary. With this layout, the integer can be read from one word, requiring less instructions. Of course, there is a new issue, and that's the 3 wasted bytes that cannot be used due to arena allocators only moving the offset forward. This can only be solved if your allocations are predictable. If you were to allocate the integer first, then the char, the memory would look like this:

 0 1 2 3 4 5 6 7
|       |       |
|i|i|i|i|c| | | |

Now it would be possible to use the remaining 3 bytes if there were additional chars to allocate.

Bringing It Together

Here's a simple alloc implementation:

void *arena_alloc(arena_t *arena, size_t size, size_t alignment) {

  if (alignment == 0 || (alignment & (alignment - 1)) != 0) {
    return NULL;
  }

  size_t aligned_offset = (arena->offset + alignment - 1) & ~(alignment - 1);

  if (aligned_offset + size > arena->size) {
    return NULL;
  }

  void *next_free_addr = arena->memory + aligned_offset;
  memset(next_free_addr, 0, size);

  arena->offset = aligned_offset + size;

  return next_free_addr;
}

First, check that the provided alignment is a power of 2 (and not 0). There's a cool trick here using bitwise operators to determine if the alignment is a power of 2. Let a be the alignment. If a & (a - 1) != 0, then its not a power of 2. For example, if a = 4, we have 4 & 3 which is 100 & 011 in binary which is 000. If a = 7, 7 & 6 = 111 & 110 = 100.

We then calculate the aligned offset with another bitwise operator trick. Let's say arena->memory = 0x00 and arena->offset = 0x06. We want to allocate an 4-byte aligned int which should be placed at 0x08 with proper alignment. The aligned offset is then calculated as 6 + 4 - 1 & ~(4 - 1) which in binary is 1001 & ~(0011). Invert the RHS bits with the tilde (~) operator then bitwise AND them and you're left with 1000, the next 4 byte alignment.

A simple check is needed to make sure that the requested allocation size does not overflow the total memory of the arena. The arena is not resized when its memory is exhausted. There are some cases where arenas can expand, but to my knowledge, arenas are used where memory management is predictable, like batch creating temporary objects in games.

The aligned offset is used to get the pointer for the allocation. The memory is zeroed in case it was previously used before a call to arena_free(). Then the arena offset is bumped for the next allocation.

En Conclusion

Arenas are not a silver bullet as they must still be used in the correct context. In environments with dynamic memory requirements, you would need to allocate a large block of memory which can lead to a lot of waste if usage is low in practice. Arenas also only support a single lifetime, so you must be aware of where the arena has been used if calling arena_free(). The arena allocator is a very useful tool in memory management when the environment is predictable. Arenas offer very fast allocations and deallocations and minimize wasted space when used correctly.

Source code