Malfeasance - Calloc and Thread Safety

Calloc and Thread Safety

During my undergrad, I took a module on High Performance Computing which was taught in C and covered the usual suspects: MPI, OpenMP, CUDA etc. For each assignment, students uploaded their source code which was then automatically tested before being inspected by hand. The content was relatively straightforward, however when the 3rd assignment was returned to us, a number of students had been docked 25% of the mark with the comment:

calloc is NOT guaranteed to be thread safe.

Firstly, I did a quick sanity check and looked up the libc reference:

void * calloc (size_t count, size_t eltsize)
Preliminary: | MT-Safe | AS-Unsafe lock | AC-Unsafe lock fd mem | See POSIX Safety Concepts.

Checking POSIX Safety Concepts revealed that 'MT-Safe' does indeed mean multithread safe. Relief! There is some sanity in the world and calloc is promised to be thread safe in any standard implementation.

However this left the question of the lecturer's demonstration code:

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main () {
  int* my_ptr;
  int tid, i, j; 
  int overlaps     = 0; 
  int num_threads  = 8;
  int alloc_amount = 1000000;
  unsigned int start_address [num_threads];
  unsigned int end_address   [num_threads];

  #pragma omp parallel private(tid,my_ptr) default(shared) num_threads(num_threads)
  { /* begin parallel region */
    tid                = omp_get_thread_num();
    my_ptr             = (int*) calloc(N, sizeof(int));
    start_address[tid] = (unsigned int) &my_ptr[0];
    end_address  [tid] = (unsigned int) &my_ptr[N-1];
    free(my_ptr);
  } /* end parallel region */

  unsigned int range = end_address[0] - start_address[0];

  for (i = 0; i < num_threads; i++) {
    for (j = 0; j < num_threads; j++) {
      if (j == i) continue;
      if ((start_address[j] > start_address[i]) 
      &&  (start_address[j] - start_address[i] < range)) {
        overlaps = 1;
      }
    }
  }

  if (overlaps == 1) printf("FAIL\n");
}

Which also came with the attached comment:

On the CS desktops, this shows that arrays created by calloc running on different threads sometimes overlap by significant fractions of a Mb, using both icc and gcc. On my desktop iMac (gcc), this never happens, so it really is implementation dependent and not portable to assume calloc is thread safe in an OpenMP context.

Running the program on the university machines did indeed lead to a plethora of FAIL messages, however examining the source code leads to a number of issues:

Issue 1: Type Width

start_address[tid] = (unsigned int)&my_ptr[0];

Here we are making an implicit assumption that a conversion from void* to unsigned int makes sense. Consulting the documentation reveals that an unsigned int is guaranteed to be at least 2 bytes wide (but typically 4 bytes wide). Contrastingly, our pointers are going to be either 4 or 8 bytes wide. This mismatch leads the program to throw away half the memory address on 64 bit systems and explains why the lecturer saw issues on the CS Desktops, but didn't have any problems on his (presumably 32-bit) Mac.

Issue 2: Return Value Checking

my_ptr = (int*) calloc(N, sizeof(int));

When a calloc call fails, it returns NULL. This can happen when there is not enough memory to perform the allocation or in rare cases because the heap has become highly fragmented and no contiguous memory of the required size exists. In this case, the program will treat NULL as 0 and consequently

(start_address[j] - start_address[i] < range)

will always be true, leading to a false positive.

Issue 3: Race Conditions

The code is intended to produce the following sequence of events:

A: my_ptr = (int*) calloc(N, sizeof(int));
B: my_ptr = (int*) calloc(N, sizeof(int));
A: free(my_ptr);
B: free(my_ptr);

However, it is equally possible that the following pattern occurs:

A: my_ptr = (int*) calloc(N, sizeof(int));
A: free(my_ptr);
B: my_ptr = (int*) calloc(N, sizeof(int));
B: free(my_ptr);

which our example program will treat as an overlap, but in fact is perfectly acceptable as the memory was freed before being reallocated. We can fix this by introducing a #pragma omp barrier between the allocation and the free to ensure only the first pattern can occur.

Wrap up

Despite the lecturer warning their students about all of these issues, they still crept into the relatively simple demonstration code. Thankfully, after some wrangling over email, the marks were returned to the affected students.