In this article, I’ll use the POSIX Threads Library to provide an introduction to threads and its implementation in the Linux environment. I’ll also try to provide a good mix of application and introductory theory along the way. Chapter 4 of ALP will serve as the primary reference. This article is written more for me than anyone else, so it’s a bit terse in areas that I’m already familiar with.
Threads are somewhat like processes in that they are a mechanism to allow a program to do more than one thing concurrently at a more fine-grained level of execution. Conceptually, threads exist within a process, and unlike processes, these new threads share same memory space and resources as the original.
Threads are not part of the standard library. Instead, Linux implements the POSIX Threads Library. As a result, you must include the header file pthread.h and link your program with libpthread. Of course, POSIX threads are not the only thread implementation.
Like processes, each thread is identified by a thread type, pthread_t. Threads can be created using the pthread_create function, and there are restrictions on the parameters that a threaded function may accept. Threads can be joinable or detached. A joinable thread hangs around until pthread_join is called to obtain its return value; a detached thread automatically cleans itself up after it’s finished. An example of creating a thread might be:
pthread_t thread_id;
pthread_create (&thread_id, NULL,
&my_function, NULL);
It is also possible for a thread to request that another thread be terminated. This is known as thread cancellation and is done using pthread_cancel. The effect of a cancellation request is dependent on the type of thread: whether it is asynchronously cancelable, synchronously cancelable (deferred), or uncancellable. Sychronous threads can be cancelled through the use of cancellation points, which can be set using pthread_testcancel. And for critical sections, cancellation can be disabled entirely, using pthread_setcancelstate.
Normally, threads share the same variables and memory space. But thread-specific data can be used to duplicate variables in each thread if necessary. These shared data items can not be accessed using normal mechanisms. Instead, use a pthread_key_t in conjunction with pthread_setspecific and pthread_getspecific.
Sometimes it’s also useful to clean up functions. In this case, a cleanup handler can be called when a thread exits. This is done using a stack-like discipline: pthread_cleanup_push and pthread_cleanup_pop. For instance, you may want to write a cleanup handler which frees memory managed by malloc. One way to do it might be:
pthread_cleanup_push (free, temp_buffer);
/* code which might cancel thread */
pthread_cleanup_pop (1);
As you can probably guess, programming with threads is tricky as it is. But the introduction of threads also introduces two major sources of bugs in our code: race conditions and deadlocks. To eliminate race conditions, we use mutual exclusion locks, or mutexes. This support is generally provided by the operating system, and can be used to block execution paths. Deadlocks require a bit more explanation.
A deadlock occurs when threads are blocked waiting on a state that will never occur. For example, if thread A mutex waits on thread B, and thread B mutex waits on thread A, a potential deadlock has occurred. In such a case, the type of mutex will determine if an actual deadlock occurs: a fast mutex (the default) will always deadlock, a recursive mutex will allow multiple locks, and an error-checking mutex will detect a deadlock and return EDEADLK.
While mutexes allow us to block on the variable level, we can also block on the thread level through the use of semaphore locks and condition variables. Semaphores are essentially atomic counters. Similarly, condition variables provide atomic ways to set and unset flags in threaded programs without resulting in race conditions. Condition variables can be declared using the pthread_cond_t type.
The question of whether to use threads is still a hotly debated topic. In general, programs will benefit from threads most when memory must be shared and tasks can be clearly parallelized.