- If a computer has multiple processors, then your program can take advantage of its ability to execute several things at once. Some modern processors even have multiple CPUs on a single chip, so there are, in principle, significant performance gains to be had.
- You may want to overlap activities, particularly when some operations involve waiting for an external event. For example, you may want to have one part of your system do disk or network I/O while another part continues computing. This case is important even in the absence of hardware multiprocessing.
- Really an important subcase of the above application: you can often improve a user's experience by separating the interactive parts of a UI from longer computation. Users get frustrated quickly if, for example, their mouse pauses, even if they just requested a laborious computation. An important UI design principle involves the user's ability to see and manipulate partial results rather than waiting for some long process to complete. Concurrency is a natural way to think about these things.
There are two drawbacks to using processes for some concurrent systems:
- Creating a new process, complete with its own virtual memory and other resources, can be an expensive operation.
- Once you have separate processes running, getting them to cooperate/communicate becomes complicated. There are ways for processes to communicate, of course. The most obvious is via the file system or through pipes (which look like files to your program but have special OS support). Processes can also send signals to each other. But even with fairly efficient communication methods, the context switch from one process to another (and the copying of data from one address space to another), can be quite expensive.
A particular path of execution through a program is called a thread of control. A process, then, can consist of one program/instruction segment, a single, shared adress space, and multiple threads of control. Each thread has its own virtual CPU (registers and program counter) and stack.
Threads, therefore, are cheap to create, and they can theoretically communicate with low overhead (no copying or context switching). The Linux apporach to threads is unusual and interesting. We will concern ourselves here with the widely supported POSIX threads package, but the general techniques apply to any threads-based system.
Creating and Waiting for Threads
In the POSIX thread model, one creates a thread withpthread_create()
, and one waits for a thread to
finish with pthread_join()
.
int pthread_create(pthread_t *thread,
pthread_attr_t *attr,
void *(*start_routine)(void *),
void *arg);
int pthread_join(pthread_t th, void **thread_return)
Think of pthread_create()
as a form of function
call: You would like to exectute start_routine(arg)
,
but have the function run in a new thread. The
thread
argument to pthread_create()
is
a place for the system to store a reference to the thread, its
thread ID, so that you can refer to it later, for example to join
with it. The attr
argument is a place to set all
kinds of attributes, such as the stack size. Passing in
NULL
gives you default values for everything. The
third argument is a pointer to the function the thread should
invoke when it starts up. This function must take and return a
void *
(and you will make appropriate casts in your
program). The final argument will be passed to the
start_routine
when the thread begins execution.
Note that the function the thread runs does not receive a
thread identifier. If a thread wants to find out it's ID, it can
call the pthread_self()
function:
pthread_t pthread_self(void);
A call to pthread_join()
blocks until thread
th
exits. It doesn't block if the thread is already
done. The thread's return status is placed in the location
specified by thread_return
.
All the POSIX thread routines return 0
on success
or an error value (from the same set as errno
) on
failure. They do not use errno
.
But creating and waiting for threads is only the start of the story.
Programming with Threads
If it is hard to have independent processes communicate efficiently, the major difficulty in writing thread-based programs is keeping threads from interfering with each other. The problem arises because, in principle, the actual machine instructions for each thread may execute in any order.
Consider a simple example in which two threads are using a
linked list. They are each taking items off a global list (whose
first element is pointed to by the global variable
work_list
) processing them independently using the
following code:
while (work_list != NULL) { /* Line 1 */
elem = work_list; /* Line 2 */
work_list = work_list->next; /* Line 3 */
elem->next = NULL; /* Line 4 */
process(elem); /* Line 5 */
}
As long as the two threads are not executing this sequence of code at
the same time, everything is fine. However, suppose things happen in
this order:
Thread 1 executes lines 1 and 2.
Thread 2 executes lines 1 and 2.
Thread 2 executes line 3.
Thread 1 executes line 3.
The result is that both threads will now process the same element
(which might bring about disastrous results as they both update
the same data with no coordination) and one element has been lost
from the work_list
(there might even be segmentation
fault as a result)!
The above code fragment is intended to be executed as a unit. In the middle of the code, the shared data is in an inconsistent state that should not be seen by other threads. Such a section of code is called a critical region.
Whenever multiple threads manipulate common data, operations must be synchronized so that no thread ever sees data in an inconsistent state. That is, when one thread is manipulating the data, other threads must be excluded from the critical reagion.
An operation is atomic if it appears to happen all at once: there is no way to observe any partial results. If our threads are not going to get into each other's way, the operation of taking something off the queue must be atomic. In fact, you cannot count on any ordinary program operation being atomic, even a variable assignment or an increment operation can be interrupted by a page fault or involve data held in a register in one thread.
As you can imagine, there is a wide variety of synchonization techniques that have been developed for writing concurrent, shared-memory applications. Unix standards support several, inluding mutexes, condition variables (both of which are described in Birrell's paper and in Rochkind, Ch. 5), read-write locks (described in Birrell), spin locks, and barriers. We will focus on the first two here.
Mutual Exclusion
Using threads, we can ensure that a block of code is atomic by associating a guard with the code that allows only one thread at a time to enter, providing mutual exclusion. Mutual exclusion is provided by a data element called a mutex or a lock. (The operating system uses an atomic hardware test-and-set operation to implement locks of various kinds, including mutexes.)There is a special data type for mutexes called
pthread_mutex_t
, and you declare them like any other
variable. A mutex must be initialized. There
is a function
pthread_mutex_init()
that you can use. If the mutex
is statically allocated, you can use a constant initializer
called PTHREAD_MUTEX_INITIALIZER
.
A mutex has one of two states: not owned by any thread and owned by exactly one thread.
A mutex is, conceptually, associated with a single piece of
shared, mutable data. A thread gains ownership of a mutex (or
seizes a lock) by calling
pthread_mutex_lock()
and gives up (releases)
the lock by calling pthread_mutex_unlock()
.
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
A call to pthread_mutex_lock()
grants the calling
thread the lock if it is available, and blocks until the lock
becomes available otherwise.
pthread_mutex_trylock()
is similar except it doesn't
block: if the mutex is already locked, it returns with the error
code EBUSY
.
pthread_mutex_unlock()
, assuming the calling thread
owns the lock, releases the mutex.
Returning to our previous example, we can make our threads work together productively this way:
struct elem_s *work_list = NULL;
pthread_mutex_t wl_mtx = PTHREAD_MUTEX_INITIALIZER;
int rtn;
...
if (rtn = pthread_mutex_lock(&wl_mtx) != 0)
_exit(rtn);
while (work_list != NULL) { /* Line 1 */
elem = work_list; /* Line 2 */
work_list = work_list>next; /* Line 3 */
elem->next = NULL; /* Line 4 */
process(elem); /* Line 5 */
}
if (rtn = pthread_mutex_unlock(&wl_mtx) != 0)
_exit(rtn);
Programming with mutexes is tricky. First, you must ensure that you protect all shared data. Converting a program to use threads, therefore first involves reducing the amount of shared data to the absolute minimum. You don't want to protect too much data, because that can slow your program down quite a bit (and because it's painful and error-prone). Second, you must ensure that you don't have a situation in which Thread 1 is blocked waiting for a mutex held by Thread 2, which is blocked waiting for a mutex held by Thread 1. This situation is called deadlock (or one says the threads are locked in a deadly embrace).
Error handling needs to be thought through very carefully in a
multi-threaded environment. If one thread terminates, e.g.,
because of a programming error or because it chose to do so after
a failed system call, while holding a mutex, then any thread
waiting on that mutex is blocked forever. Suppose
process()
in the code above aborts, for example.
While we're discussing process()
, it's worth
asking whether this function's activities needs to be in the
critical section. If this is all the threads do, then they
really aren't working in parallel, which is the reason we wanted
threads in the first place!
For multi-threaded code, you want to minimize the code in
critical sections for performance reasons (and to reduce the
threat of deadlock). So, it would be a better design to ensure
that calls to process()
are thread safe,
and then sieze the lock only for the purpose of extracting one
element from the list:
...
if (rtn = pthread_mutex_lock(&wl_mtx) != 0)
_exit(rtn);
while (work_list != NULL) { /* Line 1 */
elem = work_list; /* Line 2 */
work_list = work_list>next; /* Line 3 */
elem->next = NULL; /* Line 4 */
}
process(elem); /* Line 5 */
if (rtn = pthread_mutex_unlock(&wl_mtx) != 0)
_exit(rtn);
The best technique is to associate the mutex with a data abstraction that allows access to the shared data only through exported function calls. The functions that support the abstraction then can do all the necessary locking and unlocking. This technique may not work in the presence of upcalls, but, in general, it's the best approach. If there are mutiple locks, then one must establish a partial order on the mutexes and always sieze them in that order.
Condition Variables
It is inefficient for a thread to keep seizing a lock only to find there is no work to do. To simplify this process, condition variables were invented. The basic idea is that a thread can wait (sleep) until some condition arises. Another thread can signal threads waiting for a condition when it is appropriate to proceed.In our example above, there must be another thread putting
elements on the work_list
:
new_elem = read_data();
new_elem->next = NULL;
if (rtn = pthread_mutex_lock(&wl_mtx) != 0)
_exit(rtn);
if (work_list == NULL)
work_list = new_elem;
else {
for(tmp = work_list; tmp->next != NULL; tmp = tmp->next) ;
tmp->next = new_elem;
}
if (rtn = pthread_mutex_unlock(&wl_mtx) != 0)
_exit(rtn);
This sort of relationship among threads, i.e., where one
thread is collecting new tasks to do — like reading an
http
request from a network— and other threads
are carrying out the tasks is quite common. It's an instance of
a producer-consumer pattern. It is typical of
multi-threaded applications to have multiple consumers.
Consumer threads should stay blocked until there is work to do. We can use condition variables for this. Here is how we wait for and signal conditions:
int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex);
Notice that a condition variable is always associated with a
particular mutex. We signal a condition by calling
pthread_cond_signal()
on that condition. That is
easy enough. Waiting for a condition is more complex. When a thread
that holds mutex
calls
pthread_cond_wait()
on a condition variable with
mutex
, then the thread relinquishes control of the
mutex, blocks until the condition is signaled, whereupon it
seizes the mutex again.
Consider our running example. Here's how we enqueue elements:
pthread_cond_t wl_cond = PTHREAD_COND_INITIALIZER;
...
new_elem = read_data();
new_elem->next = NULL;
if (rtn = pthread_mutex_lock(&wl_mtx) != 0) _exit(rtn);
if (work_list == NULL)
work_list = new_elem;
else {
for(tmp = work_list; tmp->next != NULL; tmp = tmp->next);
tmp->next = new_elem;
}
if (rtn = pthread_cond_signal(&cond) != 0) _exit(rtn);
if (rtn = pthread_mutex_unlock(&wl_mtx) != 0) _exit(rtn);
And here is how we take elements off the list:
if (rtn = pthread_mutex_lock(&wl_mtx) != 0) _exit(rtn);
while (work_list == NULL)
if (rtn = pthread_cond_wait(&wl_cond, &wl_mtx) !=0) _exit(rtn);
elem = work_list;
work_list = work_list->next;
elem->next = NULL;
process(elem);
if (rtn = pthread_mutex_unlock(&wl_mtx) != 0) _exit(rtn);
The worker thread code would be embedded in a loop that runs
until it it is time for the tread to terminate. There are a
couple ways to get a worker thread to terminate: The process
scheduling the work orders can put a special element on the list
that tells workers everything is done (or set a flag somewhere
else), or the worker threads can be explicitly cancelled.
As a general rule, a special stop indicator that causes threads
to terminate voluntarily is the simplest and safest choice.
Note that the call to pthread_cond_wait()
is in a
loop. Why? The condition is supposed to be signaled when there
is work to do, so why can't the thread just fearlessly process
the element? pthread_cond_wait()
may be interrupted
and return even if the condition has not been signaled. A more
likely problem is that long before the consumer gets to this
point, the producer may have signaled the condition. But under
POSIX condition variables, a signal is discarded if no thread is
currently waiting for it. A consumer could wait forever even
if there is work to do under such a scenario.
Modified: 7 April 2008