Friday 25 April 2014

Threading - Deadlock

1. Problem
The classical description of deadlock is dining philosophers problem [1]. Briefly it says that 5 philosophers are sitting at a round table to have dinner. They can't eat until they hold both the left and the right forks. As long as holding two forks, they can eat for a while and put down the fork. And others can grab the fork and get a chance to eat. There are states where the 5 philosophers can get into and none of them can eat. For instance each of them holds the fork on their left hand side and waits for others to release the fork. Because they can't talk to each other and none of them will release the fork deliberately, and therefore none of them can eat of course. At this state no further progress can make, which is called deadlock.

From this abstract problem, we can come up a few concepts:
- Thread: each philosopher acts independently and has a task to accomplish
- Shared resource: the forks that the philosophers have to acquire before acting
- Mutual exclusion: as long as the fork is held by a philosopher, no other can hold it any more until it is released.
- Multiple sequential resource acquisition: each philosopher has to acquire two forks sequentially then he can carry on eating.
- Deadlock: a state that no progress can make

Besides this classical description of deadlock I would like share with you two examples that I met in my previous projects and we can conclude some common issues in they share where deadlock could happen.

Deadlock example 1:
//********************************************************************************
class BankAccount{
public:
    void Lock();
    void Unlock();
    void Pay();
    void Receive();
private:
    long long sort_code;
    long long account_no;
    std::string name;
    std::string postcode
};
void Transfer(BankAccount& src, BankAccount& des, double amount)
{
    // lock these two accounts
    src.Lock();
    des.Lock();
    // source bank account pay the money
    // destination bank account receive the money
    src.Pay(amount);
    des.Receive(amount);
    // unlock the two accounts
    des.Unlock();
    src.Unlock();
}

BankAccount alan_smith;
BankAccount bob_brown;

// thread 1
Transfer(alan_smith, bob_brown, 500);

// thread 2
Transfer(bob_brown, alan_smith, 100);
//********************************************************************************

In the Transfer() function two bank accounts have to be locked at the same time before doing the transaction. (Someone might think that it is not necessary to lock both accounts at the same time to do the transaction. But in the fact it has to, because it has to done in one transaction otherwise it will affect the balance of the account at a time. Especially at multiple threading application if transactions can be interleaved, then some invalid action could be invoked if balance is not latched every single time. For instance, the bank account could go negative and prevent the "pay" action.) This problem shared a common issue with the dinning philosophers problem, - both of them try to acquire multiple resources. Therefore the deadlock can happen. For instance, two threads try to do the transaction on the same account as shown in Example 1. The deadlock happens when both threads execute until both have acquired the source bank account and then tries to acquires the destination account. Here comes the problem that alan_smith is acquired by thread 1 but bob_brown is acquired by thread 2. Neither of them will be able to acquire to the other/destination bank account.

Another good example is loading DLLs in Windows, when you are developing your own dynamic linked library. The implementation of DllMain() [2] will be very critical.

                                                     Figure 1 [3]
Keep in mind that there is already a lock when coming into DllMain(). Trying to lock anther resource for any reason, for instance loading a COM object, creating another process, could cause mysterious deadlock, which has proven to me very difficult to spot and debug.

2. Common issues shared by the examples
From the above 3 examples deadlock can only happen when the thread get into a certain state. Here we can conclude a few things they share all.
- Multiple threads works independently
- Each thread must acquire multiple shared resource in order to act
- Mutual exclusion: as long as a resource is held, it can't be held by others until it is released by its current owner
- No preemption: no thread has the privilege to grab resource from others
- Sequential multiple resource acquisition: a thread have to grab all the resources it needs before it can act, otherwise have to wait
- No communication between them: they can't talk to each other and release their resource deliberately in order to unlock the deadlock situation.

3. Solutions
There are a few solutions presented in the wiki page [1]. Here I would like to discuss a couple solutions I used in my previous projects.

Acquire the multiple resource in the order
This solution is very close to the one from Dijkstra described on the wiki page [1]. What it is saying is that the shared resources are labeled with an ID. When each thread tries to acquire multiple resources, it has to acquire them in an order, which is decided by the ID labeling the shared resources. As Dijkstra suggested in his original solution to assigned a partial order to the forks (shared resources), then all the philosophers have to hold the fork in the same order (ascending or descending order).
In a programming application, some techniques can be used to assign the ID to the shared resources to ensure the partial order in the universal scope. It uses the hash function to take a few data from the shared resources to generate unique ID for each shared resource. Then threads acquire the shared resource in the ascending/descending order of ID.
In Example 1 use any combination of private data member of BankAccount as the hash key to generate the ID (hash value). And in function Transfer() lock the BankAccount in the ascending/descending order of the ID.

Solution - Deadlock example 1:
//********************************************************************************
class BankAccount{
public:
    // ......
    size_t GetID();
private:
    // ......
};
void Transfer(BankAccount& src, BankAccount& des, double amount)
{
    // lock these two accounts in ascending order
    if (src.GetID() < des.GetID()) {
        src.Lock();
        des.Lock();
    } else {
        des.Lock();
        src.Lock();
    }
    // source bank account pay the money
    // destination bank account receive the money
    src.Pay(amount);
    des.Receive(amount);
    // unlock the two accounts
    if (src.GetID() < des.GetID()) {
        des.Unlock();
        src.Unlock();
    } else {
        src.Unlock();
        des.Unlock();
    }
}
//********************************************************************************

With this solution as long as thread 1 acquires any resource, it will prevent thread 2 from acquiring any shared resource until it finishes the transaction. Then these two threads will never get into the position that each of them holds only half the resources they need and constantly wait for another to release the other half. So problem solved!!!

A master thread + work threads
This idea is similar to the "arbitrator solution" in [1] but not exactly the same. The difference is that in "arbitrator solution", the waiter works as the arbitrator but passively waits for the query from philosophers. But in this solution the master thread works as the arbitrator but it works actively. More precisely it works as a scheduler, oversees the shared resources and decides which threads to go next or together. It even does not need to spool all the threads in advance. It can create and destroy the working threads as needed, which will certainly reduce the situations of race conditions. The downside of this approach is the performance penalty caused by frequently creating/destroying threads (memory fragments and so on, read my other blog entries for details about how memory affects performance, "Run out of stack" and "Run out of heap"). But this is not a big issue. Techniques like pre-created threads pool will reduce the impact on the performance.
As for implementation, in POSIX [4] condition variable and wait()/notify() mechanism can be used for this solution.

4. Tips to avoid deadlock
When design a multiple threading application, deadlock is certainly what we should avoid. A few patterns in your design serve good warnings for potential deadlock.
- Sequential resource acquisition: if they can be acquired in any order, then it is a big warning sign. Use the first solution to fix it.
- The dependency between the threads and shared resources is in a closed loop. As the dining philosophers problem illustrated, the round table describe the relationship between threads and resources. They are in a loop. Be careful if this loop is detected.

More generally think about the design
- If single mutual exclusion works, never use multiple
- If multiple acquire/release (acquire-work-release, acquire-work-release, ...) fits your needs, never use sequential shared resource acquisition (acquire-acquire- ... - work - release -... - release)

Bibliography:
[1] http://en.wikipedia.org/wiki/Dining_philosophers_problem
[2] http://msdn.microsoft.com/en-us/library/ms682583.aspx
[3] http://msdn.microsoft.com/en-us/library/windows/desktop/dn633971(v=vs.85).aspx
[4] http://www.gnu.org/software/pth/

No comments:

Post a Comment