Wednesday 17 May 2017

C++11 - Data Sharing

1. Static Variables Initialization
Static variables are initialized when the very first thread runs through it. The data race could happen when multiple thread try to run through it at the same time.
Here are a few hazards. First the same object can be initialized a few times. Second there might be memory leak when the static variable is a pointer or it has memory allocation inside and are initialized a few times. Third static variables may lose its persistence if it is not stateless - some state may take from the first initialization, some may take from the 2nd initialization and so on. Last the program could crash due to inconsistent state show in the following code snippet.

//*********************************************************************************
class Bar
{
public:
    Bar(int x, double y);
    void DoSomething();
};
void Foo(int x, double y)
{
    // C++ guarantees thread safe
    static Bar localBar = Bar(x, y);
    localPt.DoSomething();
}
//*********************************************************************************

For instance two threads run into Foo() at the same time and the first one initializes the localBar and carries on run DoSomething(). Then the second thread initializes the localBar again when the first thread is still in DoSomething(). Literally thread 1 is operating an object which is half completed and half destroyed. Crash could happen anytime.

C++11 noticed this issue and makes thread-safety of this scenario as standard. It means that C++11 guarantees that localBar is to be initialized only once. The rest threads will be blocked out until the first thread finishes the initialization. This also solves the famous singleton problem, Design pattern - Singleton pattern.

2. Data Race on Initialization
This scenario applies to those data that are needed to be protected at their first initialization. The typical case is calculation-on-demand and one of typical scenarios of this process is lazy-initialization of variables.
It is quite common in quantitative development to do the calculation-on-demand. For instance given a market and trades its risks are not calculated until they are required/query. And they are only recalculated until one of them changed.

*********************************************************************************
class Market;
class Trade;

class Risks
{
public:
    double GetVega() {
        std::call_once(m_VegaFlag, CalculateVega);
        return m_Vega;
    }
private:
    void CalculateVega();
    std::once_flag m_VegaFlag;

    Market *m_Mkt;
    Trade   *m_Trd;
    double m_Vega;
}
*********************************************************************************

std::call_once and std::once_flag are introduced into C++11 to achieve that functions that need to be run only once in the entire application life. And serves the first come the first server semantics.
This guarantees the following,
    - CalculateVega() not called if not vega is not retrieved
    - Guarantee that CalculateVega() called only once on demand

3. Reader-Writer Lock
Single-reader-and-single-writer is the simplest case. An atomic variable will do the job if the data is single primitive type. Otherwise std::mutex can be used to sync the read-and-write operation. Or even implement a lock-free guard via an atomic flag or bool.

Here we are focusing on multi-reader cases. The typical use case in quantitative world is lazy-calculation. Similar to lazy-initialization the variables are required to be re-calculated after its initialization. But the re-calculation happens rarely. For instance the market-data bumped or ref-data changed. The quick read access to the data is required because it might be frequently retrieved to calculate other derived risks. And at the same time no starvation happens to write access - in other word recalculation is served as soon as possible.

Semaphore solution:
A typical solution before C++11 is to use semaphores. It is not a part of C++ standard but a system-dependent implementation. Initialize the resources in semaphores as the total number of read-threads. Under the condition without write-access each read-thread have at least one resource for them and the read-access is quick and smooth. But when a write-access is required, the write-thread has to acquire the resources by lowest granularity until acquire all the resources in order to block all read-threads before writing. This will guarantee that it is a fairly implementation for write access - otherwise starvation of write access happens.

Semaphore Implementation
*********************************************************************************
//  a simple Semaphore API (see QSemaphore implemention)
class Semaphore
{
public:
    Semaphore(unsigned int totalResrouces);
    void Acquire(unsigned int resource);
    void Release(unsigned int resource);
    unsigned int TotalResources() const;
};

class RiskVariable1
{
public:

    double Read()
    {
        double val;
        m_Sem.Acquire(1);
        val = m_Value;
        m_Sem.Release(1);
 
        return val;
    }

    void Write(const Market &m, const RefData &rd)
    {
     
        const unsigned int TotalResources = m_Sem.TotalResources();
        // Here if call "m_Sem.Acquire(TotalResources)" can cause write-thread starvation
        // And have to acquire each resource in lowest granularity until all
        // This implementation is only for single-write.
        // Will cause dead-lock if multi-write try to access because each just holds part of resources
        for (unsigned into i = 0; i < TotalResources; ++i) {
            m_Sem.Aqcuire(1);
        }

        // hit here all resources are acquired by this write-thread
        Calculate(m, rd);

        m_Sem.Release(TotalResources);
    }
private:
    void Calculate(const Market &m, const RefData &rd)
    {
         // change the value of m_Value
    }

    double m_Value;
    Semaphore m_Sem;
};
*********************************************************************************

The above implementation works only for multi-reader-single-writer scenario. It guarantees quick read access when no write access is in request and also fair service for write access (no starvation). As soon as read access gets out of away, write access is serviced. But the above implementation will cause deadlock for multi-reader-multi-writer. It needs only tiny tweak to make it work for multi-reader-multi-writer.

//*********************************************************************************
class Semaphore
{
    // ......
};
class RiskVariable1
{
public:
    // ...... code

    void Write(const Market &m, const RefData &rd)
    {
     
        const unsigned int TotalResources = m_Sem.TotalResources();
        // Here if call "m_Sem.Acquire(TotalResources)" can cause write-thread starvation
        // And have to acquire each resource in lowest granularity until all
        // This implementation is only for multi-write.
        m_WLock.lock();
        {
            for (unsigned into i = 0; i < TotalResources; ++i) {
                m_Sem.Aqcuire(1);
            }
        }
        m_WLock.unlock();

        // hit here all resources are acquired by this write-thread
        Calculate(m, rd);

        m_Sem.Release(TotalResources);
    }
private:
    // ...... code
    double m_Value;
    Semaphore m_Sem;
    Mutex m_WLock;
};
//*********************************************************************************

The tweak is quite straightforward too. Use a mutex to guard the write thread when they come to acquire the resources of semaphore. It guarantees that only one write-thread in progress to acquire to semaphore's resource and the rest has to wait.

C++11 solution:
C++11 provides std::shared_mutex to solve this issue. And it works with any scenarios shown above including multi-reader-multi-writer.

//*********************************************************************************
class RiskVariable1
{
public:

    double Read()
    {
        std::shared_lock<std::shared_mutex> guard(m_SMut);
        return m_Val;
    }

    void Write(const Market &m, const RefData &rd)
    {
        std::unique_lock<std::shared_mutex> guard(m_SMut);
        Calculate(m, rd);
    }
private:
    void Calculate(const Market &m, const RefData &rd)
    {
         // change the value of m_Value
    }

    double m_Value;
    std::shared_mutex m_SMut;
};
//*********************************************************************************

C++11 provides std::shared_lock for quick read access and std::unique_lock for write access to guarantee mutual exclusion as only one write access allowed. Besides C++11 standard guarantees that there is starvation for write access.

4. Recursive Lock
Recursive lock is not a part of standard before C++11. However POSIX standard defines recursive lock. And both POSIX and Boost provides implementations.
Honestly speaking the application of requiring recursive lock is not that common. I did use it once and only once in my work. Here is the code snippet.

//*********************************************************************************
class BankAccount
{
public:
    void Transfer(BankAccount &toAcc, double amount)
    {
        // lock the two mutex
        std::lock(m_Lock, rhs.m_Lock);
        std::lock_guard<std::mutex> lock1(m_Lock, std::adopt_lock);
        std::lock_guard<std::mutex> lock2(rhs.m_Lock, std::adopt_lock);
        
        // transfer the money
    }

    void ServiceCharge(double amount)
    {
        std::lock_guard<std::mutex> guard(m_Lock);
        // take £10 out of this account for instance 
    }

    void ComplexTransfer(BankAccount &toAcc, double amount)
    {
        std::lock_guard<std::mutex> guard(m_Lock);
        Transfer(toAcc, amount);
        ServiceCharge(10.00); // £10.00 service charge 
    }
private:
    double m_Balance;
    std::mutex m_Lock;
};
//*********************************************************************************

Both ServiceCharge() and Transfer() are valid APIs and need to be protected individually as both lock the mutex in the implementation. The recursive case happens in ComplexTransfer() that is to do two things in atomic sense - transfer the money and do service charge. Then the lock in ComplexTransfer() has to lock from the start to the end and the lock must be the same lock as in ServiceCharge() and Transfer(). Otherwise BankAccount might have data race on m_Balance.

In the above code snippet m_Lock is locked twice in ComplexTransfer() and it has to be std::recursive_mutex.

Comparing std::mutex and std::recursive_mutex the former is light-weighted on both memory footprint and speed, because it does not need to store information who (which thread) owns it and how many times it has been locked. However std::recursive_mutex is designed to have these information so that kernel can track them against thread to manage resources and threads. Read the comparison of these two type on Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex). They also share a lot in common for instance both implement std::lockable interface and both are not copy able but movable.

Wednesday 3 May 2017

C++11 Features - Solution for Deadlock

1. Problem Description
On my previous blog, Threading - Deadlock, an example of "BankAccount" shows how deadlock is to happen. Account A is to do a transfer to Account B and at the same time Account B is to do one to Account A. The first transaction is to lock A then lock B and do the transfer. The second transaction is to lock B then lock A and do the transfer. There is a good chance that only the first step happened during the two transactions. The first transaction locks A and the second locks B and both try to get the second locking. But fail to do so and two transaction are locked mutually.
The solutions proposed on Threading - Deadlock still work but C++11 and C++17 just have better solutions.

2. C++11 and C++17 Solutions
The solution proposed on Threading - Deadlock is to lock the mutex in the order to guarantee that only one thread can acquire all the resources at once. The solution of C++11 is std::lock, which is able to lock multiple lockable objects at once - in the scenario of none or all.
std::lock either locks all the lockable objects or locks none of them in case of failure (for instance one of lockable objects isn't called in parity of lock/unlock).

C++11 solutions
std::lock can used either with std::lock_guard or std::unique_lock. Comparing std::lock_guard and std::unique_lock, they have similar functionalities. std::lock_guard can be regarded as RAII wrapper of lockable objects via reference and however std::unique_lock can be regarded as RAII wrapper of lockable objects via pointer and with ownership indicator.

*******************************************************************************
class BankAccount{
public:
    void Pay();
    void Receive();
    void Transfer(BankAccount& rhs, double amount)
private:
    std::mutex m_Lock;
    long long sort_code;
    long long account_no;
    std::string name;
    std::string postcode
};

void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }
  
    // std::lock to lock both mutex
    // std::adopt_lock is to tell std::guard to take ownership only
    // rather than lock the object in the ctor
    std::lock(m_Lock, rhs.m_Lock);
    std::lock_guard<std::mutex> lock1(m_Lock, std::adopt_lock);
    std::lock_guard<std::mutex> lock2(rhs.m_Lock, std::adopt_lock);

    // do other stuff
}

void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }

    // std::defer_lock to tell std::unique_lock not to lock the object
    // just take the ownership
    // std::lock is to both mutex
    std::unique_lock<std::mutex> lock1(m_Lock, std::defer_lock);
    std::unique_lock<std::mutex> lock2(rhs.m_Lock, std::defer_lock);
    std::lock(m_Lock, rhs.m_Lock);

    // do other stuff
}
*******************************************************************************

Beside std::unique_lock is not copyable but movable but std::lock_guard is neither copyable nor movable. Therefore std::unique_lock takes extra memory, runs slower but is more flexible than std::lock_guard. A typical use of std::lock_gurad as the automatic variables for lock/unlock lockable objects. And std::unique_luck is to take the ownership of lockable objects and transfer the ownership between functions (for instance as a returning object of function).

C++17 solution
A solution to replace the combination of std::lock and std::lock_guard in C++11 is provided in C++17 as std::scoped_lock. It works as a RAII wrapper of std::lock and std::lock_guard in C++11. Very clean and beautiful.

*******************************************************************************
void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }
    std::scoped_lock sl(m_Lock, rhs.m_Lock);
    

    // do other stuff
}
*******************************************************************************