Wednesday 11 April 2018

C++17: insert_or_assign - new API for associative containers

1. Before C++17
This new API ticks a few boxes for our daily programming. Think of the scenario that insert into the container if the key does not exist and update the value if it does exist. Before C++17 it has to use the combination of the following 3 calls as shown in Figure 1.
    - find
    - insert/emplace
    - operator []

Figure 1. Implementation before C++17

Note: emplace() is recommended other than insert() for better performance.

The reason why not use operator [] directly, please refer to Item 24 in [1]. There Scott Meyer has detailed discussion about the performance between operator [] and insert.

2. C++17
From C++17 on the solutions shown in Figure 1 can be replace with this new API - insert_or_assign() as shown in Figure 2.

Figure 2. Implementation in C++17 with the New API

I have looked at the assembly for the code under VS2015 and this is no performance penalty. Therefore feel free and happy to use.


Bibliography:
[1] Scott Meyers, Effectie STL 50 Specific Ways to Improve Your Use of the Standard Template Library



Tuesday 10 April 2018

C++11: a pitfall in std::to_string(float) and std::to_string(double)

1. Root cause
The problem with these two functions is that the precision of the conversion is hard coded as 6 decimals and it is not possible to adjust it.

2. What happened to me?
I made such a mistake when implementing one of services. It is to take live market FX price as input to compute a user-defined price, simple cases like mid price or WAP (weighted average price), convert the price into string (pass to the down stream service via XML), and then reconstruct or compute numeric risk values. It works OK for most currencies but not for JPY. 

It sounds bizarre but the fundamental issue is that the live market price based on JPY instruments is already in 6 decimal precision and the following calculations requires higher precision to make sure the reconstruction and risk value computation correct. But std::to_string(double) has only 6 decimal precision and the 6th decimal is rounded. This causes numeric risk computation failure - the value swing around even though the price is in trend.

3. Solution
The std::to_string manual on cppreference only says it "would produce for sufficiently large buf". No mention about conversion precision. Nor there is a second parameter to specify required precision in the function signature. Easy to make mistake. Lesson learned: always keep in mind floating-point number is different from integer in terms of underflow, overflow, comparison and string conversion. And the solution is rather simple - use string stream and io manipulator. 

Wednesday 17 May 2017

C++11 - Data Sharing

1. Static Variables Initialization
Static variables are initialized when the very first thread runs through it. The data race could happen when multiple thread try to run through it at the same time.
Here are a few hazards. First the same object can be initialized a few times. Second there might be memory leak when the static variable is a pointer or it has memory allocation inside and are initialized a few times. Third static variables may lose its persistence if it is not stateless - some state may take from the first initialization, some may take from the 2nd initialization and so on. Last the program could crash due to inconsistent state show in the following code snippet.

//*********************************************************************************
class Bar
{
public:
    Bar(int x, double y);
    void DoSomething();
};
void Foo(int x, double y)
{
    // C++ guarantees thread safe
    static Bar localBar = Bar(x, y);
    localPt.DoSomething();
}
//*********************************************************************************

For instance two threads run into Foo() at the same time and the first one initializes the localBar and carries on run DoSomething(). Then the second thread initializes the localBar again when the first thread is still in DoSomething(). Literally thread 1 is operating an object which is half completed and half destroyed. Crash could happen anytime.

C++11 noticed this issue and makes thread-safety of this scenario as standard. It means that C++11 guarantees that localBar is to be initialized only once. The rest threads will be blocked out until the first thread finishes the initialization. This also solves the famous singleton problem, Design pattern - Singleton pattern.

2. Data Race on Initialization
This scenario applies to those data that are needed to be protected at their first initialization. The typical case is calculation-on-demand and one of typical scenarios of this process is lazy-initialization of variables.
It is quite common in quantitative development to do the calculation-on-demand. For instance given a market and trades its risks are not calculated until they are required/query. And they are only recalculated until one of them changed.

*********************************************************************************
class Market;
class Trade;

class Risks
{
public:
    double GetVega() {
        std::call_once(m_VegaFlag, CalculateVega);
        return m_Vega;
    }
private:
    void CalculateVega();
    std::once_flag m_VegaFlag;

    Market *m_Mkt;
    Trade   *m_Trd;
    double m_Vega;
}
*********************************************************************************

std::call_once and std::once_flag are introduced into C++11 to achieve that functions that need to be run only once in the entire application life. And serves the first come the first server semantics.
This guarantees the following,
    - CalculateVega() not called if not vega is not retrieved
    - Guarantee that CalculateVega() called only once on demand

3. Reader-Writer Lock
Single-reader-and-single-writer is the simplest case. An atomic variable will do the job if the data is single primitive type. Otherwise std::mutex can be used to sync the read-and-write operation. Or even implement a lock-free guard via an atomic flag or bool.

Here we are focusing on multi-reader cases. The typical use case in quantitative world is lazy-calculation. Similar to lazy-initialization the variables are required to be re-calculated after its initialization. But the re-calculation happens rarely. For instance the market-data bumped or ref-data changed. The quick read access to the data is required because it might be frequently retrieved to calculate other derived risks. And at the same time no starvation happens to write access - in other word recalculation is served as soon as possible.

Semaphore solution:
A typical solution before C++11 is to use semaphores. It is not a part of C++ standard but a system-dependent implementation. Initialize the resources in semaphores as the total number of read-threads. Under the condition without write-access each read-thread have at least one resource for them and the read-access is quick and smooth. But when a write-access is required, the write-thread has to acquire the resources by lowest granularity until acquire all the resources in order to block all read-threads before writing. This will guarantee that it is a fairly implementation for write access - otherwise starvation of write access happens.

Semaphore Implementation
*********************************************************************************
//  a simple Semaphore API (see QSemaphore implemention)
class Semaphore
{
public:
    Semaphore(unsigned int totalResrouces);
    void Acquire(unsigned int resource);
    void Release(unsigned int resource);
    unsigned int TotalResources() const;
};

class RiskVariable1
{
public:

    double Read()
    {
        double val;
        m_Sem.Acquire(1);
        val = m_Value;
        m_Sem.Release(1);
 
        return val;
    }

    void Write(const Market &m, const RefData &rd)
    {
     
        const unsigned int TotalResources = m_Sem.TotalResources();
        // Here if call "m_Sem.Acquire(TotalResources)" can cause write-thread starvation
        // And have to acquire each resource in lowest granularity until all
        // This implementation is only for single-write.
        // Will cause dead-lock if multi-write try to access because each just holds part of resources
        for (unsigned into i = 0; i < TotalResources; ++i) {
            m_Sem.Aqcuire(1);
        }

        // hit here all resources are acquired by this write-thread
        Calculate(m, rd);

        m_Sem.Release(TotalResources);
    }
private:
    void Calculate(const Market &m, const RefData &rd)
    {
         // change the value of m_Value
    }

    double m_Value;
    Semaphore m_Sem;
};
*********************************************************************************

The above implementation works only for multi-reader-single-writer scenario. It guarantees quick read access when no write access is in request and also fair service for write access (no starvation). As soon as read access gets out of away, write access is serviced. But the above implementation will cause deadlock for multi-reader-multi-writer. It needs only tiny tweak to make it work for multi-reader-multi-writer.

//*********************************************************************************
class Semaphore
{
    // ......
};
class RiskVariable1
{
public:
    // ...... code

    void Write(const Market &m, const RefData &rd)
    {
     
        const unsigned int TotalResources = m_Sem.TotalResources();
        // Here if call "m_Sem.Acquire(TotalResources)" can cause write-thread starvation
        // And have to acquire each resource in lowest granularity until all
        // This implementation is only for multi-write.
        m_WLock.lock();
        {
            for (unsigned into i = 0; i < TotalResources; ++i) {
                m_Sem.Aqcuire(1);
            }
        }
        m_WLock.unlock();

        // hit here all resources are acquired by this write-thread
        Calculate(m, rd);

        m_Sem.Release(TotalResources);
    }
private:
    // ...... code
    double m_Value;
    Semaphore m_Sem;
    Mutex m_WLock;
};
//*********************************************************************************

The tweak is quite straightforward too. Use a mutex to guard the write thread when they come to acquire the resources of semaphore. It guarantees that only one write-thread in progress to acquire to semaphore's resource and the rest has to wait.

C++11 solution:
C++11 provides std::shared_mutex to solve this issue. And it works with any scenarios shown above including multi-reader-multi-writer.

//*********************************************************************************
class RiskVariable1
{
public:

    double Read()
    {
        std::shared_lock<std::shared_mutex> guard(m_SMut);
        return m_Val;
    }

    void Write(const Market &m, const RefData &rd)
    {
        std::unique_lock<std::shared_mutex> guard(m_SMut);
        Calculate(m, rd);
    }
private:
    void Calculate(const Market &m, const RefData &rd)
    {
         // change the value of m_Value
    }

    double m_Value;
    std::shared_mutex m_SMut;
};
//*********************************************************************************

C++11 provides std::shared_lock for quick read access and std::unique_lock for write access to guarantee mutual exclusion as only one write access allowed. Besides C++11 standard guarantees that there is starvation for write access.

4. Recursive Lock
Recursive lock is not a part of standard before C++11. However POSIX standard defines recursive lock. And both POSIX and Boost provides implementations.
Honestly speaking the application of requiring recursive lock is not that common. I did use it once and only once in my work. Here is the code snippet.

//*********************************************************************************
class BankAccount
{
public:
    void Transfer(BankAccount &toAcc, double amount)
    {
        // lock the two mutex
        std::lock(m_Lock, rhs.m_Lock);
        std::lock_guard<std::mutex> lock1(m_Lock, std::adopt_lock);
        std::lock_guard<std::mutex> lock2(rhs.m_Lock, std::adopt_lock);
        
        // transfer the money
    }

    void ServiceCharge(double amount)
    {
        std::lock_guard<std::mutex> guard(m_Lock);
        // take £10 out of this account for instance 
    }

    void ComplexTransfer(BankAccount &toAcc, double amount)
    {
        std::lock_guard<std::mutex> guard(m_Lock);
        Transfer(toAcc, amount);
        ServiceCharge(10.00); // £10.00 service charge 
    }
private:
    double m_Balance;
    std::mutex m_Lock;
};
//*********************************************************************************

Both ServiceCharge() and Transfer() are valid APIs and need to be protected individually as both lock the mutex in the implementation. The recursive case happens in ComplexTransfer() that is to do two things in atomic sense - transfer the money and do service charge. Then the lock in ComplexTransfer() has to lock from the start to the end and the lock must be the same lock as in ServiceCharge() and Transfer(). Otherwise BankAccount might have data race on m_Balance.

In the above code snippet m_Lock is locked twice in ComplexTransfer() and it has to be std::recursive_mutex.

Comparing std::mutex and std::recursive_mutex the former is light-weighted on both memory footprint and speed, because it does not need to store information who (which thread) owns it and how many times it has been locked. However std::recursive_mutex is designed to have these information so that kernel can track them against thread to manage resources and threads. Read the comparison of these two type on Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex). They also share a lot in common for instance both implement std::lockable interface and both are not copy able but movable.

Wednesday 3 May 2017

C++11 Features - Solution for Deadlock

1. Problem Description
On my previous blog, Threading - Deadlock, an example of "BankAccount" shows how deadlock is to happen. Account A is to do a transfer to Account B and at the same time Account B is to do one to Account A. The first transaction is to lock A then lock B and do the transfer. The second transaction is to lock B then lock A and do the transfer. There is a good chance that only the first step happened during the two transactions. The first transaction locks A and the second locks B and both try to get the second locking. But fail to do so and two transaction are locked mutually.
The solutions proposed on Threading - Deadlock still work but C++11 and C++17 just have better solutions.

2. C++11 and C++17 Solutions
The solution proposed on Threading - Deadlock is to lock the mutex in the order to guarantee that only one thread can acquire all the resources at once. The solution of C++11 is std::lock, which is able to lock multiple lockable objects at once - in the scenario of none or all.
std::lock either locks all the lockable objects or locks none of them in case of failure (for instance one of lockable objects isn't called in parity of lock/unlock).

C++11 solutions
std::lock can used either with std::lock_guard or std::unique_lock. Comparing std::lock_guard and std::unique_lock, they have similar functionalities. std::lock_guard can be regarded as RAII wrapper of lockable objects via reference and however std::unique_lock can be regarded as RAII wrapper of lockable objects via pointer and with ownership indicator.

*******************************************************************************
class BankAccount{
public:
    void Pay();
    void Receive();
    void Transfer(BankAccount& rhs, double amount)
private:
    std::mutex m_Lock;
    long long sort_code;
    long long account_no;
    std::string name;
    std::string postcode
};

void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }
  
    // std::lock to lock both mutex
    // std::adopt_lock is to tell std::guard to take ownership only
    // rather than lock the object in the ctor
    std::lock(m_Lock, rhs.m_Lock);
    std::lock_guard<std::mutex> lock1(m_Lock, std::adopt_lock);
    std::lock_guard<std::mutex> lock2(rhs.m_Lock, std::adopt_lock);

    // do other stuff
}

void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }

    // std::defer_lock to tell std::unique_lock not to lock the object
    // just take the ownership
    // std::lock is to both mutex
    std::unique_lock<std::mutex> lock1(m_Lock, std::defer_lock);
    std::unique_lock<std::mutex> lock2(rhs.m_Lock, std::defer_lock);
    std::lock(m_Lock, rhs.m_Lock);

    // do other stuff
}
*******************************************************************************

Beside std::unique_lock is not copyable but movable but std::lock_guard is neither copyable nor movable. Therefore std::unique_lock takes extra memory, runs slower but is more flexible than std::lock_guard. A typical use of std::lock_gurad as the automatic variables for lock/unlock lockable objects. And std::unique_luck is to take the ownership of lockable objects and transfer the ownership between functions (for instance as a returning object of function).

C++17 solution
A solution to replace the combination of std::lock and std::lock_guard in C++11 is provided in C++17 as std::scoped_lock. It works as a RAII wrapper of std::lock and std::lock_guard in C++11. Very clean and beautiful.

*******************************************************************************
void BankAccount::Transfer(BankAccount& rhs, double amount)
{
    if (*this == rhs) {
        return;
    }
    std::scoped_lock sl(m_Lock, rhs.m_Lock);
    

    // do other stuff
}
*******************************************************************************




Wednesday 22 February 2017

C++11 Features - std::thread

1. Creation
To create a thread in C++11 is to create a std::thread object by passing a function and its argument(s). The passing function can be a standalone function, functor, member function or lambda function. And the arguments are following the function as the extra arguments passed into std::thread.

Create std::thread object:
* Standalone Function
A standalone function is the first argument of std::thread and its arguments follow on as the 2nd, 3rd ... arguments of std::thread.

/*****************************************/
    void Foo(int x, double y) {}
    int x = 1;
    double y = 2.0;
    std::thread t1(Foo, x, y);
    std::thread t2(Foo, 1, 2.0);
/*****************************************/

* Functor
Those programmers work on STL algorithms and iterator should not be unfamiliar with functor. It refers to a callable object. Literally it is a class with overloaded implementation of operator ().

/*****************************************/
    struct Bar {
        void operator() (int x, double y)
        {}
    };

    Bar b;
    int x = 1;
    double y = 2.0;

    std::thread t3(b, x, y);
    std::thread t4(Bar(), 1, 2.0);
/*****************************************/

Most-Vexing Problem in Functor
It is interpreted as a function declaration to take a function pointer that takes no arguments and returns an object Bar as parameter and to return an object of std::thread

The first fix is to use brackets to enforce the compiler to evaluate "Bar()" first to get an object of Bar, then passed as an argument to std::thread.
The second fix is to use initialization list of std::thread to tell compiler explicitly to create/initialize std::thread object.
/*****************************************/
    struct Bar {
        void operator() ()
        {}
    };

    std::thread(Bar()); // most-vexing problem
    std::thread((Bar())); // fix
    std::thread{Bar()}; // fix
/*****************************************/

* Member Function
The member function of a class type can be passed as the first argument of std::thread. And the second argument of std::thread has to be one of the following,
    - an instance/object of this class type
    - a reference to an instance/object of this class type
    - a pointer to an instance/object of this class type

/*****************************************/
    struct Bar {
        void DoWork(int x, double y)
        {}
    };

    Bar b;
    int x = 1;
    double y = 2.0;

    std::thread t1(&B:DoWork, b, x, y);
    std::thread t2(&B:DoWork, std::ref(b), x, y);
    std::thread t3(&B:DoWork, &b, x, y);
/*****************************************/

* Lambda Function
std::thread can take a lambda function as the first argument. And the lambda's catch defines what has been passed, copy or reference.

/*****************************************/
    struct Bar {
        void DoWork(int x, double y)
        {}
    };

    Bar b;
    int x = 1;
    double y = 2.0;

    std::thread t1([=](){b.DoWork(x, y);});
    std::thread t2([&](){b.DoWork(x, y);});
/*****************************************/

Passing Arugments:
By default std::thread is to make a copy of the passing arguments in the parent thread. Then move the copy into the child thread. Be particular careful about passing by reference. In order to make sure the reference is passed to the function associated with the child thread, std::ref is to be used.

/*****************************************/
    void Foo(int x, double& y)
   {
        y += x;
   }
    int x = 1;
    double y = 2.0;
    std::thread t1(Foo, x, y); // y is not updated after Foo() returned
    std::thread t1(Foo, x, std::ref(y)); // y is updated as expected
/*****************************************/

For something like a functor has states, then std::ref() or a pointer to the functor is required to pass into std::thread(). Otherwise the object will be copied and the state will only reflected on the newly-copied object in std::thread constrcutor.
/*****************************************/
    struct Bar {
        int x;
        Bar() : x(0)
        {}

        void Increment() {
            ++x;
        }
    };

    Bar b;
   
    std::thread t1(&B:DoWork, b); // b.x = 0
    std::thread t2(&B:DoWork, std::ref(b)); // b.x incremented by 1
    std::thread t3(&B:DoWork, &b, x, y); // b.x incremented by 1
/*****************************************/

As I mentioned above, the reference (via std::ref) or a pointer can be passed into the function associated with the child thread, then it is programmer's responsibility that make sure the passed references or pointers are not referring/pointing to objects which are out-of-life. This causes dangling pointer/reference pointer and causes crashes.

2. Destruction
C++11 standard says that std::terminate() is to be called and therefore causes the crash of the program, when std::thread object is destroyed before neither it is an empty object nor it becomes un-joinable. std::thread object has to call std::thread::join() or std::thread::detach() upon itself in order to become un-joinable. Call std::thread::joinable() to query its status.

Join
std::thread::join() is a very strong statement. Literally it says that the caller thread will block itself until the callee thread terminates - becoming un-joinable.
std::thread::join() is the way usually often used to make sure the multi-threading programming terminates gracefully. One of common practice is to join all children threads before returning from main() function.

Detach
std::thread::detach() is to tell this thread is to run in the background as daemon. After calling it std::thread object will become un-joinable and the parent thread will lose the control of the detached children thread because it does not hold a valid reference (std::thread object) to the children any more.

One of solutions to tackle this issue is to use RAII mechanism. Implement a class called like ScopedThread to call std::thread::join() or std::thread::detach() in the destructor if it is joinable.

3. Crashes
Thread Must Be Un-joinable before Out-Of-Life
C++11 standard says that std::terminate() will be called if a thread instance/object is still joinable before it comes out-of-life. This means that a std::thread object must call itself on function join() or detach() before its life comes out of scope, otherwise causes crash.

/*****************************************/
    struct Bar {
        void DoWork(int x, double y)
        {}
    };

    {
        Bar b;
        int x = 1;
        double y = 2.0;

        std::thread t1(&B:DoWork, b, x, y);
        // t1.join(); - block here until t1 returns
        // t1.deatch(); - t1 runs as a daemon
    } // must call join() or deatch() on t1 before hitting here.
  /*****************************************/

No Exception Thrown When Copying Arguments By Default
std::thread is to make a copy of every passing arguments into the function associated with it. C++11 standard says that std::terminated() is to be called if an exception is thrown during the copying process.

/*****************************************/
    struct Bar {
        Bar(const Bar&) {
            std::throw logic_error("Copy inhibited");
        }
        void DoWork(int x, double y)
        {}
    };

    Bar b;
    int x = 1;
    double y = 2.0;

    // crash as b is passed as an instance and a copy to be made
    // when creating std::thread object. But an exception is thrown
    // in Bar's copy constructor
    std::thread t1(&B:DoWork, b, x, y);
    // Fine reference/pointer is passed, Bar's copy-constructor not called
    std::thread t2(&B:DoWork, std::ref(b), x, y);
    std::thread t3(&B:DoWork, &b, x, y);
/*****************************************/

Dangling Reference or Pointer
This is nothing new for dangling reference/pointer causing crashes. But in std::thread there is one particular situation that need extra care - std::thread::deatch(), It means that a thread is running as daemon. It can still run on background when all other threads go out-of-scope. Make sure the daemon thread is not referring to any automatic variables from any other threads.

/*****************************************/
    struct Bar {
        void DoWork(int x, double& y)
        {
             // do some work and reference to y
        }
    };

    {
        Bar b;
        int x = 1;
        double y = 2.0;

        // object b is fine as passed as an object and therefore
        // an copy is made upon b.
        // but y is not ok. its reference is passed and cause dangling
        // if DoWork() has not return but t1.deatch() is called and run
        // out of this block.
        std::thread t1(&B:DoWork, b, x, std::ref(y));
        t1.deatch(); - t1 runs as a daemon
    }
  /*****************************************/

The Function Associated with std::thread Object Throw Exception but Uncaught
C++11 standard says that any exception cannot be propagated outside of std::thread object. Exceptions have to be caught inside std::thread object, otherwise std::terminated() is called. (C++11 provides other facilities to catch exception thrown outside thread like std::future, std::async, std::packaged_task and etc.)

/*****************************************/
    struct Bar {
        void DoWork(int x, double y)
        {
             std::throw logic_error("Causing crash");
        }
    };

    {
        Bar b;
        int x = 1;
        double y = 2.0;

        // std::terminated when DoWork() throws
        std::thread t1(&B:DoWork, b, x, y); 
        t1.join(); - block here until t1 returns
    }
  /*****************************************/

Ownership
std::thread is not copy-able but move-able. The left hand side has to point to an empty or become an un-joinable std::thread object. Otherwise the left hand side std::thread object is to be destroyed when it is still joinable before the right hand side object is moved into it. C++11 standard says this cause std::terminate() to be called.

/*****************************************/
    struct Bar {
        void DoWork(int x, double y)
        {
             std::throw logic_error("Causing crash");
        }
    };

    void Foo()
    {
        // do work
    }
    {
        std::thread t1(Foo);
        std::thread t2 = t1; // t1 points to nothing and t2 points to Foo
        t1 = std::thread(Foo); // t1 points to Foo too
        t2.deatch() // t2 becomes un-joinable now
        t2 = std::thread(Foo) // t2 points to Foo too
        t2 = t1; // Crash - t2 is still joinable
    }
  /*****************************************/

4. Deadlock
As I mentioned in Section 2 about std::thread::join(), it is a blocking function. Deadlock will happen if two threads are waiting for each other to terminate before they can carry on further.

/*****************************************/
    std::thread t1;
    std::thread t2
    void Bar()
    {
        // do some work
        t2.join(); // blocking until t2 to finish
        // continue work
    };

    void Foo()
    {
        // do some work
        t1.join(); // blocking until t1 to finish
        // continue work
    }

     // in parent thread
    {
        t1 = std::thread(Bar);
        t2 = std::thread(Foo);
    }
  /*****************************************/

It is quite similar with the deadlock happening to data-race, when there is cyclic-locking between multiple std::mutex. In this case there is also a cyclic joining between threads. The solution is also quite similar - avoid cyclic joining and always join in an order. Practically join all threads in the main thread.

Friday 13 January 2017

BTS - Find the Longest Consecutive Path

1. Problem Description
This is a Google interview question for software engineers from careercup. Here is the original thread description,
"
Find longest consecutive path in a binary tree.
1. the path can be decreasing or increasing, i.e [1,2,3,4] and [4,3,2,1] are both valid
2. the path can be child-parent-child, not necessarily a parent-to-child path

similar to this question: http://www.geeksforgeeks.org/longest-consecutive-sequence-binary-tree/
- wtcupup2017 December 10, 2016 in United States | Report Duplicate | Flag 
".

2. Data Structure and Algorithm
Pseudo-code
- Generate child-to-parent consecutive depth map for each node
    * Find the longest ascending sequence if its left child < itself
    * Find the longest descending sequence if its left child > itself
    * Find the longest ascending sequence of its right child < itself
    * Find the longest descending sequence if its right child > itself
    * 0 sequence if any child is null
    * 0 sequence if any child is equal to itself
- Find the node with longest consecutive node in above map
    * Take the longer sequence if both children has same order sequence
    * Sum the sequence if two children has difference order sequence

3. C++ Implementation
Some of utility function and class can be found on (also two similar problems) my two previous posts - BTS - Find the Longest Sequence  and BTS - Find the Deepest Path.

// ********************************************************************************
// Implementation
// ********************************************************************************
template <typename T>
size_t GenerateOrderedDepthMap(TreeNode<T> *root, DepthMap<T> &depth)
{
    if (!root) {
        // special case - NULL pointer
        return 0;
    }
    // depth of this node and save into the map
    size_t leftDepth = GenerateOrderedDepthMap(root->left, depth);
    size_t rightDepth = GenerateOrderedDepthMap(root->right, depth);
    if (leftDepth != 0) {
        // ascending order
        if (*root->data > *root->left->data) {
            auto itFound = depth.find(root->left);
            assert(itFound != depth.end());
            if (itFound->second.depthOfLeft) {
                // ascending order
                if (*root->left->data > *root->left->left->data) {
                    leftDepth = itFound->second.depthOfLeft + 1;
                }
            }
            if (itFound->second.depthOfRight) {
                // ascending order
                if (*root->left->data > *root->left->right->data) {
                    if (leftDepth < itFound->second.depthOfRight + 1) {
                        leftDepth = itFound->second.depthOfRight + 1;
                    }
                }
            }
        }
        else if (*root->data < *root->left->data) {
            // descending order
            auto itFound = depth.find(root->left);
            assert(itFound != depth.end());
            if (itFound->second.depthOfLeft) {
                // descending order
                if (*root->left->data < *root->left->left->data) {
                    leftDepth = itFound->second.depthOfLeft + 1;
                }
            }
            if (itFound->second.depthOfRight) {
                // descending order
                if (*root->left->data < *root->left->right->data) {
                    if (leftDepth < itFound->second.depthOfRight + 1) {
                        leftDepth = itFound->second.depthOfRight + 1;
                    }
                }
            }
        }
        else {
            leftDepth = 0;
        }
    }
    if (rightDepth != 0) {
        // descending order
        if (*root->data > *root->right->data) {
            auto itFound = depth.find(root->right);
            assert(itFound != depth.end());
            if (itFound->second.depthOfLeft) {
                // descending order
                if (*root->right->data > *root->right->left->data) {
                    rightDepth = itFound->second.depthOfLeft + 1;
                }
            }
            if (itFound->second.depthOfRight) {
                // descending order
                if (*root->right->data > *root->right->right->data) {
                    if (rightDepth < itFound->second.depthOfRight + 1) {
                        rightDepth = itFound->second.depthOfRight + 1;
                    }
                }
            }
        }
        else if (*root->data < *root->right->data) {
            // ascending order
            auto itFound = depth.find(root->right);
            assert(itFound != depth.end());
            if (itFound->second.depthOfLeft) {
                // ascending order
                if (*root->right->data < *root->right->left->data) {
                    rightDepth = itFound->second.depthOfLeft + 1;
                }
            }
            if (itFound->second.depthOfRight) {
                // ascending order
                if (*root->right->data < *root->right->right->data) {
                    if (rightDepth < itFound->second.depthOfRight + 1) {
                        rightDepth = itFound->second.depthOfRight + 1;
                    }
                }
            }
        }
        else {
            rightDepth = 0;
        }
    }
    depth.insert(std::make_pair(root, TreeNodeDepth(leftDepth, rightDepth)));
    return 1;
}
enum WhichChildToSearch
{
    INVALID = 0,
    LEFT,
    RIGHT,
    BOTH
};
template <typename T>
std::list<TreeNode<T>*> FindTheOrderedPathFromRoot(TreeNode<T> *root,
                                                   const DepthMap<T> & dm,
                                                   const size_t depth)
{
    std::list<TreeNode<T>*> path;
    if (!root) {
        return path;
    }
    size_t tempDepth = depth;
    auto itFound = dm.find(root);
    while (itFound != dm.cend()) {
        path.push_back(itFound->first);
        if (itFound->second.depthOfLeft == tempDepth) {
            itFound = dm.find(itFound->first->left);
        }
        else if (itFound->second.depthOfRight == tempDepth) {
            itFound = dm.find(itFound->first->right);
        }
        if (tempDepth == 0) {
            break;
        }
        --tempDepth;
    }
    assert(tempDepth == 0);
    return path;
}
template <typename T>
std::list<TreeNode<T>*> FindTheLongestOrderedSeq(TreeNode<T> *root)
{
    DepthMap<T> dm;
    // generate depth map
    GenerateOrderedDepthMap(root, dm);
    auto itEnd = dm.cend();
    TreeNode<T>* nodeFound = NULL;
    size_t maxDepth = 0;
    WhichChildToSearch maxDepthSearch = INVALID;
    size_t tmp;
    WhichChildToSearch search = INVALID;
    // find the longest consecutive order
    for (auto it = dm.cbegin(); it != itEnd; ++it) {
        tmp = 0;
        if (!it->first->left) {
            tmp = it->second.depthOfRight;
            search = RIGHT;
        }
        else if (!it->first->right) {
            tmp = it->second.depthOfLeft;
            search = LEFT;
        }
        else {
            if ((*it->first->left->data < *it->first->data &&
                *it->first->data < *it->first->right->data) ||
                (*it->first->left->data > *it->first->data &&
                    *it->first->data > *it->first->right->data)) {
                tmp = it->second.depthOfLeft + it->second.depthOfRight;
                search = BOTH;
            }
            else {
                if (it->second.depthOfLeft > it->second.depthOfRight) {
                    tmp = it->second.depthOfLeft;
                    search = LEFT;
                }
                else {
                    tmp = it->second.depthOfRight;
                    search = RIGHT;
                }
            }
        }
        if (tmp > maxDepth) {
            maxDepth = tmp;
            maxDepthSearch = search;
            nodeFound = it->first;
        }
    }

    // populate the sequence
    std::list<TreeNode<T>*> path;
    if (nodeFound && maxDepthSearch) {       
        if (maxDepthSearch == LEFT || maxDepthSearch == RIGHT) {
            path = FindTheOrderedPathFromRoot(nodeFound,
                                              dm,
                                              maxDepth);
            if (maxDepthSearch == LEFT) {
                path.reverse();
            }
        }
        else {
            auto itFound = dm.find(nodeFound);
            assert(itFound != dm.cend());
            path = FindTheOrderedPathFromRoot(nodeFound->left,
                                              dm,
                                              itFound->second.depthOfLeft - 1);
            std::list<TreeNode<T>*> rightPath = FindTheOrderedPathFromRoot(
                                                    nodeFound,
                                                    dm,
                                                    itFound->second.depthOfRight);
            path.reverse();
            path.splice(path.end(), rightPath);
        }
        assert(path.size() == maxDepth + 1);
    }
    return path;
}

// ********************************************************************************
// Test
// ********************************************************************************
void TestFindTheLongestOrderedSeq()
{
    std::vector<double> input = { 1.0, 2.0, 3.0, 4.0, 5.0 };
    TreeNode<double> *root = NULL;
    //              3
    //      1               4
    //          2               5
    root = ConstructTreeOnSortedValueBFS(input);
    auto path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0 });
    input = { 6.0, 7.0, 8.0, 9.0 };
    TreeNode<double> *subTree1 = ConstructTreeOnSortedValueDFS(input);
    input = { 10.0, 11.0, 12.0, 13.0 , 14.0 };
    TreeNode<double> *subTree2 = ConstructTreeRecursive(input);
    input = { 15.0, 16.0, 17.0, 18.0 };
    TreeNode<double> *subTree3 = ConstructTreeOnSortedValueDFS(input);
    input = { 19.0, 20.0, 21.0, 22.0 , 23.0 };
    TreeNode<double> *subTree4 = ConstructTreeRecursive(input);
    TreeNode<double> *leftestNode = FindLeftestNode(root);
    assert(leftestNode != NULL);
    leftestNode->left = subTree1;
    //              3
    //         1        4
    //      8     2        5
    //    6   9
    //  7
    path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0 });
    TreeNode<double> *rightestNode = FindRightestNode(root);
    assert(rightestNode != NULL);
    rightestNode->left = subTree2;
    //              3
    //         1        4
    //      8     2         5
    //    6   9          12
    //  7            10      13
    //                  11      14
    path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0, 12.0, 13.0, 14.0 });
    TreeNode<double> *firstLeftRighestNode = FindRightestNode(root->left);
    assert(firstLeftRighestNode != NULL);
    firstLeftRighestNode->right = subTree3;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //               16            11      14
    path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0, 12.0, 13.0, 14.0 });
    TreeNode<double> *firstRightLeftestNodeOfS2 = FindRightestNode(subTree2->left);
    assert(firstRightLeftestNodeOfS2 != NULL);
    firstRightLeftestNodeOfS2->left = subTree4;
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10       13
    //               16           11      14
    //                         21
    //                    19     22
    //                       20       23
    path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0, 12.0, 13.0, 14.0 });
    TreeNode<double> *rightestNodeOfS2 = FindRightestNode(subTree2);
    assert(rightestNodeOfS2 != NULL);
    for (size_t i = 0; i < 8; ++i) {
        rightestNodeOfS2->right = new TreeNode<double>(100.0 + i);
        rightestNodeOfS2 = rightestNodeOfS2->right;
    }
    //                    3
    //         1              4
    //      8     2                   5
    //    6   9     17          12
    //  7         15   18   10     13
    //              16           11      14
    //                        21              100
    //                    19     22            101
    //                      20       23          102
    //                                                  103
    //                                                     104
    //                                                        105
    //                                                          106
    //                                                            107
    // clean up
    path = FindTheLongestOrderedSeq(root);
    TestResult<double>(path, { 1.0, 3.0, 4.0, 5.0, 12.0, 13.0, 14.0, 100.0,
        101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0});
    // clean up
    DeleteTree_TD(&root);
}