Tuesday 29 April 2014

Design pattern - Builder design pattern

1. Introduction
Builder design pattern is one of four creation design patterns. For more details about it please read the chapter in [1]. As shown in Figure 1 it resides between Product and Director. Director is to call buildPart() and then call getResult() to retrieve a reference/pointer to product.

                                                   Figure 1 [2]
             
Depend on how complex the application is and decide if both abstract and concrete builder layers are needed.
Builder design pattern is usually come to play when the product is quite complex (a lot of components to configure) and has a lot of options to set, which could lead to a lot of constructors that need to implement based on the possible combinations of options. (However factory method pattern is suitable for classes with under handful of constructors.)

Builder design pattern has been proven quite useful in scientific computing from my personal experience, where some solvers or systems/problems could have a list of parameters to configure. Builder design pattern can not only be used for object creation of such classes but also be very good fit for implementing unit test for such classes.

2. Used for object creation of final class (before C++11)
Before C++11 the constructor of a final class has to be declared as "private" in order to force this class not to be inheritable. Therefore some creation techniques need to be employed to create the instance of final class. Builder design pattern provides as one of solutions along with factory method pattern and friend standalone creation method. Simply declaring builder class as the friend of the final class will solve this problem. More details please see my other blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/final-non-inheritable-class.html.

Note: C++11 introduced a class attribute "final" to solve this problem gracefully.

3. Replacement for telescoping constructor
This has been proved quite a case in scientific computing. Some solvers can have a list of parameters to configure. It is quite common to have more than a dozen of parameters for solvers. For instance, fmincon solver in Matlab has more than 30 solution parameters [4]. Therefore it is impossible to telescope all the combinations of constructors. Builder design pattern is a very good solution for this kind of problem.

Telescoping constructor: example solver with 12 options
//********************************************************************************
class Solver {
public:
    // telescoping constructor
    Solver(); // take default options
    Solver(double op1); // override option_1 and take the rest as default
    Solver(double op2); // override option_1 and option_2 and take the rest as default 
    // more combinations of arguments
    Solver(double op1;
           double op2;
           double op3;
           double op4;
           int op5;
           int op6;
           int op7;
           int op8;
           bool op9;
           bool op10;
           bool op11;
           bool op12); // override all options
 
private:
    void InitOptions() {
        // set the default options
    }
    // Configuration: solver's parameters/options
    double option_1;
    double option_2;
    double option_3;
    double option_4;
    int option_5;
    int option_6;
    int option_7;
    int option_8;
    bool option_9;
    bool option_10;
    bool option_11;
    bool option_12;
};
//********************************************************************************

The total number of combination [5]: C(12, 0) + C(12, 1) + C(12, 2) + ... + C(12, 12) = 2^12 = 4096. So given a list of n options, then the total number of combinations will be
   sigma(C(n, k)) , where k bounded to [0, n]
   which is eqaul to 2^n. 
Of course this is the worse case with the assumption that all the options are independent. Normally not this bad. Even the dependency of options is considered, the combination is still unbearable to enumerate all the constructors. Here builder design pattern is the right one for this issue.

Solution with builder design pattern
//********************************************************************************
class Solver {
    // sole constructor
    Solver(double op1;
           double op2;
           double op3;
           double op4;
           int op5;
           int op6;
           int op7;
           int op8;
           bool op9;
           bool op10;
           bool op11;
           bool op12); // override all options
 
private:
    // Configuration: solver's parameters/options
    double option_1;
    double option_2;
    double option_3;
    double option_4;
    int option_5;
    int option_6;
    int option_7;
    int option_8;
    bool option_9;
    bool option_10;
    bool option_11;
    bool option_12;
};

class SolverBuilder{
public:
    SolverBuilder() {
        InitOptions();
    }
    
    SolverBuilder& SetOption1(double op1) {
        op_1 = op1;
        return *this;
    }
    // ......
    void SetOption12(bool op12) {
        op_12 = op12;
        return *this;
    }

    Solver* CreateInstance() {
        return new Solver(op_1, op_2, op_3, op_4, op_5, op_6,
                          op_7, op_8, op_9, op_10, op_11, op_12);
    }
private:
    void InitOptions() {
        // set the default options
    }
    // Configuration: solver's parameters/options
    double op_1;
    double op_2;
    double op_3;
    double op_4;
    int op_5;
    int op_6;
    int op_7;
    int op_8;
    bool op_9;
    bool op_10;
    bool op_11;
    bool op_12;
};

void foo () {
    SolverBuilder sb;
    Solver* solverWithDefaultOps = sb.CreateInstance();
    Slover* solverOverrideOp1and12 = sb.SetOption1(1.0).SetOption12(true).CreateInstance();

    //......
}
//********************************************************************************

Here I have demonstrated the power of builder design pattern as an excellent replacement of telescoping constructors. Simply there is only one constructor needed and the default options are enforced in the builder.
In C++11 the solution is even more beautiful, as default value is directly supported. Read my other blog entry for more details, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/c11-features-definitely-worth-learning_20.html.

C++11 solution: default value supported
//******************************************************************************
class SolverBuilder{
public:
    // use default constructor

    SolverBuilder& SetOption1(double op1) {
        op_1 = op1;
        return *this;
    }
    // ......
    SolverBuilder& SetOption12(bool op12) {
        op_12 = op12;
        return *this;
    }

    Solver* CreateInstance() {
        return new Solver(op_1, op_2, op_3, op_4, op_5, op_6,
                          op_7, op_8, op_9, op_10, op_11, op_12);
    }
private:
    // Configuration: solver's parameters/options
    double op_1 = 0.0;
    double op_2 = 0.0;
    double op_3 = 0.0;
    double op_4 = 0.0;
    int op_5 = 0;
    int op_6 = 0;
    int op_7 = 0;
    int op_8 = 0;
    bool op_9 = false;
    bool op_10 = false;
    bool op_11 = false;
    bool op_12 = false;
};
//******************************************************************************

4. Used for object creation of class together with UI
This is the same user case as last one (Section 3). Here I would like to emphasize this because this kind of applications are existing everywhere from daily life to daily work. For instance buying a flight ticket or creating C++ project in Microsoft Visual Studio. Life is getting so simple now by clicking the buttons and select (scroll down) the options.
Builder design pattern can be a very good solution for this kind of application. Each UI is used to present available choices of each options to the users. As long as selected, call function from builder to set the option. Then advance to the next option. When all options are set (with "Confrim" button in GUI or "Yes" in command line), then call function to create the instance from builder. As follows,

->UI for Option 1            ->UI for option 2       -> ... ->UI for all options to confirm
    ->Select choice                ->Select choice                    ->Yes
        ->Set option on builder   -> Set option on builder       ->Create instance from builder

In this application a builder class can always sits behind the UI and take the option from user's choice. When options are confirmed, then create the object and return. I am not a UI design expert but builder design pattern has been widely used by them after consulting with my colleagues in UI team.

5. Better unit tests implementation
I find builder design pattern is extremely useful to indicate what test case is carried out and how this test function body is to be implemented, especially fit/useful against some code that is not very tested. Here is one example that  utilize builder design pattern to build unit tests.

Builder design pattern: unit test
//********************************************************************************
class Solver {
    // sole constructor
    Solver(double op1;
           double op2;
           double op3;
           double op4;
           int op5;
           int op6;
           int op7;
           int op8;
           bool op9;
           bool op10;
           bool op11;
           bool op12); // override all options
    bool Solve(double, double); // implement the unit test
 
private:
    // Configuration: solver's parameters/options
    double option_1;
    double option_2;
    double option_3;
    double option_4;
    int option_5;
    int option_6;
    int option_7;
    int option_8;
    bool option_9;
    bool option_10;
    bool option_11;
    bool option_12;
};

class SolverUnitTestBuilder : public CppUnit::TestFixture
{
public:  
    SolverUnitTestBuilder& SetOption1(double op1) {
        op_1 = op1;
        return *this;
    }
    // ......
    SolverUnitTestBuilder& SetOption12(bool op12) {
        op_12 = op12;
        return *this;
    }
 
    std::unique_ptr<Solver*> CreateInstance() {
        return std::unique_ptr<slover*>(
        new Solver(op_1, op_2, op_3, op_4, op_5, op_6,
                            op_7, op_8, op_9, op_10, op_11, op_12));
    }

protected:
    // Op1 = 3.0
    // Op2 = 4.0
    // Op12 = 0 (false)
    // Test function Solve(0, 0)
    // Expected it return 1 (true)
    void TestSolveGivenOp1_3_Op2_4_Op12_0_Against_0_0_Expected_1() {
        CPPUNIT_ASSERT(
        (*this).SetOption1(3).SetOption2(4).SetOption12(false).CreateInstance()->Solve(0,0));
    }
private:
    CPPUNIT_TEST_SUITE(TestSolveGivenOp1_3_Op2_4_Op12_0_Against_0_0_Expected_1);
};
//********************************************************************************

Builder design pattern is used here to write self-documented code. The other programmers just need to read the function names in the header file and know what this test function is to test. Something is even better when coming to the implementation. Simply read the function name, set the correct option, create the instance, call the function and test against the expected value.
It removes the ambiguity or errors on the comments about what the code will test and how the test is setup. And the programmer does not need to switch between the header file and the function body to check what actually this test is to test against. This is especially useful when the code is under test and the team is to catch up with testing by adding a couple tests more each time when they are touching the code. This is a very good self-documented programming to tell other programmers what have and have not tested. This approach is very helpful for better communication, easy implementation and simply code reviewing.
    - Testing function naming: test function+test setting+test function argument+expected behavior
    - Implementation by using builder design pattern

Note: there is a defect of this method when coming to naming the test method. The value of option involves fractions, for instance op1 = 1.1, because C++'s function name can't have "." in naming. In my case I use "1_dot_1" for naming to represent "1.1". The naming convention is a big issue for every team. Rule is simple. Define rules for all. Or following others good examples, like Google C++ coding standard [4].

Bibliography:
[1] GOF, Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, "Design Patterns - Elements of Reusable Object-Oriented Software", 1994
[2] http://en.wikipedia.org/wiki/Builder_pattern
[3] Google C++ coding standard, http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
[4] http://www.mathworks.co.uk/help/optim/ug/optimoptions.html
[5] Donald E. Knuth "The Art of Computer Programming", Volume 2, Fundamental Algorithm, Third Edition

Friday 25 April 2014

Threading - Deadlock

1. Problem
The classical description of deadlock is dining philosophers problem [1]. Briefly it says that 5 philosophers are sitting at a round table to have dinner. They can't eat until they hold both the left and the right forks. As long as holding two forks, they can eat for a while and put down the fork. And others can grab the fork and get a chance to eat. There are states where the 5 philosophers can get into and none of them can eat. For instance each of them holds the fork on their left hand side and waits for others to release the fork. Because they can't talk to each other and none of them will release the fork deliberately, and therefore none of them can eat of course. At this state no further progress can make, which is called deadlock.

From this abstract problem, we can come up a few concepts:
- Thread: each philosopher acts independently and has a task to accomplish
- Shared resource: the forks that the philosophers have to acquire before acting
- Mutual exclusion: as long as the fork is held by a philosopher, no other can hold it any more until it is released.
- Multiple sequential resource acquisition: each philosopher has to acquire two forks sequentially then he can carry on eating.
- Deadlock: a state that no progress can make

Besides this classical description of deadlock I would like share with you two examples that I met in my previous projects and we can conclude some common issues in they share where deadlock could happen.

Deadlock example 1:
//********************************************************************************
class BankAccount{
public:
    void Lock();
    void Unlock();
    void Pay();
    void Receive();
private:
    long long sort_code;
    long long account_no;
    std::string name;
    std::string postcode
};
void Transfer(BankAccount& src, BankAccount& des, double amount)
{
    // lock these two accounts
    src.Lock();
    des.Lock();
    // source bank account pay the money
    // destination bank account receive the money
    src.Pay(amount);
    des.Receive(amount);
    // unlock the two accounts
    des.Unlock();
    src.Unlock();
}

BankAccount alan_smith;
BankAccount bob_brown;

// thread 1
Transfer(alan_smith, bob_brown, 500);

// thread 2
Transfer(bob_brown, alan_smith, 100);
//********************************************************************************

In the Transfer() function two bank accounts have to be locked at the same time before doing the transaction. (Someone might think that it is not necessary to lock both accounts at the same time to do the transaction. But in the fact it has to, because it has to done in one transaction otherwise it will affect the balance of the account at a time. Especially at multiple threading application if transactions can be interleaved, then some invalid action could be invoked if balance is not latched every single time. For instance, the bank account could go negative and prevent the "pay" action.) This problem shared a common issue with the dinning philosophers problem, - both of them try to acquire multiple resources. Therefore the deadlock can happen. For instance, two threads try to do the transaction on the same account as shown in Example 1. The deadlock happens when both threads execute until both have acquired the source bank account and then tries to acquires the destination account. Here comes the problem that alan_smith is acquired by thread 1 but bob_brown is acquired by thread 2. Neither of them will be able to acquire to the other/destination bank account.

Another good example is loading DLLs in Windows, when you are developing your own dynamic linked library. The implementation of DllMain() [2] will be very critical.

                                                     Figure 1 [3]
Keep in mind that there is already a lock when coming into DllMain(). Trying to lock anther resource for any reason, for instance loading a COM object, creating another process, could cause mysterious deadlock, which has proven to me very difficult to spot and debug.

2. Common issues shared by the examples
From the above 3 examples deadlock can only happen when the thread get into a certain state. Here we can conclude a few things they share all.
- Multiple threads works independently
- Each thread must acquire multiple shared resource in order to act
- Mutual exclusion: as long as a resource is held, it can't be held by others until it is released by its current owner
- No preemption: no thread has the privilege to grab resource from others
- Sequential multiple resource acquisition: a thread have to grab all the resources it needs before it can act, otherwise have to wait
- No communication between them: they can't talk to each other and release their resource deliberately in order to unlock the deadlock situation.

3. Solutions
There are a few solutions presented in the wiki page [1]. Here I would like to discuss a couple solutions I used in my previous projects.

Acquire the multiple resource in the order
This solution is very close to the one from Dijkstra described on the wiki page [1]. What it is saying is that the shared resources are labeled with an ID. When each thread tries to acquire multiple resources, it has to acquire them in an order, which is decided by the ID labeling the shared resources. As Dijkstra suggested in his original solution to assigned a partial order to the forks (shared resources), then all the philosophers have to hold the fork in the same order (ascending or descending order).
In a programming application, some techniques can be used to assign the ID to the shared resources to ensure the partial order in the universal scope. It uses the hash function to take a few data from the shared resources to generate unique ID for each shared resource. Then threads acquire the shared resource in the ascending/descending order of ID.
In Example 1 use any combination of private data member of BankAccount as the hash key to generate the ID (hash value). And in function Transfer() lock the BankAccount in the ascending/descending order of the ID.

Solution - Deadlock example 1:
//********************************************************************************
class BankAccount{
public:
    // ......
    size_t GetID();
private:
    // ......
};
void Transfer(BankAccount& src, BankAccount& des, double amount)
{
    // lock these two accounts in ascending order
    if (src.GetID() < des.GetID()) {
        src.Lock();
        des.Lock();
    } else {
        des.Lock();
        src.Lock();
    }
    // source bank account pay the money
    // destination bank account receive the money
    src.Pay(amount);
    des.Receive(amount);
    // unlock the two accounts
    if (src.GetID() < des.GetID()) {
        des.Unlock();
        src.Unlock();
    } else {
        src.Unlock();
        des.Unlock();
    }
}
//********************************************************************************

With this solution as long as thread 1 acquires any resource, it will prevent thread 2 from acquiring any shared resource until it finishes the transaction. Then these two threads will never get into the position that each of them holds only half the resources they need and constantly wait for another to release the other half. So problem solved!!!

A master thread + work threads
This idea is similar to the "arbitrator solution" in [1] but not exactly the same. The difference is that in "arbitrator solution", the waiter works as the arbitrator but passively waits for the query from philosophers. But in this solution the master thread works as the arbitrator but it works actively. More precisely it works as a scheduler, oversees the shared resources and decides which threads to go next or together. It even does not need to spool all the threads in advance. It can create and destroy the working threads as needed, which will certainly reduce the situations of race conditions. The downside of this approach is the performance penalty caused by frequently creating/destroying threads (memory fragments and so on, read my other blog entries for details about how memory affects performance, "Run out of stack" and "Run out of heap"). But this is not a big issue. Techniques like pre-created threads pool will reduce the impact on the performance.
As for implementation, in POSIX [4] condition variable and wait()/notify() mechanism can be used for this solution.

4. Tips to avoid deadlock
When design a multiple threading application, deadlock is certainly what we should avoid. A few patterns in your design serve good warnings for potential deadlock.
- Sequential resource acquisition: if they can be acquired in any order, then it is a big warning sign. Use the first solution to fix it.
- The dependency between the threads and shared resources is in a closed loop. As the dining philosophers problem illustrated, the round table describe the relationship between threads and resources. They are in a loop. Be careful if this loop is detected.

More generally think about the design
- If single mutual exclusion works, never use multiple
- If multiple acquire/release (acquire-work-release, acquire-work-release, ...) fits your needs, never use sequential shared resource acquisition (acquire-acquire- ... - work - release -... - release)

Bibliography:
[1] http://en.wikipedia.org/wiki/Dining_philosophers_problem
[2] http://msdn.microsoft.com/en-us/library/ms682583.aspx
[3] http://msdn.microsoft.com/en-us/library/windows/desktop/dn633971(v=vs.85).aspx
[4] http://www.gnu.org/software/pth/

Thursday 24 April 2014

Data Structure and Algorithm - Sweep Line Algorithm

1. Problem
In the first-order differential algebraic equation system, the variables are often expressed in the function of time,
    X = f(t),
And most of time the function is not continuous but dis-continuous. An common occasions the function f(t) is piecewise-const, piecewise-linear or piecewise-quadratic. No matter what function f(t) is, as the simulation advances (the time t goes on), the variables have to be updated and therefore the problem (most likely the constraints) have to be updated as well. In this process, an algorithm should be implemented to find out what intervals the current active variables are in. A typical scenario is that half-million ~ millions variables with dozens of intervals for each. So the data level could be reaching tens of millions or even billions.

Here I would like to implement a small algorithm that is to find out how many active intervals are hit by the current value.

2. Sweep line algorithm:
Sweep line algorithms: http://en.wikipedia.org/wiki/Sweep_line_algorithm. Interval's lower bounds and upper bounds are sorted into two arrays. In this case we use qsort. Then find out how many lower bounds and upper bounds the number have been swept against a given value. The count  of this number crossing the intervals is equal to the difference of sweeping the lower bounds and upper bounds.

For say the intervals has 50 millions. 50 millons is > 2^25 and < 2^26. For a number it takes 26*2 = 52 times searching at most.

3. Implementation
Assumption:
- The maximally size of interval source file (extents.txt) is limited to 50 millions records, so these should be no problem to load it to memory. It takes roughly 0.088 seconds to load 100K lines of input file. (Read from the file, process the string and sort the lower and upper bounds)
- The number input file could be huge, so the multi-threads (producer-consumer mechanism) could boost the performance. As the size could be billions, so the reading from the file has to be chopped into multi-producer threads.
- The extends.txt and numbers.txt have to be end with an empty line to parse properly

System:
Platform: Windows 8, 64 bit version
Machine: i3-3327U @1.9G, 6G RAM
Compiler: Visual Studio 2010 Express (Win32 Console Application)

Delivery:
A single thread version
A multi thread version: 1 producer thread and 1 consumer thread

Performance: 
- Output directly to file
Single thread: 0.196 second to read numbers from 100K lines of input file and process it
Multi-thread: 0.170 second to read numbers from 100K lines of input file and process it
- Output to screen
Multi-thread has no advantage over single-thread as both takes around 31 seconds.
- Analysis: The most of time is consumed is to display to the screen.

//********************************************************************************
#include "stdafx.h"
#include <algorithm>
#include <cassert>
#include <ctime>
#include <iostream>
#include <iterator>
#include <fstream>
#include <queue>
#include <string>
#include <vector>

// system specific headers
#include <Windows.h>
#include <process.h>

using namespace std;

//********************************************************************************
namespace Utility {
class BigTextFile
{
protected:
BigTextFile(const string &fileName) : m_FileName(fileName)
    {}

virtual ~BigTextFile() {};

private:
string m_FileName;
};

class BigTextInputFile : BigTextFile
{
public:
BigTextInputFile(const string &fileName);
~BigTextInputFile();

// put the lines of data into text
// and return the acutal lines that read into text
size_t ReadLines(size_t lines, string &text);

// read the whole file into one string
void ReadAll(string &text);

private:
ifstream m_IFileStream;
};

class BigTextOutputFile : BigTextFile
{
public:
BigTextOutputFile(const string &fileName);
~BigTextOutputFile();

void WriteResults(vector<unsigned int> &results);

private:
ofstream m_OFileStream;
};

class DataParser{
public:
static void ParseIntervalStringToNumber(const string &text,
vector<unsigned int> &lowerBounds,
vector<unsigned int> &upperBounds);
static void ParseValueStringToNumber(const string &text,
vector<unsigned int> &vals);
static void ParseOneIntervalToBounds(const string &oneInterval,
unsigned int &lowerBound,
unsigned int &upperBound);
static void ParserOneLineValueStringToNumber(const string &oneLineValueString,
vector<unsigned int> &vals);
};

BigTextInputFile::BigTextInputFile(const string &fileName)
: BigTextFile(fileName),
  m_IFileStream(fileName)
{
}

BigTextInputFile::~BigTextInputFile()
{
if (m_IFileStream.is_open()) {
m_IFileStream.close();
}
}

size_t BigTextInputFile::ReadLines(size_t lines, string &text)
{
if (m_IFileStream.is_open()) {
size_t readLines= 0;
text.clear();
string oneLineString;
while (readLines < lines && m_IFileStream.good()) {
getline(m_IFileStream, oneLineString);
text += '\n';
text += oneLineString;
++readLines;
}
if (m_IFileStream.good()) {
text += '\n'; // append an return at the end if not EOF
}

return readLines;
}
else {
throw std::runtime_error("BigTextFile:: invalid file");
}
}

void BigTextInputFile::ReadAll(string &text)
{
if (m_IFileStream.is_open()){
text = string((istreambuf_iterator<char>(m_IFileStream)),
                            istreambuf_iterator<char>());
}
else {
throw std::runtime_error("BigTextFile:: invalid input file");
}
}

BigTextOutputFile::BigTextOutputFile(const string &fileName)
: BigTextFile(fileName), m_OFileStream(fileName)
{
}

BigTextOutputFile::~BigTextOutputFile()
{
if (m_OFileStream.is_open()) {
m_OFileStream.close();
}
}

void BigTextOutputFile::WriteResults(vector<unsigned int> &results)
{
if (m_OFileStream.good()) {
copy(results.begin(), results.end(),
                 ostream_iterator<unsigned int>(m_OFileStream,"\n"));
}
else{
throw std::runtime_error("BigTextFile:: invalid output file");
}
}

void DataParser::ParseIntervalStringToNumber(const string &text,
  vector<unsigned int> &lowerBounds,
  vector<unsigned int> &upperBounds)
{
size_t found, cur_pos = 0;
string oneIntervalString;
unsigned int lowerBound, upperBound;
while ((found = text.find('\n', cur_pos + 1)) != string::npos) {
oneIntervalString = text.substr(cur_pos, found - cur_pos);
ParseOneIntervalToBounds(oneIntervalString, lowerBound, upperBound);
lowerBounds.push_back(lowerBound);
upperBounds.push_back(upperBound);
cur_pos = found;
}
}

void DataParser::ParseValueStringToNumber(const string &text,
     vector<unsigned int> &values)
{
size_t found, cur_pos = 0;
string oneLineString;
while ((found = text.find('\n', cur_pos + 1)) != string::npos) {
oneLineString = text.substr(cur_pos, found - cur_pos);
ParserOneLineValueStringToNumber(oneLineString, values);
cur_pos = found;
}
}

void DataParser::ParserOneLineValueStringToNumber(const string &oneLineValueString,
 vector<unsigned int> &vals)
{
size_t found, cur_pos = 0;
string oneValueString;
while ((found = oneLineValueString.find(' ', cur_pos + 1)) != string::npos) {
oneValueString = oneLineValueString.substr(cur_pos, found - cur_pos);
vals.push_back(atoi(oneValueString.c_str()));
cur_pos = found;
}

// hit here if oneValueString has one one value left
if (!oneLineValueString.empty()) {
vals.push_back(atoi(oneLineValueString.c_str()));
}
}

void DataParser::ParseOneIntervalToBounds(const string &oneInterval,
 unsigned int &lowerBound,
 unsigned int &upperBound)
{
size_t found;
if ((found = oneInterval.find(' ')) != string::npos){
string str_LowerBound = oneInterval.substr(0, found);
        string str_UpperBound = oneInterval.substr(found + 1, oneInterval.size() - found -1);
lowerBound = atoi(str_LowerBound.c_str());
upperBound = atoi(str_UpperBound.c_str());
}
else{
throw std::runtime_error("Invalid Interval Input");
}
}
}; // Utility

//********************************************************************************
namespace Intervals{
enum IntervalCounterAlgType : unsigned int{
SWEEPLINE,
OTHER_ALG
};

// interface class
class IntervalCounter{
public:
virtual ~IntervalCounter() = 0  {};
// set the intervals based on the exisiting intervals
virtual void SetIntervals(const vector<unsigned int> &lowerBounds,
 const vector<unsigned int> &upperBounds) = 0;

// Read intervals from the files
virtual void SetIntervals(const string &fileName) = 0;

// Get the counter against a vlaue
virtual unsigned int GetCounter(unsigned int val) = 0;

// Get the counters against a list of value
virtual void GetCounters(const vector<unsigned int> &vals,
      vector<unsigned int> &counters) = 0;
};

// algorithms to detect the counters of intervals
// Use template method pattern
class IntervalCounterDetector{
public:
virtual ~IntervalCounterDetector() = 0 {};

// any preprocess work need to be done before calling detect
    // one of template methods
virtual void PreProcess(vector<unsigned int> &lowerBounds,
            vector<unsigned int> &upperBounds) = 0;
// detect the coutner
virtual unsigned int DetectCounter(unsigned int val,
      const vector<unsigned int> &lowerBounds,
    const vector<unsigned int> &upperBounds) = 0;
virtual void DetectCounters(const vector<unsigned int> &vals,
  vector<unsigned int> &results,
  const vector<unsigned int> &lowerBounds,
  const vector<unsigned int> &upperBounds) = 0;
};

// implementation of IntervalCounter
// use strategy design pattern to hold a (smart) pointer to
// the IntervalCounterDetector
// Strategy pattern for scientific computing
class IntervalCounterImp : public IntervalCounter {
public:
IntervalCounterImp(IntervalCounterAlgType);
virtual ~IntervalCounterImp();

public:
void SetIntervals(const vector<unsigned int> &lowerBounds,
          const vector<unsigned int> &upperBounds);
void SetIntervals(const string &intervalSrcFile);
unsigned int GetCounter(unsigned int val);
void GetCounters(const vector<unsigned int> &vals,
    vector<unsigned int> &coutners);
private:
void InitDetector(IntervalCounterAlgType);

private:
// list of start values of intervals
vector<unsigned int> m_LowerBounds;
// list of end values of intervals
vector<unsigned int> m_UpperBounds;

// algorithms
std::auto_ptr<IntervalCounterDetector> m_Detector;
};

// sweep line algorithm  to detect the counter
class SweepLineAlg : public IntervalCounterDetector{
public:
SweepLineAlg();
~SweepLineAlg();

// this function is to preprocess on the intervals
void PreProcess(vector<unsigned int> &lowerBounds,
vector<unsigned int> &upperBounds);

unsigned int DetectCounter(unsigned int val,
  const vector<unsigned int> &lowerBounds,
  const vector<unsigned int> &upperBounds);
void DetectCounters(const vector<unsigned int> &vals,
vector<unsigned int> &results,
const vector<unsigned int> &lowerBounds,
const vector<unsigned int> &upperBounds);
private:
static int CompareFunc(const void *a, const void *b);

struct SweepLineFunctor{
SweepLineFunctor(const vector<unsigned int> &lowerBounds,
                                const vector<unsigned int> &upperBounds);

unsigned int operator()(unsigned int val);

const vector<unsigned int> &m_SortedLowerBounds;
const vector<unsigned int> &m_SortedUpperBounds;
};

};

IntervalCounterImp::IntervalCounterImp(IntervalCounterAlgType algType)
{
InitDetector(algType);
}

IntervalCounterImp::~IntervalCounterImp()
{
}

void IntervalCounterImp::InitDetector(IntervalCounterAlgType algType)
{
switch(algType){
case SWEEPLINE:
m_Detector = auto_ptr<IntervalCounterDetector>(new SweepLineAlg());
break;
default:
assert(false);
break;
}
}

void IntervalCounterImp::SetIntervals(const vector<unsigned int> &lowerBounds,
 const vector<unsigned int> &upperBounds)
{
// shrink the size
vector<unsigned int>().swap(m_LowerBounds);
vector<unsigned int>().swap(m_UpperBounds);

// make the copy -- not good for big data
m_LowerBounds = lowerBounds;
m_UpperBounds = upperBounds;

// get ready for use
m_Detector->PreProcess(m_LowerBounds, m_UpperBounds);
}

void IntervalCounterImp::SetIntervals(const string &intervalSrcFileName)
{
// populate out the intervals to m_LowerBound and m_UpperBound
Utility::BigTextInputFile file(intervalSrcFileName);
string text;
file.ReadAll(text);

// parse the result
Utility::DataParser::ParseIntervalStringToNumber(text,
                                                                               m_LowerBounds,
                                                                               m_UpperBounds);

m_Detector->PreProcess(m_LowerBounds, m_UpperBounds);
}

unsigned int IntervalCounterImp::GetCounter(unsigned int val)
{
return m_Detector->DetectCounter(val, m_LowerBounds, m_UpperBounds);
}

void IntervalCounterImp::GetCounters(const vector<unsigned int> &vals,
vector<unsigned int> &counters)
{
m_Detector->DetectCounters(vals, counters, m_LowerBounds, m_UpperBounds);
}

SweepLineAlg::SweepLineAlg()
{
}
SweepLineAlg::~SweepLineAlg()
{
}
void SweepLineAlg::PreProcess(vector<unsigned int> &lowerBounds,
 vector<unsigned int> &upperBounds)
{
// sort the start/end arraries of intervals
qsort(&lowerBounds[0],
             lowerBounds.size(),
             sizeof(unsigned int),
             SweepLineAlg::CompareFunc);
qsort(&upperBounds[0],
             upperBounds.size(),
             sizeof(unsigned int),
             SweepLineAlg::CompareFunc);
}

int SweepLineAlg::CompareFunc(const void *a, const void *b)
{
return (*(const unsigned int*)a - *(const unsigned int*)b);
}

SweepLineAlg::SweepLineFunctor::SweepLineFunctor(const vector<unsigned int> &lowerBounds,
const vector<unsigned int> &upperBounds)
    : m_SortedLowerBounds(lowerBounds), m_SortedUpperBounds(upperBounds)
{
}

// This is actual core of sweep line algorithm
// the difference between the swept lowerBound and upperBound
unsigned int SweepLineAlg::SweepLineFunctor::operator()(unsigned int val)
{
// find out how many "lowerBounds" and "upperBounds" val has swept
vector<unsigned int>::const_iterator sweepCountInLowerBound = 
upper_bound(m_SortedLowerBounds.begin(), m_SortedLowerBounds.end(), val);
vector<unsigned int>::const_iterator sweepCountInUpperBound = 
lower_bound(m_SortedUpperBounds.begin(), m_SortedUpperBounds.end(), val);

return (distance(m_SortedLowerBounds.begin(), sweepCountInLowerBound) - 
distance(m_SortedUpperBounds.begin(), sweepCountInUpperBound));
}

unsigned int SweepLineAlg::DetectCounter(unsigned int val,
const vector<unsigned int> &lowerBounds,
const vector<unsigned int> &upperBounds)
{
return SweepLineFunctor(lowerBounds, upperBounds)(val);
}

void SweepLineAlg::DetectCounters(const vector<unsigned int> &vals,
 vector<unsigned int> &results,
 const vector<unsigned int> &lowerBounds,
 const vector<unsigned int> &upperBounds)
{
// shrink results to the needed
vector<unsigned int>().swap(results);
results.resize(vals.size());

// iterator each value and save the result
transform(vals.begin(),
                   vals.end(),
                   results.begin(),
                   SweepLineFunctor(lowerBounds, upperBounds));
}
} // Intervals

// Multiple-thread solution
//********************************************************************************
using namespace Intervals;
using namespace Utility;

struct IntervalLock{
IntervalCounter& ic;
queue<vector<unsigned int> > numbersFromFile;
string numbersFileName;
    bool eof;
    HANDLE mutex;
    HANDLE controlsemaphore;
size_t countOfConsumerThreads;

IntervalLock(IntervalCounter &intervalCounter,
    size_t consumerThreads, const string &numbersSrcFileName)
: ic(intervalCounter), eof(false),
        countOfConsumerThreads(consumerThreads),
        numbersFileName(numbersSrcFileName)
{
mutex = CreateMutex(NULL, false, NULL);
controlsemaphore = CreateSemaphore(NULL, 0, countOfConsumerThreads,NULL);
}

~IntervalLock()
{
CloseHandle(mutex);
CloseHandle(controlsemaphore);
}
};

DWORD WINAPI threadReadNumbersFromFile(void* lp){
    IntervalLock *il = (IntervalLock*)lp;

try{
BigTextInputFile numFile(il->numbersFileName);

#ifdef _DEBUG
BigTextOutputFile numResult("expected.txt");
#endif

const size_t toReadLines = 8096;
vector<unsigned int> values;
string text;
while (numFile.ReadLines(toReadLines, text) > 0){
DataParser::ParseValueStringToNumber(text, values);

WaitForSingleObject(il->mutex, INFINITE); //Mutex Lock
il->numbersFromFile.push(values);
ReleaseMutex(il->mutex); //Mutex Unlock

//ReleaseSemaphore(il->controlsemaphore,1, NULL);

// clear the data
values.clear();
}
}
catch (std::exception &e) {
il->eof=true;
// ReleaseSemaphore(il->controlsemaphore, il->countOfConsumerThreads, NULL);
        // Release all comsumer threads
cout << e.what() << endl;
}

il->eof=true;
// ReleaseSemaphore(il->controlsemaphore, il->countOfConsumerThreads, NULL);
    //Release all comsumer threads
    return 0;
}

DWORD WINAPI threadDetector(void* lp){
     IntervalLock *il = (IntervalLock*)lp;
#ifdef _DEBUG
    BigTextOutputFile numResult("expected.txt");
#endif
     while(!il->eof || il->numbersFromFile.size()>0){
         //WaitForSingleObject(il->controlsemaphore,INFINITE); //Wait for producer;

       vector<unsigned int> numbers;

       WaitForSingleObject(il->mutex, INFINITE); //Mutext Lock
       if (il->numbersFromFile.size() > 0) {
             // have a local copy
          numbers = vector<unsigned int>(il->numbersFromFile.front());
          il->numbersFromFile.pop();
       }
       ReleaseMutex(il->mutex); //Mutex Unlock

       if (!numbers.empty()) {
          vector<unsigned int> results;
          il->ic.GetCounters(numbers, results);
          copy(results.begin(),
         results.end(),
         ostream_iterator<unsigned int>(cout,"\n"));
#ifdef _DEBUG
numResult.WriteResults(results);
#endif
         }        
     }
     return 0;
}

#ifdef MULTIPLE_THREAD_SOLUTION
int _tmain(int argc, _TCHAR* argv[])
{
#ifdef PROFILING_STATS
clock_t pre_clock = clock();
clock_t cur_clock;
float timeToProcessInterval;
#endif
try {
// load the intervals
IntervalCounter &ic = IntervalCounterImp(SWEEPLINE);
ic.SetIntervals("extents.txt");

#ifdef PROFILING_STATS
cur_clock = clock();
timeToProcessInterval = (float) (cur_clock - pre_clock);
pre_clock = cur_clock;
#endif

IntervalLock il(ic, 1, "numbers.txt");

HANDLE handles[2];
handles[0] = CreateThread(0,0,&threadReadNumbersFromFile, (void*)&il , 0,0);
handles[1] = CreateThread(0,0,&threadDetector, (void*)&il, 0,0);

WaitForMultipleObjects(2, handles, true, INFINITE); //"Join" trreads
}
catch (std::exception &e){
cout << e.what();
}

#ifdef PROFILING_STATS
cur_clock = clock();
cout << "Time consumed to process intervals"
           << timeToProcessInterval/CLOCKS_PER_SEC
           << endl;
cout << "Mutlithread time consumed to process the numbers:"
           << ((float)(cur_clock -    pre_clock))/CLOCKS_PER_SEC
           << endl;
#endif

return 0;
}

#else
// Single thread solution
//********************************************************************************
int _tmain(int argc, _TCHAR* argv[])
{
#if PROFILING_STATS
clock_t pre_clock = clock();
clock_t cur_clock;
float timeToProcessInterval;
#endif

try {
IntervalCounter &ic = IntervalCounterImp(SWEEPLINE);
ic.SetIntervals("extents.txt");

#ifdef PROFILING_STATS
cur_clock = clock();
timeToProcessInterval = (float) (cur_clock - pre_clock);
pre_clock = cur_clock;
#endif

BigTextInputFile numFile("numbers.txt");

#ifdef _DEBUG
BigTextOutputFile numResult("expected.txt");
#endif

const size_t toReadLines = 4096;
vector<unsigned int> values, results;
string text;
while (numFile.ReadLines(toReadLines, text) > 0) {
DataParser::ParseValueStringToNumber(text, values);
ic.GetCounters(values, results);
copy(results.begin(),
        results.end(),
        ostream_iterator<unsigned int>(cout,"\n"));
#ifdef _DEBUG
numResult.WriteResults(results);
#endif
assert(values.size() == results.size());
values.clear();
results.clear();
}
}
catch (std::exception &e) {
cout << e.what() << endl;
}

#ifdef PROFILING_STATS
cur_clock = clock();
cout << "Time consumed to process intervals"
           << timeToProcessInterval/CLOCKS_PER_SEC
           << endl;
cout << "Single thread time consumed to process the numbers:"
           << ((float)(cur_clock - pre_clock))/CLOCKS_PER_SEC
           << endl;
#endif

return 0;
}
#endif

Tuesday 22 April 2014

Scientific Programming - Accuracy, speed and extensibility

In scientific computing, we are often working with problems/systems and solvers/algorithms. Keep 3 things in mind, accuracy, speed, and extensibility before turning them into code.

1. Accuracy
Accuracy is the most basic requirement. If a simple arithmetic operation give a wrong answer, it will certainly damage the perception of the accountability of this software. Of course accuracy more likely defined as the nature of the algorithm. The accuracy of the algorithm is normally predictable by the notation of O(deltaX^n). For instance using Talyor Series to discretize a continuous function. Based on the polynomial to simulate it (a*x^n + ...), the error will be ~O(delatX^(N+1)).
Another common methods to evaluate the accuracy is the peer comparison. Compare it with competing algorithms or well-known problems/benchmark. This often can be done when mathematicians are developing the algorithm. Those peer comparison  examples can be found on the academic publication, conference meeting or industrial known challenges.
Integration test is the another way to evaluate the accuracy of the newly developed algorithms by simple compare the result with the other peer in-house algorithms with the assumption that the software team have already accumulated considerable integration tests.

The accuracy often comes as the nature of the algorithm. But it can also be affected by the programming/computers, as the number that the computer can represent is not continuous and the computer has its resolution as well. Please read more about numerical errors in scientific computing in my previous blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/scientific-computing-numerical-errors.html.

2. Speed
There are two aspects for this item. One is from the algorithm. In this territory it has a different name - convergence. This is normally defined when the algorithm is developed and accompanied together with peer comparison. For instance Linear search take O(n) and binary search takes O(log2n). This is the nature of the algorithms or solvers. If looking to improve the speed of software, this should be the first thing you should look at.

The second aspect is coming from programming. It includes the language you are using (Fortran, C. C++, Matlab) and the techniques using in the program (static polymorphism vs. dynamic polymorphism in C++)
The choice of language
This has been C/C++ playground for a long time in commercial business due to the performance and flexibility of this language. In academic world Fortran has been there for quite a long time. Of course I can't miss Matlab for quick idea modelling and experimenting. Surely for some simple application Microsoft Excel has been used for algorithm modelling as well. So far I have encounter a few combinations that are quick popular.
- Fortran and C/C++
- Matlab and C/C++
Most of time Fortran code still exists as the legacy code. And C/C++ code/module is implemented or wrapped on Fortran code to provide commercial deliverable package. Matlab is mostly existing for quick modeling only. Of course some C/C++ packages are developed as plug-ins for Matlab as well. I have seen these 3 languages working together very nicely. Comparing with the speed, Fotran code is at the top. In quite a few occasions I compared with the performance of C++ with Fortran. Fortran is ~25% faster than C++. (My guess: one of reasons causing the performance penalty in C++ is coming from the implementation of STL - memory reallocation) And some of my colleagues had experience on C, which has under but very close performance with Fortran.
Nowadays functional languages have been becoming more and more popular, like Microsoft F# and Scala. The key for their popularity is to bridge the workflow between the mathematician and developers. Often using the combinations of languages as shown above needs close work relationship between mathematicians and developers. Sometimes this could be very expensive and time-consuming, because the misunderstanding could happen between their communication and bugs could buried in the implementation as C/C++ is regarded as one of most difficult programming languages. These functional languages are developed purely for the purpose of work productivity and software quality. It simply makes programming easier for mathematicians to simplify the workflow and at the same time use cheap but more powerful hardware to compensate the speed.

Techniques used in C++
- Static polymorphism vs. dynamic polymorphism. See my other blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/design-pattern-curiously-recurring.html.
- Inline functions
- General practice of more efficient C++. Except use better programming techniques, such as post-increment vs. pre-increment, how to use memory is also very key for performance of C/C++ programming. See my previous blog entries, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/performance-running-out-of-memory-heap.html and http://cpluspluslearning-petert.blogspot.co.uk/2014/03/performance-running-out-of-stack.html.

3. Extensibility
It is often referring that the design has to be flexible enough to take extra algorithm to solve the existing problems/systems and new problems/systems can be constructed to be solved by existing algorithm. From this prospect we are talking about this issues from the programming point view. The software design/architecture has to be flexible to be expanded. In other words the architecture has to decouple between the system construction and algorithm implementation.

Fortunately there have been quite a few design patterns that are designed for this purpose. For instance strategy pattern and template method pattern. See my blog entry about strategy pattern, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/design-patterns-for-scientific.html.

4. Implementation impact on accuracy and speed
Often when we are working on the commercial environment, the academic/programming techniques are developed/adopted to pursue higher accuracy and faster speed in order to gain competing advantages. Very often some techniques or even algorithms are supposed to achieve higher accuracy or/and faster speed in theory but in fact they fail to do so. I call it the impact of implementation.
The very first impact is the overhead of the techniques or algorithms. For example a common technique is to reduce the size of system or decompose a large system to a few smaller systems to achieve faster speed (at least for speed, could achieve higher accuracy or numerical robustness as well). Most often these techniques are not free to run at all. For some particular system the overhead could be very large. In my previous project I have seen some techniques can work really badly, a huge overhead, on some edging cases on optimization problem.
The second impact is coming from the problem/system itself. Even thought in some cases the overhead is neglect-able, the actual solving process does not speed up or even become worse than the original problem. In my previous projects I have seen much worse performance on problems with large discontinuity and sensitive cases. For instance the same problem but with varying constraints could result in tow polar performance on accuracy or/and speed.

The best practice to collect a reasonable large collection of integration tests for verifying accuracy and speed. Re-run these tests before releasing (after the feature is completed by reasonably ahead before releasing). Make sure each feature will delivery what it promises and not degrade the performance of the software. But be clear there is no techniques/algorithms have positive impact on all cases. Be realistic and practical about your target when deciding if a feature has positive impact on the performance. Have a representative collections of integration tests from your business unit and be clear of the main characteristic of this technique/algorithm, robustness, speed, accuracy and so on. If possible, work out the heuristic clause to decide if the feature is to run or not. However this has been proven very difficult in my previous projects.

5. Summary
Excellent scientific programming will needs,
- Competent mathematicians and developers
- Seamless communication between them
- Suitable choices of programming languages
- A representative collection of integration tests for accuracy and speed.

Monday 21 April 2014

Design pattern - Factory method pattern

1. Introduction
Factory method pattern is a reasonably simple pattern. Basically a base class is to provide public static function to return object pointer/smart pointer of its derived/concrete class or itself.

Example 1:
*********************************************************************************
class Bird{
public:
    static Bird* CreateBird(/*arugments*/);
    /*
     * public interface
     */
    virtual void Fly();
    virtual void Eat();
};

class Swift : public Bird{
};

class Eagle : public Bird{
};
*********************************************************************************

More details please read the chapter about factory method pattern in [1]. (Explanation of this code snippet) Here I would like to discuss two scenarios in my previous projects this design pattern applies.

2. Use Factory method pattern in interface class for better code de-coupling
I have discussed what interface class is in my previous blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/pure-virtual-functions.html. It simply consists of pure virtual functions only and can't not be instantiated directly, and only its concrete class can be created. For better code de-coupling, simply it means that the code can be better categorized into modules and the code dependency can be reduced/minimized. In this case a factory method can be provided to instantiate the concrete objects. which will be up-casted into the interface class pointer. And the only behavior available to them is the public function declared in the interface and their detailed behavior however is defined by the overridden virtual functions in the derived classes.

Example 2:
*********************************************************************************
class Bird{
public:
    static Bird* CreateBird(/*arugments*/);
    /*
     * public interface
     */
    virtual void Fly() = 0;
    virtual void Eat() = 0;
};

class Swift : public Bird{
};

class Eagle : public Bird{
};
*********************************************************************************

Example 2 illustrates the implementation. A public static creation method is added into the interface class. "arguments" are passed into CreateBird() to create the concrete birds, like Swift and Eagle. And their individual behavior has to be re-implemented by overriding the pure virtual function.

Example 3:
*********************************************************************************
class Bird{
public:
    /*
     * public interface
     */
    virtual void Fly() = 0;
    virtual void Eat() = 0;
};

class Swift : public Bird{
};

class Eagle : public Bird{
};

Bird* CreateBird(/*arugments*/);
*********************************************************************************

Another implementation (Example 3) is to declare CreateBird() as a standalone function, which is to return a pointer or a smart pointer for Bird.

This design will provide excellent code-decoupling (simply Bird and its derived classes can be arranged into a standalone module). All the clients (who use Bird) do not need to access the implementation of Bird and any of its derived classes. The only thing needed to expose is the interface class (Bird) and the "arguments" passed into the creation method (CreateBird()). For instance in one of my previous hardware projects the arguments are read out from the EEPROM of device (it includes the device series no, model, features to support), pass these arguments to the creation method and it will create a unique model of device. But sometimes not all the initialization of  the derived classes share the common arguments. So a mutant version of factory method pattern emerges.

Example 4: - individual creation methods for concrete classes
*********************************************************************************
class Bird{
public:
    /*
     * public interface
     */
    virtual void Fly() = 0;
    virtual void Eat() = 0;
};

class Swift : public Bird{
};

class Eagle : public Bird{
};

Bird* CreateSwift(/*2 arugments*/);
Bird* CreateEagle(/*8 arugments*/);
*********************************************************************************

If not a common method can be found to create all the concrete object, each individual creation method can be added and return a pointer/smart pointer to the base class (Bird).

3. Avoid to create "class conceptually exists and does not in reality"
I have discussed CCEADNIR ("class conceptually exists and does not in reality") in my previous blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/pure-virtual-functions.html. The main thing to achieve is to prevent CCEADNIR to be created. There are a few techniques to achieve.

Use pure virtual function (abstract class) + Factory method  pattern
If there exists pure virtual function in this class based on the abstraction/encapsulation, then it would be the best. Directly use the technique (add a public static creation method or add a standalone creation method) as described in Section 1.
If all the functions so far should have their default behavior, then the best candidate is the destructor. It will not bring any performance penalty and simply enforce that the base class cannot be instantiate directly. Again apply the technique, adding a creation method, as described in Section 1.

Use protected constructor of base class + Factory method pattern
Here the constructors of the base class will be declared as protected. Then only its derived classes can access it (with the assumption that no friend classes and functions).

Example 5:
*********************************************************************************
class Bird{
public:
    /*
     * public interface
     */
    virtual void Fly();
    virtual void Eat();
protected:
    // constructor not public
    Bird();
    Bird(const Bird&);
    Bird& operator=(const Bird&)
};

class Swift : public Bird{
};

class Eagle : public Bird{
};

 Bird* CreateBird(/*arugments*/);
*********************************************************************************

Add a public static creation function to instantiate the objects of the concrete class and do not provide choices (in the arguments passed into the CreateBird()) to create the base class (Bird). Or add a standalone function and do not declare it as a friend of the base class (Bird).

Both the techniques provide good solution and as well result in better code-decoupling.

Summary
Factory method pattern is one of most used design patterns in all my previous projects. All in all I think the best feature of this pattern is better code-decoupling.

Bibliography:
[1] GOF, Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, "Design Patterns - Elements of Reusable Object-Oriented Software", 1994