C++: March 2014

Thursday 20 March 2014

C++11 features - Improvement on object construction

1. Default value for member variables in class
This is really excellent improvement. Image in C++03 all the member variables with default values have to go to the initilization list. This is really a troublesome issue. A lot of code duplication and bad maintainability.

Problem in C++03:
class Foo {
Foo() : x(1), y(1.0), z("Foo") {}
Foo(int i) : x(i), y(1.0), z("Foo") {}
private:
int x;
double y;
std::string z;
}.;

For all the constructors the member variables with default values has to go to the initialization list. Otherwise temporary objects will be created/deleted if any of them goes into constructor body or any member funciton. (More details please read my previous post: http://cpluspluslearning-petert.blogspot.co.uk/2014/03/the-order-of-object-initializationdestr.html)
Think about that a class has a few constructors, then all the default value initialization has to be copied into the initialization list for all constructor. Keeping all the default value the same will be hell-hot of job. By the way a lot of typing and at the same time have to keep them in the order in which they are declared in the class, if the initialization has dependency.
Maintainability is another issue. How to make sure all have the same default value when new developer coming and going, adding and removing constructor/code.

Duplication is evil.

Improvement in C++11:
class Foo {
Foo() {}
Foo(int i) : x(i){}
private:
int x = 1;
double y = 1.0
std::string z = std::string("Foo");
}.;

Clean and tidy. If nothing appeared/initialized in the initialization list or in the construction body, the member variables, x, y and z will be initialized as the default value when they are declared. For instance Foo() will take their default values because nothing appears in its initialization list and the constructor body is empty. Foo(int) will customize member variable x, but y and z will their default values.
Much easier to maintain. All the default value is defined when member variables are declared. They just have one place to go - where they are declared.

2. Delegate constructor and target constructor
Delegate constructor is referring to a constructor that delegates its object construction job into another constructor. Target constructor is referring to a constructor that is used by a delegate constructor to construct the object. A constructor can be delegate constructor and target constructor at the same time. But in one class there is at least one constructor is pure target constructor, otherwise the loop of delegation appears, which is the pitfall we should avoid. I will discuss this later.

Let's first illustrate the problem happening in C++03 and then we present the improvement/solutions coming from C++11

Problems in C++03:
The common code shared between constructors has to be duplicated either in all constructors or being lifted into a private function and called by all constructors. In such a workaround a few problems arise:
- If the common code coming from the initialization list, then the work around to lift them into a private function will create/delete temporary objects. Potentially performance damage.
- There is no limit to prevent the private function to accommodate the shared code from being called by other function.
- If the common code is duplicated in all constructors, then the maintainability will be a headache.

class Foo {
Foo() : x(1.0), y(1.0), z(x + y) {}
Foo(double xx) : x(xx), y(1.0), z(x+y) {}
Foo(double xx, double yy) : x(xx), y(yy), z(x+y) {}
private:
double x;
double y;
double z;
};
Workaround:
class Foo {
Foo() {Init(1.0, 1.0);}
Foo(double xx) {Init(xx, 1.0)}
Foo(double xx, double yy) {Init(xx, yy);}
private:
void Init(double xx, double yy) {
x = xx;
y = yy;
z = x + y;
}
double x;
double y;
double z;
};

In this workaround before hitting into Init(), x, y and z have been initialized to its default value. And they will be re-initialized again in Init(). This will not be cheap if member variables are big objects.

Improvement on C++11:
Rather than lift the common code into a private function, C++11 allows constructor to call another constructor in the initialization list, which is called delegation.

class Foo {
Foo() : Foo(1.0) {}
Foo(double xx) : Foo(xx, 1.0) {}
Foo(double xx, double yy) : x(xx), y(yy), z(x+y) {}
Foo(Foo& f) : Foo(f.x, f.y) {}
private:
double x;
double y;
double z;
};

Foo() will delegate its construction work into Foo(double), which is to delegate its work to Foo(double, double). Here Foo() is a delegate constructor, Foo(double) is a delegate constructor and a target constructor and Foo(double, double) is a pure target constructor. And the copy constructor Foo(Foo&) is a delegate constructor as well. In this solution there is no private "Init()" function then hence no temporary objects, and there is no code duplication either.

Duplication is evil.

Keep in mind a few things from C++11
- The target constructor's selection is based on the function overloading resolution and template argument deduction.
- An object life starts when coming out of the very first constructor. Any other delegate constructor can be taken as a private function working on a created object.
- Target constructor finishes before the delegate constructor, because the target constructor are called from initialization list. In the above example Foo(), the call graph:
-> Foo()
-> Foo(double)
-> Foo(double, double) // the object life starts after coming out of this constructor
class Foo {
Foo() : Foo(1.0) {std::cout << "Foo()" << std::endl;}
Foo(double xx) : Foo(xx, 1.0) {std::cout << "Foo(double)" << std::endl;}
Foo(double xx, double yy) : x(xx), y(yy), z(x+y) {std::cout << "Foo(double, double)" << std::endl;}
Foo(Foo& f) : Foo(f.x, f.y) {}
private:
double x, y, z
};
If we do:
void main(int argc, char* argv[])
{
{Foo f;}
}
Output:
Foo(double, double)
Foo(double)
Foo()

The object life starts immediately coming out of Foo(double, double). (When the target constructor finishes before its delegating constructor)
- Any exception thrown in the target constructor can be captured by its delegating constructor.
Foo(double i, double j) {throw ExceptionFooDD;)
Foo(double i) : try : Foo(i, 0.0){}
catch(...) (throw ExceptionFooD;)
Foo() : try :Foo(0.0){}
catch (...) {throw ExceptionFoo;}
The stack unwinding can safely taken place upwards from target constructor up to delegate constructors.
(I am going to explain more on this "try block" in the next section)
- No other member variables can appeare in the initialization list if it is defined as a delegate constructor.
Foo() : F(1.0), y(1.0) {} // will generate compilation error, as y(1.0) can not appear in // initialization list any more
- No recursive loop in delegate constructors
Foo() : F(double) {}
Foo(double) : Foo(double, double) {}
Foo(double, double) : Foo() {}
In this case the recursive exists. The behavior is undefined and no diagnostics required for compilers as well according to C++11 standard. So it down to the programmer preventing this.
Tips: in one class there is at least one pure target constructor, which is not to delegate any work to any other constructor. Otherwise this could be good indication that there is a recursive loop.

3. Construction function try block
C++11 allows try block appearing in the initialization list to catch any exception from the initialization list of the member variables or target constructor. And the exception can be captured to take defense programming to make sure no resource leaked.

Problem in C++03:
class Bar{
Bar(std::string fileName) {
// open the file and allocate memory
{
throw BarCreationExcpeiton;// if something wrong with opening file or allocate memory
}
}
~Bar() {//release the resource}
private:
File* m_Handler;
int* m_Buffer;
};

class A {
A(int i) {throw ACreationExcption;}
};

class Foo {
Foo() : x(1.0) {}
Private:
Bar m_Bar;
A m_A;
int x;
};

In C++03, the exception in A() constructor can not be captured in Foo() constructor, because it is thrown before it actually hitting into Foo's constructor body. This will cause any resource allocated in m_Bar will be lost out of control.

Improvement on C+11:
class Foo {
Foo() : try : m_Bar("myFile"), m_A(0){}
catch (...) {// house-keeping} {}
Private:
Bar m_Bar;
A m_A;
int x;
};
Try block can come together with member variables in the initialization list. Catch any exception to get a chance to clean up.

4. Inherit default constructors
In C++11 it is not necessary to type all the constructors declaration and definition if the sub-class have the exactly same as its base class

Problem in C++03:
class Base {
public:
Base(int, int);
Base(int, std::string);
};

class Derived : public Base{
public:
Derived(int i, int j) : Base(i, j){}
Derived(int i, std::string s) : Base(i, s) {}
};
In C++03 even though Derived's constructors does no more than Base's constructors, the standard still requires to add that two line code for Derived's constructor. Otherwise C++03 standard will generate a default constructor Derived() for Derived class and complains about Derived() can not find its base-counterpart from Base class.

Improvement in C++11:
class Derived : public Base{
public:
using Base::Base;
};

The line "using" will bring Base's constructors into scope and no more typing needed for Derived and at the same time will prevent compilers from generating default constructor for it. Keep in mind that this is a none-or-all feature.
(
Does this "using" remind you of something? The important usage of using in C++
- using namespace std:
- using Base::foo() // to bring the hidden name/function "foo" back to the scope of base's sub-class
}

Summary
Overall excellent to see all the features in C+11. You will not feel too overwhelmed if you are coming from Java world. But good to see these features introduced finally in C++11, which is to put C++ back into head-to-head race against other modern language like Java and Python.

Bibliography:
[1] http://www.stroustrup.com/C++11FAQ.html#delegating-ctor
[2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1986
[3] http://en.wikipedia.org/wiki/C++11#Explicit_conversion_operators

Wednesday 19 March 2014

Scientific computing - numerical errors

In scientific computing most of time we are play with "double" to keep the accuracy. And nowadays 64-bit machine is the mainstream. So I am going to limit my dicussion within 64-bit operating system and focus on "double" data type in C++.

1. Overflow caused by arithmetic operations on signed/unsigned integer
This is the place where most of overflow happens when do arithmetic operation on integers. In both LLP64 and LP64 [1], they have 32-bits in memory. For unsigned integer [0, 2^32 -1], roughly 0~4 billion, and [-2^31, 2^31-1], roughly -2 billion~2 billion.

Nowadays billions of data are becoming quite common. I have encountered data with size of billions in quite a few occasions. And in modern computer (PC) with 8 core and at least and 16+G memory, billions of data are no long a big problem. With multi-threading/processes application billions data can be processed within seconds.

In such application overflow can happen if you use most of the data type in your program as "int" and do arithmetic operations on them. (It does provide the temptation because by default C++ will take data as "int" and still it is one of the data type that the CPU operates most efficient on.)

The overflow could happen in such cases:
- subtraction between two unsigned/singed integers or between one signed and one unsigned
- Sum up a series of integers
- Big number number multiplication.

Problems 1:
std::vector<int> myData;
GetData(myData); // feed the data
int sum = 0;
for (int i = 0; i != myData.size(); ++i) {
sum += myData[i];
}
*) sum could be easily overflow given the value and size of myData
*) i could overflow given the size of myData, like billions
*) Most compilers will generate the warning for "i != myData.size()" to complain about the comparison between signed and unsigned integer.

Tips:
*) Promote the data type for the result of arithmetic operation. For instance to declare sum as "long long"/std::size_t.
*) Use std::size_t instead of unsigned long/unsigned long long for compatibility issue across platforms (LP64 and LLP64)
*) Use std::pdiff_t instead of long/long long for compatibility issue across platforms (LP64 and LLP64)
*) std::size_t and std::pdiff_t are defined by C++ standard and in 64 bit system they are defined as 64-bit unsigned/signed integer respectively.
The code sould be look like:
std::vector<int> myData;
GetData(myData); // feed the data
std::pdiff_t sum = 0;
for (std::size_t i = 0; i != myData.size(); ++i) {
sum += myData[i];
}

Problem 2:
If you have to do arithmetic operation between singed/unsigned integer, the best option is to promote them into std::size_t and std::pdiff_t. Considering with the range they can represent, it is safe to prevent overflow by using them in most applications.
But do keep the data type as small as possible. If the data is only 12 bits, then short should be enough. This will save you tons of memory if the this is a big data application. Just keep in mind the possible data size of the intermediate/final result during the arithmetic operations.

2. double - Machine epsilon
Under IEE754 double in C++ has 64 bits in memory. Top bit for sign, the following 11 bits for exponent and the rest 52 bits is for fraction. So the resolution for double is ~10^-16 (2^-53 or 2^-52).

Bits of double [2]

In c++ you can get the machine epsilion by
double GetMachineEpsilon()
{
double epsilon = 1.0d;
while ((1.0+epsilon) != 1.0) {
epsilon/=2;
}
return epsilon;
}

The machine epsilon makes a lot of senses for some of the applications you do.
Problem: Optimization
If you are doing an optimization process. Most commonly a cost function, the variable constraints and algorithm specifications should be defined. One of most important specifications is the tolerance (termination condition and so on), which include the absolute tolerance, relative tolerance, Taylor error tolerance and so on.

Among them absolute tolerance and relative tolerance are used as termination conditions. Because of absolute tolerance is totally depends on the problem (scale), the relative tolerance can be very importance to determine if the local/global optimum is reached. Based on the numerical difference between the current and the previous iteration, it is really makes no sense to make the relative tolerance less than machine epsilon. Otherwise the termination condition could never been met and the iteration could never stop.

Problem: Simulation
If you see the difference between the current and the previous iteration becoming less and less and converging to ~10^12 (which most like taken as zero in numerical calculation, close enough to machine epsilon), this may be signalling that you can stop the process of simulation. Because this process may not be able to make any progress towards the direction of your expectation.

Tips: be aware of machine epsilon when you setting tolerance in your application

3. double - floating point underflow
This is might be the most dangerous numerical error that could impact the numerical accuracy significantly. As it is shown on Section 2 how to get machine epsilon, assume that we have a large list of very small numbers which are less than the machine epsilon. And we do addition operation of "1.0" to this full list. The final result will be "1" rather than the value it shoudl be. All the really small value in the whole list will be lost.

Here is this piece of code:
std::vector<double> val(10^9, 10^-17);
double sum = 1.0;
for (size_t i = 0; i != val.size(); ++i) {
sum += val[i];
}

We can calculate that sum of val should be 10^9*10^-17 = 10^-8. So theoretically the sum should be equal to 1.00000008. But in practice sum will be 1.0. No matter how many such numbers are added into 1.0, the result will never change and always be 1.0.

Why can this happen?
This is due to how double is represented in memory in C++. Any double value in C++ is normalized, which means that the fraction section (bottom 52 bits) is to represent a factional number between 0~1. More preciously shoue be [0.5, 1). And the exponent seciton is to present the power of 2.
For instance 0.25 will be presented as 0.5*2^-1, so the fraction will be 0.5 and the exponent will be -1. 1.0 will presented as: all fraction section set as 1 and the exponent section will be 0.

When doing the add operation on two double numbers, the small number will shift the fraction section in the right direction in order to increase the exponent section until it is equal to the exponent section of the bigger number. Then add two fraction section together. Finally this number will be normalized to make sure fraction section is bounded within [0.5, 1.0) by shifting in the right direction to increase the exponent section[3].
For instance (a=0.75) + (b=0.5*2^-53).
0.75: fraction seciton- 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
exponent seciton- 0
0.5*2^-17: fraction section - 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
exponent section- -53
When do addition operation:
*) a > b
*) b's fraction section shift in the right direction to increase b's exponent section
- b's fraction section has to shift right for 53 (0 - (-53)) bit in order to b's exponent section = a's exponent section
- b's fraction section will become "0" after shift right 53 bits.
*) Adding a's and b's fraction sections together will not affect anything, and the sum is simply equal to a's fraction section because b's fraction section is equal to 0 now after shifting.

Hence underflow happens on this addition.

Fix: sum the small values first and then add the sum with the big value
As we can see above the real problem happening here is that the shifting operation to make sure the big number and the small number to have the same exponent section causes the smaller number lost its significance.

std::vector<double> val(10^9, 10^-17);
double sum = 0.0;
for (size_t i = 0; i != val.size(); ++i) {
sum += val[i];
}
sum +=1.0;
This will give the correct answer - 1.00000008

4. double - floating point overflow
The real danger here is not on the addition operation but the multiplication operation. The multiplication operation is simpler than addition operation.
*) Add the exponent sections of both numbers
*) Multiply the fraction sections
*) Normalize the number to make sure fraction seciton is bounded wiht [0.5, 1.0), by shifting fraction section and decreasing the exponent section.

So far I have no encounter with any overflow by these two operation yet in my daily work.

One more danger is the division, when big number is divided by a really small number, for say close to zero. In C++ standard it will generate std::NaN/std::Inifinite if divided by zero.
Tips for overflow:
Model the problem with a scaler if all the number are really huge. Define the bound for all variables. If the value of variable is really small, consider add/perturb machine epsilon to it in order to make sure divided-by-zero not happening.

5. Consider using data encoding techniques
If the data is really huge number and the size is quite big, data encoding techniques should be considered. It will not only save the space but also boost the performance.
For instance using simple (average value + difference) to represent very compact data. This could reduce the bits of data type from representing the whole range of data to representing the difference only. The data type could move from "int" to "short". Then half the memory is save.
Besides doing arithmetic operations on small numbers is cheap on bigger numbers. For instance working out 3+5 is cheaper than working out 231338393+9887867.

Bibliography:
[1] http://en.wikipedia.org/wiki/64-bit_computing
[2] http://en.wikipedia.org/wiki/Double_precision
[3] Donald E. Knuth "The Art of Computer Programming", Volume 2, Fundamental Algorithm, Third Edition

Tuesday 18 March 2014

The order of object initialization/destruction

1. Objections initialization of class/struct and its member
The order of initialization : the member variables are initialized before this class's constructor is called.
- Allocate the memory for this object
- The initialization order of member variables are the same as the order in which they are declared in the class.
- Call the constructor and the life of this object begins

The order of destruction : the reverse order of initialization
- The destructor of the class is called first
- The destructor of it memebers are called in the reverse order in which they are declared int the class.
- de-allocate the memory

struct A{
A() {std::cout << "A()" << std::endl;}
~A() {std::cout << "~A()" << std::endl;}
};

struct B{
B() {std::cout << "B()" << std::endl;}
~B() {std::cout << "~B()" << std::endl;}
};

struct C{
C() {std::cout << "C()" << std::endl;}
~C() {std::cout << "~C()" << std::endl;}
};
struct Base{
Base() {std::cout << "Base()" << std::endl;}
~Base() {std::cout << "~Base()" << std::endl;}

A a;
B b;
C c;
};

If we do
int main(int argc, char* argv[])
{
{
Base b;
}
return 0;
}
Output:
A()
B()
C()
Base()
~Base()
~C()
~B()
~A()
The order is quite clear
- the member variables firstly initialized in the order they are declared
- Then class itself
- Destroy in the reverse order of initialization

Memory map of Base
---------------------------------------------------------------------------
|
| virtual table pointer: if Base has virtual function
|
---------------------------------------------------------------------------
|
|
|
| Memory for A, call A's constructor to initialize this memory
|
|
|
|
---------------------------------------------------------------------------
|
|
|
| Memory for B, call B's constructor to initialize this memory
|
|
|
|
---------------------------------------------------------------------------
|
|
| Memory for C, call C's constructor to initialize this memory
|
|
|
---------------------------------------------------------------------------

2. The initialization/destruction order of inherited classes
The initialization order of the inherited class
- Allocate the memory for itself
- Initialize the base class member variables
- Initialize the base class
[
// if multi-inheritance
- Initialize the 2nd base class member variables
- Initialize the 2nd base class
........
]
- Initialize the sub-class's member variables
- Initialize the sub-class itself

The destruction order of inherited class: the reverse order of the initialization
struct D{
D() {std::cout << "D()" << std::endl;}
~D() {std::cout << "~D()" << std::endl;}
};

struct E{
E() {std::cout << "E()" << std::endl;}
~E() {std::cout << "~E()" << std::endl;}
};
struct Foo : public Base{
Foo() {std::cout << "Foo()" << std::endl;}
~Foo() {std::cout << "~Foo()" << std::endl;}

D d;
E e;
};

If we do
int main(int argc, char* argv[])
{
{
Foo f;
}
return 0;
}
Output:
A()
B()
C()
Base()
D()
E()
Foo()
~Foo()
~E()
~D()
~Base()
~C()
~B()
~A()

Memory map of Foo
---------------------------------------------------------------------------
|
| virtual table pointer: if Foo has virtual functions
|
---------------------------------------------------------------------------
|--------------------------------------------------------------------------
| | Memory for Base
| |-----------------------------------------------------------------------
| |
| |
| | Memory for A, call A's constructor to initialize this memory
| |
| |
| |
| |------------------------------------------------------------------------
| |
| |
| |
| | Memory for B, call B's constructor to initialize this memory
| |
| |
| |
| |
| |-------------------------------------------------------------------------
| |
| |
| | Memory for C, call C's constructor to initialize this memory
| |
| |
| |
|--------------------------------------------------------------------------
| |
| | Memory of other base classes if Foo inherits from multi-classes
| |
---------------------------------------------------------------------------
|
|
|
| Memory for D, call D's constructor to initialize this memory
|
|
---------------------------------------------------------------------------
|
|
|
| Memory for E, call E's constructor to initialize this memory
|
|
---------------------------------------------------------------------------

3. The initialization/destruction order in array
The initialization order is to create objects from low index to high index. The destruction is the reverse order of the initialization order

struct F{
F():m_Count(s_Count) {
std::cout << "F(" << m_Count << ")" << std::endl;
++s_Count;
}

F(const F& f) : m_Count(s_Count) {
std::cout << "copy to F(" << m_Count << ")" << std::endl;
++s_Count;
}

~F() {std::cout << "~F(" << m_Count << ")" << std::endl;}

static int s_Count;
int m_Count;
};

int F::s_Count = 0;

If we do
int main(int argc, char* argv[])
{
{
F fArr[5];
}
return 0;
}
Output:
F(0)
F(1)
F(2)
F(3)
F(4)
~F(4)
~F(3)
~F(2)
~F(1)
~F(0)

4. The initialization/destruction order in std::vector
The objects are initialized from low index to high index order. And the destruction order is the same order as the initialization

If we do,
int main(int argc, char* argv[])
{
{
std::vector<F> fVec(5, F());
}
return 0;
}
Output:
F(0)
copy to F(1)
copy to F(2)
copy to F(3)
copy to F(4)
~F(0)
~F(1)
~F(2)
~F(3)
~F(4)
Internally the copy constructor (the first object as the argument) is called to create new objects for the rest of the vector.

If we do,
int main(int argc, char* argv[])
{
{
std::vector<F> fVec(5);
}
return 0;
}
Output:
F(0)
copy to F(1)
F(2)
copy to F(3)
F(4)
copy to F(5)
F(6)
copy to F(7)
F(8)
copy to F(9)
~F(1)
~F(3)
~F(5)
~F(7)
~F(9)
This is the output of Miscrosoft 2010 Visual C++ Express. See how inefficient it is. 5 temporary objects are created and destroyed when creating a 5-objects vector. CRAZY!

5. Use initialization list for constructor
Eliminate unnecessary temporary object created and deleted for initializing member variables.
struct G{
G() : m_X(0) {
std::cout << "G()" << std::endl;
}
G(int i) : m_X(i) {
std::cout << "G(int)" << std::endl;
}
~G() {
std::cout << "~G()" << std::endl;
}

int m_X;
};
struct Bar{
Bar(int i) {
std::cout << "Bar(int)" << std::endl;
m_G = i;
}
~Bar() {std::cout << "~Bar()" << std::endl;}

G m_G;
};

If we do,
int main(int argc, char* argv[])
{
{
Bar b;
}
return 0;
}
Output:
G()
Bar(int)
G(int)
~G()
~Bar()
~G()
The default constructor G() is called to initialize m_G before getting into constructor Bar(int). When hitting "m_G = i", constructor G(int) was called to re-initialize m_G. Then old value of m_G is to be destructed, which is the temporary object created and destructed. This should be avoided.

If we modify Bar as
struct Bar{
Bar(int i) : m_G(i) {
std::cout << "Bar(int)" << std::endl;
}
~Bar() {std::cout << "~Bar()" << std::endl;}

G m_G;
};
Output:
G(int)
Bar(int)
~Bar()
~G()
"m_G" is to be initialized directly by passing "i" as the argument in the initialization list. Then no temporary object is created and destructed.

6. Initialize the objects in the initialization list in the same order in which they are declared
This is especially true when some member variables are taking other member variables as the arguments. The defendants have to be declared after the member variables they are depending.

struct B{
B(int i);
};
struct C{
C(int i);
};
struct A{
A(const B&, const C&);
}

struct Foo{
Foo(int i, int j) : m_B(i), m_C(j), m_A(m_B, m_C) {}

B m_B;
C m_C;
A m_A;
};
"m_A" is dependent on "m_B" and "m_C". Then "m_B" and "m_C" have to be initialized before "m_A". It will require "m_B" and "m_C" has to be declared before "m_A". Otherwise in the initialization list when hitting "m_A(m_B, m_C)", "m_B" and "m_C" are not initialized yet. Then will cause crash.

Monday 17 March 2014

C++11 features - Explicitly defaulted/deleted member functions

Problems: the default 4 generated items by compiler
class Foo{
};
An empty class has four signature generated by compiler by default:
Foo(); // constructor
Foo(Foo&); // copy constructor
Foo& operator=(const Foo&); // assignment operator
~Foo(); // destructor
(of course some other default operator are generator as well, for instance global new operator, ".", "&", "->" operator)
Quick quiz: sizeof(A) = ?
Most of time there is no problem for default constructor and desctrucotr. The problem lies with the default copy constructor and assignment operator. By default the copy constructor and assignment operator generated by compiler are doing member-wise copy/assignment. Simply there are 3 options.
1. Define your own copy constructor and assignment operator. For instance there are resources allocated in the class, like heap memory, file handler, socket connection and data sessions.
2. Accept the default copy constructor and assignment operator. Then do nothing. Simply ignore the declaration in your class.
3. Suppress the default copy constructor and assignment operator and prefer this class un-copyable or un-assignable.

Here we are concentrating the 3rd option and discuss what C++11 provides to make these option more convenient and clear.

Solutions in C++03:
1. Make this class un-copyable or un-assignable by declaring the copy constructor or assignment operator explicitly private and having no implementation.
class Foo {
public:

private:
// exmplicitly declare them private and
// do not provide any implementation
Foo(&Foo);
Foo& operator(const Foo&);
};

Declare them private: prevent any copy or assignment outside this class and stop them being called from sub-classes legitimately as well. But still one defect: can not stop friend classes and function calling them.
Provide no implementation: this is actually stopping friend classes and functions calling them. The compiler will signal linking errors when they are actually called from friends. Simply the declarations do not have function body to match.

2. (Multi)-inheritance from a base uncopyable/assignable class
class Uncopyable{
private:
Uncopyable(Uncopyable&);
Uncopyable& operator=(const Uncopyable&);
};

// Foo inherit from Uncopyable class
class Foo : public Uncopyable, public Base {
......
};

The idea is to take the 1st idea into a base class to be inherited by all un-copyable class. Because it sub-classes's copy constructor and assignment operator will try to call its base class counterpart, but it fails to do
so as they are declared as private and no implementation provided.

Solution in C++11:
1. "delete"
An new pair was introduced as default/delete. The two solution provided as in C++03 will look like:
class Foo {
Foo(Foo&) = delete;
Foo& operator=(Foo&) = delete;
};

class Uncopyable {
Uncopyable(Uncopyable&) = delete;
Uncopyable& operator=(const Uncopyable&) = delete;
};

This is crystal clear about we would like to do here - suppress the default copy assignment and assignment operator and make this class un-copyable. No ambiguity to sub-class and friends.

2. "default"
C++11 also provide "default" keyword to tell the compiler that the programmer would like to use the default version generated by compilers. This means that the programmers are happy with the member-wise copy
of copy constructor and assignment operator. If an explicit version is preferred, simple declare it and provide the implementation.

3. Extra usage of "delete"
It can be used to suppress type conversion/promotion when finding closest function signature (function overloading) or in template specialization (no matter partial or full) and template overloading.
void foo(double) {}
void foo(int) = delete; // no type promotion from int to double
This will suppress calling like foo(1). Compilers will generate error.

template<typename T> void foo(T) {}
template<> void foo<int> (int) = delete;
This will suppress this template function to be instantiate to take a int type.

I have not seen this type of things are useful yet, because most of occasions the program will require special implementation against a certain type of argument or a series of arguments. For instance to have different implementation with void foo(int) {} and special implementation with template<> void foo<Complex> (Complex) by template specification/overloading.

Especially for template class, there is a single template function can be used for all types. So impossible to enumerate all by "delete". All need to keep in mind is that what scope it is designed for.

template<typename T>
T Foo(T a, T b) {return a + b;}

template<>
Complex Foo<Complex>(Complex a, Complex b) {
return Complex(a.x+b.x, a.y+b.y)
}

So really "delete" is not the solution. This usage is really useless and C+11 should not really wasting time on facilitating this sort of convenience.

Saturday 15 March 2014

C++11 features - Move constructor

Problem in C++03: The cost of temporary objects
When a class has resources allocated inside itself (resources can be memory, file handler, database session, socket connection and so on), an explicit implementation of destructor is a must. For copy constructor and assignment operator they often appears in pair. Either both are required to be implemented or both are required to be disabled by declaring them private with no implementation. Here we are talking about the first case, they are both required to be implemented.

In C++03 for those variables appear in the right-hand side of operator=, it normally means they are const reference and their value are not allowed to be modified. When doing user-defined objects with operator=, the only legal operations are designated to copy constructor and assignment operator. Here is the problem that the copy/assignment operations, called deep-copy, are normally very expensive because most of time they are implemented as bit-wised copy.
------------------------------------------------------------------------------------------------------------
class Bar;
class Foo {
pulic:
Foo();
~Foo();
Foo (Foo& f);
Foo& operator=(const Foo&);
private:
int m_x;
Bar* m_Bar;
};

Foo a;
Foo b = a; // copy constructor (p-1)
Foo c;
c = a; // assignment operator (p-2)

Foo GetFoo() {
Foo temp; // temporary objects
.......
return temp; // more temporary objects create. (p-3)
}
------------------------------------------------------------------------------------------------------------
One of the problems is return-value type. (In C++03 we are taught not to return an object. However it is recommended to return a reference or pointer. But this will involve more or less memory management and risk of crashing the program by de-referencing of dangling reference/pointer.) In the above code (GetFoo()) an object of Foo is returned. Think about how this is achieved. One of methods to achieve it is to allocate heap memory to invoke copy constructor by passing the const reference of temp. Then temp disappears from stack as stack-unwinding when returning from function invoking. Return back to the caller of GetFoo, a temporary object is yet created again based on the object residing in heap memory and then delete the object in the heap. In the total a few temporary objects are created in deep-copy operation. Very expensive and inefficient. One of examples are the implementation of std::string and STL in C++.

Even though most modern compilers support return-type optimization, we still lose the control of how it is optimized. Therefore most of time we are recommended not return an object.

The savior : C++11 rvalue reference and hence move constructor
rvalue reference - syntax notation: T&&
Comparing with const reference in C+03, rvalue reference is newly introduced in C++11. It is referring a temporary object which is allowed to be modified after initialization. This is will completely overturn the "move semantics" in C++03, which is sort of like:
temp = a;
a = b;
b = temp;
Temporary objects are the bridge. C++11 with the introduction of rvalue reference will do the swap, which is extremely efficient for allocated resources. Think about a and b are pointing an array in the above example. A "temp" of array will be created for the move purpose in C++03. In the "swap semantics" the only thing need to do (move) is to exchange the pointer and the data can stay where they are (no temporary objects needed).

With this "swap semantics" concept, it is not only fast but also exception-safe comparing with C++03. What if exception happens when creating the temporary object (allocate memory and calling the constructor). The program could lose the control of the allocated resources. However C++11 the move constrcuctor uses "swap semantics" concept, therefore no temporary objects in the middle and it is exception-safe. To understand it better, read Item 22 and 23 of Herb Sutter's book "More Exceptional C++ - 40 New Engineering Puzzles, Programming Problems, and Solutions".
------------------------------------------------------------------------------------------------------------
Move constructor:
class Foo {
public:
Foo(Foo&&);
......
};
------------------------------------------------------------------------------------------------------------
It has been confirmed that C++11 STL has move constructor implementation. And we don't have to worry about the return type of "vector<Foo>" or "vector<Foo>&" any more. As well we are no longer dependent on the compiler vendors to optimize the code.

Friday 14 March 2014

Performance - Running out of stack

When a new process starts, memory is allocated for it to run. It includes usages for stack and heap. Stack runs much faster than heap (memory is allocated in heap via new/malloc) and it is used for accommodating automatic variables and facilitating function calling. If we take the memory like a belt. Normally stack and heap are staying at two poles. As more objects are created in them, they are grows towards together. For most cases the size of stack has a default size. (You may readjust the share of stack/head. Such a problem often comes across in embedded world.) As the program runs, more objects are created on the fly and call functions. The program may run out of stack and then flies.

- What can cause stack overflow?
1. Two many/large automatic objects created on stack. For instance try
void foo() {
int myArray[100000000];
memset(&myArray[0], sizeof(myArray), 0x00);
}
This will cause stack overflow as the required automatic array overtakes the size of stack. Note it has to be declared/committed in a function, any function. Declare it as a global variable, then it will take global/static memory area rather than stack.
2. Deep function calling
It basically has something to do with how the function calling are invoked. Let's say we have a calling graph from A->B->C....... When it starts, all the automatic variable in A (declared&committed before calling B) will be pushed into stack. Program states (Common registers, PC counter, sign registers and so on) are pushed into stack as well. Then all automatic variables in B and its program states into stack as well, then C...... With a certain depth, the stack will be exhausted and the overflow happens.
It is extremely vital to check the maximal function call depth in embedded applications (or stack size), because lacking of a OS or a cut-down version, the share between stack/heap are static. Unlike modern PC, the boundary of stack can move and the size of stack can grow as long as it does not grow over to heap section.
3. Infinite function calling - recursive function calling
void foo() {
foo();
}
If infinite function calling exits, the overflow of stack is guaranteed. Most of time it happens on recursive function implementation. This is why people towards recursive function are divided. On one hand it presents the idea/algorithm clearly than using loop and on the other hand it risks the danger of deep/infinite function calling then crashes the program. The worst problem is that it may run OK with the current arguments and it may crash when the scale of problems grows bigger to a certain size.
It is clear that recursive functions are banned (or not recommended) in embedded applications. In PC application it is neutral as long as it does not run the risk of infinite loop.

- Tips
1. Use tools to check stack size. For instance like Valgrind for Linux
2. For your self studying, run the program like
#include <iostream>
int main(int argc, char* argv[]) {
int i = 0;
int* iPtr = new int(1);
std::cout << &i << std::endl; // stack: the address of i
std::cout << &iPtr<< std::endl; // stack: the address of iPtr
std::cout << iPtr << std::endl; // heap: where iPtr points
delete iPtr;
return 0;
}

Result: 0x0054FCEC
0x0054FCE0
0x0062BBE8
Based on the value, you may be able to guess roughly the size of stack. Normally the top a few (most) significant bits are different between stack and heap. And heap has a bigger number.
3. Use safe version of C/C++ in embedded world - like objective-C
The idea is not to use heap hence the program will take all the memory for stack. Rather than using heap to dynamically create objects, use global variables or implement a objects-pool or MMU with the memory residing on stack. Then forget the heap and you have a fixed/bigger size of stack.
int main(int argc, char* argv[]) {
char Mem[1<<12]; // allocate memory in stack
int* iPtr = realloc(&Mem[0], 4);
int* iPtr1 = new(&Mem[4]) int(4);

.......
}
It gains not only the safety but also speed. But lose the convenience. In C++ you have to invoke destructor explicitly to release the resource and not call delete operator for user-define type.

Thursday 13 March 2014

Performance - Running out of memory (heap)

I am not going to waste any words to emphasize how important memory management is for C++. Let's cut into this topic directly. What should you do when running out of memory? (I am not going to talk about the program failing to allocate a huge chunk of memory, like 4G in a 2G machine. Thing like this you should avoid directly by choosing a more memory-efficient algorithm.) Here I am going to talk about what can be done when the program are gradually losing the control of the memory and when the memory consumption grows up and never down again. How to approach this issue?

The very first thing to me (I would like to do first) is to collect a memory trace when memory allocation failure happens. Tools like Valgrind, Dr. Memory (from Google) and IBM Rational PurifyPlus. I would strongly recommend Valgrind on Linux, free and powerful. For Windows if you can afford, Rational PurifyPlus is great. Otherwise Dr. Memory, it is free from Google and it does the job. Ok, as long as you get a memory trace. Fix any memory leak on "definitely loss" and pay attention to "possible loss". Fix the memory leaks first before move on.

If the code is in high quality and has no memory leak at all, the process will be tough to spot the hot data-structure and find the solution. Here is a few things you can check.

1. Spot the hot data structure from the trace. Check the life time of data. Ask yourself if the data live beyond its life-span. Especially on global variables, static variables, big data chunk memory. The point here is that create data/variables when needed and destroy them asap.

2. Check any user-defined data structure in hot memory area. If yes, check the amount and its actually size. The point here is memory alignment. In resource critical applications this is very important. Normally the actual size of memory the user-defined data structure takes is more than the value you get from "sizeof()" dynamic operator. Check the actual size it takes in memory, re-arrange its data member and ensure as least waste as possible.

3. Check the frequency of calling new/delete and malloc/free. The point here is memory fragment. Thank about that the memory is like a white paper at the beginning. After millions of memory allocation/de-allocation of small objects, it is just like scattering black pepper on the white paper. The memory is polluted and never has a decent size blank white area available when you need to write a long message. If this happens, think about having an objects pool or self-implemented memory-management unit. Scenarios like using a node-based container with frequent add/remove operations. Having a self-implement memory management unit is quite common in embedded world.

4. Check your compiler vendor. See the implementation like std::string and STL containers. Some string implementation have default minimal size, for instance 32 at least. It would be bad when the program is using a lot of small strings. As well check the STL container implementation and their default minimal size.

5. Use node-base based container if fails to allocate a large chunk of continuous memory. For instance choose std::list over std::vector, std::queue. Choose self-implemented node-based container (link-list) over array.

I see memory shortage happens mostly in embedded applications. Self-implemented MMU is the solution on the most occasions. In scientific computing often fails to allocate a large chunk of continuous memory. Think about a different algorithm. Remember better memory management sometimes means better performance. Especially with better data alignment and object pools.

Wednesday 12 March 2014

Data Structure and Algorithm - Trie

This small program is to implement a TRIE data structure (see Trie on wiki) and algorithm for a dictionary. Its main functionality follows
> apple
> appreciate
> application
------ Add there words into this dictionary
> app?
------ Ending with "?" will query the dictionary and print out all matching words that starts with "app"
> app-
------ Ending with "-" will remove all words that starts with "app" in the dictionary

Note: extra function can be implemented, for instance adding more patterns to query the dictionary. The main purpose of this program is to implement a dictionary based on TRIE.
Assumption: the words consists of 26 English letters and they are all stored in lower case.
Environment: 2010 Microsoft Visual C++ 2010 Express

//-----------------------------------------------------------------------------------------------------------
// TRIE dictionary C++ public API
// DictionaryTrie.h
#ifndef DICTIONARY_TRIE_H
#define DICTIONARY_TRIE_H

#include <string>
#include <vector>

#define NUMBER_OF_LETTERS 26
struct DictionaryTrie {
char* str;
DictionaryTrie* children[NUMBER_OF_LETTERS];
DictionaryTrie() {
str = NULL;
for (int i = 0; i != NUMBER_OF_LETTERS; ++i) {
children[i] = NULL;
}
}
};

void InitDictionaryTrie(DictionaryTrie** dtriePtr);
void AddWord(DictionaryTrie* root, const char* word);
void RemoveWord(DictionaryTrie* root, const char* word);
void QueryWord(DictionaryTrie* root, const char* word, char** result);
void QueryWord(DictionaryTrie* root, const char* word, std::vector<std::string>& result);
bool FindWord(DictionaryTrie* root, const char* word);
void TraverseDictionaryTrie(DictionaryTrie* root, char** result);
void TraverseDictionaryTrie(DictionaryTrie* root, std::vector<std::string>& result);
void DestoryDictionaryTrie(DictionaryTrie* root);
bool EmptyTrieNode(DictionaryTrie* node);

#endif

//-----------------------------------------------------------------------------------------------------------
// TRIE dictionary C++ private API
// DictionaryTrie_internal.h
#ifndef DICTIONARY_TRIE_INTERNAL_H
#define DICTIONARY_TRIE_INTERNAL_H

#include "DictionaryTrie.h"

DictionaryTrie* LocateWord(DictionaryTrie* root, const char* word);
bool EmptyTrieNode(DictionaryTrie* node);

#endif

//-----------------------------------------------------------------------------------------------------------
// TRIE dictionary C++ implementation
// DictionaryTrie.cxx
#include "stdafx.h"

#include "DictionaryTrie_internal.h"

#include <stack>
#include <stdlib.h>
#include <string.h>

void InitDictionaryTrie(DictionaryTrie** dtriePtr)
{
if (!dtriePtr) {
*dtriePtr = static_cast<DictionaryTrie*> (malloc(sizeof(dtriePtr)));
(*dtriePtr)->str = 0;
for (int i = 0; i < NUMBER_OF_LETTERS; ++ i) {
(*dtriePtr)->children[i] = 0;
}
}
}

void AddWord(DictionaryTrie* root, const char* word)
{
if (!root || !word) {
return;
}

DictionaryTrie* tempPtr = root;
int size_str = 0;
while (*(word + size_str) != '\0') {
if (!tempPtr->children[*(word + size_str) - 'a']) {
tempPtr->children[*(word + size_str) - 'a'] = new DictionaryTrie();
}
tempPtr = tempPtr->children[*(word + size_str) - 'a'];
++size_str;
}

if (!tempPtr->str) {
tempPtr->str = static_cast<char*>(malloc(size_str + 1));
strncpy(tempPtr->str, word, size_str);
*(tempPtr->str + size_str) = '\0';
}
}

void RemoveWord(DictionaryTrie* root, const char* word)
{
DictionaryTrie* tempPtr = LocateWord(root, word);
if (tempPtr) {
if (tempPtr->str) {
free(tempPtr->str);
tempPtr->str = 0;
}

if (EmptyTrieNode(tempPtr)) {
DestoryDictionaryTrie(tempPtr);
}
}
}

void QueryWord(DictionaryTrie* root, const char* word, char** result)
{
DictionaryTrie* wordFound = LocateWord(root, word);
if (wordFound) {
TraverseDictionaryTrie(wordFound, result);
}
}

void QueryWord(DictionaryTrie* root, const char* word, std::vector<std::string>& result)
{
DictionaryTrie* wordFound = LocateWord(root, word);
if (wordFound) {
TraverseDictionaryTrie(wordFound, result);
}
}

bool FindWord(DictionaryTrie* root, const char* word)
{
DictionaryTrie* tempPtr = LocateWord(root, word);
return !tempPtr && !tempPtr->str;
}

void TraverseDictionaryTrie(DictionaryTrie* root, char** result)
{
DictionaryTrie* tempPtr = root;
if (tempPtr) {
if (tempPtr->str) {
// copy the pointer rather than the string
*(result++) = tempPtr->str;
}
for (int i = 0; i < NUMBER_OF_LETTERS; ++i) {
TraverseDictionaryTrie(tempPtr->children[i], result);
}
}
}

void TraverseDictionaryTrie(DictionaryTrie* root, std::vector<std::string>& result)
{
DictionaryTrie* tempPtr = root;
if (tempPtr) {
if (tempPtr->str) {
result.push_back(tempPtr->str);
}
for (int i = 0; i < NUMBER_OF_LETTERS; ++i) {
TraverseDictionaryTrie(tempPtr->children[i], result);
}
}
}

void DestoryDictionaryTrie(DictionaryTrie* root)
{
DictionaryTrie* tempPtr = root;
if (tempPtr) {
if (tempPtr->str) {
free(tempPtr->str);
tempPtr->str = 0;
}

for (int i = 0; i < NUMBER_OF_LETTERS; ++i) {
DestoryDictionaryTrie(tempPtr->children[i]);
}
delete tempPtr;
}
}

/*
* this verison use a lot of mem
*/
/*
void DestoryDictionaryTrie(DictionaryTrie* root)
{
if (!root) {
return;
}

std::stack<DictionaryTrie*> stackDtriePtr;
stackDtriePtr.push(root);
DictionaryTrie* tempPtr;
while (!stackDtriePtr.empty()) {
tempPtr = stackDtriePtr.top();
stackDtriePtr.pop();
if (tempPtr->str) {
free(tempPtr->str);
}
for (int i = 0; i < NUMBER_OF_LETTERS; ++i) {
if (!tempPtr->children[i]) {
stackDtriePtr.push(tempPtr->children[i]);
}
}

free(tempPtr);
}
}
*/

DictionaryTrie* LocateWord(DictionaryTrie* root, const char* word)
{
if (!root || !word) {
return 0;
}

DictionaryTrie* tempPtr = root;
while (*word != '\0' && tempPtr) {
tempPtr = tempPtr->children[*word - 'a'];
++word;
}

return tempPtr;
}

bool EmptyTrieNode(DictionaryTrie* node)
{
if (node->str) {
return false;
}

for (int i = 0; i < NUMBER_OF_LETTERS; ++i) {
if (node->children[i]) {
return false;
}
}

return true;
}

//-----------------------------------------------------------------------------------------------------------
// Wrapper class Dictionary based on DictionaryTrie
// DictionaryImpl.h
#ifndef DICTIONARY_IMPL_H
#define DICTIONARY_IMPL_H

#pragma once

#include <string>
#include <vector>

struct DictionaryTrie;

class DictionaryImpl
{
public:
DictionaryImpl();
~DictionaryImpl();

void Add(const std::string& word);
void Query(const std::string& word, std::vector<std::string>& words);
void Query(const std::string& word, char** words);
void Remove(const std::string& word);
void RetrieveAll(std::vector<std::string>& words);
void RetrieveAll(char** words);

private:
DictionaryTrie* const m_DictRoot;
};

#endif

//-----------------------------------------------------------------------------------------------------------
// Wrapper class Dictionary based on DictionaryTrie
// DictionaryImpl.cxx
#include "StdAfx.h"
#include "DictionaryImpl.h"

#include "DictionaryTrie.h"

DictionaryImpl::DictionaryImpl()
: m_DictRoot(new DictionaryTrie())
{
}

DictionaryImpl::~DictionaryImpl()
{
DestoryDictionaryTrie(m_DictRoot);
}

void DictionaryImpl::Add(const std::string& word)
{
AddWord(m_DictRoot, word.c_str());
}

void DictionaryImpl::Query(const std::string& word, std::vector<std::string>& words)
{
std::vector<std::string>().swap(words);
QueryWord(m_DictRoot, word.c_str(), words);
}

void DictionaryImpl::Query(const std::string& word, char** words)
{
QueryWord(m_DictRoot, word.c_str(), words);
}

void DictionaryImpl::Remove(const std::string& word)
{
RemoveWord(m_DictRoot, word.c_str());
}

void DictionaryImpl::RetrieveAll(std::vector<std::string>& words)
{
std::vector<std::string>().swap(words);
TraverseDictionaryTrie(m_DictRoot, words);
}

void DictionaryImpl::RetrieveAll(char** words)
{
TraverseDictionaryTrie(m_DictRoot, words);
}

//-----------------------------------------------------------------------------------------------------------
// main starts
// Dictionary.cxx
// Dictionary.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include "DictionaryImpl.h"

#include <boost\shared_ptr.hpp>

#include <algorithm>
#include <iostream>
#include <iterator>

void InitDictWords(DictionaryImpl& dict)
{
dict.Add("hello");
dict.Add("world");
}

void DisplayWords(std::ostream& outStream,
const std::string& prefix,
const std::string& word,
const std::vector<std::string>& result)
{
outStream << prefix << word << std::endl;
std::copy(result.begin(),
result.end(),
std::ostream_iterator<std::string>(outStream, "\n"));
}

int _tmain(int argc, _TCHAR* argv[])
{
DictionaryImpl dict;

InitDictWords(dict);

std::cout << "Welcome to Dictionary Program" << std::endl;
std::string input;
while (true) {
std::cin >> input;
if (*input.rbegin() == '\?') {
std::string word(input.begin(), input.end() - 1);
std::vector<std::string> result;
if (word.empty()) {
dict.RetrieveAll(result);
DisplayWords(std::cout, "PRINT ALL WORDS: ", "", result);
} else {
dict.Query(word, result);
if (result.empty()) {
std::cout << word << " NOT FOUND" << std::endl;
} else {
DisplayWords(std::cout, "MATCH FOUND: ", word, result);
}
}
}
else if (*input.rbegin() == '\-') {
std::string word(input.begin(), input.end() - 1);
if (word.empty()) {
break;
} else {
std::vector<std::string> result;
dict.RetrieveAll(result);
DisplayWords(std::cout, "BEFORE REMOVE: ", word, result);
dict.Remove(word);
dict.RetrieveAll(result);
DisplayWords(std::cout, "AFTER REMOVE: ", word, result);
}
}
else {
std::vector<std::string> result;
dict.RetrieveAll(result);
DisplayWords(std::cout, "BEFORE ADD: ", input, result);
dict.Add(input);
dict.RetrieveAll(result);
DisplayWords(std::cout, "AFTER ADD: ", input, result);
}
}

return 0;
}

Bibliography:
[1] http://en.wikipedia.org/wiki/Trie
[2] Glenn W. Rowe, "Introduction to Data Structure and Algorithms with C++", Prentice Hall, 1997