Friday 30 May 2014

Aggregate on C++11

One way to learn aggregate is starting from C++ standard, reading the definition and the clauses. Here I would like to start from the other way. I will start from how it is being used and reversely deducing its requirements defined in standard.

1. Aggregate in C++03 Standard
C++03 inherits the initlialzer-lists of C's style. Initializer-lists can be used on aggregate. It means that aggregate can be directly initialized via "{}". And the member variables in aggregate will be initialized in the order of the values appearing in the {}-list. This implies two requirements,
    - Statically initialized at compiling time
    - Know the exact memory footprint at compiling time
Here I would like to show you what exactly these two requirements mean in C++ standard

Example 1: initializer-lists on classes (Here the clause of "classes" refers to class, struct and union in standard.)
//********************************************************************************
struct Foo {
    int x;
    double y;
};

Foo f = {1, 10.0};

struct Bar {
    int a;
    double b;
    Foo f;
    double c;
};
Bar bar = {2, 3.0, {1, 10.0}, 2.0};
//********************************************************************************

In Example 1 "Bar bar" is initialized as:
    bar.a = 2
    bar.b = 3.0
    bar.f.x = 1
    bar.f.y = 10.0
    bar.e = 2.0
This tells us that the sizeof(Bar) = size(int) + sizeof(double) + sizeof(Foo) + sizeof(double), where sizeof(Foo) = size(int) + sizeof(double), without considering memory padding and memory aligment (see my other blog entry in Performance - Running out of memory (heap)). And this has to be known by compiler at compiling time. At the same time it has to be known of the exact memory footprint/order of Bar, then you can know what exactly the values of member variables of "Bar bar" is initialized.

Let's think about what could affect the memory size and footprint of a class in C++. Virtual function and inheritance.
Virtual function will add an entry, virtual table pointer, into the memory of an object, which will increase the size of an object. More importantly C++ standard does not define where this virtual table pointer should locate in the memory of the object (It could stay at the top or anywhere else). Therefore there is no way to have a portable implementation to initialize an object via initializer-lists, if classes have virtual functions
Inheritance also increase the size of the derived classes. Normally the size of Derived is equal to its own size of non-static member variables plus the size of Base. (This is not exactly true if the class is an empty class or has virtual functions). But this is correct to claim that sizeof(Derived) >= sizeof(Base). And only when both Derived and Base class have no data member, then it will be the "equal" case. And the same case as virtual functions C++ standard does not define the order of Base class's memory in the footprint of Derived. Therefore there is no way to have a portable implementation to initialize an object via initializer-lists, if classes have base classes.

Here we can conclude aggregate can not have any C++ feature that increase its memory size or affects the order of its memory footprint. In other words aggregate can have features/qualification of C++, as along as they do not have impact on the memory.
    - No virtual functions
    - No base classes
    - Can have as many static (public/protected/private) member variables as you like.
    - Can have as many (public/protected/private) functions as you like.

One more thing keep in mind is that as initializer-lists can initialize the member variables of aggregate directly. It implies that all the member variables have to be visible/accessible from outside by anyone and this means that all the non-static member variables in aggregate classes have to be "public". (Static member variables will not take memory of objects, because it resides in global/static data section in memory and shared by all the objects. And they can directly accessed by classes plus "::")

Example 2: initializer-lists on array
//********************************************************************************
// user-defined type
class Foo;
Foo fooArr[n] = {F1, F2, ..., Fm};
// build-in type
int a[n] = {X1, X2, X3, ..., Xm};
//********************************************************************************

Where F1, F2, ..., Fm are the instances of Foo. X1, X2, ..., Xm are all integer with build-in type "int". Here are the 3 different relationships between n and m:
    n = m: each value is initialized as specified
    n > m: the first m values are initialized as specified and the rest is initialized as default value
    n < m: compilation error - compiler will flag it out

As shown in the case of (n>m), the rest of (n-m) objects will be initialized as the default objects. This will require that classes have to provide a default constructor. It implies that the classes can not have user-defined constructor because C++ standards says that any user-defined constructor will suppress the default constructor. As this class can (only) be initialized by default constructor. It leads to another requirement. All the member variables have to have default values. It implies that all the member variables must be build-in types or any other existing aggregate classes. And keep in mind that C++ reference does not have default value and therefore the aggregate class can not have member variables with C++ reference type.

Example 3: arrays with non-aggregate
//***************************'*****************************************************
class Foo () {
public:
    Foo(int x) : m_x(x) {} // Suppress the default constructor and
                                      // therefore Foo is not an aggregate
    int m_x;
};

Foo arrFoo1[] = {Foo(1), Foo(2), Foo(3)}; // ok
Foo arrFoo2[3] = {Foo(1), Foo(2), Foo(3)}; // ok
Foo arrFoo3[3] = {Foo(1)}; // Not ok
//********************************************************************************

Keep in mind that all arrays in C++03 are aggregate. But not all of them are legally initialized. In Example 3, Foo is not an aggregate class because it has user-defined constructor. Except arrFoo3, they are all legal aggregates, because arrFoo3 is the (n>m) case shown in Example 2. So the rest (3-1=2) has to be initialized as default value. Then it becomes illegal because Foo does not provide default constructor.

Here is the list of things that are worth of keeping in mind in term of aggregate in C++03
    - No virtual function
    - No base classes
    - No user-defined constructor (default constructor only)
    - No limitation on copy constructor, assignment operator and destructor (user-defined allowed)
    - No reference type in member variables
    - Any other build-in types and aggregate as member variables
    - All public non-static member variables
    - Any public/protected/private static variables
    - Any public/protected/private static/non-static functions
    - Any array is aggregate
    - Array works both on aggregate and non-aggregate
    - Array works only on aggregate in (n>m) case shown in Example 2

2. Improvement on C+11 Standard
There is no significant improvement on aggregate in C++11. However some new features newly introduced in C++11 relax the requirement/definition of aggregate.

Feature 1: explicitly defaulted member function
More details about this feature please refer to my other blog entry, Explicitly defaulted/deleted member functions.
This is not really an improvement. It simply changes the notation of explicitly declaring to use the default constructor generated by the compiler.

Example 4
//********************************************************************************
// C++11
class Foo {
public:
    Foo() = default;
    int m_x;
};
//********************************************************************************

C++11 allows to use "default" to declare explicitly that Foo will use the default constructor. And in C++11 Foo is an aggregate. However in C++03 any declaration/definition of constructor will prevent classes from being aggregate.

Feature 2: default value for class member variables
More details about this feature please refer to my other blog entry, Improvement on object construction.

This is a significant improvement on C++11 over C++03. It allows the aggregate to have different default value under C++11. In Example 1, the default values of member variables of Foo and Bar will be {0, 0.0}, {0, 0.0, {0, 0.0}, 0.0}

Example 5: default values in C++11
//********************************************************************************
struct Foo {
    int x = 1;
    double y = 1.0;
};

struct Bar {
    int a = 10;
    double b = 10.0;
    Foo f = {2, 20.0};
    double c = 30.0;
};

Bar barArr[3];
//********************************************************************************

barArr will have 3 Bar objects with the default value of {10, 10.0, {2, 20.0}, 30.0}, as specified in its declaration. It will save a lot of time/code to re-initialize them to different values in C++03.

Bibliography:
[1] C++03 Standard
[2] C++11 Standard
[3] http://www.stroustrup.com/C++11FAQ.html
[4] N2640 by Jason Merrill and Daveed Vandevoorde
[5] http://en.wikipedia.org/wiki/C++11

Tuesday 27 May 2014

C+11 - Runtime Type Information (RTTI)

RTII refers to the facility defined by C++ standard that returns the object's type information at runtime. C++11 standard provides the definitions in <typeinfo>. It includes 3 parts
- class std::type_info
- class std::bad_cast
- class std::bad_typeid;

1. std::type_info
More details refer to C++11 standard.
//********************************************************************************
namespace std {
class type_info {
public:
    virtual ~type_info();
    bool operator==(const type_info& rhs) const noexcept;
    bool operator!=(const type_info& rhs) const noexcept;
    bool before(const type_info& rhs) const noexcept;
    size_t hash_code() const noexcept;
    const char* name() const noexcept;
    type_info(const type_info& rhs) = delete; // cannot be copied
    type_info& operator=(const type_info& rhs) = delete; // cannot be copied
};
}
//********************************************************************************

std::type_info is the return value of typeid() operator. It can be a lvalue or gvalue. And it serves the base class for any other user-defined.

2. Type identification: typeid()
Type identification is used to called on object, object reference and object pointer. cv-specilier does not affect its value. It returns the std::type_info or any of its sub-class.

Example 1: cv specifier and pointer/reference
//********************************************************************************
class Foo {
};

class Bar {
};

void test(void)
{
    Foo foo;
    const Foo fooConst;
    volatile Foo fooVolatile;
    const Foo& fooRef = fooConst;
    Foo* fooPtr = &foo;
    Bar bar;
    std::cout << (typeid(foo) == typeid(fooConst)) << std::endl;     // true
    std::cout << (typeid(foo) == typeid(fooVolatile)) << std::endl;  // true
    std::cout << (typeid(foo) == typeid(fooRef)) << std::endl;        // true
    std::cout << (typeid(foo) == typeid(fooPtr)) << std::endl;         // false
    std::cout << (typeid(foo) == typeid(*fooPtr)) << std::endl;       // true
    std::cout << (typeid(fooPtr) == typeid(fooRef)) << std::endl;   // false
    std::cout << (typeid(*fooPtr) == typeid(fooRef)) << std::endl; // true
    std::cout << (typeid(foo) == typeid(bar)) << std::endl;             // false
}
//********************************************************************************

Example 1 shows that const-volatile specifier does not affect the return value of typeid. Reference type has no effect on the return value of typeid. And the pointer has to de-referenced when used.

Example 2:  base and derived classes without virtual function
//********************************************************************************
class Base {
};

class Derived : public Base {
};

void test()
{
    Base b;
    Derived d;
    Base& bRef = d;
    Base* bPtr = &d;
    std::cout << (typeid(b) == typeid(d)) << std::endl;              // false
    std::cout << (typeid(bRef) == typeid(d)) << std::endl;         // false
    std::cout << (typeid(bPtr) == typeid(d)) << std::endl;          // false
    std::cout << (typeid(*bPtr) == typeid(d)) << std::endl;        // false
    std::cout << (typeid(bRef) == typeid(bPtr)) << std::endl;     // false
    std::cout << (typeid(bRef) == typeid(*bPtr)) << std::endl;   // true
}
//********************************************************************************

Example 3: base and derived classes with virtual function
//********************************************************************************
class Base1 {
public:
    virtual std::string GetName() {
        return "Base1";
    }
};

class Derived1 : public Base1 {
public:
    virtual std::string GetName() {
        return "Derived1";
    }
};

void test()
{
    Base1 b1;
    Derived1 d1;
    Base1& b1Ref = d1;
    Base1* b1Ptr = &d1;
    std::cout << (typeid(b1) == typeid(d1)) << std::endl;              // false
    std::cout << (typeid(b1Ref) == typeid(d1)) << std::endl;         // true
    std::cout << (typeid(b1Ptr) == typeid(d1)) << std::endl;          // false
    std::cout << (typeid(*b1Ptr) == typeid(d1)) << std::endl;        // true
    std::cout << (typeid(b1Ref) == typeid(b1Ptr)) << std::endl;     // false
    std::cout << (typeid(b1Ref) == typeid(*b1Ptr)) << std::endl;   // true
}
//********************************************************************************

Example 2 and Example 3 show the importance of virtual function in the inheritance. Virtual function affects the memory map of object. And the information in the virtual table will be accessed when calling on typeid() operator. Please refer to my other blog entries, virtual table and (pure) virtual function.

3. dynamic_cast and std::bad_cast
Please refer to my other blog entry, Casting in C++.

4. Exceptions: std::bad_typeid and std::bad_cast
std::bad_typeid is thrown when typeid() is called on a nullptr. And std::bad_cast is discussed in my other blog entry, Casting in C++.

//********************************************************************************
class Foo {
};

void test
{
    Foo* fooPtr = nullptr;
    typeid(fooPtr);            // throw std::bad_typeid
}
//********************************************************************************

5. Calling in constructor and deconstructor
C++11 standard says that their behavior is undefined, if dynamic_cast and typeid() is called on an object whose life does not start yet or comes into its end. This is because the virtual table is not correctly written before coming out of constructor and the virtual table may lose information when coming into destructor.  Please refer to my other blog entry, The order of object initialization/destruction.

6. dynamic_cast vs. typeid
typeid and std::bad_typeid is rarely used in C++ programming. However dynamic_cast sometimes have to be used in rare occasions. (If any of them is used very often, it indicates that it is a bad software design.) For instance used with boost::Any to find out the exactly type.
Performance-wise dynamic_cast and typeid are identical because both of them have to access the virtual table. And its performance is equal to the pointer de-referencing to the virtual table.

//********************************************************************************
class Base {
// with virtual functions
};
class Derived : public Base {
// with virtual functions
};

Base* bPtr = new Derived;

// typeid + static_cast
if (typeid(*bPtr) == typeid(Derived)) {
    Derived* dPtr = static_cast<Derived*> (bPtr);
    // do stuff with dPtr
}

// dynamic casting
Derived* dPtr = dynamic_cast<Derived*> (bPtr);
if (dPtr) {
    // do stuff with dPtr;
}
//********************************************************************************

These two solution have the identical performance.

Bioboograpby:
[1] C++ 11 Standard