Friday 14 March 2014

Performance - Running out of stack

When a new process starts, memory is allocated for it to run. It includes usages for stack and heap. Stack runs much faster than heap (memory is allocated in heap via new/malloc) and it is used for accommodating automatic variables and facilitating function calling.  If we take the memory like a belt. Normally stack and heap are staying at two poles. As more objects are created in them, they are grows towards together. For most cases the size of stack has a default size. (You may readjust the share of stack/head. Such a problem often comes across in embedded world.) As the program runs, more objects are created on the fly and call functions. The program may run out of stack and then flies.

- What can cause stack overflow?
1. Two many/large automatic objects created on stack. For instance try
    void foo() {
         int myArray[100000000];
         memset(&myArray[0], sizeof(myArray), 0x00);
    }
    This will cause stack overflow as the required automatic array overtakes the size of stack. Note it has to be declared/committed in a function, any function. Declare it as a global variable, then it will take global/static memory area rather than stack.
2. Deep function calling
    It basically has something to do with how the function calling are invoked. Let's say we have a calling graph from A->B->C....... When it starts, all the automatic variable in A (declared&committed before calling B) will be pushed into stack.  Program states (Common registers, PC counter, sign registers and so on) are pushed into stack as well. Then all automatic variables in B and its program states into stack as well, then C...... With a certain depth, the stack will be exhausted and the overflow happens.
    It is extremely vital to check the maximal function call depth in embedded applications (or stack size), because lacking of a OS or a cut-down version, the share between stack/heap are static. Unlike modern PC, the boundary of stack can move and the size of stack can grow as long as it does not grow over to heap section.
3. Infinite function calling - recursive function calling
    void foo() {
        foo();
    }
    If infinite function calling exits, the overflow of stack is guaranteed. Most of time it happens on recursive function implementation. This is why people towards recursive function are divided. On one hand it presents the idea/algorithm clearly than using loop and on the other hand it risks the danger of deep/infinite function calling then crashes the program. The worst problem is that it may run OK with the current arguments and it may crash when the scale of problems grows bigger to a certain size.
   It is clear that recursive functions are banned (or not recommended) in embedded applications. In PC application it is neutral as long as it does not run the risk of infinite loop.

- Tips
1. Use tools to check stack size. For instance like Valgrind for Linux
2. For your self studying, run the program like
    #include <iostream>
    int main(int argc, char* argv[]) {
       int i = 0;
       int* iPtr = new int(1);
       std::cout << &i << std::endl;      // stack: the address of i
       std::cout << &iPtr<< std::endl;   // stack: the address of iPtr
       std::cout << iPtr << std::endl;     // heap: where iPtr points
       delete iPtr;
       return 0;
   }

    Result: 0x0054FCEC
               0x0054FCE0
               0x0062BBE8
    Based on the value, you may be able to guess roughly the size of stack. Normally the top a few (most) significant bits are different between stack and heap. And heap has a bigger number.
3. Use safe version of C/C++ in embedded world - like objective-C
   The idea is not to use heap hence the program will take all the memory for stack. Rather than using heap to dynamically create objects, use global variables or implement a objects-pool or MMU with the memory residing on stack. Then forget the heap and you have a fixed/bigger size of stack.
   int main(int argc, char* argv[]) {
      char Mem[1<<12]; // allocate memory in stack
      int* iPtr = realloc(&Mem[0], 4);
      int* iPtr1 = new(&Mem[4]) int(4);

      .......
   }
It gains not only the safety but also speed. But lose the convenience. In C++ you have to invoke destructor explicitly to release the resource and not call delete operator for user-define type.

 

No comments:

Post a Comment