Tuesday 8 July 2014

C++ Performance - Array vs. std::vector

1. Problem
Continuing with the investigation on how data structure impacts on performance, another interesting points has been discovered. The access via array/pointer is much much faster than the access via std::vector. I believe that this point is compiler-dependent and the performance issue is reported on Microsoft MSVC 2005.

Please check the two assembly generated by two implementations. The huge difference can easily found on very simple 3 lines of codes,

//********************************************************************************
// m_Var is in std::vector type
// hidden is a pointer type and an array is passed to it.
    // if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
    if (hidden[*(jclist+iwork[i]) - 1]) {
        goto NEXT_VARIABLE;
    }
//********************************************************************************

Assembly 1 is the assembly generated for the pointer/array access and Assembly 2 is the assembly generated for std::vector access. The reason why this code is bloated is because MSVC compiler  enables the dynamic sanity checking (out of bound) on std::vector.

// Assembly 1:  Pointer/Array Access
//********************************************************************************
    // if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
    if (hidden[*(jclist+iwork[i]) - 1]) {
0040306F  mov         esi,dword ptr [esp+14h]
00403073  mov         ecx,dword ptr [esi+ecx*4]
00403076  mov         esi,dword ptr [esp+1Ch]
0040307A  cmp         byte ptr [ecx+esi-1],0
0040307F  jne         NEXT_VARIABLE (403050h)
        goto NEXT_VARIABLE;
    }
//********************************************************************************

// Assembly 2: std::vector Access
//********************************************************************************
    // if (hidden[*(jclist+iwork[i]) - 1]) {
    if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
00403119  mov         edx,dword ptr [esp+28h]
0040311D  mov         ebx,dword ptr [edx+eax*4]
00403120  mov         eax,dword ptr [esp+24h]
00403124  mov         esi,dword ptr [eax+2Ch]
00403127  mov         eax,dword ptr [esi+0Ch]
0040312A  mov         edx,dword ptr [esi+10h]
0040312D  mov         ebp,eax
0040312F  add         edx,eax
00403131  sub         ebx,1
00403134  cmp         ebp,edx
00403136  jbe         NEXT_VARIABLE+52h (403142h)
00403138  call        dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040313E  mov         ecx,dword ptr [esp+3Ch]
00403142  mov         eax,dword ptr [esi+0Ch]
00403145  mov         edx,dword ptr [esi+10h]
00403148  add         ebx,ebp
0040314A  add         edx,eax
0040314C  cmp         ebx,edx
0040314E  ja          NEXT_VARIABLE+64h (403154h)
00403150  cmp         ebx,eax
00403152  jae         NEXT_VARIABLE+6Eh (40315Eh)
00403154  call        dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040315A  mov         ecx,dword ptr [esp+3Ch]
0040315E  mov         eax,dword ptr [esi+10h]
00403161  add         eax,dword ptr [esi+0Ch]
00403164  cmp         ebx,eax
00403166  jb          NEXT_VARIABLE+82h (403172h)
00403168  call        dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040316E  mov         ecx,dword ptr [esp+3Ch]
00403172  mov         eax,dword ptr [esi+8]
00403175  cmp         eax,ebx
00403177  ja          NEXT_VARIABLE+8Bh (40317Bh)
00403179  sub         ebx,eax
0040317B  mov         edx,dword ptr [esi+4]
0040317E  mov         eax,dword ptr [edx+ebx*4]
00403181  cmp         byte ptr [eax+11h],0
00403185  jne         NEXT_VARIABLE (4030F0h)
        goto NEXT_VARIABLE;
    }
//********************************************************************************

2 Fixes:
This dynamic sanity checking can be disabled by setting _SECURE_SCL and _HAS_ITERATOR_DEBUGGING as 0. These options have been set as default 0 in release mode since Visual Studio 2008 and later. But unfortunately in my case I have to use older version of MSVC, because the performance regression happens between releases and releases. Some releases were done many years ago via MSVC 2005.

As BOOST C++ library suggested that the performance of STL is implementation dependent. For Microsoft MSVC disable dynamic sanity checking as suggested above. Or choose stlport that outperforms Miscosoft MSVC.

3. Summary
For hot spot code area, using array and pointer (preventing from using feature-laden C++, STL or algorithm) is not a bad choice. In my case Fortran obviously outperforms the C++ implementation, because any overhead can be extra especially in hot spot code area.

In order to gain better performance and good maintainability at the same time, personally I believe using C to implement the algorithms and then using C++ to wrap them is not a bad choice.

Bibiliograhy:
[1] http://msdn.microsoft.com/en-us/library/hh697468.aspx
[2] http://cowboyprogramming.com/2007/02/22/what-is-_invalid_parameter_noinfo-and-how-do-i-get-rid-of-it/
[3] http://www.boost.org/doc/libs/1_50_0/libs/geometry/doc/html/geometry/compilation.html
[4] http://sourceforge.net/projects/stlport/

No comments:

Post a Comment