1. Problem
Continuing with the investigation on how data structure impacts on performance, another interesting points has been discovered. The access via array/pointer is much much faster than the access via std::vector. I believe that this point is compiler-dependent and the performance issue is reported on Microsoft MSVC 2005.
Please check the two assembly generated by two implementations. The huge difference can easily found on very simple 3 lines of codes,
//********************************************************************************
// m_Var is in std::vector type
// hidden is a pointer type and an array is passed to it.
// if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
if (hidden[*(jclist+iwork[i]) - 1]) {
goto NEXT_VARIABLE;
}
//********************************************************************************
Assembly 1 is the assembly generated for the pointer/array access and Assembly 2 is the assembly generated for std::vector access. The reason why this code is bloated is because MSVC compiler enables the dynamic sanity checking (out of bound) on std::vector.
// Assembly 1: Pointer/Array Access
//********************************************************************************
// if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
if (hidden[*(jclist+iwork[i]) - 1]) {
0040306F mov esi,dword ptr [esp+14h]
00403073 mov ecx,dword ptr [esi+ecx*4]
00403076 mov esi,dword ptr [esp+1Ch]
0040307A cmp byte ptr [ecx+esi-1],0
0040307F jne NEXT_VARIABLE (403050h)
goto NEXT_VARIABLE;
}
//********************************************************************************
// Assembly 2: std::vector Access
//********************************************************************************
// if (hidden[*(jclist+iwork[i]) - 1]) {
if (m_Var[*(jclist+iwork[i]) - 1].m_Hidden) {
00403119 mov edx,dword ptr [esp+28h]
0040311D mov ebx,dword ptr [edx+eax*4]
00403120 mov eax,dword ptr [esp+24h]
00403124 mov esi,dword ptr [eax+2Ch]
00403127 mov eax,dword ptr [esi+0Ch]
0040312A mov edx,dword ptr [esi+10h]
0040312D mov ebp,eax
0040312F add edx,eax
00403131 sub ebx,1
00403134 cmp ebp,edx
00403136 jbe NEXT_VARIABLE+52h (403142h)
00403138 call dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040313E mov ecx,dword ptr [esp+3Ch]
00403142 mov eax,dword ptr [esi+0Ch]
00403145 mov edx,dword ptr [esi+10h]
00403148 add ebx,ebp
0040314A add edx,eax
0040314C cmp ebx,edx
0040314E ja NEXT_VARIABLE+64h (403154h)
00403150 cmp ebx,eax
00403152 jae NEXT_VARIABLE+6Eh (40315Eh)
00403154 call dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040315A mov ecx,dword ptr [esp+3Ch]
0040315E mov eax,dword ptr [esi+10h]
00403161 add eax,dword ptr [esi+0Ch]
00403164 cmp ebx,eax
00403166 jb NEXT_VARIABLE+82h (403172h)
00403168 call dword ptr [__imp___invalid_parameter_noinfo (405118h)]
0040316E mov ecx,dword ptr [esp+3Ch]
00403172 mov eax,dword ptr [esi+8]
00403175 cmp eax,ebx
00403177 ja NEXT_VARIABLE+8Bh (40317Bh)
00403179 sub ebx,eax
0040317B mov edx,dword ptr [esi+4]
0040317E mov eax,dword ptr [edx+ebx*4]
00403181 cmp byte ptr [eax+11h],0
00403185 jne NEXT_VARIABLE (4030F0h)
goto NEXT_VARIABLE;
}
//********************************************************************************
2 Fixes:
This dynamic sanity checking can be disabled by setting _SECURE_SCL and _HAS_ITERATOR_DEBUGGING as 0. These options have been set as default 0 in release mode since Visual Studio 2008 and later. But unfortunately in my case I have to use older version of MSVC, because the performance regression happens between releases and releases. Some releases were done many years ago via MSVC 2005.
As BOOST C++ library suggested that the performance of STL is implementation dependent. For Microsoft MSVC disable dynamic sanity checking as suggested above. Or choose stlport that outperforms Miscosoft MSVC.
3. Summary
For hot spot code area, using array and pointer (preventing from using feature-laden C++, STL or algorithm) is not a bad choice. In my case Fortran obviously outperforms the C++ implementation, because any overhead can be extra especially in hot spot code area.
In order to gain better performance and good maintainability at the same time, personally I believe using C to implement the algorithms and then using C++ to wrap them is not a bad choice.
Bibiliograhy:
[1] http://msdn.microsoft.com/en-us/library/hh697468.aspx
[2] http://cowboyprogramming.com/2007/02/22/what-is-_invalid_parameter_noinfo-and-how-do-i-get-rid-of-it/
[3] http://www.boost.org/doc/libs/1_50_0/libs/geometry/doc/html/geometry/compilation.html
[4] http://sourceforge.net/projects/stlport/
No comments:
Post a Comment