When using Google one rarely gets the right answer until you already know the
right questions to ask ...
That 64bit runs slower than 32bit has always niggled me since when moving up
from 8/16 bit platforms to 32 bits one always saw a substantial speed improvement.
I've been asking the wrong questions about 'timing attach' used to decode
passwords simply because I've been looking at the wrong questions, and the
detailed analysis being supplied was convincing. But it's based on a lower level
mistake. The 'timing attach' only works if one is doing 'byte by byte' compares
and I'd not twigged that this was the mistake being made.
We tend to ignore just how something is achieved, so using memcmp() seems a
sensible low level function, but it has a time penalty when all one wants is
'='. It was not until I started to look at how it works that I found all the
explanations on how having found a missmatch it then works out which character
and produces the problem and the '<>' results. All the effort to produce a
'safe' comparison on other c software aims to hide that timing difference? But
why are we wasting time doing all that work? What is needed initially is to
simply drop memcmp() and switch to a memequ() macro which bales as soon as a
missmatch is found. It could have a flag which scans the whole buffer before
baling if you want that extra 'safety'.
How often in the PHP code is memcmp() used when a simple memequ() would be
faster ... and faster still on 64bit machines?
There is a side question here as to why with all the instructions available on
processors there is not a 'byte compare' which produces the right set of flags
in a single process cycle, but I suspect that fails as soon as you introduce
unicode characters? memcmp() fails to produce the correct answer once one
switches to a unicode base, while a memequ() would not have that problem?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester Caine wrote:
That 64bit runs slower than 32bit has always niggled me
Only having had replies to this off-list I though I'd follow up since I have
been trying to do my homework. The last time I addressed this area it was moving
code from 16bit to 32bit on 80386 processors. I've just found the books out ...
Optimizing code to run on 64bit hardware is not simply increasing the word sizes
and checking that buffers are correctly aligned for 64 bit boundaries, it is
also ensuring that processes can be optimised where appropriate. It may be that
the current view is that this is the compilers problem? That seem to be the gist
of what I am being told, but the compilers can only work with what they are
supplied, and if there wrong process is selected they are not going to change it
to a better one? Setting a buffer to 7 bytes and then using the 'spare' byte for
something else for example? Using a process that has byte by byte requirements
but does not actually then use the data generated.
The timing attack analysis is a good example of this and an area where a slight
change to the process in general may give speed improvements even on the 32bit
platform. memcpy/strcpy can be optimised by the compiler to use double word or
quad word machine code instructions, but only if there is no byte/word
restricted elements to the process. ( I have not invstigated IF compilers do
achieve the best result ). The timing attack analysis has shown that this is the
problem area but by not using memcpy so that the largest word base can be used
for the comparison then the problem is managed better. Adding a conversion to
provide a -1/0/1 result effectively blocks optimising even if only a 0/1 is
finally used!
The windows platform should benefit from the same rationalisations, however it
does seem as if the 64 bit builds are designed to optimise memcmp() as long as
buffers are multiple of 8 bytes. 32 bit builds work the same as linux ...
Certainly some other platforms will have different requirements to optimise, but
as part of the optimization of 64 bit support on Intel/Amd processor it would be
useful to identify this base level functions which would benefit from closer
inspection rather than the simple assumption that the time involved is not worth
worrying about? If there is a difference in the 6th byte of a string, on a 64bit
processor it will have to scan all 6 bytes, while optimised for 32bit on a 32bit
processor, it would only scan 2. This sort of extra processing may account for a
64 bit setup being slower, and if the byte compare is not even needed, then both
operations could be improved? Obviously we have differing pointer sizes which
need moving around, but one is moving twice as much data per memory cycle ... if
you have the correct memory setup.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk