[VOTE] Timing attack safe string comparison function

11 years ago by Yasuo Ohgaki — view source

unread

Hi Rouven,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards

Thank you for setting up vote!
Timing attack is in the wild.
This function is must have function, even for release versions. IMHO.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Sara Golemon — view source

unread

as I've received no further feedback I've opened the voting on "Timing attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Voted yes, but IMO the comparison function should behave a little more
like ===. That is: something like hash_compare(null,"") should return
false. Possibly be even more strict and require both input parameters
to be string (e.g. hash_compare(123,123) would return false as well).

But there's some "non-PHP"ish about that idea so I'm not horribly fussed by it.

-Sara

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

Voted yes, but IMO the comparison function should behave a little more
like ===. That is: something like hash_compare(null,"") should return
false. Possibly be even more strict and require both input parameters
to be string (e.g. hash_compare(123,123) would return false as well).

Good point!

Comment just to make sure we have better implementation.
hash_compare() should not accept anything other than string.
E_WARNING is appropriate. IMO.
So get parameter as zval and check types before processing
is preferred.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Nikita Popov — view source

unread

On Sun, Feb 2, 2014 at 11:50 PM, Rouven Weßling me@rouvenwessling.dewrote:

Hi internals,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards
Rouven

Did your code already get reviewed by someone with understanding of the
issue? From a quick glance, two potential issues:

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.
You leak information on mod_len / known_len, because you will have
different cache access patterns for comparing always the same 10 memory
positions and 10000 different ones, at least I'd assume so.
I don't know how you can prevent the latter issue, and if it is possible at
all. Personally I'd just drop the length magic and explicitly document it
to be safe for equal-length strings only. In any case you should have this
reviewed by someone with more than just a cursory understanding of the
matter.

Nikita

11 years ago by padraic.brady@gmail.com — view source

unread

Hi Nikita,

Did your code already get reviewed by someone with understanding of the
issue? From a quick glance, two potential issues:

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.

You leak information on mod_len / known_len, because you will have
different cache access patterns for comparing always the same 10 memory
positions and 10000 different ones, at least I'd assume so.
I don't know how you can prevent the latter issue, and if it is possible at
all. Personally I'd just drop the length magic and explicitly document it
to be safe for equal-length strings only. In any case you should have this
reviewed by someone with more than just a cursory understanding of the
matter.

The constant time comparison function, which is fairly standard at
this point, doesn't protect length - only the timing of comparing two
strings. The inputs should have identical length otherwise it's being
used inappropriately. It may seem that length being leaked is bad, but
it's assumed to be a predictable factor in hashing passwords, etc.
That may get taken for granted given how and where it's implemented in
userland so an exception is not out of the question for a base
function where lengths do differ.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative

11 years ago by Yasuo Ohgaki — view source

unread

Hi Nikita,

Hi internals,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards
Rouven

Did your code already get reviewed by someone with understanding of the
issue? From a quick glance, two potential issues:

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.

You leak information on mod_len / known_len, because you will have
different cache access patterns for comparing always the same 10 memory
positions and 10000 different ones, at least I'd assume so.
I don't know how you can prevent the latter issue, and if it is possible at
all. Personally I'd just drop the length magic and explicitly document it
to be safe for equal-length strings only. In any case you should have this
reviewed by someone with more than just a cursory understanding of the
matter.

Length leak is known issue and we may improve these. There was discussion
for this.
For the sake of completeness, we may address issues now or later.

To be honest, I think length leak must be avoided especially for shorter
strings.
It would be better to iterate at least 100 times regardless of input.

Perhaps, something like

/**
- If known_string has a length of 0 we set the length to 1,
- this will cause us to compare all bytes of userString with the null
  byte which fails
*/
mod_len = MAX(known_len, 1);
len = MAX(known_len, 127);
/* This is security sensitive code. Do not optimize this for speed. */
result = known_len - user_len;
for (j = 0; j < user_len; j++) {

for (j = 0; j < len; j++) {

  result |= known_str[j % mod_len] ^ user_str[j];

result |= known_str[j % known_len] ^ user_str[j % user_len];

}

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

We can be more conservative.
So 127 is not long enough for SHA-512, make it 256 for larger hashes?

Length leak is known issue and we may improve these. There was discussion
for this.
For the sake of completeness, we may address issues now or later.

To be honest, I think length leak must be avoided especially for shorter
strings.
It would be better to iterate at least 100 times regardless of input.

Perhaps, something like

/**

If known_string has a length of 0 we set the length to 1,

this will cause us to compare all bytes of userString with the null
byte which fails

*/

mod_len = MAX(known_len, 1);
len = MAX(known_len, 127);

len = MAX(known_len, 256);

/* This is security sensitive code. Do not optimize this for speed. */

result = known_len - user_len;

for (j = 0; j < user_len; j++) {

for (j = 0; j < len; j++) {
result |= known_str[j % mod_len] ^ user_str[j];
result |= known_str[j % known_len] ^ user_str[j % user_len];

}

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Stas Malyshev — view source

unread

Hi!

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.

You leak information on mod_len / known_len, because you will have

If it's meant to compare hashes and other such things, we can presume
the attacker already knows what your code does, and thus knows what the
expected hash length is. What he doesn't know is what that hash is. The
timing attack is based on the fact that regular comparison drops after
first mismatch, so the attacker by trying different first symbols and
using time as oracle between match and mismatch, can guess the hash. The
length of the hash however is not useful for him - for most standard
crypto protocols all lengths are already known and even if you are using
some modifications basic crypto principles tell us to assume your
algorithm is known to the attacker and thus most probably your known
hash length is too.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

On Tue, Feb 4, 2014 at 7:09 AM, Stas Malyshev smalyshev@sugarcrm.comwrote:

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that
the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.

You leak information on mod_len / known_len, because you will have

If it's meant to compare hashes and other such things, we can presume
the attacker already knows what your code does, and thus knows what the
expected hash length is. What he doesn't know is what that hash is. The
timing attack is based on the fact that regular comparison drops after
first mismatch, so the attacker by trying different first symbols and
using time as oracle between match and mismatch, can guess the hash. The
length of the hash however is not useful for him - for most standard
crypto protocols all lengths are already known and even if you are using
some modifications basic crypto principles tell us to assume your
algorithm is known to the attacker and thus most probably your known
hash length is too.

I agree partially. Length leak would not be serious issue when length is
long enough.

I've already posted possible improvement that would not be affected by
supplied string length.

/**
- If known_string has a length of 0 we set the length to 1,
- this will cause us to compare all bytes of userString with the null
  byte which fails
*/
mod_len = MAX(known_len, 1);
len = MAX(known_len, 256);
/* This is security sensitive code. Do not optimize this for speed. */
result = known_len - user_len;
for (j = 0; j < user_len; j++) {

for (j = 0; j < len; j++) {

  result |= known_str[j % mod_len] ^ user_str[j];

result |= known_str[j % known_len] ^ user_str[j % user_len];

}

It's enough for hash functions up to 1024 bits. For SHA-3, we may set
larger minimum.
Comments, corrections, improvements are appreciated.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

It's enough for hash functions up to 1024 bits. For SHA-3, we may set
larger minimum.

Just an additional note.
AFAIK, SHA-3 also has 512 bits maximum.
Since it has more bits internally, it may be extended.
There may be hash functions that have larger bits.
256 iterations would be long enough for many years.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

It could be optimized a little since 256 is too much for now.
How about make MAX returns max of 3 values?

len = MAX(known_len, user_len, 64);

/**

If known_string has a length of 0 we set the length to 1,

this will cause us to compare all bytes of userString with the null
byte which fails

*/

mod_len = MAX(known_len, 1);
len = MAX(known_len, 256);

len = MAX(known_len, user_len, 64);

/* This is security sensitive code. Do not optimize this for speed. */

result = known_len - user_len;

for (j = 0; j < user_len; j++) {

for (j = 0; j < len; j++) {
result |= known_str[j % mod_len] ^ user_str[j];
result |= known_str[j % known_len] ^ user_str[j % user_len];

}

64 is long enough for SHA-256 and if parameter is longer than that it will
be used.
Even if user used it to compare 'raw password', they are protected well
from timing
attack.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by me@rouvenwessling.de — view source

unread

Hi everyone,

sorry for not getting back earlier, real life taking it's toll. I'll try to answer things in just a couple of E-Mail to not cause too much traffic (there seems to be a lot lately!)

as I've received no further feedback I've opened the voting on "Timing attack safe string comparison function":

I've voted yes. However, at the risk of opening more bikeshedding again, I should say that I don't think hash_compare is an appropriate name. It's a timing attack-safe string comparison function, so I think something like str_equals_time_constant might be better as it is not so much a hash comparison function as a string comparison function.

I'd rather not discuss this anymore. The discussion was open since mid-december. By my reading, there was no universally agreed name. Since it was suggested this goes into ext/hash starting with hash_ also seemed important to me.

Voted yes, but IMO the comparison function should behave a little more
like ===. That is: something like hash_compare(null,"") should return
false. Possibly be even more strict and require both input parameters
to be string (e.g. hash_compare(123,123) would return false as well).

But there's some "non-PHP"ish about that idea so I'm not horribly fussed by it.

That sounds like a good idea to me. I'd wager we can change this post RFC too? Or is the process more strict than I think it is?

I like the suggestion of an E_WARNING for that case too.

Did your code already get reviewed by someone with understanding of the issue? From a quick glance, two potential issues:

Anthony Ferra reviewed (and improved) my original patch, however it has been changed since. Notably the MAX change wasn't included originally.

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the if and else branches will have different instruction counts in that case. Simple alternative would be something fixed like mod_len = known_len+1 or known_len&1.

This problem should be limited by that fact, that except for the case where known_len == 0, always the same branch is hit.

You leak information on mod_len / known_len, because you will have different cache access patterns for comparing always the same 10 memory positions and 10000 different ones, at least I'd assume so.
I don't know how you can prevent the latter issue, and if it is possible at all. Personally I'd just drop the length magic and explicitly document it to be safe for equal-length strings only. In any case you should have this reviewed by someone with more than just a cursory understanding of the matter.

I agree leaking length may happen. I'm wondering if my first patch wasn't safer in this regard. If no better implementation can be found I'd suggest to just document, that length may leak. However to me that's acceptable as long as this doesn't allow approximating the known string by changing the input byte by byte.

Given the name hash_compare() it makes a good case to add a bail-out
branch based on string length, in which case you don't have to remember
which argument should come first ... still, it may be nice to have a
generic comparison function :)

While I like the idea, of foregoing this entire discussion I think it is a good idea to get something better than === out to people who compare mixed length strings.

It could be optimized a little since 256 is too much for now.
How about make MAX returns max of 3 values?

len = MAX(known_len, user_len, 64);

What would this buy us? We still get branch on MAX and the memory access would still go the same memory if the string is shorter than 64 bytes.

Best regards
Rouven

11 years ago by Yasuo Ohgaki — view source

unread

Hi Rouven,

It could be optimized a little since 256 is too much for now.
How about make MAX returns max of 3 values?

len = MAX(known_len, user_len, 64);

What would this buy us? We still get branch on MAX and the memory access
would still go the same memory if the string is shorter than 64 bytes.

I think we may ignore length leak and focus comparison.
len = MAX(known_len, 64) is for to make more difficult to guess short
secret(known_str) byte length, hopefully. Just trying to less obvious, but
it's not hiding.

Perhaps, something like this would be good enough.

/**
- If known_string has a length of 0 we set the length to 1,
- this will cause us to compare all bytes of userString with the null
  byte which fails
*/
mod_len = MAX(known_len, 1);
len = MAX(user_len, 64); // Do not care much
len = MAX(known_len, len); // Do not care much

// These kind of operations have done somewhere anyway
// Just don't care.
k = (unsinged char *)emalloc(len+1)
u = (unsinged char *)emalloc(len+1);
memset(k, 0, len+1);
memset(u, 0, len+1);
strncpy(k, known_str, known_len);
strncpy(u, user_str, user_len);

/* This is security sensitive code. Do not optimize this for speed. */
result = known_len - user_len;

for (j = 0; j < user_len; j++) {

  result |= known_str[j % mod_len] ^ user_str[j];

for (; len > 0; len--) {

                             result |= *k++ ^ *u++; // This must

be constant. Use simpler operation and keep constant operation here is
enough.

}

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

Padraic gave me an another idea of additional mitigation for this.
Although we cannot rely on it, randomized delay can be used
as mitigation. It would be good for length leak.

Perhaps, something like this would be good enough.

/**

If known_string has a length of 0 we set the length to 1,

this will cause us to compare all bytes of userString with the null
byte which fails

*/

mod_len = MAX(known_len, 1);
len = MAX(user_len, 64); // Do not care much
len = MAX(known_len, len); // Do not care much

// These kind of operations have done somewhere anyway
// Just don't care.
k = (unsinged char *)emalloc(len+1)
u = (unsinged char *)emalloc(len+1);
memset(k, 0, len+1);
memset(u, 0, len+1);
strncpy(k, known_str, known_len);
strncpy(u, user_str, user_len);

// Determination of delay is tricky. Too short or too long delay does not
work.
// It depends on execution path/data. e.g. How many times
strlen/strncpy/etc is called, length of string.
// I'm not sure if this is sufficient/valid as mitigation for average
usage. Experiments are needed.
// Improvement/suggestion is appreciated.
r1 = (unsigned char)get_random_byte();
r1 *= 2; // doubles range
for (; r1 > 0; r1--) {
r2 = (unsigned char)get_random_byte();
for (; r2 > 0; r2--) {
if (r1 < r2) {
buf[r2] = r2 % r1;
} else {
buf[r2] = r1 % r2;
}
}
}

/* This is security sensitive code. Do not optimize this for speed. */

result = known_len - user_len;

for (j = 0; j < user_len; j++) {
result |= known_str[j % mod_len] ^ user_str[j];
for (; len > 0; len--) {
                             result |= *k++ ^ *u++; // This must be constant. Use simpler operation and keep constant operation here is enough.
}

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

Padraic gave me an another idea of additional mitigation for this.
Although we cannot rely on it, randomized delay can be used
as mitigation. It would be good for length leak.

Perhaps, something like this would be good enough.

/**

If known_string has a length of 0 we set the length to 1,

this will cause us to compare all bytes of userString with the
null byte which fails

*/

mod_len = MAX(known_len, 1);
len = MAX(user_len, 64); // Do not care much
len = MAX(known_len, len); // Do not care much

// These kind of operations have done somewhere anyway
// Just don't care.
k = (unsinged char *)emalloc(len+1)
u = (unsinged char *)emalloc(len+1);
memset(k, 0, len+1);
memset(u, 0, len+1);
strncpy(k, known_str, known_len);
strncpy(u, user_str, user_len);

// Determination of delay is tricky. Too short or too long delay does not
work.
// It depends on execution path/data. e.g. How many times
strlen/strncpy/etc is called, length of string.
// I'm not sure if this is sufficient/valid as mitigation for average
usage. Experiments are needed.
// Improvement/suggestion is appreciated.
r1 = (unsigned char)get_random_byte();
r1 *= 2; // doubles range
for (; r1 > 0; r1--) {
r2 = (unsigned char)get_random_byte();
for (; r2 > 0; r2--) {
if (r1 < r2) {
buf[r2] = r2 % r1;
} else {
buf[r2] = r1 % r2;
}
}
}
/* This is security sensitive code. Do not optimize this for speed. */

result = known_len - user_len;

for (j = 0; j < user_len; j++) {
   result |= known_str[j % mod_len] ^ user_str[j];
for (; len > 0; len--) {
                             result |= *k++ ^ *u++; // This must be constant. Use simpler operation and keep constant operation here is enough.
}

Since comparison of short and/or not hashed data (e.g. user supplied raw
password) should
not be done as the function name imply, we may better to document so that
users always
compare hashed values even when they store raw password/etc.
So randomized delay may be overkill.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Sorry for multiple posts.

Since comparison of short and/or not hashed data (e.g. user supplied raw
password) should
not be done as the function name imply, we may better to document so that
users always
compare hashed values even when they store raw password/etc.
So randomized delay may be overkill.

Because user should not pass other than hashed values, we may
return FALSE simply when length mismatches. Generated hashed
length should not be a secret. This get rid of length leak issue and
the function name is good for this purpose and make the operation
always constant.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.netSo

11 years ago by Yasuo Ohgaki — view source

unread

It's me again.

Sorry for multiple posts.

Since comparison of short and/or not hashed data (e.g. user supplied raw
password) should
not be done as the function name imply, we may better to document so that
users always
compare hashed values even when they store raw password/etc.
So randomized delay may be overkill.

Because user should not pass other than hashed values, we may
return FALSE simply when length mismatches. Generated hashed
length should not be a secret. This get rid of length leak issue and
the function name is good for this purpose and make the operation
always constant.

Since there is internal code that is vulnerable to timing attack,
could you make it PHPAPI?

For example, ext/session/mod_mm.c is comparing session ID using strcmp()

 for (prev = NULL, ret = data->hash[slot]; ret; prev = ret, ret =

ret->next) {
if (ret->hv == hv && !strcmp(ret->key, key)) {
break;
}
}

Regards,

P.S. Other save handlers are also vulnerable to timing attack.
It could be mitigated the attack by specifying minimum length of session
ID. I'll add this new INI option to session module.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Christopher Jones — view source

unread

Hi all,

Padraic gave me an another idea of additional mitigation for this.

What's the status of the RFC? It's listed as under voting but there
is deep discussion still ongoing. The RFC is very short on technical
detail. It is also lacking an end-of-vote date. It's not clear what
the RFCs path forward is. (If this info is in a mail thread, but not
in the RFC then remember readers/voters should not have to trawl
internals mail to understand the proposal and its direction).

Personally, I suggest the vote be closed/withdrawn with the assumption
the concept was accepted 15 to 1. Then work on the code until a
mutually acceptable and useful implementation is found. After that, a
quick vote can be made on the implementation.

Chris

--
christopher.jones@oracle.com http://twitter.com/ghrd
Free PHP & Oracle book:
http://www.oracle.com/technetwork/topics/php/underground-php-oracle-manual-098250.html

11 years ago by Pierre Joye — view source

unread

On Thu, Feb 6, 2014 at 5:23 AM, Christopher Jones
christopher.jones@oracle.com wrote:

Hi all,

Padraic gave me an another idea of additional mitigation for this.

What's the status of the RFC?

Voting phase

It's listed as under voting but there
is deep discussion still ongoing.

Yes, a request for peer review has been asked, that's why I asked a
couple of security related contacts to take a look at the code. It
cannot hurt.

The RFC is very short on technical
detail. It is also lacking an end-of-vote date.

It is one week, so let add it :)

It's not clear what
the RFCs path forward is. (If this info is in a mail thread, but not
in the RFC then remember readers/voters should not have to trawl
internals mail to understand the proposal and its direction).

Personally, I suggest the vote be closed/withdrawn with the assumption
the concept was accepted 15 to 1. Then work on the code until a
mutually acceptable and useful implementation is found. After that, a
quick vote can be made on the implementation.

We do not have to over react here, it is, for a change, that there is
clear concensus about the need or wish for this feature. It is not a
trivial thing to implement but we have time to make it rock solid
until final 5.6.0.

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

11 years ago by Stas Malyshev — view source

unread

Hi!

We do not have to over react here, it is, for a change, that there is
clear consensus about the need or wish for this feature. It is not a
trivial thing to implement but we have time to make it rock solid
until final 5.6.0.

There's a consensus about the feature as it was proposed, but when all
kind of things start to be added to it, that eventually becomes a
different feature from one that was voted on. If the proposal is not
ready, then the vote should be delayed. If it's ready then the constant
stream of changes, tweaks and additions looks strange - it's really hard
to know what the vote is actually about.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Pierre Joye — view source

unread

Hi!

We do not have to over react here, it is, for a change, that there is
clear consensus about the need or wish for this feature. It is not a
trivial thing to implement but we have time to make it rock solid
until final 5.6.0.

There's a consensus about the feature as it was proposed,

For what I see there is one:

https://wiki.php.net/rfc/timing_attack

but when all
kind of things start to be added to it, that eventually becomes a
different feature from one that was voted on.

What I mean is that the need of a function to do hash equality tests
with time attacks protection got a positive consensus. The
implementation details are indeed subject to change and will certainly
get updates until 5.6.0 final, and surely afterwards as well.

If the proposal is not
ready, then the vote should be delayed. If it's ready then the constant
stream of changes, tweaks and additions looks strange - it's really hard
to know what the vote is actually about.

There are a couple of caveats that need to be solved. Not sure it
needs a new vote but that should not be a problem to redo it. My point
is that such feature seems to be desired for 5.6 and we should try to
make sure it gets in, as long as it is possible.

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

11 years ago by Tjerk Meesters — view source

unread

On Sun, Feb 2, 2014 at 11:50 PM, Rouven Weßling <me@rouvenwessling.de

wrote:

Hi internals,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards
Rouven

Did your code already get reviewed by someone with understanding of the
issue? From a quick glance, two potential issues:

You are using MAX, i.e. an if-then-else branch. I'm pretty sure that the
if and else branches will have different instruction counts in that case.
Simple alternative would be something fixed like mod_len = known_len+1 or
known_len&1.

I can't speak for other compilers, but gcc compiles the ternary to this:

    cmpl    $0, -8(%rbp)
    cmovg   -8(%rbp), %eax

So it comes down to whether cmovg takes a variable amount of time; that
said, the move operation is only required if the known length is zero, i.e.
you're comparing user input against an empty string which is quite unlikely.

You leak information on mod_len / known_len, because you will have
different cache access patterns for comparing always the same 10 memory
positions and 10000 different ones, at least I'd assume so.

There's one particular scenario that comes to mind: if the string data
spans at least two memory pages; assuming memory allocation is optimized to
fit allocated memory inside one page whenever possible, we're talking about
string lengths bigger than the page itself, i.e. 4kB / 8kB.

Given that the known string is relatively small, the leaked information may
reveal that the given string is bigger than the known string.

The above is my own logical deduction, I'm not an expert in this area :)

I don't know how you can prevent the latter issue, and if it is possible at

all. Personally I'd just drop the length magic and explicitly document it
to be safe for equal-length strings only. In any case you should have this
reviewed by someone with more than just a cursory understanding of the
matter.

Given the name hash_compare() it makes a good case to add a bail-out
branch based on string length, in which case you don't have to remember
which argument should come first ... still, it may be nice to have a
generic comparison function :)

Nikita

--

Tjerk

11 years ago by Yasuo Ohgaki — view source

unread

Hi Rouven,

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r 'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r 'var_dump(str_compare("asfasdf",
"slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r 'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

I thought it would be better to compare performance difference.
Added more functions to play with.
There are

bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
bool str_md5_compare(str, str) - md5. Timing safe (128bit)
bool str_byte_compare(str, str) - Byte compare. Timing safe. No division.
bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
division. (Modulo as this RFC)
bool str_compare(str, str) - plain strncmp(). Not timing safe.

I didn't took bench mark and did minimum tests.
I appreciate if anyone take benchmark.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.netH

11 years ago by Yasuo Ohgaki — view source

unread

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r 'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

I thought it would be better to compare performance difference.
Added more functions to play with.
There are

bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
bool str_md5_compare(str, str) - md5. Timing safe (128bit)
bool str_byte_compare(str, str) - Byte compare. Timing safe. No division.
bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
division. (Modulo as this RFC)
bool str_compare(str, str) - plain strncmp(). Not timing safe.

I didn't took bench mark and did minimum tests.
I appreciate if anyone take benchmark.

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r 'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

I thought it would be better to compare performance difference.
Added more functions to play with.
There are

bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
bool str_md5_compare(str, str) - md5. Timing safe (128bit)
bool str_byte_compare(str, str) - Byte compare. Timing safe. No
division.
bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
division. (Modulo as this RFC)
bool str_compare(str, str) - plain strncmp(). Not timing safe.

I didn't took bench mark and did minimum tests.
I appreciate if anyone take benchmark.

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

I took a benchmark. str_compare() is not timing safe. It's there for
reference.

str_siphash_compare Elapsed: 1.389824 Iterations: 1000000 DataSize: 8
str_xxhash32_compare Elapsed: 1.241737 Iterations: 1000000 DataSize: 8
str_md5_compare Elapsed: 3.029127 Iterations: 1000000 DataSize: 8
str_byte_compare Elapsed: 1.236183 Iterations: 1000000 DataSize: 8
str_byte_compare2 Elapsed: 1.269901 Iterations: 1000000 DataSize: 8
str_word_compare Elapsed: 1.273266 Iterations: 1000000 DataSize: 8
str_compare Elapsed: 1.181425 Iterations: 1000000 DataSize: 8

str_byte_compare() is the winner for small data.
I'm a little surprised that str_xxhash32_compare() is the second.
str_word_compare() is marginally slower.

str_siphash_compare Elapsed: 2.341025 Iterations: 1000000 DataSize: 128
str_xxhash32_compare Elapsed: 1.560131 Iterations: 1000000 DataSize: 128
str_md5_compare Elapsed: 6.055007 Iterations: 1000000 DataSize: 128
str_byte_compare Elapsed: 1.799050 Iterations: 1000000 DataSize: 128
str_byte_compare2 Elapsed: 2.163229 Iterations: 1000000 DataSize: 128
str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 128
str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 128

str_word_compare() is the winner for relatively large data.

It seems str_word_compare() is the way to go.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi Dmitry,

str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 128
str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 128

str_word_compare() is the winner for relatively large data.

It seems the difference is marginal. It may be better to make ==/=== string
comparison timing safe by default if php_word_compare() implementation
does not have timing issue.

Patch for php_word_compare() is
https://github.com/yohgaki/php-src/commit/302a53db87c93b469fb85041e8c505207e3a6d9c

What do you think?

Regards,

P.S.
xxhash32 is surprisingly fast, but it has limited hash space. Therefore,
it's not a good idea to use it for string comparison. IIRC or if it did not
changed, Zend uses old DJB hash. It may be good idea to use xxhash
for PHP array.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

I took a benchmark. str_compare() is not timing safe. It's there for
reference.

str_siphash_compare Elapsed: 1.389824 Iterations: 1000000 DataSize: 8
str_xxhash32_compare Elapsed: 1.241737 Iterations: 1000000 DataSize: 8
str_md5_compare Elapsed: 3.029127 Iterations: 1000000 DataSize: 8
str_byte_compare Elapsed: 1.236183 Iterations: 1000000 DataSize: 8
str_byte_compare2 Elapsed: 1.269901 Iterations: 1000000 DataSize: 8
str_word_compare Elapsed: 1.273266 Iterations: 1000000 DataSize: 8
str_compare Elapsed: 1.181425 Iterations: 1000000 DataSize: 8

str_byte_compare() is the winner for small data.
I'm a little surprised that str_xxhash32_compare() is the second.
str_word_compare() is marginally slower.

str_siphash_compare Elapsed: 2.341025 Iterations: 1000000 DataSize: 128
str_xxhash32_compare Elapsed: 1.560131 Iterations: 1000000 DataSize: 128
str_md5_compare Elapsed: 6.055007 Iterations: 1000000 DataSize: 128
str_byte_compare Elapsed: 1.799050 Iterations: 1000000 DataSize: 128
str_byte_compare2 Elapsed: 2.163229 Iterations: 1000000 DataSize: 128
str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 128
str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 128

str_word_compare() is the winner for relatively large data.

It seems str_word_compare() is the way to go.

https://gist.github.com/yohgaki/ede544f290c6cf9fa90d

This is the benchmark script.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Tjerk Meesters — view source

unread

Hi,

Hi all,

On Fri, Feb 7, 2014 at 10:39 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:

On Fri, Feb 7, 2014 at 8:05 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

I thought it would be better to compare performance difference.
Added more functions to play with.
There are

bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
bool str_md5_compare(str, str) - md5. Timing safe (128bit)
bool str_byte_compare(str, str) - Byte compare. Timing safe. No
division.
bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
division. (Modulo as this RFC)
bool str_compare(str, str) - plain strncmp(). Not timing safe.

I didn't took bench mark and did minimum tests.
I appreciate if anyone take benchmark.

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

I took a benchmark. str_compare() is not timing safe. It's there for
reference.

str_siphash_compare Elapsed: 1.389824 Iterations: 1000000 DataSize: 8
str_xxhash32_compare Elapsed: 1.241737 Iterations: 1000000 DataSize: 8
str_md5_compare Elapsed: 3.029127 Iterations: 1000000 DataSize: 8
str_byte_compare Elapsed: 1.236183 Iterations: 1000000 DataSize: 8
str_byte_compare2 Elapsed: 1.269901 Iterations: 1000000 DataSize: 8
str_word_compare Elapsed: 1.273266 Iterations: 1000000 DataSize: 8
str_compare Elapsed: 1.181425 Iterations: 1000000 DataSize: 8

str_byte_compare() is the winner for small data.
I'm a little surprised that str_xxhash32_compare() is the second.
str_word_compare() is marginally slower.

str_siphash_compare Elapsed: 2.341025 Iterations: 1000000 DataSize: 128
str_xxhash32_compare Elapsed: 1.560131 Iterations: 1000000 DataSize: 128
str_md5_compare Elapsed: 6.055007 Iterations: 1000000 DataSize: 128
str_byte_compare Elapsed: 1.799050 Iterations: 1000000 DataSize: 128
str_byte_compare2 Elapsed: 2.163229 Iterations: 1000000 DataSize: 128
str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 128
str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 128

str_word_compare() is the winner for relatively large data.

It seems str_word_compare() is the way to go.

Quite honestly, I don't think that an absolute difference of 0.46 μs per
function call should be used to pick the "winner" from the line-up you're
made.

That said, the implementation of str_word_compare introduces a MIN()
branch and the second loop should loop until MAX(b1_len, b2_len) instead
of b2_len alone; imo you've just made it harder to prove that it actually
prevents timing attacks.

From what has been discussed so far, to me, the question that needs an
answer is:

"Should we have a function that attempts to provide a constant comparison
for strings of different lengths?"

If not, reduce the function to just a length mismatch branch and a loop;
length information is leaked as long as both strings are of difference
lengths only.

If so, then we need to create and run a simulation program to prove that a
particular algorithm is indeed suitable with a certain amount of confidence
for the most common scenarios.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

--

Tjerk

11 years ago by padraic.brady@gmail.com — view source

unread

Hi,

str_word_compare() is the winner for relatively large data.

It seems str_word_compare() is the way to go.

Quite honestly, I don't think that an absolute difference of 0.46 μs per
function call should be used to pick the "winner" from the line-up you're
made.

That said, the implementation of str_word_compare introduces a MIN()
branch and the second loop should loop until MAX(b1_len, b2_len) instead
of b2_len alone; imo you've just made it harder to prove that it actually
prevents timing attacks.

Which is a significant risk. This is extremely easy to get wrong which
is why adopting an established solution that has actually withstood
examination is better than going with something that perhaps hasn’t.
There’s also Keep It Simple, Stupid ;).

From what has been discussed so far, to me, the question that needs an
answer is:

"Should we have a function that attempts to provide a constant comparison
for strings of different lengths?"

If not, reduce the function to just a length mismatch branch and a loop;
length information is leaked as long as both strings are of difference
lengths only.

If so, then we need to create and run a simulation program to prove that a
particular algorithm is indeed suitable with a certain amount of confidence
for the most common scenarios.

We do not need to protect length information because we’re not
protecting strings of arbitrary length. We’re protecting hashes which
have fixed lengths across a small range of commonly used types, i.e. a
hacker should already have a good idea of what the length is. So what
is length hiding? It’s security through obscurity.

The function docs should just state that it only accepts strings of
equal length or will throw an error. Users can then make merry,
bikeshedding workarounds on their own time, like hashing both strings
to an equal length and then comparing the hashes. That sort of thing.
We may even find that we already do that…

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative

Hi,

Hi all,

On Fri, Feb 7, 2014 at 10:39 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:

On Fri, Feb 7, 2014 at 8:05 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:

I made SipHash version of str_compare() as a sample.
There is timing safe php_compare(), which is stolen from BSD.

https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare

[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("abc",
"abc"));'
bool(true)
[yohgaki@dev github-php-src]$ ./php-bin -r
'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));'
bool(false)

It's quick patch made less than 30 min.
So it can be improved, I suppose.

I thought it would be better to compare performance difference.
Added more functions to play with.
There are

bool str_siphash_compare(str, str) - siphash. timing safe. (64bit)
bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit)
bool str_md5_compare(str, str) - md5. Timing safe (128bit)
bool str_byte_compare(str, str) - Byte compare. Timing safe. No
division.
bool str_byte_compare2(str, str) - Byte compare. Timing safe. With
division. (Modulo as this RFC)
bool str_compare(str, str) - plain strncmp(). Not timing safe.

I didn't took bench mark and did minimum tests.
I appreciate if anyone take benchmark.

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

I took a benchmark. str_compare() is not timing safe. It's there for
reference.

str_siphash_compare Elapsed: 1.389824 Iterations: 1000000 DataSize: 8
str_xxhash32_compare Elapsed: 1.241737 Iterations: 1000000 DataSize: 8
str_md5_compare Elapsed: 3.029127 Iterations: 1000000 DataSize: 8
str_byte_compare Elapsed: 1.236183 Iterations: 1000000 DataSize: 8
str_byte_compare2 Elapsed: 1.269901 Iterations: 1000000 DataSize: 8
str_word_compare Elapsed: 1.273266 Iterations: 1000000 DataSize: 8
str_compare Elapsed: 1.181425 Iterations: 1000000 DataSize: 8

str_byte_compare() is the winner for small data.
I'm a little surprised that str_xxhash32_compare() is the second.
str_word_compare() is marginally slower.

str_siphash_compare Elapsed: 2.341025 Iterations: 1000000 DataSize: 128
str_xxhash32_compare Elapsed: 1.560131 Iterations: 1000000 DataSize: 128
str_md5_compare Elapsed: 6.055007 Iterations: 1000000 DataSize: 128
str_byte_compare Elapsed: 1.799050 Iterations: 1000000 DataSize: 128
str_byte_compare2 Elapsed: 2.163229 Iterations: 1000000 DataSize: 128
str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 128
str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 128

str_word_compare() is the winner for relatively large data.

It seems str_word_compare() is the way to go.

Quite honestly, I don't think that an absolute difference of 0.46 μs per
function call should be used to pick the "winner" from the line-up you're
made.

That said, the implementation of str_word_compare introduces a MIN()
branch and the second loop should loop until MAX(b1_len, b2_len) instead
of b2_len alone; imo you've just made it harder to prove that it actually
prevents timing attacks.

From what has been discussed so far, to me, the question that needs an
answer is:

"Should we have a function that attempts to provide a constant comparison
for strings of different lengths?"

If not, reduce the function to just a length mismatch branch and a loop;
length information is leaked as long as both strings are of difference
lengths only.

If so, then we need to create and run a simulation program to prove that a
particular algorithm is indeed suitable with a certain amount of confidence
for the most common scenarios.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

--

Tjerk

--

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative

11 years ago by Lester Caine — view source

unread

Yasuo Ohgaki wrote:

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

Yasuo ... you seem to be missing the point, and I'm not quite sure where all
these different hash compares come from but ...

bool memequ() should only return true or false and simply compare 'word by word'
be that 32bit or 64bit words. There should be nothing in the macro than an word
XOR, and buffers should be padded to the right boundary. It is the processing of
unnecessary byte related data which is adding back in your 'insecurity', but it
is also adding extra unnecessary processing.

The point about 'memequ' is that it should both speed up any use of memcpy where
there is no need to return anything other than a match ... it is the subsequent
search to find the '</>' byte which even introduces the timing attack ... and it
also makes these compares unicode safe! So that it is only the compares that
need the fine result that then need handling for a full unicode installation.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

11 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

Yasuo Ohgaki wrote:

Added yet another function to compare suggested by Lester.

bool str_word_compare(str, str)

This function compares data word by word rather than byte by byte.
It supposed to be faster for large data.

Yasuo ... you seem to be missing the point, and I'm not quite sure where
all these different hash compares come from but ...

bool memequ() should only return true or false and simply compare 'word by
word' be that 32bit or 64bit words. There should be nothing in the macro
than an word XOR, and buffers should be padded to the right boundary. It is
the processing of unnecessary byte related data which is adding back in
your 'insecurity', but it is also adding extra unnecessary processing.

The point about 'memequ' is that it should both speed up any use of memcpy
where there is no need to return anything other than a match ... it is the
subsequent search to find the '</>' byte which even introduces the timing
attack ... and it also makes these compares unicode safe! So that it is
only the compares that need the fine result that then need handling for a
full unicode installation.

The reason why word version of comparison function(str_word_compare) does
not
simply compare word by word, is alignment.

We cannot assume word aligned data for strings. This is the reason why
there is
additional byte by byte comparison. It could be made to align memory to
word size,
but it requires memory allocation for comparison. I'm not sure which is
better, though.

Regards,

BTW, byte by byte part comparison can be optimized by switch since
unaligned data
is only up to 7 bytes.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Lester Caine — view source

unread

Yasuo Ohgaki wrote:

We cannot assume word aligned data for strings. This is the reason why
there is
additional byte by byte comparison. It could be made to align memory to
word size,
but it requires memory allocation for comparison. I'm not sure which is
better, though.

Since PHP is controlling the memory management, there is no reason it can't
properly align data elements ... based on the target platform. When you only had
640k of memory every byte counted, but these days the speed gain provided by
simply wasting a few bytes and always padding buffers to a relevant word
boundary is far greater than the space saving. Things have changed somewhat
since the original C compilers were written. But this is better discussed on the
32/64bit optimization thread. Suffice to say that Firebird recorded substantial
speed improvement once the basic ground rules were accepted and unnecessary
'byte related' processing dropped.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

Yasuo Ohgaki wrote:

We cannot assume word aligned data for strings. This is the reason why
there is
additional byte by byte comparison. It could be made to align memory to
word size,
but it requires memory allocation for comparison. I'm not sure which is
better, though.

Since PHP is controlling the memory management, there is no reason it
can't properly align data elements ... based on the target platform. When
you only had 640k of memory every byte counted, but these days the speed
gain provided by simply wasting a few bytes and always padding buffers to a
relevant word boundary is far greater than the space saving. Things have
changed somewhat since the original C compilers were written. But this is
better discussed on the 32/64bit optimization thread. Suffice to say that
Firebird recorded substantial speed improvement once the basic ground rules
were accepted and unnecessary 'byte related' processing dropped.

Word aligned memory allocation can be done. It may be good for
fragmentation also.
I don't read current memory manager code, so I cannot answer this.

Anyone?

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by me@rouvenwessling.de — view source

unread

I'm incredibly sorry I haven't been able to get back to this earlier.

The RFC was accepted 22 to 1. As there was an abundance of feedback during the voting period and beyond I'll update the implementation accordingly. After that I'll discuss whether this goes into 5.6 or 5.7 with the RMs.

Best regards
Rouven

Hi internals,

as I've received no further feedback I've opened the voting on "Timing attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards
Rouven

11 years ago by me@rouvenwessling.de — view source

unread

Hello together,

I've updated the patch, taking the following feedback into account:
-Renamed function to hash_equals
-Error out early in case string lengths are not equal (I've maintained the name known_string and user_string too allow improving this in the future, also makes for a nicer error message)
-Only allow strings to be compared

The patch can be found here: https://github.com/realityking/php-src/compare/hash_equals

If anyone thinks, that this needs a new RFC please say so.

Best regards
Rouven

I'm incredibly sorry I haven't been able to get back to this earlier.

The RFC was accepted 22 to 1. As there was an abundance of feedback during the voting period and beyond I'll update the implementation accordingly. After that I'll discuss whether this goes into 5.6 or 5.7 with the RMs.

Best regards
Rouven

Hi internals,

as I've received no further feedback I've opened the voting on "Timing attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Voting ends on 2014/02/09 11:00PM UTC

Best regards
Rouven

11 years ago by Yasuo Ohgaki — view source

unread

Hi Rouven,

On Mon, Feb 24, 2014 at 3:31 AM, Rouven Weßling me@rouvenwessling.dewrote:

I've updated the patch, taking the following feedback into account:
-Renamed function to hash_equals
-Error out early in case string lengths are not equal (I've maintained the
name known_string and user_string too allow improving this in the future,
also makes for a nicer error message)
-Only allow strings to be compared

The patch can be found here:
https://github.com/realityking/php-src/compare/hash_equals

If anyone thinks, that this needs a new RFC please say so.

I did some experiments. It seems it's good to implement timing safe
comparison in engine. i.e. We can make ==/=== secure by default like
Python. It would be much safer get rid of all timing from PHP.

We need new RFC to include the change in engine.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by me@rouvenwessling.de — view source

unread

Hi Yasuo,

I did some experiments. It seems it's good to implement timing safe comparison in engine. i.e. We can make ==/=== secure by default like Python. It would be much safer get rid of all timing from PHP.

We need new RFC to include the change in engine.

That's not how I read that discussion (though I might have missed a mail). Also personally I don't like it. I don't see that the supposed gain in security is worth the performance implication. Also if it turns out there's a bug, and we'd have to make it 100 times slower for some reason, than that's not a big deal for a function like hash_equals. It is however if it affects all comparisons.

Since I don't believe in that change, I'm not interested in proposing that RFC.

Best regards
Rouve

11 years ago by Yasuo Ohgaki — view source

unread

Hi Rouven,

On Mon, Feb 24, 2014 at 6:05 AM, Rouven Weßling me@rouvenwessling.dewrote:

I did some experiments. It seems it's good to implement timing safe
comparison in engine. i.e. We can make ==/=== secure by default like
Python. It would be much safer get rid of all timing from PHP.

We need new RFC to include the change in engine.

That's not how I read that discussion (though I might have missed a mail).
Also personally I don't like it. I don't see that the supposed gain in
security is worth the performance implication. Also if it turns out there's
a bug, and we'd have to make it 100 times slower for some reason, than
that's not a big deal for a function like hash_equals. It is however if it
affects all comparisons.

Since I don't believe in that change, I'm not interested in proposing that
RFC.

Understood. This is OK.

From the experiment, performance implication is not much.
Automatic protection in any place would worth the trade off. IMHO.
Python even uses less efficient hash for comparison. I heard they
use optimized version of SipHash, we may better try it before deciding
algorithm, though.

Any other comments regarding ==/=== timing safety?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Lester Caine — view source

unread

Yasuo Ohgaki wrote:

I did some experiments. It seems it's good to implement timing safe
comparison in engine. i.e. We can make ==/=== secure by default like
Python. It would be much safer get rid of all timing from PHP.

We need new RFC to include the change in engine.

That's not how I read that discussion (though I might have missed a mail).
Also personally I don't like it. I don't see that the supposed gain in
security is worth the performance implication. Also if it turns out there's
a bug, and we'd have to make it 100 times slower for some reason, than
that's not a big deal for a function like hash_equals. It is however if it
affects all comparisons.

Since I don't believe in that change, I'm not interested in proposing that
RFC.

Understood. This is OK.

From the experiment, performance implication is not much.
Automatic protection in any place would worth the trade off. IMHO.
Python even uses less efficient hash for comparison. I heard they
use optimized version of SipHash, we may better try it before deciding
algorithm, though.

Any other comments regarding ==/=== timing safety?

Surely there are two elements to this?

Simply comparing two strings in a time stable manner is relatively easy. You
just modify the compare to store the result and always complete a full scan.
Only the lengths of the string come into that. How physically that is handled in
the processor will actually determine if the time taken is 'byte sensitive'
anyway! Simply stopping the core process from the unnecessary <> comparison
where the answer is not used is going to be a performance gain, all be it
minuscule, but the timing differences that are trying to be hidden are even
smaller than that? Tidying the core use of string compare is generally more
useful than simply adding yet another method of calling it.

The other element here is actually calculating the hash value to compare with?
That will also depend on just how the processor is processing multiple bytes, so
simply fixing the final comparison may be somewhat irrelevant if the hash
generation also has a leak? Yes it should be another constant, but what is being
fed in is the variable being changed during the attack?

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

11 years ago by Yasuo Ohgaki — view source

unread

Hi Lester,

Yasuo Ohgaki wrote:

I did some experiments. It seems it's good to implement timing safe

comparison in engine. i.e. We can make ==/=== secure by default like
Python. It would be much safer get rid of all timing from PHP.

We need new RFC to include the change in engine.

That's not how I read that discussion (though I might have missed a
mail).
Also personally I don't like it. I don't see that the supposed gain in
security is worth the performance implication. Also if it turns out
there's
a bug, and we'd have to make it 100 times slower for some reason, than
that's not a big deal for a function like hash_equals. It is however if
it
affects all comparisons.

Since I don't believe in that change, I'm not interested in proposing
that
RFC.

Understood. This is OK.

From the experiment, performance implication is not much.

Automatic protection in any place would worth the trade off. IMHO.
Python even uses less efficient hash for comparison. I heard they
use optimized version of SipHash, we may better try it before deciding
algorithm, though.

Any other comments regarding ==/=== timing safety?

Surely there are two elements to this?

Simply comparing two strings in a time stable manner is relatively easy.
You just modify the compare to store the result and always complete a full
scan. Only the lengths of the string come into that. How physically that is
handled in the processor will actually determine if the time taken is 'byte
sensitive' anyway! Simply stopping the core process from the unnecessary <>
comparison where the answer is not used is going to be a performance gain,
all be it minuscule, but the timing differences that are trying to be
hidden are even smaller than that? Tidying the core use of string compare
is generally more useful than simply adding yet another method of calling
it.

The other element here is actually calculating the hash value to compare
with? That will also depend on just how the processor is processing
multiple bytes, so simply fixing the final comparison may be somewhat
irrelevant if the hash generation also has a leak? Yes it should be another
constant, but what is being fed in is the variable being changed during the
attack?

Good hash function will produce unpredictable values with constant size.
Computation is too complex to get the usable difference, I suppose.
Therefore, hash value comparison should be safe.

I don't know why Python uses optimized version of SipHash. Are they using
shared strings like Java? If so, it's understandable.

From my experiment, simple byte by byte comparison seems the way to go.
There is only negligible difference between strncmp() and simple byte by
byte comparison. If we care for performance, word by word comparison may be
used. It was the fastest. Detailed benchmark should be taken, though.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

Hi internals,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Is there any progress?
From benchmark result, overhead for timing safe comparison is negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible. (== could
be timing safe, too)

Is anyone working on it?

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by me@rouvenwessling.de — view source

unread

Hi internals,

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Is there any progress?

The pull request (https://github.com/php/php-src/pull/608) for that RFC is waiting to be merged, I hope someone gets to it before beta1.

From benchmark result, overhead for timing safe comparison is negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible. (== could
be timing safe, too)

Is anyone working on it?

I don't know if someone else is, but I am not.

Best regards
Rouven

11 years ago by Adam Harvey — view source

unread

as I've received no further feedback I've opened the voting on "Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Is there any progress?

The pull request (https://github.com/php/php-src/pull/608) for that RFC is waiting to be merged, I hope someone gets to it before beta1.

I'll look at merging it today.

From benchmark result, overhead for timing safe comparison is negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible. (== could
be timing safe, too)

Is anyone working on it?

I don't know if someone else is, but I am not.

I'm not in favour of this — identity doesn't imply timing safety, and
I think we should keep operators as performant as possible.

Adam

11 years ago by Ferenc Kovacs — view source

unread

2014.03.18. 17:14, "Adam Harvey" aharvey@php.net ezt írta:

On Mon, Feb 3, 2014 at 7:50 AM, Rouven Weßling me@rouvenwessling.de
wrote:

as I've received no further feedback I've opened the voting on
"Timing
attack safe string comparison function":

https://wiki.php.net/rfc/timing_attack

Is there any progress?

The pull request (https://github.com/php/php-src/pull/608) for that RFC
is waiting to be merged, I hope someone gets to it before beta1.

I'll look at merging it today.

From benchmark result, overhead for timing safe comparison is
negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible. (==
could
be timing safe, too)

Is anyone working on it?

I don't know if someone else is, but I am not.

I'm not in favour of this — identity doesn't imply timing safety, and
I think we should keep operators as performant as possible.

Agree and afair it was explicitly stated as out of scope for this rfc.
(sorry for not merging this sooner, thanks Adam for thaking care of this).

11 years ago by Yasuo Ohgaki — view source

unread

Hi all,

From benchmark result, overhead for timing safe comparison is
negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible. (==
could
be timing safe, too)

Is anyone working on it?

I don't know if someone else is, but I am not.

I'm not in favour of this — identity doesn't imply timing safety, and
I think we should keep operators as performant as possible.

Agree and afair it was explicitly stated as out of scope for this rfc.
(sorry for not merging this sooner, thanks Adam for thaking care of this).

Benchmark reveals performance issue is negligible.

Regardless of where it is implemented, simple byte by byte compare is the
best.
We may ignore length leak at all. It's simpler (less risk) and faster.

It's better to make === timing safe like Python, since it would
be far less risks than dedicated API, but we may consider this later if the
risk is
getting greater.

I need API to be exposed so that I can use it anywhere in PHP.
Please make a PHPAPI.

--
Yasuo Ohgaki
yohgaki@ohgaki.net

11 years ago by Ferenc Kovacs — view source

unread

Hi all,

From benchmark result, overhead for timing safe comparison is
negligible
with byte by byte comparison.
I would like to see timing safe "===" for 5.6, if it's possible.
(== could
be timing safe, too)

Is anyone working on it?

I don't know if someone else is, but I am not.

I'm not in favour of this — identity doesn't imply timing safety, and
I think we should keep operators as performant as possible.

Agree and afair it was explicitly stated as out of scope for this rfc.
(sorry for not merging this sooner, thanks Adam for thaking care of this).

Benchmark reveals performance issue is negligible.

and that benchmark was executed only one platform, with random data.
I'm not saying that it's not accurate, but it is a different kind of risk
when you introduce a new method for specific compare functionality than
replacing === with this new method for everybody.

Regardless of where it is implemented, simple byte by byte compare is the
best.
We may ignore length leak at all. It's simpler (less risk) and faster.

It's better to make === timing safe like Python, since it would
be far less risks than dedicated API, but we may consider this later if
the risk is
getting greater.

I disagree, yeah we could cover up more ground if we replace the ===
implementation with a timing safe one, but that would be more riskier from
the stability point of view, than adding it as a separate function first.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

[VOTE] Timing attack safe string comparison function

Paddy

Cheers,

Cheers,

--

--

--

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL