Hi internals!
This mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for all cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable
insertion operations. This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.
Implementation: https://github.com/php/php-src/pull/1565
From my testing the change has no negative performance impact. The change
is small and does not break ABI.
Tracking bug (with various implementations):
https://bugs.php.net/bug.php?id=70644
What are your thoughts on this?
Nikita
For those not familiar with HashDos or interested in some considerations
regarding alternative implementations, the original mail:
In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
the form of max_input_vars. As a recap, HashDos exploits the fact that
hashtables (which PHP uses ubiquitously) perform very badly if the hashes
of many elements collide. In particular the performance of inserting N
elements into a hashtable can degrade from O(N) to O(N^2) using carefully
chosen keys.
A very simple demo script for creating a hashtable with many collisions:
$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
This code creates an array with about 65k elements and runs in 30s (PHP
5.6) or 4s (PHP 7) on my machine.
This behavior can be exploited by an attacker whenever a server running PHP
can be made to create a decent-sized array (or object) with
attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
payload can easily take 5 CPU seconds to process on PHP 7 and from there it
goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).
The most obvious attack vector are GET and POST variables, which are
automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
by introducing max_input_vars, which prevents creating GET/POST arrays with
more than a certain number of elements. As the HashDos attack relies on
O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.
However, GET/POST are not the only ways an attacker can trigger
array/object creation with specific keys. For example many PHP applications
have endpoints that accept JSON encoded payloads, which are not protected
by max_input_vars. The only means of protection currently available to
userland code is to never decode JSON payloads that exceed some
conservative size limit like 50-100KB (often not practical).
Apart from JSON, there will likely be various other application-specific
situations where arrays are generated which contain user-provided keys.
While we could add something similar to max_input_vars to json_decode and
unserialize, this would not solve the problem for any custom code.
There are essentially three ways to prevent HashDos attacks in general
(rather than patching specific places):
- If there are many collisions for a hash slot, switch collision
resolution from linked lists to self-balancing binary trees. - Switch hashtables to a keyed cryptographic hash like SipHash.
- If there are very many collisions for a hash slot, throw a fatal error.
I will comment on these possibilities as they apply to PHP.
- (Self-balancing binary trees). The idea here is that a balanced binary
tree has worst-case insertion time O(log N), while the linked list we
normally use has worst-case insertion time O(N). This means that the
worst-case execution time for N insertions into a hashtable goes from
O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
usual".
I don't think this solution is viable for PHP. Supporting binary trees next
to linked list collision resolution will make the hashtable implementation
a lot more complicated -- and it isn't exactly simple as is.
- (Randomized cryptographic hash). This approach doesn't change anything
about how collisions are handled, instead it prevents the attacker from
constructing such collisions in the first place.
This is done by using a fast cryptographic hash function (typically
SipHash), which is keyed with a (cryptographically strong) random key. As
the attacker does not know the key, he cannot efficiently construct arrays
with many collisions.
There are a number of factors to take into account when trying to apply
this to PHP:
a) The SipHash key would have to be the same for all processes in an FPM
pool. We can't use a per-process or per-request key due to arrays living in
opcache SHM. As far as I know SipHash is designed to be secure even when
the used key is long living, so this does not appear to be an issue.
b) SipHash is slower (especially for very small strings) than the hash
function we currently use (DJBX33A). As PHP 7 caches the hashes for
strings, I do not believe this to be a significant problem.
c) The real issue: Currently PHP does not hash integer keys, but they are
just as susceptible to HashDos as string keys. To protect against this
vector we'd have to start hashing integer keys as well. Especially when
using a hash function as expensive as SipHash, this would make operations
on arrays with (non-ascending or non-dense) integer keys significantly
slower.
Here is an implementation of using SipHash for both strings and integers:
https://github.com/php/php-src/compare/master...nikic:integerHash
Testing this against Wordpress showed only a small (but non-trivial)
slowdown. This is probably attributable to the fact that most
integer-indexed arrays are packed (not using a hash) and thus this change
has less impact than it might have had on PHP 5.
However, testing raw array operations, working on non-packed integer arrays
can easily be three times slower using this implementation. I do not think
this is acceptable. Maybe Wordpress doesn't use many unpacked integer
arrays, but that does not have to hold generally and this difference is
just too drastic.
- (Fatal error on many collisions). While the two previous options try to
ensure that hashtables stay efficient regardless of the used keys, the last
option aims to simply detect malicious array keys and abort the script in
such a case.
This is done by counting the number of collisions during hashtable
insertion operations and throwing a fatal error if this collisions count
exceeds a certain threshold.
Here is an implementation of this mechanism for PHP:
https://github.com/php/php-src/pull/1565
This is the approach I would recommend for PHP. The patch for this change
is small and non-intrusive and does not break ABI. From my testing it has
no adverse performance impact. (In microbenchmarks I found it to even be
consistently faster, as it uses a more specialized implementation for a
commonly used type of insertion operation.)
While this is certainly less elegant than the alternatives, this would
allow us to fix the security issue in a timely matter. Switching to
SipHash, while maybe a possibility for the future, would still require a
significant amount of work to be at all feasible.
- (Fatal error on many collisions). While the two previous options try to
ensure that hashtables stay efficient regardless of the used keys, the last
option aims to simply detect malicious array keys and abort the script in
such a case.This is done by counting the number of collisions during hashtable
insertion operations and throwing a fatal error if this collisions count
exceeds a certain threshold.
Would this be a catchable Error (implementing Throwable) or a real fatal?
Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
workers with carefully crafted input variables then.
Thanks, Niklas
Would this be a catchable Error (implementing Throwable) or a real fatal?
Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
workers with carefully crafted input variables then.
It would be a real fatal error. Throwing an exception in this place would
be problematic, as we're inserting into hashtables all over the place and
not all of these places are exception-safe. Many places also make the
reasonable assumption that (normal) insert operations always succeed.
You are correct in saying that this approach would mean that you can now
easily trigger a fatal error in an application that was previously
susceptible to HashDos (and likely also in some that previously were not).
In most cases this should not be a problem, however for Aerys and likely
other daemons a fatal error implies losing all connections on the current
worker. This may effectively be worse than a HashDos attack.
The only thing I can think to offer here is to make the collision number
threshold ini configurable, so you would have the ability to effectively
disable this protection. In that case you'd have to manually perform size
checks in the right places if you wish to protect against HashDos.
I'm fine with having a solution that only works for the 99% of
applications, until such a time, if ever, as we can feasibly implement one
of the alternative schemes that would cover all possible scenarios.
Nikita
Would this be a catchable Error (implementing Throwable) or a real fatal?
Having a real fatal could lead to a DOS in Aerys as you'd be able to crash
workers with carefully crafted input variables then.It would be a real fatal error. Throwing an exception in this place would
be problematic, as we're inserting into hashtables all over the place and
not all of these places are exception-safe. Many places also make the
reasonable assumption that (normal) insert operations always succeed.You are correct in saying that this approach would mean that you can now
easily trigger a fatal error in an application that was previously
susceptible to HashDos (and likely also in some that previously were not).
In most cases this should not be a problem, however for Aerys and likely
other daemons a fatal error implies losing all connections on the current
worker. This may effectively be worse than a HashDos attack.The only thing I can think to offer here is to make the collision number
threshold ini configurable, so you would have the ability to effectively
disable this protection. In that case you'd have to manually perform size
checks in the right places if you wish to protect against HashDos.I'm fine with having a solution that only works for the 99% of
applications, until such a time, if ever, as we can feasibly implement one
of the alternative schemes that would cover all possible scenarios.Nikita
I don't know if anyone has suggested this before, but why not have a
function that application developers can call to switch hash modes and
support multiple hash modes in the core?
I've always viewed 'max_input_vars' as an emergency hack and I've run
into the default 1,000 limit many times. When I hit that limit, I
inevitably have to raise it to anywhere from 3,000 to 10,000 to get the
target application to function, which, of course, puts the whole server
at risk.
If the application developer could declare what hash mode to use for new
arrays, that could potentially solve the problem more cleanly. SipHash
would be used for security scenarios (PHP loading of superglobals, user
input, etc.) and the current DJB hash for everything else. That could,
of course, lead to unfortunate situations. Alternatively, default to
SipHash for everything but the application developer could turn it off
and on as needed at runtime. Put a warning in the documentation about
the security of hashes and also mention leaving SipHash enabled when in
doubt.
--
Thomas Hruska
CubicleSoft President
I've got great, time saving software that you will find useful.
Hi Thomas,
I don't know if anyone has suggested this before, but why not have a
function that application developers can call to switch hash modes and
support multiple hash modes in the core?I've always viewed 'max_input_vars' as an emergency hack and I've run into
the default 1,000 limit many times. When I hit that limit, I inevitably
have to raise it to anywhere from 3,000 to 10,000 to get the target
application to function, which, of course, puts the whole server at risk.
Because any hash functions have collisions.
Even if we use stronger hash against collisions, computers are getting
faster and faster, creating colliding key datatabease becomes easier and
easier. Clever person may find algolithmic way to generate colliding keys
in the future also.
In practice, we wouldn't have problems with max number of collisions.
Max number of collisions resolves the issue for good and we may execute
code faster with faster hash. I forgot the number but SipHash is much slower
than DJB.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Thomas,
In practice, we wouldn't have problems with max number of collisions.
Is CLI going to be or can CLI be excluded from max collisions? After
thinking about it for a long while, that's my only area of concern here.
SAPI can (fatal) error to its heart's content and I'll find ways to
live with it. But CLI needs stability guarantees. 'max_input_vars'
didn't affect CLI. However, max collisions could negatively affect CLI
because the change affects all arrays instead of just the superglobals.
Real-world scenario: About once every 4 to 6 months I will write a
script to load up a single PHP array with a few million records. Partly
because I can and partly because I need to. On CLI, especially for my
one-off quick-n-dirty scripts, I don't care how much CPU, RAM, and other
resources are used nor how much time it takes to complete. I just care
that the script finishes running successfully. If the script reads in a
record and attempts to add it to the array, the proposed max collisions
might trigger a fatal error if it hits the max collision limit for
arrays. My experience is that CLI is silent about 50% of the time when
it encounters a fatal error. So my script would drop back to the prompt
after spending many minutes to hours loading the data, not having done
any work, and not emit any error(s). I would think that it had
completed successfully until I went to look at the results and the
results I would be expecting to see wouldn't be there.
I abuse PHP arrays. Especially on CLI. Sorry.
--
Thomas Hruska
CubicleSoft President
I've got great, time saving software that you will find useful.
On Sat, Nov 28, 2015 at 12:22 AM, Thomas Hruska thruska@cubiclesoft.com
wrote:
Hi Thomas,
In practice, we wouldn't have problems with max number of collisions.
Is CLI going to be or can CLI be excluded from max collisions? After
thinking about it for a long while, that's my only area of concern here.
SAPI can (fatal) error to its heart's content and I'll find ways to live
with it. But CLI needs stability guarantees. 'max_input_vars' didn't
affect CLI. However, max collisions could negatively affect CLI because
the change affects all arrays instead of just the superglobals.Real-world scenario: About once every 4 to 6 months I will write a script
to load up a single PHP array with a few million records. Partly because I
can and partly because I need to. On CLI, especially for my one-off
quick-n-dirty scripts, I don't care how much CPU, RAM, and other resources
are used nor how much time it takes to complete. I just care that the
script finishes running successfully. If the script reads in a record and
attempts to add it to the array, the proposed max collisions might trigger
a fatal error if it hits the max collision limit for arrays. My experience
is that CLI is silent about 50% of the time when it encounters a fatal
error. So my script would drop back to the prompt after spending many
minutes to hours loading the data, not having done any work, and not emit
any error(s). I would think that it had completed successfully until I
went to look at the results and the results I would be expecting to see
wouldn't be there.I abuse PHP arrays. Especially on CLI. Sorry.
To make sure there's no confusion on this point: When I say "number of
collisions" I do not mean the total sum of collisions in a hashtable.
That's a meaningless value, as a large hashtable will have a large total
number of collisions, even if it is not malicious. What I rather mean is
the number of collisions in a single collisions resolution chain.
Lets to to quantify the probability of reaching the collision limit C with
a hashtable of size N and assuming a random hash distribution. The upper
bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1) being
the probability that C elements hash to one value and (N over C) the number
of C element subsets of an N element set. For large N and N >> C we
approximate (N choose C) to (eN/C)^C / sqrt(2piC). As such our upper
bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest
possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100
probability 2.3E-149. The patch uses C = 1000.
Now my probability-foo is pretty weak, so it's not unlikely that this
result is totally wrong ^^ In any case, the point is that even for very
large arrays, there is no way you're going to reach the collision limit
with random data. Of course it can happen that your (obviously not
perfectly random) data set happens to have keys that our hash function
behaves particularly badly with. E.g. if you create a hashtable with
pointers, you'll likely be getting an expected length of 8 for each
collision chain due to alignment. If for whatever reason you are inserting
integers that all happen to be a factor of 2^10, you could reach the
collision limit. If that's the case you're probably still better off
knowing that you're trying to Dos yourself, rather than just silently
slowing down to a crawl ;)
I'm not adverse to making the collision limit configurable, but it would
still be enabled by default for CLI.
Nikita
Hi Thomas,
-----Original Message-----
From: Thomas Hruska [mailto:thruska@cubiclesoft.com]
Sent: Saturday, November 28, 2015 12:23 AM
To: Yasuo Ohgaki yohgaki@ohgaki.net
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] HashDos protectionHi Thomas,
In practice, we wouldn't have problems with max number of collisions.
Is CLI going to be or can CLI be excluded from max collisions? After thinking
about it for a long while, that's my only area of concern here.
SAPI can (fatal) error to its heart's content and I'll find ways to live with it. But
CLI needs stability guarantees. 'max_input_vars'
didn't affect CLI. However, max collisions could negatively affect CLI because
the change affects all arrays instead of just the superglobals.Real-world scenario: About once every 4 to 6 months I will write a script to load
up a single PHP array with a few million records. Partly because I can and partly
because I need to. On CLI, especially for my one-off quick-n-dirty scripts, I don't
care how much CPU, RAM, and other resources are used nor how much time it
takes to complete. I just care that the script finishes running successfully. If the
script reads in a record and attempts to add it to the array, the proposed max
collisions might trigger a fatal error if it hits the max collision limit for arrays.
My experience is that CLI is silent about 50% of the time when it encounters a
fatal error. So my script would drop back to the prompt after spending many
minutes to hours loading the data, not having done any work, and not emit any
error(s). I would think that it had completed successfully until I went to look at
the results and the results I would be expecting to see wouldn't be there.I abuse PHP arrays. Especially on CLI. Sorry.
The particular case being fixes is a vulnerability which can be specifically triggered. If you already used to write scripts creating arrays with millions of records, and those went through, means there was never enough collisions produced. If you have time, I'd ask you to please test this patch with one of your previous scripts of that kind.
The reproduce case from https://github.com/bk2204/php-hash-dos contains 10^6 crafted keys, that renders the application to just hang as hash table will be busy resolving collisions and looking for free buckets. I was running that reproduce case and broke the CLI script after a hour as it doesn't make sense. Ofc, a cron running for hours or even days is ok, but it is clear that with invalid keys that cron will spend those days not in the actual job but resolving collisions. So indeed, if you would meet the malicious situation, your script wouldn't work. Using same approach for my test, I'm already able to produce an array with around 2*10^6 11 byte long keys from 62 chars set (see the script linked here https://github.com/php/php-src/pull/1565#issuecomment-160231878 ). It is supposed to show, that the patch doesn't produce an issue with creating huge arrays.
I'd really plea for a bit less theoretical approach at this point - lets test it with any possible concerning cases and see. In the real life, the key length can vary, the quality of keys can vary, the keys can be explicitly made suitable for a huge data set actually just same way they can be crafted for a malicious attack. Say, use bigger set for keys and use longer keys, use keys generated by random_bytes()
, etc. As such scripts processing big datasets are usually only meant for cronjobs, taking special measures could affect their runtime, however - if a scripts runs for days, + or - a couple of hours won't be significant.
So after all, fixing the vulnerability could theoretically cost something for the justified but unusual use cases like big datasets. However it won't make those unusual cases impossible, furthermore it doesn't really seem to have an impact in practice, and it won't affect common cases at all. At least that is what my current tests show. Please, let's fetch the patch and test it.
Regards
Anatol
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Le 26/11/2015 18:24, Nikita Popov a écrit :
Here is an implementation of this mechanism for PHP:
https://github.com/php/php-src/pull/1565This is the approach I would recommend for PHP. The patch for this
change is small and non-intrusive and does not break ABI. From my
testing it has no adverse performance impact. (In microbenchmarks
I found it to even be consistently faster, as it uses a more
specialized implementation for a commonly used type of insertion
operation.)
Thanks for this work.
I agree, this seems a very acceptable solution, I would also recommend
it, even for 7.0.1 (perhaps even for 5.6 if possible)
Remi.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlZYA18ACgkQYUppBSnxahg7pQCeNCpRjQxdzZz9MXF94snPsigB
nxwAn0lK0VAZgn5u4n+S1C1ABlvRrRWU
=otYB
-----END PGP SIGNATURE
This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.
To me this feels like it's just moving a DoS vector from one place to
another.
As Niklas already pointed out, he is directly affected by this.
I was considering the scenario:
- Open resources
- json_decode
- Do stuff
- Clean up resources
This makes the DoS more application specific, but there's any number of
creative uses for making an application unexpectedly fail half way through.
You can argue it's similar to any DoS that causes a request to run out of
memory half way through, it is in some ways.
I don't think an exception is right for this either. People blindly catch
all exceptions because they can.
Not sure what to suggest.
Hi Nikita,
What are your thoughts on this?
Great! This is exactly what I was thinking.
I prefer collision counting rather than slower hash function.
Hardcoded collision max (1000) seems ok for me.
Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Currently it's not catchable, that's my main concern. If it's catchable,
it's not that much of a problem.
Regards, Niklas
2015-11-27 10:05 GMT+01:00 Yasuo Ohgaki yohgaki@ohgaki.net:
Hi Nikita,
On Fri, Nov 27, 2015 at 2:24 AM, Nikita Popov nikita.ppv@gmail.com
wrote:What are your thoughts on this?
Great! This is exactly what I was thinking.
I prefer collision counting rather than slower hash function.Hardcoded collision max (1000) seems ok for me.
Catchable fatal error, at least E_RECOVERABLE_ERROR, is preferred, IMHO.Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
What are your thoughts on this?
First of all, thanks a lot for looking into it! That's great!
I think that it's all cool except the fact that json_decode would result in
fatal error. I don't think that json_decode should kill the script in this
case. It's much better to return NULL
and set a new JSON_ERROR_...
We could maybe add some EG var to silence it when using json_decode and set
it back when finished. In that case when the collision limit is exceeded it
would set another EG that could be checked in the json_parser.y as
UNEXPECTED after each addition to the object or array. It's not probably
nicest solution and it would probably slow down slightly the json parser
but it could work and it's still better than fatal error IMHO.
What do you think?
Cheers
Jakub
Hi Nikita,
-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Thursday, November 26, 2015 6:25 PM
To: PHP internals internals@lists.php.net; Anatol Belski
anatol.php@belski.net; Remi Collet remi@php.net
Subject: [PHP-DEV] HashDos protectionHi internals!
This mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for all cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable insertion
operations. This will throw a fatal error if the number of collisions during an
insertion operation exceed a certain threshold.Implementation: https://github.com/php/php-src/pull/1565
From my testing the change has no negative performance impact. The change is
small and does not break ABI.Tracking bug (with various implementations):
https://bugs.php.net/bug.php?id=70644What are your thoughts on this?
Responding to the short version as well :)
I was checking your patch and I think it is great. Currently I see no ABI breach (please correct me if I err). So IMHO after sufficient discussion, corrections and testing, given there's still no ABI incompatibility, it should be backported to 7.0 as early as possible.
Regards
Anatol
-----Message d'origine-----
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : jeudi 26 novembre 2015 18:25
À : PHP internals; Anatol Belski; Remi Collet
Objet : HashDos protection
Hi internals!
his mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for all cases (rather than just GET/POST parsing), I propose to introduce collision counting during hashtable insertion operations. This will throw a fatal error if the number of collisions during an insertion operation exceed a certain threshold.
In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in the form of max_input_vars.
Hi everybody...
I am very new to this mailing list, and I do not know If my thinking about this problem is good, but to my opinion, this kind of attack is based on the fact that the hacker knows in advance how to compute the hash value in order to generate collision.
If a random salt was added in the _zend_array struct (at a cost of the salt size 4 bytes? for each hash table),
Then if the hash computation takes that salt into account ( add ht parameter to each function that calculates the hash)
It would be impossible to predict the hash of a value.
So impossible to perform such kind of attack...
What do you think about that ?
Perhaps if you do not want to increase the size of the the _zend_array struct, perhaps a random salt initialized at the init of a php program (the same for all hash tables, that changes at each run) could be enough!
Best regards,
Pascal KISSIAN
On Sat, Nov 28, 2015 at 2:00 AM, Pascal KISSIAN php-mailing-list@lool.fr
wrote:
-----Message d'origine-----
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : jeudi 26 novembre 2015 18:25
À : PHP internals; Anatol Belski; Remi Collet
Objet : HashDos protectionHi internals!
his mail turned out to be rather long, so I'll start with a TL;DR:To fix the HashDos vulnerability for all cases (rather than just
GET/POST parsing), I propose to introduce collision counting during
hashtable insertion operations. This will throw a fatal error if the number
of collisions during an insertion operation exceed a certain threshold.In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced
in the form of max_input_vars.Hi everybody...
I am very new to this mailing list, and I do not know If my thinking about
this problem is good, but to my opinion, this kind of attack is based on
the fact that the hacker knows in advance how to compute the hash value in
order to generate collision.If a random salt was added in the _zend_array struct (at a cost of the
salt size 4 bytes? for each hash table),
Then if the hash computation takes that salt into account ( add ht
parameter to each function that calculates the hash)
It would be impossible to predict the hash of a value.
So impossible to perform such kind of attack...What do you think about that ?
Perhaps if you do not want to increase the size of the the _zend_array
struct, perhaps a random salt initialized at the init of a php program (the
same for all hash tables, that changes at each run) could be enough!
This is what variant 2 (Switch hashtables to a keyed cryptographic hash
like SipHash) describes, using a global per-pool key.
Nikita
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : samedi 28 novembre 2015 11:35
À : Pascal KISSIAN
Cc : PHP internals
Objet : Re: HashDos protection
-----Message d'origine-----
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : jeudi 26 novembre 2015 18:25
À : PHP internals; Anatol Belski; Remi Collet
Objet : HashDos protection
Hi internals!
his mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for all cases (rather than just GET/POST parsing), I propose to introduce collision counting during hashtable insertion operations. This will throw a fatal error if the number of collisions during an insertion operation exceed a certain threshold.
In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in the form of max_input_vars.
Hi everybody...
I am very new to this mailing list, and I do not know If my thinking about this problem is good, but to my opinion, this kind of attack is based on the fact that the hacker knows in advance how to compute the hash value in order to generate collision.
If a random salt was added in the _zend_array struct (at a cost of the salt size 4 bytes? for each hash table),
Then if the hash computation takes that salt into account ( add ht parameter to each function that calculates the hash)
It would be impossible to predict the hash of a value.
So impossible to perform such kind of attack...
What do you think about that ?
Perhaps if you do not want to increase the size of the the _zend_array struct, perhaps a random salt initialized at the init of a php program (the same for all hash >>tables, that changes at each run) could be enough!
This is what variant 2 (Switch hashtables to a keyed cryptographic hash like SipHash) describes, using a global per-pool key.
Nikita
Sorry Nikita,
I didn’t fully read your 1st message because it was speaking on changing hash algo…, and I’ve been a bit lazy on that…
However, I only have thought about a minor change introducing a salt.
In the zend_inline_hash_func , hash is initialized with Z_UL(5381) … what about just adding a salt value to this number, this would not make any performance issue.
For hashing integers, why just not adding a salt value also… and no performance issue….
For storing the salt, if you choose to have one different random value for each hash table, it would not be a problem for arrays living in opcache SHM…
(and hoping that no performance, nor memory issue arise increasing a bit the size of the hash table)
pk
On Sat, Nov 28, 2015 at 12:02 PM, Pascal KISSIAN php-mailing-list@lool.fr
wrote:
Sorry Nikita,
I didn’t fully read your 1st message because it was speaking on changing
hash algo…, and I’ve been a bit lazy on that…However, I only have thought about a minor change introducing a salt.
In the zend_inline_hash_func , hash is initialized with Z_UL(5381) … what
about just adding a salt value to this number, this would not make any
performance issue.
Collisions in DJBX33A are (integer overflow notwithstanding) completely
independent of the starting value, so randomizing it wouldn't help. If
you're interested in how DJB collisions are constructed, see
http://www.phpinternalsbook.com/hashtables/hash_algorithm.html#hash-collisions
.
For hashing integers, why just not adding a salt value also… and no
performance issue….
Similarly, this would not have any effect either. We reduce hashes using an
equivalent of hash % table_size, which is the same as (hash + N *
table_size) % table_size for any N. If you simply add an additional number
to it, the same relation still holds: (hash + salt) % table_size == (hash +
salt + N * table_size) % table_size, so elements that collided previously
still collide.
For storing the salt, if you choose to have one different random value
for each hash table, it would not be a problem for arrays living in
opcache SHM…
Randomizing the string hashing function would however prevent caching the
hash in the zend_string structure.
Nikita
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : dimanche 29 novembre 2015 12:45
À : Pascal KISSIAN
Cc : PHP internals
Objet : Re: HashDos protection
Collisions in DJBX33A are (integer overflow notwithstanding) completely independent of the starting value, so randomizing it wouldn't help. If you're interested in how DJB collisions are constructed, see http://www.phpinternalsbook.com/hashtables/hash_algorithm.html#hash-collisions.
Very interesting reading, thanks…
Similarly, this would not have any effect either. We reduce hashes using an equivalent of hash % table_size, which is the same as (hash + N * table_size) % table_size for any N. If you simply add an additional number to it, the same relation still holds: (hash + salt) % table_size == (hash + salt + N * table_size) % table_size, so elements that collided previously still collide.
You’re absolutely right! Just adding something results in a translation of the hash table cell…
Perhaps another operation could do the job? Multiply still keeps the collision for the modulo equal to 0… perhaps add + multiply ….
However, my main feeling is that "An ounce of prevention is worth a pound of cure"…
… and my preferences will go to your second option … taking care of not degrading performance…
pk
On 30 November 2015 at 13:58, Pascal KISSIAN php-mailing-list@lool.fr
wrote:
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : dimanche 29 novembre 2015 12:45
À : Pascal KISSIAN
Cc : PHP internals
Objet : Re: HashDos protectionCollisions in DJBX33A are (integer overflow notwithstanding) completely
independent of the starting value, so randomizing it wouldn't help. If
you're interested in how DJB collisions are constructed, see
http://www.phpinternalsbook.com/hashtables/hash_algorithm.html#hash-collisions
.Very interesting reading, thanks…
Similarly, this would not have any effect either. We reduce hashes using
an equivalent of hash % table_size, which is the same as (hash + N *
table_size) % table_size for any N. If you simply add an additional number
to it, the same relation still holds: (hash + salt) % table_size == (hash +
salt + N * table_size) % table_size, so elements that collided previously
still collide.You’re absolutely right! Just adding something results in a translation
of the hash table cell…Perhaps another operation could do the job? Multiply still keeps the
collision for the modulo equal to 0… perhaps add + multiply ….However, my main feeling is that "An ounce of prevention is worth a pound
of cure"…
… and my preferences will go to your second option … taking care of not
degrading performance…pk
As for what other languages do, Python, Ruby and Perl all seem to have
switched to using sipHash - maybe we should consider this too
On 28 November 2015 at 01:00, Pascal KISSIAN php-mailing-list@lool.fr
wrote:
-----Message d'origine-----
De : Nikita Popov [mailto:nikita.ppv@gmail.com]
Envoyé : jeudi 26 novembre 2015 18:25
À : PHP internals; Anatol Belski; Remi Collet
Objet : HashDos protectionHi internals!
his mail turned out to be rather long, so I'll start with a TL;DR:To fix the HashDos vulnerability for all cases (rather than just
GET/POST parsing), I propose to introduce collision counting during
hashtable insertion operations. This will throw a fatal error if the number
of collisions during an insertion operation exceed a certain threshold.In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced
in the form of max_input_vars.Hi everybody...
I am very new to this mailing list, and I do not know If my thinking about
this problem is good, but to my opinion, this kind of attack is based on
the fact that the hacker knows in advance how to compute the hash value in
order to generate collision.If a random salt was added in the _zend_array struct (at a cost of the
salt size 4 bytes? for each hash table),
Then if the hash computation takes that salt into account ( add ht
parameter to each function that calculates the hash)
It would be impossible to predict the hash of a value.
So impossible to perform such kind of attack...What do you think about that ?
Perhaps if you do not want to increase the size of the the _zend_array
struct, perhaps a random salt initialized at the init of a php program (the
same for all hash tables, that changes at each run) could be enough!Best regards,
Pascal KISSIAN--
As a bare minimum, such a salt would need to be Xor'd with the string
before hashing simple addition still produces collisions. I suspect it
would still be easy to produce collisions, however my math is not good
enough (at least not on a Monday morning) to figure out how.
- (Self-balancing binary trees). The idea here is that a balanced binary
tree has worst-case insertion time O(log N), while the linked list we
normally use has worst-case insertion time O(N). This means that the
worst-case execution time for N insertions into a hashtable goes from
O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
usual".I don't think this solution is viable for PHP. Supporting binary trees next
to linked list collision resolution will make the hashtable implementation
a lot more complicated -- and it isn't exactly simple as is.
I have written self-balancing binary trees on multiple occasions. I
know that we want to minimize complexity but if the complexity
actually helps solve this issue I think it would be worth it. I assume
you didn't create a proof of concept for this option – is that
correct?
I think fataling because of collisions is a poor solution to this
problem because fataling in some cases may contribute to a successful
DOS attack (I do not think Aerys is the only piece of software that
would be vulnerable to this). I am sure PHP is not the only language
that has experienced pain with HashDos – how have other languages
handled it?
Hi!
To fix the HashDos vulnerability for all cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable
insertion operations. This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.Implementation: https://github.com/php/php-src/pull/1565
This looks pretty cool. I'd support making the limit configurable
though, is there a reason why it's not?
--
Stas Malyshev
smalyshev@gmail.com
Hi Stas,
-----Original Message-----
From: Stanislav Malyshev [mailto:smalyshev@gmail.com]
Sent: Tuesday, December 1, 2015 12:21 AM
To: Nikita Popov nikita.ppv@gmail.com; PHP internals
internals@lists.php.net; Anatol Belski anatol.php@belski.net; Remi Collet
remi@php.net
Subject: Re: [PHP-DEV] HashDos protectionHi!
To fix the HashDos vulnerability for all cases (rather than just
GET/POST parsing), I propose to introduce collision counting during
hashtable insertion operations. This will throw a fatal error if the
number of collisions during an insertion operation exceed a certain threshold.Implementation: https://github.com/php/php-src/pull/1565
This looks pretty cool. I'd support making the limit configurable though, is there
a reason why it's not?
From what I was testing, the configuration is not absolutely necessary. The normal usage doesn't seem to cause the situation reproducible by https://github.com/bk2204/php-hash-dos . Even with a big array - with patched PHP I was reaching like 2.5 millions of string keys and gave up. On the other hand, if such a malicious situation would be reached, the application would become unusable - so the configuration is senseless for that case. If the array is big and there are too many collisions, PHP would just iterate over all the buckets all over again looking for a suitable one. Maybe the only case where INI could be useful were to force the exact zero collision or very low collision rate to bail out. At least that was my observation.
Regards
Anatol
Le 01/12/2015 02:28, Anatol Belski a écrit :
From what I was testing, the configuration is not absolutely necessary.
My first thinking, seeing this hardcoded limit was also, "we need to
make it an new ini option".
So I run a few tests.
Especially the Anatol one (a bit optimized to get more unique keys
faster), and with a debug message in zend_hash.c to display max number
of collisions encountered during the run
After some time...
With around 9.2 millions of unique keys, and ~1.2GB of memory used,
the max collisions never exceeds "12".
So I really think, now, that we can go without new ini option.
Remi.
Hi Nikita,
few notes:
It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.
Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward
incompatibility with 7.0.0. However, we may reserve it for internal usage
removing ZEND_API.
I don't think PHP should prevent user from construction of arrays he likes,
because of internal problems (like in your example).
$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.
I think only big arrays coming from external sources should be checked.
Your solution is incomplete, anyway, because of crafting a single list with
10000 collisions, attacker will able to use N lists with 1000 collisions.
Thanks. Dmitry.
Hi internals!
This mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for all cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable
insertion operations. This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.Implementation: https://github.com/php/php-src/pull/1565
From my testing the change has no negative performance impact. The change
is small and does not break ABI.Tracking bug (with various implementations):
https://bugs.php.net/bug.php?id=70644What are your thoughts on this?
Nikita
For those not familiar with HashDos or interested in some considerations
regarding alternative implementations, the original mail:In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
the form of max_input_vars. As a recap, HashDos exploits the fact that
hashtables (which PHP uses ubiquitously) perform very badly if the hashes
of many elements collide. In particular the performance of inserting N
elements into a hashtable can degrade from O(N) to O(N^2) using carefully
chosen keys.A very simple demo script for creating a hashtable with many collisions:
$s = 1 << 16; $a = []; for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
This code creates an array with about 65k elements and runs in 30s (PHP
5.6) or 4s (PHP 7) on my machine.This behavior can be exploited by an attacker whenever a server running PHP
can be made to create a decent-sized array (or object) with
attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
payload can easily take 5 CPU seconds to process on PHP 7 and from there it
goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).The most obvious attack vector are GET and POST variables, which are
automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
by introducing max_input_vars, which prevents creating GET/POST arrays with
more than a certain number of elements. As the HashDos attack relies on
O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.However, GET/POST are not the only ways an attacker can trigger
array/object creation with specific keys. For example many PHP applications
have endpoints that accept JSON encoded payloads, which are not protected
by max_input_vars. The only means of protection currently available to
userland code is to never decode JSON payloads that exceed some
conservative size limit like 50-100KB (often not practical).Apart from JSON, there will likely be various other application-specific
situations where arrays are generated which contain user-provided keys.
While we could add something similar to max_input_vars to json_decode and
unserialize, this would not solve the problem for any custom code.There are essentially three ways to prevent HashDos attacks in general
(rather than patching specific places):
- If there are many collisions for a hash slot, switch collision
resolution from linked lists to self-balancing binary trees.- Switch hashtables to a keyed cryptographic hash like SipHash.
- If there are very many collisions for a hash slot, throw a fatal error.
I will comment on these possibilities as they apply to PHP.
- (Self-balancing binary trees). The idea here is that a balanced binary
tree has worst-case insertion time O(log N), while the linked list we
normally use has worst-case insertion time O(N). This means that the
worst-case execution time for N insertions into a hashtable goes from
O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
usual".I don't think this solution is viable for PHP. Supporting binary trees next
to linked list collision resolution will make the hashtable implementation
a lot more complicated -- and it isn't exactly simple as is.
- (Randomized cryptographic hash). This approach doesn't change anything
about how collisions are handled, instead it prevents the attacker from
constructing such collisions in the first place.This is done by using a fast cryptographic hash function (typically
SipHash), which is keyed with a (cryptographically strong) random key. As
the attacker does not know the key, he cannot efficiently construct arrays
with many collisions.There are a number of factors to take into account when trying to apply
this to PHP:a) The SipHash key would have to be the same for all processes in an FPM
pool. We can't use a per-process or per-request key due to arrays living in
opcache SHM. As far as I know SipHash is designed to be secure even when
the used key is long living, so this does not appear to be an issue.b) SipHash is slower (especially for very small strings) than the hash
function we currently use (DJBX33A). As PHP 7 caches the hashes for
strings, I do not believe this to be a significant problem.c) The real issue: Currently PHP does not hash integer keys, but they are
just as susceptible to HashDos as string keys. To protect against this
vector we'd have to start hashing integer keys as well. Especially when
using a hash function as expensive as SipHash, this would make operations
on arrays with (non-ascending or non-dense) integer keys significantly
slower.Here is an implementation of using SipHash for both strings and integers:
https://github.com/php/php-src/compare/master...nikic:integerHashTesting this against Wordpress showed only a small (but non-trivial)
slowdown. This is probably attributable to the fact that most
integer-indexed arrays are packed (not using a hash) and thus this change
has less impact than it might have had on PHP 5.However, testing raw array operations, working on non-packed integer arrays
can easily be three times slower using this implementation. I do not think
this is acceptable. Maybe Wordpress doesn't use many unpacked integer
arrays, but that does not have to hold generally and this difference is
just too drastic.
- (Fatal error on many collisions). While the two previous options try to
ensure that hashtables stay efficient regardless of the used keys, the last
option aims to simply detect malicious array keys and abort the script in
such a case.This is done by counting the number of collisions during hashtable
insertion operations and throwing a fatal error if this collisions count
exceeds a certain threshold.Here is an implementation of this mechanism for PHP:
https://github.com/php/php-src/pull/1565This is the approach I would recommend for PHP. The patch for this change
is small and non-intrusive and does not break ABI. From my testing it has
no adverse performance impact. (In microbenchmarks I found it to even be
consistently faster, as it uses a more specialized implementation for a
commonly used type of insertion operation.)While this is certainly less elegant than the alternatives, this would
allow us to fix the security issue in a timely matter. Switching to
SipHash, while maybe a possibility for the future, would still require a
significant amount of work to be at all feasible.
Hi Nikita,
few notes:
It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward
incompatibility with 7.0.0. However, we may reserve it for internal usage
removing ZEND_API.I don't think PHP should prevent user from construction of arrays he likes,
because of internal problems (like in your example).$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.I think only big arrays coming from external sources should be checked.
Your solution is incomplete, anyway, because of crafting a single list with
10000 collisions, attacker will able to use N lists with 1000 collisions.Thanks. Dmitry.
This is a good point but bear in mind to get the same effect as for 10000
collisions, you'd need 100 lists not 10
Hi Nikita,
few notes:
It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.
The addition of zend_hash_add_or_return() isn't an optimization, it is
required to ensure that the check happens in all relevant cases. The code
previously used a combination of zend_hash_find() and zend_hash_add_new().
However the whole purpose of the zend_hash_add_new() function is to skip
the part of the insertion operation where the collision count would be done.
At this point I could either a) also count collisions in zend_hash_find(),
which is imho not a good option as it's enough to count once and not at
every lookup, or b) avoid the combination of zend_hash_find() and
zend_hash_add_new() by introducing a new zend_hash_add_or_return() function
(which does the same as the previous combination but also performs the
collision count). Alternatively I would also simply make
zend_hash_add_new() the same as zend_hash_add(), which is probably not
wanted either.
Introduction of zend_hash_add_or_return() in 7.0.1 is going to make
forward incompatibility with 7.0.0. However, we may reserve it for internal
usage removing ZEND_API.
I'm fine with not marking it ZEND_API in 7.0.
I don't think PHP should prevent user from construction of arrays he
likes, because of internal problems (like in your example).$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.
I think only big arrays coming from external sources should be checked.
There is no way for PHP to know whether or not an array is constructed from
an external source. Yes, of course we can special case json_decode()
and
unserialize()
, but this would not fix the vulnerability for any array that
is created in PHP code. It's not exactly unusual to create arrays which are
in some way based on user input.
Your solution is incomplete, anyway, because of crafting a single list with
10000 collisions, attacker will able to use N lists with 1000 collisions.
One list with N*1000 collisions and N lists with 1000 collisions each have
very different asymptotic complexity. The former is O(N^2), while the
latter is O(N). The DOS attack is based on having O(N^2) complexity.
Nikita
Hi Nikita,
few notes:
It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.The addition of zend_hash_add_or_return() isn't an optimization, it is
required to ensure that the check happens in all relevant cases. The code
previously used a combination of zend_hash_find() and zend_hash_add_new().
However the whole purpose of the zend_hash_add_new() function is to skip
the part of the insertion operation where the collision count would be done.At this point I could either a) also count collisions in zend_hash_find(),
which is imho not a good option as it's enough to count once and not at
every lookup, or b) avoid the combination of zend_hash_find() and
zend_hash_add_new() by introducing a new zend_hash_add_or_return() function
(which does the same as the previous combination but also performs the
collision count). Alternatively I would also simply make
zend_hash_add_new() the same as zend_hash_add(), which is probably not
wanted either.
I got it now.
Introduction of zend_hash_add_or_return() in 7.0.1 is going to make
forward incompatibility with 7.0.0. However, we may reserve it for internal
usage removing ZEND_API.I'm fine with not marking it ZEND_API in 7.0.
I don't think PHP should prevent user from construction of arrays he
likes, because of internal problems (like in your example).$s = 1 << 16; $a = [];
for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.I think only big arrays coming from external sources should be checked.
There is no way for PHP to know whether or not an array is constructed
from an external source. Yes, of course we can special casejson_decode()
andunserialize()
, but this would not fix the vulnerability for any array
that is created in PHP code. It's not exactly unusual to create arrays
which are in some way based on user input.
If this is a real problem, we should better provide an API for "safe" array
construction from "unsafe" data.
Also, if such arrays are constructed from many elements, it may make sense
to insert all elements first and then rehash them.
This will make O(2*N) complexity, even if all of them have the same hash
value.
Your solution is incomplete, anyway, because of crafting a single list
with 10000 collisions, attacker will able to use N lists with 1000
collisions.One list with N*1000 collisions and N lists with 1000 collisions each have
very different asymptotic complexity. The former is O(N^2), while the
latter is O(N). The DOS attack is based on having O(N^2) complexity.
You are right of course about O(N) and O(N^2), but today affecting
data-locality may make more harm then list length.
The fact, that PHP7 is few times faster on your example, is caused by the
linear placement of the collisions list.
Once we start to work with randomized lists, in addition to complexity, we
will also suffer from stalls because of the cache misses.
Thanks. Dmitry.
Nikita
I think only big arrays coming from external sources should be checked.
I tend to agree here.
We discussed it with Remote last week. I was trying to explain why having a
crafted hash function for inputs may be better and safer. That includes
get/post/env/serialize/json and the likes.
The performance impact for these is most likely minimal for only them while
ensuring a better protection from a long term point of view.
I may be wrong and did not think much more than brainstorming about it. So
take it with a bit of salt :)