Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:96061
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <d2080444-9a17-b3db-89fa-c383155a5756@thefsb.org>
References: <CAKws9z2EWkNNhM7tOMh-_pQ3iLjVj7hbXTB1Hh3a19utPzj-9A@mail.gmail.com>
 <e741fec3-b301-647c-b24c-dec2fa92554c@gmail.com> <7d5727ba-da33-e3c5-1d1f-318c45d81616@cubiclesoft.com>
 <CAKws9z38_QhH8rnshJu=SP1Aav7jhMnVR2H992+Lse7iMeQ3oA@mail.gmail.com>
 <ffc669b5-8764-3879-0c5a-d047d75c9b45@gmail.com> <CAGa2bXYad=5CNKUR4GTR-uB8L1Gb-2CJk5Fiperz5D_CTL+rxw@mail.gmail.com>
 <9522ebc9-8d8b-045e-b701-02f1166063e6@gmail.com> <CAGa2bXYNZwKrT=Ei8r1DghGjqOGiMUVPEGY31wYK9Hpt+C9XMQ@mail.gmail.com>
 <CANUQDCjcnFdPyG133z6g0DdRLZjUG=+BRhcv4Es9Z+WeoU-Prg@mail.gmail.com>
 <CAFGtT7Zd3+iRuaJtwpML3hO6Kbdu7YtnNOE14zvA78E=QiNprg@mail.gmail.com>
 <40868951-8BDA-4860-884C-B8252C1839E3@gmail.com> <d2080444-9a17-b3db-89fa-c383155a5756@thefsb.org>
Date: Wed, 21 Sep 2016 16:23:11 +0200
Message-ID: <CAF+90c-uFg9A2f1SVYTNgjQuFX5PkenrtXd8kf37cYoG0EfgEw@mail.gmail.com>
To: Tom Worster <fsb@thefsb.org>
Cc: Rowan Collins <rowan.collins@gmail.com>, PHP Internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=94eb2c0872daf55e44053d054801
Subject: Re: [PHP-DEV] HashDoS
From: nikita.ppv@gmail.com (Nikita Popov)

--94eb2c0872daf55e44053d054801
Content-Type: text/plain; charset=UTF-8

On Wed, Sep 21, 2016 at 4:05 PM, Tom Worster <fsb@thefsb.org> wrote:

> On 9/21/16 8:37 AM, Rowan Collins wrote:
>
>> On 21 September 2016 13:02:20 BST, Glenn Eggleton <geggleto@gmail.com>
>> wrote:
>>
>>> What if we had some sort of configuration limit on collision length?
>>>
>>
>> Previous discussions have come to the conclusion that the difference
>> between normal collision frequency and sufficient for a DoS is so large
>> that the only meaningful settings would be on or off. e.g. the proposed
>> limit is 1000, and randomly inserting millions of rows produces about 12.
>>
>> The problem with long running applications is not that they need to raise
>> the limit, it's that they need to handle the error gracefully if they are
>> in fact under attack. Because hash tables are so ubiquitous in the engine,
>> there's no guarantee that that's possible, so an attacker would have the
>> ability to crash the process with the limit turned on, or hang the CPU with
>> the limit turned off.
>>
>
> Right. It seems like count-and-limit pushes the problem onto the user who
> then has to discriminate normal from malicious causes for rising counters
> and find appropriate actions for each.
>
> Even a sophisticated user who understands hash collision counters may not
> welcome this since it adds complexity that's hard to test and involves
> questionable heuristics.
>
> Tom
>

Quoting a relevant part of the previous discussion:

> Lets [try] to quantify the probability of reaching the collision limit C
with a hashtable of size N and assuming a random hash distribution. The
upper bound for this should be (N choose C) * (1/N)^(C-1), with (1/N)^(C-1)
being the probability that C elements hash to one value and (N over C) the
number of C element subsets of an N element set. For large N and N >> C we
approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper
bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest
possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100
probability 2.3E-149. The patch uses C = 1000.

In other words, it is extremely unlikely that you hit the collision limit
by accident, with non-malicious data. So no, the user does not have to
discriminate normal from malicious causes. If the limit triggers, it's
malicious.

Nikita

--94eb2c0872daf55e44053d054801--