Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96087 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 3732 invoked from network); 21 Sep 2016 23:30:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Sep 2016 23:30:46 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.182 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.213.182 mail-yb0-f182.google.com Received: from [209.85.213.182] ([209.85.213.182:34309] helo=mail-yb0-f182.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A2/50-01233-52813E75 for ; Wed, 21 Sep 2016 19:30:45 -0400 Received: by mail-yb0-f182.google.com with SMTP id x93so46729824ybh.1 for ; Wed, 21 Sep 2016 16:30:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=bWbb4cwN5ZQTQtoYW7ZkYZVDBl6YcogqYRrk1IM8wB8=; b=hwcaapyQeU8zHFLUY8iWuSC7BAI8qQ3pGNmvwSGW9pUGV0ABhUE6GAZF5aIN3s6X9d F1mS7PgLObe2kTU7z5lauDI090xWl93Q0oqmseM/S976y2f/UwmcwfMfofrp3NWu5xvo C/RG8Q7ojq6O5Ib1skd6bt37wpxdcKykwaRQbuOhDkbdCC9HuFmik1RRQxXEuRmxVHQx boOj8h1LzUM++ahafwc/olCzo6mz5wflkBZg5+h8jyig9YTv0YY7eAH8fTila4n2Ohrp uugapQGijtNzdF5aZmAmgU2Hd/ggFZ6T1YsqmjK9A3aaKSlaUdW3sDnrxDOU0D926nq3 4F6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=bWbb4cwN5ZQTQtoYW7ZkYZVDBl6YcogqYRrk1IM8wB8=; b=FFrCIS3nEoZ7s+nhwAoHiW+UCm35iJYdput4SCrrH+z2+xAEbOF1AXtbjVCNy2dfQf 7sIKTnUIPXDfXCTsxdV7Qv3V+DAEsCpv+TtVRiGwWH/wkG988gqP4Ps5cBkPqH5fA21l MOqEn+KB86rdhIDQD5dK1wYb1dW6Z1AzgeCcc+O1SAGW7GoMZq1A9D+bkTMggwJqLSRp /xeKaWGMM4lBIobcauS+tBAFCHjmXN2/f/n7kCOC9fiIUKA3wRedUKvciPFNi/PdVrf1 CcXH/ujJwydd/GuB2mNYYFDH4+10qclGqCv1fzyXXrsgWH7i0J8FD1FcszY9d2eLLdwH nMOg== X-Gm-Message-State: AE9vXwN14hFS+URyOuqovuuBff43ghckcCrM+0IN8/07WI2czqLEkcgUZ8qndf9EZRT7GQ2MjjJ7jtiwzzFPSQ== X-Received: by 10.37.163.195 with SMTP id e61mr7229230ybi.81.1474500642972; Wed, 21 Sep 2016 16:30:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.215.150 with HTTP; Wed, 21 Sep 2016 16:30:42 -0700 (PDT) In-Reply-To: References: <7d5727ba-da33-e3c5-1d1f-318c45d81616@cubiclesoft.com> <9522ebc9-8d8b-045e-b701-02f1166063e6@gmail.com> <40868951-8BDA-4860-884C-B8252C1839E3@gmail.com> Date: Thu, 22 Sep 2016 01:30:42 +0200 Message-ID: To: Stanislav Malyshev Cc: PHP Internals Content-Type: multipart/alternative; boundary=f403045c05f80bad37053d0cefda Subject: Re: [PHP-DEV] HashDoS From: nikita.ppv@gmail.com (Nikita Popov) --f403045c05f80bad37053d0cefda Content-Type: text/plain; charset=UTF-8 On Thu, Sep 22, 2016 at 1:07 AM, Stanislav Malyshev wrote: > Hi! > > >> Lets [try] to quantify the probability of reaching the collision limit C > > with a hashtable of size N and assuming a random hash distribution. The > > upper bound for this should be (N choose C) * (1/N)^(C-1), with > (1/N)^(C-1) > > being the probability that C elements hash to one value and (N over C) > the > > number of C element subsets of an N element set. For large N and N >> C > we > > approximate (N choose C) to (e*N/C)^C / sqrt(2pi*C). As such our upper > > bound becomes N * (e/C)^C / sqrt(2pi*C). Choosing N = 2^31 (largest > > possible hashtable) we get for C = 20 probability 8.9E-10 and for C = 100 > > probability 2.3E-149. The patch uses C = 1000. > > I assume you're talking here about per-hashtable limit? > Here I notice two things: > > 1. Hash keys are not really random. Which can both improve and make it > worse, but we have a lot of keys named "key" and not a lot of keys named > "\x07\xF5\xDD". > > 2. This does not account for the fact that if we have two keys used by > app colliding, we'll probably see this collision a lot. Of course, if > we're counting per-hashtable, that would make is somewhat less, but still. > > To validate this theory, I applied the following patch: > > https://gist.github.com/smalyshev/2c3c085998ba81d6c163a27386aaacd8 > > And run composer update on one of my projects. I got these lines: > > Destroyed hash with coll = 48454 > Destroyed hash with coll = 41871 > Destroyed hash with coll = 41828 > Destroyed hash with coll = 40535 > Destroyed hash with coll = 32774 > Destroyed hash with coll = 22782 > Destroyed hash with coll = 21381 > Destroyed hash with coll = 15232 > Destroyed hash with coll = 14745 > Destroyed hash with coll = 12216 > Destroyed hash with coll = 11742 > Destroyed hash with coll = 11625 > Destroyed hash with coll = 8916 > Destroyed hash with coll = 6820 > Destroyed hash with coll = 5911 > Destroyed hash with coll = 5429 > Destroyed hash with coll = 3390 > Destroyed hash with coll = 1188 > > I.e. there were 18 hashes with more than 1000 collisions on the course > of the run. Is there something wrong with my count or am I counting > wrong things (I didn't see the patch for limiting counts so maybe I'm > still counting wrong things). > > > In other words, it is extremely unlikely that you hit the collision limit > > by accident, with non-malicious data. So no, the user does not have to > > discriminate normal from malicious causes. If the limit triggers, it's > > malicious. > > I'm not so sure of this now - unless, again, I misunderstand what we're > counting. As linked in my first reply, there was a previous discussion on the topic: http://markmail.org/message/ttbgcvdu4f7uymfb The collision-counting approach that Yasuo references is linked at the start of the first mail: https://github.com/php/php-src/pull/1565 Collisions are counted during insertion operations. While we perform a hashtable insertion, we first need to iterate over the existing buckets for a certain hash to see if the insert is actually an update. The implementation simply keeps track of how many buckets we inspect while doing that. If the number is above a certain threshold, an error is generated. As such, the number of collisions per hashtable is not important. Only the number of collisions for a certain hash-value is relevant. Getting C=1000 collisions for a single hash-value is extremely unlikely until you explicitly design your keys to trigger this situation. Nikita --f403045c05f80bad37053d0cefda--