Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:89508 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 20308 invoked from network); 1 Dec 2015 09:50:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Dec 2015 09:50:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 209.85.213.172 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 209.85.213.172 mail-ig0-f172.google.com Received: from [209.85.213.172] ([209.85.213.172:35922] helo=mail-ig0-f172.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2E/85-13465-55D6D565 for ; Tue, 01 Dec 2015 04:50:14 -0500 Received: by igcph11 with SMTP id ph11so83813371igc.1 for ; Tue, 01 Dec 2015 01:50:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zend-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=seFvqa69tdpWiK7OvgNqqE765SOOBbG2KTip4A1qzds=; b=r5toqndC8r9J+upNsBOJBizT4Ke7Ga/cS9kUjT5TdbaGVV7PU4jl9eR5tawEkh4pBs pqzRAHGBYUeJqvxAwZqLRiVfo226gyoCJI+gaBGMZbUHQZd2hM+1g+x0QmpjvTTafagJ 2irccgaXkRrovsldUpkNGth2tnX3kjbSmFAJOicRnAgDEW8KbtUsblLtW4wFgg9ZCLCa a9NoDZFklQrCGJkcoy6qtkc0pYQZi2i+07UXmEZODE5tOyAcgURyx0gByIuQ0bLmcf5s AHt5eq5CEOgDQ2lsdA11SSKY8araMpiOvL+aDMNlvbayC6fe6VO7DEDH2sG8OPqLeolG zwNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=seFvqa69tdpWiK7OvgNqqE765SOOBbG2KTip4A1qzds=; b=KMi0gCTBeYWl4Kp79gIw8CZPNz8eTSdTJq+gB/DGmPAEN7Io+uwMntk7yVlrYhcnZd lNGf5DqkXW7Z30k1LENckdtXf06pNT6lnNPViruKCB4eq/ZJVtan77w/exdqg9KxNfER yNpwvXLVvU+EuCSx6sPBMMCaWWQ/SXyQmD3K651nG9kBVwFVQXf9kZjRLvCYWtvP3GOT YlvF9bj8PhbXivr7OcnXcR419TFa9EqYBm1EqOVER5k4pn4iZqLtwhLebWTinw/ZMtED TYS9Y370rQDG2k1xsshIJuRqVsYz5ZEsQovpy+D2rp76Z7meBhBk6f9mRXQ06Wegkc+E xFqg== X-Gm-Message-State: ALoCoQkWPp7OtCemGVmKil3VrbZ72LA2SOe59k0T71iKnnyk3Q/r2/GdLYLL7UlhhDmeVyYidjkBc0H+YUVUhiRFc02g9TAeSGD9h/1MV2fwD7RVNnPFiOpEXkiMyhj/FrcUr5e8r1u4FcRfQ0o0XXHrVR5PqMXkSlSFrPut5R6I99Nt+zR78jM= MIME-Version: 1.0 X-Received: by 10.50.56.114 with SMTP id z18mr25558277igp.62.1448963410714; Tue, 01 Dec 2015 01:50:10 -0800 (PST) Received: by 10.50.128.201 with HTTP; Tue, 1 Dec 2015 01:50:10 -0800 (PST) In-Reply-To: References: Date: Tue, 1 Dec 2015 12:50:10 +0300 Message-ID: To: Nikita Popov Cc: PHP internals , Anatol Belski , Remi Collet Content-Type: multipart/alternative; boundary=089e0158b3066387620525d31537 Subject: Re: [PHP-DEV] HashDos protection From: dmitry@zend.com (Dmitry Stogov) --089e0158b3066387620525d31537 Content-Type: text/plain; charset=UTF-8 Hi Nikita, few notes: It looks like the patch messes the attempt of catching the problem (few lines related to zend_hash_find_bucket changes) with optimization to compensate collision check overhead (everything else). I think it's better to separate these parts. Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward incompatibility with 7.0.0. However, we may reserve it for internal usage removing ZEND_API. I don't think PHP should prevent user from construction of arrays he likes, because of internal problems (like in your example). > $s = 1 << 16; $a = []; > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } It makes performance problem, but it doesn't mean we should kill it. We have "max_execution_time" to catch them all. I think only big arrays coming from external sources should be checked. Your solution is incomplete, anyway, because of crafting a single list with 10000 collisions, attacker will able to use N lists with 1000 collisions. Thanks. Dmitry. On Thu, Nov 26, 2015 at 8:24 PM, Nikita Popov wrote: > Hi internals! > > This mail turned out to be rather long, so I'll start with a TL;DR: > > To fix the HashDos vulnerability for *all* cases (rather than just GET/POST > parsing), I propose to introduce collision counting during hashtable > insertion operations. This will throw a fatal error if the number of > collisions during an insertion operation exceed a certain threshold. > > Implementation: https://github.com/php/php-src/pull/1565 > > From my testing the change has no negative performance impact. The change > is small and does not break ABI. > > Tracking bug (with various implementations): > https://bugs.php.net/bug.php?id=70644 > > What are your thoughts on this? > > Nikita > > --------- > > For those not familiar with HashDos or interested in some considerations > regarding alternative implementations, the original mail: > > In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in > the form of max_input_vars. As a recap, HashDos exploits the fact that > hashtables (which PHP uses ubiquitously) perform very badly if the hashes > of many elements collide. In particular the performance of inserting N > elements into a hashtable can degrade from O(N) to O(N^2) using carefully > chosen keys. > > A very simple demo script for creating a hashtable with many collisions: > > $s = 1 << 16; $a = []; > for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } > > This code creates an array with about 65k elements and runs in 30s (PHP > 5.6) or 4s (PHP 7) on my machine. > > This behavior can be exploited by an attacker whenever a server running PHP > can be made to create a decent-sized array (or object) with > attacker-controlled keys. This DOS vulnerability is efficient: A 700kb > payload can easily take 5 CPU seconds to process on PHP 7 and from there it > goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc). > > The most obvious attack vector are GET and POST variables, which are > automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue > by introducing max_input_vars, which prevents creating GET/POST arrays with > more than a certain number of elements. As the HashDos attack relies on > O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix. > > However, GET/POST are not the only ways an attacker can trigger > array/object creation with specific keys. For example many PHP applications > have endpoints that accept JSON encoded payloads, which are not protected > by max_input_vars. The only means of protection currently available to > userland code is to never decode JSON payloads that exceed some > conservative size limit like 50-100KB (often not practical). > > Apart from JSON, there will likely be various other application-specific > situations where arrays are generated which contain user-provided keys. > While we could add something similar to max_input_vars to json_decode and > unserialize, this would not solve the problem for any custom code. > > There are essentially three ways to prevent HashDos attacks in general > (rather than patching specific places): > > 1. If there are many collisions for a hash slot, switch collision > resolution from linked lists to self-balancing binary trees. > 2. Switch hashtables to a keyed cryptographic hash like SipHash. > 3. If there are very many collisions for a hash slot, throw a fatal error. > > I will comment on these possibilities as they apply to PHP. > > 1. (Self-balancing binary trees). The idea here is that a balanced binary > tree has worst-case insertion time O(log N), while the linked list we > normally use has worst-case insertion time O(N). This means that the > worst-case execution time for N insertions into a hashtable goes from > O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than > usual". > > I don't think this solution is viable for PHP. Supporting binary trees next > to linked list collision resolution will make the hashtable implementation > *a lot* more complicated -- and it isn't exactly simple as is. > > 2. (Randomized cryptographic hash). This approach doesn't change anything > about how collisions are handled, instead it prevents the attacker from > constructing such collisions in the first place. > > This is done by using a fast cryptographic hash function (typically > SipHash), which is keyed with a (cryptographically strong) random key. As > the attacker does not know the key, he cannot efficiently construct arrays > with many collisions. > > There are a number of factors to take into account when trying to apply > this to PHP: > > a) The SipHash key would have to be the same for all processes in an FPM > pool. We can't use a per-process or per-request key due to arrays living in > opcache SHM. As far as I know SipHash is designed to be secure even when > the used key is long living, so this does not appear to be an issue. > > b) SipHash is slower (especially for very small strings) than the hash > function we currently use (DJBX33A). As PHP 7 caches the hashes for > strings, I do not believe this to be a significant problem. > > c) The real issue: Currently PHP does not hash integer keys, but they are > just as susceptible to HashDos as string keys. To protect against this > vector we'd have to start hashing integer keys as well. Especially when > using a hash function as expensive as SipHash, this would make operations > on arrays with (non-ascending or non-dense) integer keys significantly > slower. > > Here is an implementation of using SipHash for both strings and integers: > https://github.com/php/php-src/compare/master...nikic:integerHash > > Testing this against Wordpress showed only a small (but non-trivial) > slowdown. This is probably attributable to the fact that most > integer-indexed arrays are packed (not using a hash) and thus this change > has less impact than it might have had on PHP 5. > > However, testing raw array operations, working on non-packed integer arrays > can easily be three times slower using this implementation. I do not think > this is acceptable. Maybe Wordpress doesn't use many unpacked integer > arrays, but that does not have to hold generally and this difference is > just too drastic. > > 3. (Fatal error on many collisions). While the two previous options try to > ensure that hashtables stay efficient regardless of the used keys, the last > option aims to simply detect malicious array keys and abort the script in > such a case. > > This is done by counting the number of collisions during hashtable > insertion operations and throwing a fatal error if this collisions count > exceeds a certain threshold. > > Here is an implementation of this mechanism for PHP: > https://github.com/php/php-src/pull/1565 > > This is the approach I would recommend for PHP. The patch for this change > is small and non-intrusive and does not break ABI. From my testing it has > no adverse performance impact. (In microbenchmarks I found it to even be > consistently faster, as it uses a more specialized implementation for a > commonly used type of insertion operation.) > > While this is certainly less elegant than the alternatives, this would > allow us to fix the security issue in a timely matter. Switching to > SipHash, while maybe a possibility for the future, would still require a > significant amount of work to be at all feasible. > > Nikita > > --089e0158b3066387620525d31537--