Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:72434 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 29499 invoked from network); 10 Feb 2014 09:29:35 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Feb 2014 09:29:35 -0000 Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.128.172 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.128.172 mail-ve0-f172.google.com Received: from [209.85.128.172] ([209.85.128.172:54849] helo=mail-ve0-f172.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9E/F1-18799-DFB98F25 for ; Mon, 10 Feb 2014 04:29:34 -0500 Received: by mail-ve0-f172.google.com with SMTP id c14so4702650vea.17 for ; Mon, 10 Feb 2014 01:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YTx/HwG6f2i7G04Y2Oa+rUTcE5Iwp4CayZZXY7bTHXY=; b=u6O+CHDCAfyWzu2p+M+E4apiqLBFawLFObbsY9mMBcqYN5Jo6HqDmCANepXP0USxs1 RlfaPnKGeuxFcN6OPgFtfngBD5lenA/LGDn9Dn3hiJu7feLc2+EocDI6hWR35VHB/KKh zw6hhfn9bYIho4EUQY7W0mFWLTHaIb6nVyfGRO7BQJ5Jcwb9p5+X6g3tdzTqLi/oD6RC 3blSjQSHT/XG7m9gggPvKSJThlAMmQHwagMDYpHvzo4wMn6KXwIdlhJvt5f+hA3h6seP cG00MOLAPdczaG4PGAxFpBmolcB5Urp9K0xzsa78mRf/94nq7C/DHh6Q81/cuucfx3ug 7j1g== MIME-Version: 1.0 X-Received: by 10.52.27.9 with SMTP id p9mr610329vdg.28.1392024571219; Mon, 10 Feb 2014 01:29:31 -0800 (PST) Received: by 10.58.133.229 with HTTP; Mon, 10 Feb 2014 01:29:31 -0800 (PST) In-Reply-To: References: <9E3AA302-1EC1-4497-996F-716555CAAB64@rouvenwessling.de> Date: Mon, 10 Feb 2014 17:29:31 +0800 Message-ID: To: Yasuo Ohgaki Cc: =?ISO-8859-1?Q?Rouven_We=DFling?= , PHP internals Content-Type: multipart/alternative; boundary=20cf307d010416159a04f209faee Subject: Re: [PHP-DEV] [VOTE] Timing attack safe string comparison function From: tjerk.meesters@gmail.com (Tjerk Meesters) --20cf307d010416159a04f209faee Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, On Mon, Feb 10, 2014 at 9:15 AM, Yasuo Ohgaki wrote: > Hi all, > > On Mon, Feb 10, 2014 at 9:24 AM, Yasuo Ohgaki wrote: > > > On Fri, Feb 7, 2014 at 10:39 AM, Yasuo Ohgaki > wrote: > > > >> On Fri, Feb 7, 2014 at 8:05 AM, Yasuo Ohgaki > wrote: > >> > >>> I made SipHash version of str_compare() as a sample. > >>> There is timing safe php_compare(), which is stolen from BSD. > >>> > >>> https://github.com/yohgaki/php-src/compare/PHP-5.6-rfc-hash-compare > >>> > >>> [yohgaki@dev github-php-src]$ ./php-bin -r > 'var_dump(str_compare("abc", > >>> "abc"));' > >>> bool(true) > >>> [yohgaki@dev github-php-src]$ ./php-bin -r > >>> 'var_dump(str_compare("asfasdf", "slkjojoeiwrj"));' > >>> bool(false) > >>> > >>> It's quick patch made less than 30 min. > >>> So it can be improved, I suppose. > >>> > >> > >> I thought it would be better to compare performance difference. > >> Added more functions to play with. > >> There are > >> > >> bool str_siphash_compare(str, str) - siphash. timing safe. (64bit) > >> bool str_xxhash32_compare(str, str) - xxhash. timing safe. (32bit) > >> bool str_md5_compare(str, str) - md5. Timing safe (128bit) > >> bool str_byte_compare(str, str) - Byte compare. Timing safe. No > >> division. > >> bool str_byte_compare2(str, str) - Byte compare. Timing safe. With > >> division. (Modulo as this RFC) > >> bool str_compare(str, str) - plain strncmp(). Not timing safe. > >> > >> I didn't took bench mark and did minimum tests. > >> I appreciate if anyone take benchmark. > >> > > > > Added yet another function to compare suggested by Lester. > > > > bool str_word_compare(str, str) > > > > This function compares data word by word rather than byte by byte. > > It supposed to be faster for large data. > > > > I took a benchmark. str_compare() is not timing safe. It's there for > reference. > > str_siphash_compare Elapsed: 1.389824 Iterations: 1000000 DataSize: 8 > str_xxhash32_compare Elapsed: 1.241737 Iterations: 1000000 DataSize: 8 > str_md5_compare Elapsed: 3.029127 Iterations: 1000000 DataSize: 8 > str_byte_compare Elapsed: 1.236183 Iterations: 1000000 DataSize: 8 > str_byte_compare2 Elapsed: 1.269901 Iterations: 1000000 DataSize: 8 > str_word_compare Elapsed: 1.273266 Iterations: 1000000 DataSize: 8 > str_compare Elapsed: 1.181425 Iterations: 1000000 DataSize: 8 > > str_byte_compare() is the winner for small data. > I'm a little surprised that str_xxhash32_compare() is the second. > str_word_compare() is marginally slower. > > str_siphash_compare Elapsed: 2.341025 Iterations: 1000000 DataSize: 12= 8 > str_xxhash32_compare Elapsed: 1.560131 Iterations: 1000000 DataSize: 12= 8 > str_md5_compare Elapsed: 6.055007 Iterations: 1000000 DataSize: 12= 8 > str_byte_compare Elapsed: 1.799050 Iterations: 1000000 DataSize: 12= 8 > str_byte_compare2 Elapsed: 2.163229 Iterations: 1000000 DataSize: 12= 8 > str_word_compare Elapsed: 1.337508 Iterations: 1000000 DataSize: 12= 8 > str_compare Elapsed: 1.194582 Iterations: 1000000 DataSize: 12= 8 > > str_word_compare() is the winner for relatively large data. > > It seems str_word_compare() is the way to go. > Quite honestly, I don't think that an absolute difference of 0.46 =CE=BCs p= er function call should be used to pick the "winner" from the line-up you're made. That said, the implementation of `str_word_compare` introduces a `MIN()` branch and the second loop should loop until `MAX(b1_len, b2_len)` instead of `b2_len` alone; imo you've just made it harder to prove that it actually prevents timing attacks. From what has been discussed so far, to me, the question that needs an answer is: "Should we have a function that attempts to provide a constant comparison for strings of different lengths?" If not, reduce the function to just a length mismatch branch and a loop; length information is leaked as long as both strings are of difference lengths only. If so, then we need to create and run a simulation program to prove that a particular algorithm is indeed suitable with a certain amount of confidence for the most common scenarios. > > Regards, > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net > --=20 -- Tjerk --20cf307d010416159a04f209faee--