Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43393 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94465 invoked from network); 18 Mar 2009 17:52:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Mar 2009 17:52:46 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 212.25.124.163 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 212.25.124.163 il-gw1.zend.com Windows 2000 SP4, XP SP1 Received: from [212.25.124.163] ([212.25.124.163:43448] helo=il-gw1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id BA/C3-26222-AE431C94 for ; Wed, 18 Mar 2009 12:52:44 -0500 Received: from ws.home ([10.1.10.6]) by il-gw1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 18 Mar 2009 19:54:11 +0200 Message-ID: <49C134E5.7020706@zend.com> Date: Wed, 18 Mar 2009 20:52:37 +0300 User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: Matt Wilmas CC: internals@lists.php.net, Lukas Kahwe Smith , =?ISO-8859-1?Q?Johannes_Schl=FCter?= References: <1113CE12226949C2939A31971420991F@pc1> <49C0A7C7.8000804@zend.com> <12E613FAA1C9422B948F00B61AD32366@pc1> In-Reply-To: <12E613FAA1C9422B948F00B61AD32366@pc1> Content-Type: multipart/mixed; boundary="------------000309000301070204090301" X-OriginalArrivalTime: 18 Mar 2009 17:54:11.0217 (UTC) FILETIME=[8AF78010:01C9A7F2] Subject: Re: [PATCH] Bug #45877: LONG_MAX/MIN array key as string/int From: dmitry@zend.com (Dmitry Stogov) --------------000309000301070204090301 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Matt, I suppose I fixed the patch. Could you please check which patch is better yours or the attached one? According to attached benchmark my one is faster for most usual cases, but may be I forget something again. $a["abcdefghij"] 0.130 0.130 $a["1234567890"] 0.187 0.104 $a["2147483648"] 0.168 0.142 $a["0"] 0.116 0.082 $a["214748364x"] 0.136 0.141 $a[0] 0.080 0.080 Also, do you have other patches which I could look into before RC? Thanks. Dmitry. Matt Wilmas wrote: > Hi Dmitry, > > ----- Original Message ----- > From: "Dmitry Stogov" > Sent: Wednesday, March 18, 2009 > >> BTW may be we should eliminate strtol() at all. >> There's no need to scan the string twice. > > Your change to remove strtol() [1] is not checking for overflow > correctly (for example, zend_u_strtol()'s checks are more complicated). > This breaks handling of keys above ULONG_MAX (it's correct again after > ULONG_MAX+LONG_MAX, until ULONG_MAX * 2, etc.). See: > > var_dump(array('5000000000' => 1)); > > array(1) { > [705032704]=> > int(1) > } > > And of course the new code is a bit slower for keys that aren't fully > numeric, e.g. "123a" > > [1] http://news.php.net/php.zend-engine.cvs/7465 > >> Dmitry. > > - Matt > >> Matt Wilmas wrote: >>> Hi all, >>> >>> Assuming there are no objections, I'll commit this fix in a few hours... >>> >>> Besides the bug report(s), I had also found awhile ago that currently >>> an array key can be LONG_MAX or LONG_MIN as a string and/or integer >>> because of a check in ZEND_HANDLE_NUMERIC() (I assume to avoid a slow >>> errno check for ERANGE originally). I changed it to use the *same >>> method* that's used in the scanner, is_numeric_string(), etc., and >>> those 2 values are now treated as an integer. >>> >>> It's just a few lines changed in zend_hash.h, although I had to move >>> some of the definitions from zend_operators.h to zend.h (is that OK?). >>> >>> Patches (didn't check HEAD's yet): >>> http://realplain.com/php/array_key_limit.diff >>> http://realplain.com/php/array_key_limit_5_3.diff >>> >>> >>> - Matt > --------------000309000301070204090301 Content-Type: text/plain; name="zend_hash.h.diff.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="zend_hash.h.diff.txt" Index: Zend/zend_hash.h =================================================================== RCS file: /repository/ZendEngine2/zend_hash.h,v retrieving revision 1.78.2.2.2.2.2.10 diff -u -p -d -r1.78.2.2.2.2.2.10 zend_hash.h --- Zend/zend_hash.h 18 Mar 2009 09:48:55 -0000 1.78.2.2.2.2.2.10 +++ Zend/zend_hash.h 18 Mar 2009 17:05:53 -0000 @@ -306,18 +306,42 @@ END_EXTERN_C() #define ZEND_HANDLE_NUMERIC(key, length, func) do { \ register const char *tmp = key; \ - const char *end = key + length - 1; \ - long idx; \ \ if (*tmp == '-') { \ tmp++; \ } \ - if ((*tmp >= '1' && *tmp <= '9' && (end - tmp) < MAX_LENGTH_OF_LONG) || \ - (*tmp == '0' && (end - tmp) == 1)) { \ - /* possibly a numeric index without leading zeroes */ \ - idx = (*tmp++ - '0'); \ - while (1) { \ - if (tmp == end && *tmp == '\0') { /* a numeric index */ \ + if (*tmp >= '0' && *tmp <= '9') { /* possibly a numeric index */ \ + const char *end = key + length - 1; \ + long idx; \ + \ + if (*end != '\0') { /* not a null terminated string */ \ + break; \ + } else if (*tmp == '0') { \ + if (end - tmp != 1) { \ + /* don't accept numbers with leading zeros */ \ + break; \ + } \ + idx = 0; \ + } else if (end - tmp > MAX_LENGTH_OF_LONG - 1) { \ + /* don't accept too long strings */ \ + break; \ + } else { \ + if (end - tmp == MAX_LENGTH_OF_LONG - 1) { \ + end--; /* check overflow in last digit later */ \ + } \ + idx = (*tmp++ - '0'); \ + while (tmp != end && *tmp >= '0' && *tmp <= '9') { \ + idx = (idx * 10) + (*tmp++ - '0'); \ + } \ + if (tmp != end) { \ + break; \ + } \ + if (end != key + length - 1) { \ + /* last digit can cause overflow */ \ + if (*tmp < '0' || *tmp > '9' || idx > LONG_MAX / 10) { \ + break; \ + } \ + idx = (idx * 10) + (*tmp - '0'); \ if (*key == '-') { \ idx = -idx; \ if (idx > 0) { /* overflow */ \ @@ -326,13 +350,11 @@ END_EXTERN_C() } else if (idx < 0) { /* overflow */ \ break; \ } \ - return func; \ - } else if (*tmp >= '0' && *tmp <= '9') { \ - idx = (idx * 10) + (*tmp++ - '0'); \ - } else { \ - break; \ + } else if (*key == '-') { \ + idx = -idx; \ } \ } \ + return func; \ } \ } while (0) --------------000309000301070204090301--