Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:46119 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19507 invoked from network); 20 Nov 2009 02:56:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Nov 2009 02:56:25 -0000 Authentication-Results: pb1.pair.com header.from=matt@bitwarehouse.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=matt@bitwarehouse.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain bitwarehouse.com from 72.14.220.156 cause and error) X-PHP-List-Original-Sender: matt@bitwarehouse.com X-Host-Fingerprint: 72.14.220.156 fg-out-1718.google.com Received: from [72.14.220.156] ([72.14.220.156:15041] helo=fg-out-1718.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CE/F1-04175-855060B4 for ; Thu, 19 Nov 2009 21:56:25 -0500 Received: by fg-out-1718.google.com with SMTP id e12so2757967fga.11 for ; Thu, 19 Nov 2009 18:56:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.87.75 with SMTP id x53mr271190wee.13.1258685781446; Thu, 19 Nov 2009 18:56:21 -0800 (PST) In-Reply-To: <13.A3.65535.03CC50B4@pb1.pair.com> References: <13.A3.65535.03CC50B4@pb1.pair.com> Date: Thu, 19 Nov 2009 20:56:21 -0600 Message-ID: To: jvlad Cc: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [PHP-DEV] Re: clarification on maximum string sizes in PHP on 64 bit linux From: matt@bitwarehouse.com (Matt Wirges) On Thu, Nov 19, 2009 at 4:52 PM, jvlad wrote: >> Code: >> > $s = str_repeat('A', pow(2,30)); >> $t = $s.str_repeat('B', pow(2,30));; // fails with segfault >> printf("strlen: %u last-char: %s", strlen($s), substr($s, pow(2,30)-1)); >> ?> >> --- >> Result: >> ./sapi/cli/php -d memory_limit=-1 a2.php >> >> Fatal error: Out of memory (allocated 2148270080) (tried to allocate >> 18446744071562067969 bytes) in /home/matt/tmp/php-src-5.2/a2.php on >> line 3 >> ---- > > hmmm, 18446744071562067969 is 0xFFFFFFFF80000001 > it seems a 32bit variable was used somewhere in the calculations and was > assigned to a 64bit signed int. > > what particular version of php did you use? > Did you try 5.3.1RC4? 5.2.12RC1? > > I'd try myself if I had 4GB of RAM. > > -jv I've tried using PHP 5.2.11, 5.3.0, and PHP 5.2 svn branch as of this morning (when verifying the bug fix). Perhaps I'm looking at this naively, but from what I can tell in the source, the length of a string is stored as a signed int in the zvalue_value union. It seems that the string operations within PHP expect sizeof(pointer) and sizeof(size_t) to be 32 bit (and of course unsigned). However, on 64bit system they are 64bit (and unsigned). Focusing for the moment again on the concat_function in Zend/zend_operators.c: 1203 if (result==op1) { /* special case, perform operations on result */ 1204 uint res_len = op1->value.str.len + op2->value.str.len; 1205 1206 if (Z_STRLEN_P(result) < 0) { 1207 efree(Z_STRVAL_P(result)); 1208 ZVAL_EMPTY_STRING(result); 1209 zend_error(E_ERROR, "String size overflow"); 1210 } 1211 1212 result->value.str.val = erealloc(result->value.str.val, res_len+1); 1213 1214 memcpy(result->value.str.val+result->value.str.len, op2->value.str.val, op2->value.str.len); 1215 result->value.str.val[res_len]=0; 1216 result->value.str.len = res_len; 1217 } else { The problem with the segfault in memcpy from bug 50207 was that the pointer result->value.str.val is a 64 bit unsigned integer, and of course result->value.str.len is a signed 32 bit integer. The value of result->value.str.len is implicitly cast then to unsigned 64 bit int, which of course ends up with us trying to add a multi-exabyte offset to the original string pointer and thus segfaulting on access. Of course the bug fix (lines 1206-1210) prevents this now, but doesn't allow us to in-place concatenate two strings whose initial length is 2^31 or greater. If you look at the other half of the concat operation: 1217 } else { 1218 result->value.str.len = op1->value.str.len + op2->value.str.len; 1219 result->value.str.val = (char *) emalloc(result->value.str.len + 1); 1220 memcpy(result->value.str.val, op1->value.str.val, op1->value.str.len); 1221 memcpy(result->value.str.val+op1->value.str.len, op2->value.str.val, op2->value.str.len); 1222 result->value.str.val[result->value.str.len] = 0; 1223 result->type = IS_STRING; 1224 } on line 1213 we pass result->value.str.len, which again is a 32 bit signed integer, to emalloc which expects it to be size_t. It is implicitly cast to an unsigned 64 bit integer. In the example in my previous email, when the length of the new string overflows the 32 bit signed int, we'll get huge values for the amount to attempt to allocate for the new string. -m