Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:25323 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28636 invoked by uid 1010); 11 Aug 2006 15:10:11 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 28621 invoked from network); 11 Aug 2006 15:10:10 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 11 Aug 2006 15:10:10 -0000 Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; domainkeys=good DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 66.249.92.175 ug-out-1314.google.com Linux 2.4/2.6 Received: from ([66.249.92.175:21241] helo=ug-out-1314.google.com) by pb1.pair.com (ecelerity 2.1.1.3 r(11751M)) with ESMTP id 88/B6-28296-1DD9CD44 for ; Fri, 11 Aug 2006 11:10:10 -0400 Received: by ug-out-1314.google.com with SMTP id k3so897120ugf for ; Fri, 11 Aug 2006 08:10:06 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=LvBzP5BBExaHfG1mqxCI1H9KopVKbhsd+klEy0u21faxT5c10UbWKkhYqYEygQaZYtC2ziDQ7ngu4ZqeulyXGjykG3woGxuvGI+WoxGDM7P8pcOLdSBb8EVk23ByRb7a5ohFNUcZrFLG+bwV+XBozmKni0Z5fKCxGsin8n2rLcs= Received: by 10.67.89.5 with SMTP id r5mr4201079ugl; Fri, 11 Aug 2006 08:10:06 -0700 (PDT) Received: by 10.66.248.15 with HTTP; Fri, 11 Aug 2006 08:10:06 -0700 (PDT) Message-ID: Date: Fri, 11 Aug 2006 17:10:06 +0200 To: "Matt W" Cc: internals@lists.php.net In-Reply-To: <015b01c6bd4b$c6c47b00$0201a8c0@pc1> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <005101c6b930$83f30b30$0201a8c0@pc1> <00fa01c6bd37$d61d1940$0201a8c0@pc1> <20060811141931.2ef39365@pierre-u64> <015b01c6bd4b$c6c47b00$0201a8c0@pc1> Subject: Re: [PHP-DEV] is_numeric_string causes function inconsistency From: pierre.php@gmail.com (Pierre) Hi Matt, On 8/11/06, Matt W wrote: > Hello Pierre, > > Thanks for your reply. :-) > > ----- Original Message ----- > From: "Pierre" > Sent: Friday, August 11, 2006 > > > > Hello, > > > > Note that I also answer your previous mail here :) > > > > On Fri, 11 Aug 2006 06:18:13 -0500 > > php_lists@realplain.com ("Matt W") wrote: > > > > > Hello again, > > > > > > I discovered a couple more things is_numeric... is causing problems > > > with (leading whitespace). I doubt any of the examples I've given > > > make sense to regular users who don't know what's happening behind > > > the scenes. Add these to the "wrong" list: > > > > > > is_numeric(' .123') // bool(false) > > > > this one should return true. > > > > > ' .123' + 0 // int(0) > > > > ' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should > > then result in 0.123+0, which is the correct behavior. > > I may be misunderstanding you, but ' 0.123'+0 results in the correct 0.12= 3. > Just without the leading 0 that it becomes wrong. :-) I think we should consider '.123' as 0.123. This expression is then correct= . > > > One more thing I was curious about as far as keeping things > > > consistent is with is_numeric... (and therefore > > > convert_scalar_to_number()), hex strings are allowed/work, but not > > > with convert_to_[long|double](). > > > > I did not check convert_* while fixing/enhancing filters, but I think > > there is a higher risk of breakages if you change these functions. We > > should first have a clear view of what is used where and how the > > changes affect end user scripts and extensions. It sounds like an > > impossible task (except for 6.0). > > I was just wondering if convert_to_[long|double] should check for and all= ow > hex strings like convert_scalar_to_number does (because it uses > is_numeric...). > > > I suggest you to take a look at the ext/filter code and what we accept. > > I spend a far amount of times to ask and listen to users to see what > > they expext. I'm quite happy with the current state and for what I > > hear, the users too. > > > > You can check the FILTER_VALIDATE_* mode, they do the same operations > > that we are discussing here. The sanitize mode only checks for > > unexpected chars. > > Have to admit that I'm not really familiar with any of the filter stuff. = :-/ > I'll keep that in mind though. Please take a look, it really solves many of these issues. > > > So a few PHP functions properly > > > accept hex strings, but most will convert one to 0. Should anything > > > be done about this difference? I have an idea about allowing hex > > > strings in to_[long|double] using the new is_numeric... functions I > > > will propose. > > > > > Few things about the current is_numeric... and hex strings, which I > > > think I'll change in my proposal unless I hear opinions otherwise: > > > *) Leading whitespace isn't allowed > > > > They should be allowed (leading/ending). > > Not sure if you're talking about currently, or agreeing with me. ;-) For > these 3 points, I was only referring to hex strings. Leading space is > allowed with non-hex. I would like to allow them for all types (float, integer or hex), it is what I did in ext/filter. > > > *) A sign (=B1) isn't allowed > > > > It is allowed except for in the hexadecimal notation (see the manual > > page of is_numeric), so if you talk only about is_numeric and the hex > > notation, it is a bug fix. > > Again, only referring to hex. Ah yes, I see the manual page for > is_numeric(). Hmm OK, not sure if you'd want that to be changed then... > The internal function would be fine ($n =3D -0xABC works in the parser), = I > guess, but maybe sign & hex returning true with PHP's is_numeric() is > undesired. +/- are not allowed for hex. I think we should make the difference between a string conversion and the parser. (a) $a =3D - 0xFF (b) $a =3D " - 0xFF; " (a) is a perfectly valid expression within a _script_ (just like $a=3D-2;), however (b) is a string and will require a _cast_ to INT. The two cases should not be processed the same way. (a) can be read as $a =3D -1; $a *=3D 0xFF; (b) is only a string assignement, it will be casted to INT when required and failed (int(0)). > > *) Hex doubles don't work. I think they should (for *whole* numbers > > > only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on > > > a 32-bit system. > > > > They should not, an hexadecimal notation represents an integer (long), > > not a double. A double could be the result of a cast when it is out of > > the integer range. > > Well, I think of the hex notation as just a whole number (non-floating) o= f > whatever range/size. About the cast, yeah, I see that's what is done now= in > the parser if the hex number is between LONG_MAX and ULONG_MAX -- results= in > a double. hexdec(), etc. will also return a double if needed. > > Right now, since hex doubles don't work, you also have (on 32-bit): I don't understand what you mean by hex double :) Do you mean that we should convert out of range HEX to double in any case? > is_numeric('0x7FFFFFFF') // bool(true) > is_numeric('0x80000000') // bool(false) > > > > If that last one can be changed, it also should be in the language > > > parser of course (you know, for $n =3D 0xFFFFFFFFFF;). > > > > It is the endless problem about 32/64bits issues, also I don't think > > you are considering to use double in a for loop? :) > > In a for loop of a script? No :-), but someone may want to specify a num= ber > in hex larger than ULONG_MAX (and it may work if they're 64-bit, then bre= ak > on 32-bit). I've not had a need for it (in parser), but I would like the > larger hex strings to work as I have code like ('0x' . $hexstr) + 0 wher= e > $hexstr comes from a packed/binary number (after bin2hex()) that may be a= ny > size. I don't want to use hexdec() because it's slower (and this is > speed-critical, in a loop) and usually the value WILL fit in a long. That's two different things, parser and cast operations. But I agree, it is a bit tricky to keep this difference in mind while coding. But you should do: $a =3D 0+ ('0x'.$hexstr); Some benchmarks (amd64): $hexstr=3D"FFFFFF"; $iter =3D 1000000; $s1 =3D microtime(true); for($i=3D0;$i<$iter;$i++) $a=3Dhexdec("0x".$hexstr); $s2 =3D microtime(true); echo "hexdec: " . ($s2 - $s1) . "\n"; $s1 =3D microtime(true); for($i=3D0;$i<$iter;$i++) $a=3D0+("0x".$hexstr); $s2 =3D microtime(true); echo "cast: " . ($s2 - $s1) . "\n"; hexdec: 2.6401779651642 cast: 1.4510979652405 > That reminds me, a "(number)" typecast would be nice to have. :-) number is a human thing, I'm not sure it fits our needs :) > > > > You get the idea. That's because is_numeric_string() *ignores* the > > > > value from zend_strtod() if errno=3D=3DERANGE. I don't think that'= s > > > > right, and it doesn't happen when convert_to_double() uses > > > > zend_strtod(): > > > > I have to check the sources :) > > Yes, it ignores the INF/ERANGE, for whatever reason. :-) In my opinion, these issues can be considered as bugs and should be fixed easily done without rewriting everything). > I will send a new example function to the list in a few days, after doing > some tests, etc. Its behavior should be the same, except for these bugs. > And currently, strto[l|d] is sometimes called unnecessarily -- for exampl= e, > I think '123 foo' would result in zend_strtod() after strtol() -- pretty > sure that can be avoided. You'll see soon... Again, take a look at ext/filter, I rewrite both float and integer (not sure if I commited "int" yet, I will check later :), without anything of these functions. However I consider that we should first determine what to change and where. Like separate the parser from the cast operations (string_to_*). Also we are missing way too many tests to valid the changes. But as I said, it is necessary to bring consistency in this area :) > > > > print_r(array_count_values(array(1, ' 1', ' 1 '))) > > > > Array > > > > ( > > > > [1] =3D> 2 > > > > [ 1 ] =3D> 1 > > > > ) > > > > This is typically an example of why we cannot not change the behaviors > > in php5, but I definitively like to do it for php 6.x. > > I e-mailed Andrei about array_count_values() since I think it's incorrect= . > If so, it should be very simple to fix -- just eliminate the use of > is_numeric_string. The fix is certainly easy (leading spaces management), but it will break things out there. That's what we have to avoid, imho. > P.S. I forgot to add before that I noticed a comment Derick added a few > days ago for is_numeric_string -- only in 5.2 > (http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?r1=3D1.94.2.4= .2.2& > r2=3D1.94.2.4.2.3&view=3Dpatch). It says it returns IS_DOUBLE if the num= ber > didn't fit in the integer range, but that's wrong if it's INF. :-) INF is per definition not out of range, it is out of everything (-INF too) = ;-) Cheers, --Pierre