is_numeric_string causes function inconsistency

19 years ago by Matt W — view source — reply

unread

Hi all,

Since I've been looking at is_numeric_[string|unicode], I found a weird
thing it causes; probably doesn't make sense to users; bug? Look:

abs(-1e500) // float(INF)
abs('-1e500') // int(1) WRONG
abs('-1e100') // float(1.0E+100)
is_finite(1e500) // bool(false)
is_finite('1e500') // bool(true) WRONG
is_finite('1e100') // bool(true)
is_numeric(1e500) // bool(true)
is_numeric('1e500') // bool(false) WRONG
is_numeric('1e100') // bool(true)
1e500 + 123 // float(INF)
'1e500' + 123 // int(124) WRONG

You get the idea. That's because is_numeric_string() ignores the value
from zend_strtod() if errno==ERANGE. I don't think that's right, and it
doesn't happen when convert_to_double() uses zend_strtod():

number_format(1e500) // string(3) "inf"
number_format('1e500') // string(3) "inf" RIGHT

Just wondering if others think is_numeric_string() should be changed in that
respect? I was going to rewrite the function to improve/optimize it (and
submit it of course), so I can easily change its behavior while I'm at it...

Also, is this the desired behavior of array_count_values() (manual doesn't
say; it also uses is_numeric...)?

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

Matt

19 years ago by Matt W — view source — reply

unread

Hi Andrei,

You seem to be the array-function-person ;-) so I'll ask you if the
array_count_values() result in my previous message is what's intended?
Seems to me leading whitespace should not be ignored. I didn't try it
yet, but it seems zend_[u_]symtable_[find|update] should simply be used,
instead of is_numeric..., and the HANDLE_NUMERIC() macro will take care of
things just as with array keys in $a[...]. That's how the function worked
in 4.x it appears, where zend_hash_[find|update] had HANDLE_NUMERIC()
(before symtable functions).

Am I correct? Should I make a patch? Oh, and something I just thought of
looking at the code -- you think the function could/should also count
IS_DOUBLE values? After converting to string, of course. :-)

Thanks,
Matt

----- Original Message -----
From: "Matt W"
Sent: Sunday, August 06, 2006

...

Also, is this the desired behavior of array_count_values() (manual doesn't
say; it also uses is_numeric...)?

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

19 years ago by Pierre — view source — reply

unread

Hello,

Note that I also answer your previous mail here :)

On Fri, 11 Aug 2006 06:18:13 -0500
php_lists@realplain.com ("Matt W") wrote:

Hello again,

I discovered a couple more things is_numeric... is causing problems
with (leading whitespace). I doubt any of the examples I've given
make sense to regular users who don't know what's happening behind
the scenes. Add these to the "wrong" list:

is_numeric(' .123') // bool(false)

this one should return true.

' .123' + 0 // int(0)

' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
then result in 0.123+0, which is the correct behavior.

One more thing I was curious about as far as keeping things
consistent is with is_numeric... (and therefore
convert_scalar_to_number()), hex strings are allowed/work, but not
with convert_to_long|double.

I did not check convert_* while fixing/enhancing filters, but I think
there is a higher risk of breakages if you change these functions. We
should first have a clear view of what is used where and how the
changes affect end user scripts and extensions. It sounds like an
impossible task (except for 6.0).

I suggest you to take a look at the ext/filter code and what we accept.
I spend a far amount of times to ask and listen to users to see what
they expext. I'm quite happy with the current state and for what I
hear, the users too.

You can check the FILTER_VALIDATE_* mode, they do the same operations
that we are discussing here. The sanitize mode only checks for
unexpected chars.

So a few PHP functions properly
accept hex strings, but most will convert one to 0. Should anything
be done about this difference? I have an idea about allowing hex
strings in to_[long|double] using the new is_numeric... functions I
will propose.

Few things about the current is_numeric... and hex strings, which I
think I'll change in my proposal unless I hear opinions otherwise:
*) Leading whitespace isn't allowed

They should be allowed (leading/ending).

*) A sign (±) isn't allowed

It is allowed except for in the hexadecimal notation (see the manual
page of is_numeric), so if you talk only about is_numeric and the hex
notation, it is a bug fix.

*) Hex doubles don't work. I think they should (for whole numbers
only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on
a 32-bit system.

They should not, an hexadecimal notation represents an integer (long),
not a double. A double could be the result of a cast when it is out of
the integer range.

If that last one can be changed, it also should be in the language
parser of course (you know, for $n = 0xFFFFFFFFFF;).

It is the endless problem about 32/64bits issues, also I don't think
you are considering to use double in a for loop? :)

Since I've been looking at is_numeric_[string|unicode], I found a
weird thing it causes; probably doesn't make sense to users; bug?
Look:

abs(-1e500) // float(INF)
abs('-1e500') // int(1) WRONG

Agreed, it should float(INF)

abs('-1e100') // float(1.0E+100)
is_finite(1e500) // bool(false)

is_finite('1e500') // bool(true) WRONG

Agreed, should float(INF)

is_finite('1e100') // bool(true)
is_numeric(1e500) // bool(true)
is_numeric('1e500') // bool(false) WRONG

Agreed, float(INF) as before

is_numeric('1e100') // bool(true)
1e500 + 123 // float(INF)
'1e500' + 123 // int(124) WRONG

I get the feeling that the E notation has one bug, solving it should
most of these issues. ext/filter pass all these tests successfully. But
it had the same problems in its early days.

You get the idea. That's because is_numeric_string() ignores the
value from zend_strtod() if errno==ERANGE. I don't think that's
right, and it doesn't happen when convert_to_double() uses
zend_strtod():

I have to check the sources :)

Just wondering if others think is_numeric_string() should be
changed in that respect? I was going to rewrite the function to
improve/optimize it (and submit it of course), so I can easily
change its behavior while I'm at it...

It would be nice to bring consistency between functions. But changing
the behaviors at this level can have a very large impact. It has to be
done really carefully and will many tests. I can help if you like,
both for the tests and the implementation.

Also, is this the desired behavior of array_count_values() (manual
doesn't say; it also uses is_numeric...)?

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

This is typically an example of why we cannot not change the behaviors
in php5, but I definitively like to do it for php 6.x.

Cheers,
-- Pierre

19 years ago by Jochem Maas — view source — reply

unread

Pierre wrote:

Hello,

Note that I also answer your previous mail here :)

On Fri, 11 Aug 2006 06:18:13 -0500
php_lists@realplain.com ("Matt W") wrote:

Hello again,

I discovered a couple more things is_numeric... is causing problems
with (leading whitespace). I doubt any of the examples I've given
make sense to regular users who don't know what's happening behind
the scenes. Add these to the "wrong" list:

is_numeric(' .123') // bool(false)

this one should return true.

as Pierre mentions further down changing this behaviour could cause pots of
problems for existing code. from an enduser POV having is_numeric(' .123')
return true (and therefore having maths operations using ' .123' use that string
as if it was '.123' would be tantamount to doing:

$v = " .123";
$v = trim($v);
var_dump(is_numeric($v), ($v + 0));

In order to be able to move forward and allow for leading spaces in numeric strings
maybe an ini setting could be used, one that defaults to false:

trim_numeric_strings_before_usage = 0;

such a setting if true would essentially trim space (like the trim() function does)
form strings before they were checked (by is_numeric) and before being used in
calculations.

in php6 the default of this ini setting could be changed to true (which would
offer quite some time to check for possible unforeseen problems and eventually
in php7,8,9 the setting could dissappear entirely once the community is satisfied
any/all problems have been dissipated.

I have no idea if this is feasable or desirable (I'm aware of the animosity towards
new ini settings!) but it might offer a potential resolution between moving forward
and protecting muppets like myself from 'strange behaviour' related to autocasting of
numeric strings.

rgds,
Joche,

' .123' + 0 // int(0)

' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
then result in 0.123+0, which is the correct behavior.

One more thing I was curious about as far as keeping things
consistent is with is_numeric... (and therefore
convert_scalar_to_number()), hex strings are allowed/work, but not
with convert_to_long|double.

I did not check convert_* while fixing/enhancing filters, but I think
there is a higher risk of breakages if you change these functions. We
should first have a clear view of what is used where and how the
changes affect end user scripts and extensions. It sounds like an
impossible task (except for 6.0).

I suggest you to take a look at the ext/filter code and what we accept.
I spend a far amount of times to ask and listen to users to see what
they expext. I'm quite happy with the current state and for what I
hear, the users too.

You can check the FILTER_VALIDATE_* mode, they do the same operations
that we are discussing here. The sanitize mode only checks for
unexpected chars.

So a few PHP functions properly
accept hex strings, but most will convert one to 0. Should anything
be done about this difference? I have an idea about allowing hex
strings in to_[long|double] using the new is_numeric... functions I
will propose.

Few things about the current is_numeric... and hex strings, which I
think I'll change in my proposal unless I hear opinions otherwise:
*) Leading whitespace isn't allowed

They should be allowed (leading/ending).

*) A sign (±) isn't allowed

It is allowed except for in the hexadecimal notation (see the manual
page of is_numeric), so if you talk only about is_numeric and the hex
notation, it is a bug fix.

*) Hex doubles don't work. I think they should (for whole numbers
only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on
a 32-bit system.

They should not, an hexadecimal notation represents an integer (long),
not a double. A double could be the result of a cast when it is out of
the integer range.

If that last one can be changed, it also should be in the language
parser of course (you know, for $n = 0xFFFFFFFFFF;).

It is the endless problem about 32/64bits issues, also I don't think
you are considering to use double in a for loop? :)

Since I've been looking at is_numeric_[string|unicode], I found a
weird thing it causes; probably doesn't make sense to users; bug?
Look:

abs(-1e500) // float(INF)
abs('-1e500') // int(1) WRONG

Agreed, it should float(INF)

abs('-1e100') // float(1.0E+100)
is_finite(1e500) // bool(false)

is_finite('1e500') // bool(true) WRONG

Agreed, should float(INF)

is_finite('1e100') // bool(true)
is_numeric(1e500) // bool(true)
is_numeric('1e500') // bool(false) WRONG

Agreed, float(INF) as before

is_numeric('1e100') // bool(true)
1e500 + 123 // float(INF)
'1e500' + 123 // int(124) WRONG

I get the feeling that the E notation has one bug, solving it should
most of these issues. ext/filter pass all these tests successfully. But
it had the same problems in its early days.

You get the idea. That's because is_numeric_string() ignores the
value from zend_strtod() if errno==ERANGE. I don't think that's
right, and it doesn't happen when convert_to_double() uses
zend_strtod():

I have to check the sources :)

Just wondering if others think is_numeric_string() should be
changed in that respect? I was going to rewrite the function to
improve/optimize it (and submit it of course), so I can easily
change its behavior while I'm at it...

It would be nice to bring consistency between functions. But changing
the behaviors at this level can have a very large impact. It has to be
done really carefully and will many tests. I can help if you like,
both for the tests and the implementation.

Also, is this the desired behavior of array_count_values() (manual
doesn't say; it also uses is_numeric...)?

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

This is typically an example of why we cannot not change the behaviors
in php5, but I definitively like to do it for php 6.x.

Cheers,
-- Pierre

19 years ago by Matt W — view source — reply

unread

Hi Jochem,

Leading whitespace is already allowed with PHP's is_numeric() function (and
corresponding internal one), math operations, etc. Only when it precedes
.123 or -.123 does the behavior change. :-)

Matt

----- Original Message -----
From: "Jochem Maas"
Sent: Friday, August 11, 2006

Pierre wrote:

is_numeric(' .123') // bool(false)

this one should return true.

as Pierre mentions further down changing this behaviour could cause pots of
problems for existing code. from an enduser POV having is_numeric(' .123')
return true (and therefore having maths operations using ' .123' use that
string
as if it was '.123' would be tantamount to doing:

$v = " .123";
$v = trim($v);
var_dump(is_numeric($v), ($v + 0));

In order to be able to move forward and allow for leading spaces in numeric
strings
maybe an ini setting could be used, one that defaults to false:

trim_numeric_strings_before_usage = 0;

such a setting if true would essentially trim space (like the trim()
function does)
form strings before they were checked (by is_numeric) and before being used
in
calculations.

in php6 the default of this ini setting could be changed to true (which
would
offer quite some time to check for possible unforeseen problems and
eventually
in php7,8,9 the setting could dissappear entirely once the community is
satisfied
any/all problems have been dissipated.

I have no idea if this is feasable or desirable (I'm aware of the animosity
towards
new ini settings!) but it might offer a potential resolution between moving
forward
and protecting muppets like myself from 'strange behaviour' related to
autocasting of
numeric strings.

rgds,
Joche,

19 years ago by Pierre — view source — reply

unread

Hello,

Note that I also answer your previous mail here :)

On Fri, 11 Aug 2006 06:18:13 -0500
php_lists@realplain.com ("Matt W") wrote:

Hello again,

I discovered a couple more things is_numeric... is causing problems
with (leading whitespace). I doubt any of the examples I've given
make sense to regular users who don't know what's happening behind
the scenes. Add these to the "wrong" list:

is_numeric(' .123') // bool(false)

this one should return true.

' .123' + 0 // int(0)

' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
then result in 0.123+0, which is the correct behavior.

One more thing I was curious about as far as keeping things
consistent is with is_numeric... (and therefore
convert_scalar_to_number()), hex strings are allowed/work, but not
with convert_to_long|double.

I did not check convert_* while fixing/enhancing filters, but I think
there is a higher risk of breakages if you change these functions. We
should first have a clear view of what is used where and how the
changes affect end user scripts and extensions. It sounds like an
impossible task (except for 6.0).

I suggest you to take a look at the ext/filter code and what we accept.
I spend a far amount of times to ask and listen to users to see what
they expext. I'm quite happy with the current state and for what I
hear, the users too.

You can check the FILTER_VALIDATE_* mode, they do the same operations
that we are discussing here. The sanitize mode only checks for
unexpected chars.

So a few PHP functions properly
accept hex strings, but most will convert one to 0. Should anything
be done about this difference? I have an idea about allowing hex
strings in to_[long|double] using the new is_numeric... functions I
will propose.

Few things about the current is_numeric... and hex strings, which I
think I'll change in my proposal unless I hear opinions otherwise:
*) Leading whitespace isn't allowed

They should be allowed (leading/ending).

*) A sign (±) isn't allowed

It is allowed except for in the hexadecimal notation (see the manual
page of is_numeric), so if you talk only about is_numeric and the hex
notation, it is a bug fix.

*) Hex doubles don't work. I think they should (for whole numbers
only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on
a 32-bit system.

They should not, an hexadecimal notation represents an integer (long),
not a double. A double could be the result of a cast when it is out of
the integer range.

If that last one can be changed, it also should be in the language
parser of course (you know, for $n = 0xFFFFFFFFFF;).

It is the endless problem about 32/64bits issues, also I don't think
you are considering to use double in a for loop? :)

Since I've been looking at is_numeric_[string|unicode], I found a
weird thing it causes; probably doesn't make sense to users; bug?
Look:

abs(-1e500) // float(INF)
abs('-1e500') // int(1) WRONG

Agreed, it should float(INF)

abs('-1e100') // float(1.0E+100)
is_finite(1e500) // bool(false)

is_finite('1e500') // bool(true) WRONG

Agreed, should float(INF)

is_finite('1e100') // bool(true)
is_numeric(1e500) // bool(true)
is_numeric('1e500') // bool(false) WRONG

Agreed, float(INF) as before

is_numeric('1e100') // bool(true)
1e500 + 123 // float(INF)
'1e500' + 123 // int(124) WRONG

I get the feeling that the E notation has one bug, solving it should
most of these issues. ext/filter pass all these tests successfully. But
it had the same problems in its early days.

You get the idea. That's because is_numeric_string() ignores the
value from zend_strtod() if errno==ERANGE. I don't think that's
right, and it doesn't happen when convert_to_double() uses
zend_strtod():

I have to check the sources :)

Just wondering if others think is_numeric_string() should be
changed in that respect? I was going to rewrite the function to
improve/optimize it (and submit it of course), so I can easily
change its behavior while I'm at it...

It would be nice to bring consistency between functions. But changing
the behaviors at this level can have a very large impact. It has to be
done really carefully and will many tests. I can help if you like,
both for the tests and the implementation.

Also, is this the desired behavior of array_count_values() (manual
doesn't say; it also uses is_numeric...)?

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

This is typically an example of why we cannot not change the behaviors
in php5, but I definitively like to do it for php 6.x.

Cheers,
-- Pierre

19 years ago by Pierre — view source — reply

unread

Hi Matt,

Hello Pierre,

Thanks for your reply. :-)

----- Original Message -----
From: "Pierre"
Sent: Friday, August 11, 2006

Hello,

Note that I also answer your previous mail here :)

On Fri, 11 Aug 2006 06:18:13 -0500
php_lists@realplain.com ("Matt W") wrote:

Hello again,

I discovered a couple more things is_numeric... is causing problems
with (leading whitespace). I doubt any of the examples I've given
make sense to regular users who don't know what's happening behind
the scenes. Add these to the "wrong" list:

is_numeric(' .123') // bool(false)

this one should return true.

' .123' + 0 // int(0)

' 0.123' is casted to 0, 0+0. But if the ' .123' is allowed, it should
then result in 0.123+0, which is the correct behavior.

I may be misunderstanding you, but ' 0.123'+0 results in the correct 0.123.
Just without the leading 0 that it becomes wrong. :-)

I think we should consider '.123' as 0.123. This expression is then correct.

One more thing I was curious about as far as keeping things
consistent is with is_numeric... (and therefore
convert_scalar_to_number()), hex strings are allowed/work, but not
with convert_to_long|double.

I did not check convert_* while fixing/enhancing filters, but I think
there is a higher risk of breakages if you change these functions. We
should first have a clear view of what is used where and how the
changes affect end user scripts and extensions. It sounds like an
impossible task (except for 6.0).

I was just wondering if convert_to_[long|double] should check for and allow
hex strings like convert_scalar_to_number does (because it uses
is_numeric...).

I suggest you to take a look at the ext/filter code and what we accept.
I spend a far amount of times to ask and listen to users to see what
they expext. I'm quite happy with the current state and for what I
hear, the users too.

You can check the FILTER_VALIDATE_* mode, they do the same operations
that we are discussing here. The sanitize mode only checks for
unexpected chars.

Have to admit that I'm not really familiar with any of the filter stuff. :-/
I'll keep that in mind though.

Please take a look, it really solves many of these issues.

So a few PHP functions properly
accept hex strings, but most will convert one to 0. Should anything
be done about this difference? I have an idea about allowing hex
strings in to_[long|double] using the new is_numeric... functions I
will propose.

Few things about the current is_numeric... and hex strings, which I
think I'll change in my proposal unless I hear opinions otherwise:
*) Leading whitespace isn't allowed

They should be allowed (leading/ending).

Not sure if you're talking about currently, or agreeing with me. ;-) For
these 3 points, I was only referring to hex strings. Leading space is
allowed with non-hex.

I would like to allow them for all types (float, integer or hex), it
is what I did in ext/filter.

*) A sign (±) isn't allowed

It is allowed except for in the hexadecimal notation (see the manual
page of is_numeric), so if you talk only about is_numeric and the hex
notation, it is a bug fix.

Again, only referring to hex. Ah yes, I see the manual page for
is_numeric(). Hmm OK, not sure if you'd want that to be changed then...
The internal function would be fine ($n = -0xABC works in the parser), I
guess, but maybe sign & hex returning true with PHP's is_numeric() is
undesired.

+/- are not allowed for hex. I think we should make the difference
between a string conversion and the parser.

(a) $a = - 0xFF
(b) $a = " - 0xFF; "

(a) is a perfectly valid expression within a script (just like
$a=-2;), however (b) is a string and will require a cast to INT. The
two cases should not be processed the same way.

(a) can be read as $a = -1; $a *= 0xFF;
(b) is only a string assignement, it will be casted to INT when
required and failed (int(0)).

*) Hex doubles don't work. I think they should (for whole numbers

only obviously, no "."). So '0xFFFFFFFFFF' + 0 for example, works on
a 32-bit system.

They should not, an hexadecimal notation represents an integer (long),
not a double. A double could be the result of a cast when it is out of
the integer range.

Well, I think of the hex notation as just a whole number (non-floating) of
whatever range/size. About the cast, yeah, I see that's what is done now in
the parser if the hex number is between LONG_MAX and ULONG_MAX -- results in
a double. hexdec(), etc. will also return a double if needed.

Right now, since hex doubles don't work, you also have (on 32-bit):

I don't understand what you mean by hex double :) Do you mean that we
should convert out of range HEX to double in any case?

is_numeric('0x7FFFFFFF') // bool(true)
is_numeric('0x80000000') // bool(false)

If that last one can be changed, it also should be in the language
parser of course (you know, for $n = 0xFFFFFFFFFF;).

It is the endless problem about 32/64bits issues, also I don't think
you are considering to use double in a for loop? :)

In a for loop of a script? No :-), but someone may want to specify a number
in hex larger than ULONG_MAX (and it may work if they're 64-bit, then break
on 32-bit). I've not had a need for it (in parser), but I would like the
larger hex strings to work as I have code like ('0x' . $hexstr) + 0 where
$hexstr comes from a packed/binary number (after bin2hex()) that may be any
size. I don't want to use hexdec() because it's slower (and this is
speed-critical, in a loop) and usually the value WILL fit in a long.

That's two different things, parser and cast operations. But I agree,
it is a bit tricky to keep this difference in mind while coding. But
you should do:

$a = 0+ ('0x'.$hexstr);

Some benchmarks (amd64):

$hexstr="FFFFFF";
$iter = 1000000;
$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=hexdec("0x".$hexstr);
$s2 = microtime(true);
echo "hexdec: " . ($s2 - $s1) . "\n";

$s1 = microtime(true);
for($i=0;$i<$iter;$i++) $a=0+("0x".$hexstr);
$s2 = microtime(true);
echo "cast: " . ($s2 - $s1) . "\n";

hexdec: 2.6401779651642
cast: 1.4510979652405

That reminds me, a "(number)" typecast would be nice to have. :-)

number is a human thing, I'm not sure it fits our needs :)

You get the idea. That's because is_numeric_string() ignores the
value from zend_strtod() if errno==ERANGE. I don't think that's
right, and it doesn't happen when convert_to_double() uses
zend_strtod():

I have to check the sources :)

Yes, it ignores the INF/ERANGE, for whatever reason. :-)

In my opinion, these issues can be considered as bugs and should be
fixed easily done without rewriting everything).

I will send a new example function to the list in a few days, after doing
some tests, etc. Its behavior should be the same, except for these bugs.
And currently, strto[l|d] is sometimes called unnecessarily -- for example,
I think '123 foo' would result in zend_strtod() after strtol() -- pretty
sure that can be avoided. You'll see soon...

Again, take a look at ext/filter, I rewrite both float and integer
(not sure if I commited "int" yet, I will check later :), without
anything of these functions. However I consider that we should first
determine what to change and where. Like separate the parser from the
cast operations (string_to_*). Also we are missing way too many tests
to valid the changes. But as I said, it is necessary to bring
consistency in this area :)

print_r(array_count_values(array(1, ' 1', ' 1 ')))
Array
(
[1] => 2
[ 1 ] => 1
)

This is typically an example of why we cannot not change the behaviors
in php5, but I definitively like to do it for php 6.x.

I e-mailed Andrei about array_count_values() since I think it's incorrect.
If so, it should be very simple to fix -- just eliminate the use of
is_numeric_string.

The fix is certainly easy (leading spaces management), but it will
break things out there. That's what we have to avoid, imho.

P.S. I forgot to add before that I noticed a comment Derick added a few
days ago for is_numeric_string -- only in 5.2
(http://cvs.php.net/viewvc.cgi/ZendEngine2/zend_operators.h?r1=1.94.2.4.2.2&;
r2=1.94.2.4.2.3&view=patch). It says it returns IS_DOUBLE if the number
didn't fit in the integer range, but that's wrong if it's INF. :-)

INF is per definition not out of range, it is out of everything (-INF too) ;-)

Cheers,
--Pierre

19 years ago by Matt W — view source — reply

unread

Hi Pierre,

I will reply to the rest of your message later. Just wanted to quickly
point out another thing I found with is_numeric_string() when the
allow_errors param==0 (which is_numeric() PHP function uses) and there is
trailing whitespace:

is_numeric('1 ') // bool(false)

Again, easy to fix, but I can't believe I didn't notice this simple one
sooner. :-)

Matt

----- Original Message -----
From: "Pierre"
Sent: Friday, August 11, 2006

Hi Matt,

...

Cheers,
--Pierre

19 years ago by Pierre — view source — reply

unread

Hello,

Hi Pierre,

I will reply to the rest of your message later. Just wanted to quickly
point out another thing I found with is_numeric_string() when the
allow_errors param==0 (which is_numeric() PHP function uses) and there is
trailing whitespace:

is_numeric('1 ') // bool(false)

Yes, it is still about allowing leading/trailing whitespaces :-)

--Pierre

19 years ago by Matt W — view source — reply

unread

Hi Pierre,

After checking places where is_numeric... functions are used
(http://lxr.php.net/ident?i=is_numeric_string), it looks like changing to
allow trailing spaces would have an impact in zend_operators.c for at
least compare_function() (and I guess increment_function() too). e.g. ('1'
== '1 ') is now FALSE. I don't have a preference for trailing whitespace
(and that behavior is consistent now), though it does seem logical to
allow in PHP's is_numeric() function. Maybe you would have a "flags"
parameter for internal is_numeric... (instead of only "allow_errors") that
says what things to allow...

More below.

----- Original Message -----
From: "Pierre"
Sent: Friday, August 11, 2006

Hi Matt,
...
+/- are not allowed for hex. I think we should make the difference
between a string conversion and the parser.

(a) $a = - 0xFF
(b) $a = " - 0xFF; "

(a) is a perfectly valid expression within a script (just like
$a=-2;), however (b) is a string and will require a cast to INT. The
two cases should not be processed the same way.

(a) can be read as $a = -1; $a *= 0xFF;
(b) is only a string assignement, it will be casted to INT when
required and failed (int(0)).

Yes, I noticed that in the parser a negative number is 2 operations (parse
number, and negate -- actually other way around I guess :-)). So it's
like -(123). But with a string conversion, it's one operation, and no
whitespace would be allowed after sign for a non-hex number either: '- 123'

0 is 0. That's the reason I thought a sign should logically be allowed
for hex also. The C strtol() function allows signs with hex I believe (I
know zend_u_strtol() does). Not a big concern for me, mostly thinking of
consistency. :-)

Well, I think of the hex notation as just a whole number (non-floating)
of
whatever range/size. About the cast, yeah, I see that's what is done now
in
the parser if the hex number is between LONG_MAX and ULONG_MAX -- results
in
a double. hexdec(), etc. will also return a double if needed.

Right now, since hex doubles don't work, you also have (on 32-bit):

I don't understand what you mean by hex double :) Do you mean that we
should convert out of range HEX to double in any case?

Yeah, I mean hex too large for long. :-) In parser and is_numeric..., it
looks like converting out-of-range hex to double was desired, but the
comments say "strtod() messes up hex", etc. I figured that issue could be
solved in both places by doing a "manual conversion" to double in case of
too large hex numbers.

That reminds me, a "(number)" typecast would be nice to have. :-)

number is a human thing, I'm not sure it fits our needs :)

The pseudo-type "number" is used in the manual, so I thought it might make
sense to have, meaning "whatever will hold this value: 'int' or
'double'/float'." Of course, "+ 0" will do the same, but typecasts are a
bit more elegant. :-)

Cheers,
--Pierre

Matt

19 years ago by Richard Lynch — view source — reply

unread

[Apologies for having accidentally responded to Matt W off-list, and
now bringing it back on-list without asking...]

From: "Richard Lynch"
Sent: Friday, August 11, 2006

Leading whitespace in PHP means that it's not a number, it's a
string,
and it turns into 0.

If you change that, it will break a lot of stuff.

Don't.

:-)

This is basically what Jochem Mass said, and my reply was:

"Leading whitespace is already allowed with PHP's is_numeric()
function (and
corresponding internal one), math operations, etc. Only when it
precedes
.123 or -.123 does the behavior change. :-)"

So with math operations, leading whitespace doesn't cause it (an
otherwise
numeric-prefix string) to turn into 0 (and never has), unless the
first
character(s) after the whitespace are "." or "-." Changing this
specific
(and rarely, if ever, occuring) scenario shouldn't break stuff... but
merely
make it operate the way it should. :-)

But I think you are talking about making changes to the way this works:

http://example.com/?foo=%20.123
<?php
$foo = $_GET['foo'];
if (is_numeric($foo)){
//error out
}
$query = "something involving '$foo'";
?>

If you break that, you're in big trouble to a lot of scripts all over
the planet, which rely on the leading space to trap their SQL problem.

I never actually use is_numeric, and would expect it to follow the
same "rules" as PHP's internal type-juggling mechanism.

I believe leading spaces should NOT be allowed for type-juggling, not
is_numeric, because GET/POST/COOKIE data should be subject to the most
stringent constraints reasonable to avoid security injections.

I really think the community is best served by K.I.S.S. which means
is_numeric should follow the same "rules" as type-juggling, so that
the programmer is not confused by which does what, and that those
rules for what constitutes is_numeric() should not have leading (or
trailing) spaces.

There is also a paradigm of only specifically allowing what "should
be" valid for a validity/security check on data constraints.

While I don't think leading/trailing spaces are likely to constitute a
Security Issue, there is a Principle at work that I think should be
applied.

Surely is_numeric(trim($foo)) is the right answer for the programmer
who specifically wants to allow spaces.

The fact that PHP even allows leading spaces for this is what I would
consider a bug:
<?php
$foo = ' 123';
$bar = (int) $foo;
echo "bar: $bar";
?>

EXPECTED OUTPUT:
bar:

ACTUAL OUTPUT:
bar: 123

I understand the argument that this buggy behaviour is inconsistent
with ' .123' and ' -.123' but would counter that the bug is in
allowing the leading spaces, and is not best addressed by making it
consistently buggy.

jmho

--
Like Music?
http://l-i-e.com/artists.htm

19 years ago by Pierre — view source — reply

unread

Hello,

But I think you are talking about making changes to the way this works:

http://example.com/?foo=%20.123
<?php
$foo = $_GET['foo'];
if (is_numeric($foo)){
//error out
}
$query = "something involving '$foo'";
?>

If you break that, you're in big trouble to a lot of scripts all over
the planet, which rely on the leading space to trap their SQL problem.

This example has nothing to do with what we are discussing here. There
is no conversion or detection involved here. It is a simple string
concatenation.

I never actually use is_numeric, and would expect it to follow the
same "rules" as PHP's internal type-juggling mechanism.

I believe leading spaces should NOT be allowed for type-juggling, not
is_numeric, because GET/POST/COOKIE data should be subject to the most
stringent constraints reasonable to avoid security injections.

Any example?

While I don't think leading/trailing spaces are likely to constitute a
Security Issue, there is a Principle at work that I think should be
applied.

Principle? which is? :)

--Pierre

19 years ago by Richard Lynch — view source — reply

unread

But I think you are talking about making changes to the way this
works:

http://example.com/?foo=%20.123
<?php
$foo = $_GET['foo'];
if (is_numeric($foo)){
//error out
}
$query = "something involving '$foo'";
?>

If you break that, you're in big trouble to a lot of scripts all
over
the planet, which rely on the leading space to trap their SQL
problem.

This example has nothing to do with what we are discussing here. There
is no conversion or detection involved here. It is a simple string
concatenation.

And yet, the way Matt W was talking at one point, it seemed he wanted
to change that as well...

Or perhaps I misunderstood.

I still believe that the same rules should apply for type-juggling and
is_numeric, for simplicity sake.

I never actually use is_numeric, and would expect it to follow the
same "rules" as PHP's internal type-juggling mechanism.

I believe leading spaces should NOT be allowed for type-juggling,
not
is_numeric, because GET/POST/COOKIE data should be subject to the
most
stringent constraints reasonable to avoid security injections.

Any example?

The one above?...

http://example.com/?foo=%20.123

Is $_GET['foo'] a valid number?

I don't think it should be.

I believe it is "wrong" to allow leading/trailing spaces on numeric
data in any sort of auto-conversion or test for validity.

While I don't think leading/trailing spaces are likely to constitute
a
Security Issue, there is a Principle at work that I think should be
applied.

Principle? which is? :)

Several, actually.

K.I.S.S. ==>
type-juggling === is_numeric
leading/trailing spaces are not numeric

The security Principle is that of allowing only the minimal needed
data characters to be valid, rather than attempting to do something
that's be-all end-all.

Still along the lines of simplicity, is the Principle of only allowing
what you really WANT to be valid, instead of attempting to disallow
what might be invalid.

While adding leading/trailing spaces to what is considered 'valid' is
not anywhere near the realm of disallowing the invalid, it's like that
slippery slope of complexity that leads there, if you know what I
mean...

Does PHP need to allow leading/trailing spaces? No.

Is there a userland simple solution if the applicatino developer wants
to override the "Right Way"? Yes.

I believe it is "wrong" to consider ' 123' as 'numeric' in type
juggling, and equally "wrong" for is_numeric() to return TRUE for
that.

--
Like Music?
http://l-i-e.com/artists.htm

19 years ago by Pierre — view source — reply

unread

Hello,

This example has nothing to do with what we are discussing here. There
is no conversion or detection involved here. It is a simple string
concatenation.

And yet, the way Matt W was talking at one point, it seemed he wanted
to change that as well...

Or perhaps I misunderstood.

I still believe that the same rules should apply for type-juggling and
is_numeric, for simplicity sake.

That's not the same thing, there is no type juggling here.

I never actually use is_numeric, and would expect it to follow the
same "rules" as PHP's internal type-juggling mechanism.

I believe leading spaces should NOT be allowed for type-juggling,
not
is_numeric, because GET/POST/COOKIE data should be subject to the
most
stringent constraints reasonable to avoid security injections.

Any example?

The one above?...

http://example.com/?foo=%20.123

Is $_GET['foo'] a valid number?

I don't think it should be.

I believe it is "wrong" to allow leading/trailing spaces on numeric
data in any sort of auto-conversion or test for validity.

I was asking about a security problem. There is none. Limitatingof the
area of interest to the input filtering is not a good idea, it is very
small part of what we are talking about. I do not think arguing
endlessly about trailing/tailing spaces being valid or not will help.
This is actually a very small problem (and easy to fix).

--Pierre

19 years ago by Andrei Zmievski — view source — reply

unread

Guys,

I can't keep following endless (and large) email threads about things
like that. Could you please work together on a more formal proposal
taking into consideration existing state, BC, any potential future
issues etc? If you need some guidelines, I quite like how Pythong PEPs
do it [1]. Once we have something like that in front of us, we can
evaluate it much more effectively.

Thanks.

-Andrei

[1]
http://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-
pep

Hello,

This example has nothing to do with what we are discussing here.
There
is no conversion or detection involved here. It is a simple string
concatenation.

And yet, the way Matt W was talking at one point, it seemed he wanted
to change that as well...

Or perhaps I misunderstood.

I still believe that the same rules should apply for type-juggling and
is_numeric, for simplicity sake.

That's not the same thing, there is no type juggling here.

I never actually use is_numeric, and would expect it to follow the
same "rules" as PHP's internal type-juggling mechanism.

I believe leading spaces should NOT be allowed for type-juggling,
not
is_numeric, because GET/POST/COOKIE data should be subject to the
most
stringent constraints reasonable to avoid security injections.

Any example?

The one above?...

http://example.com/?foo=%20.123

Is $_GET['foo'] a valid number?

I don't think it should be.

I believe it is "wrong" to allow leading/trailing spaces on numeric
data in any sort of auto-conversion or test for validity.

I was asking about a security problem. There is none. Limitatingof the
area of interest to the input filtering is not a good idea, it is very
small part of what we are talking about. I do not think arguing
endlessly about trailing/tailing spaces being valid or not will help.
This is actually a very small problem (and easy to fix).

--Pierre