Hi Andrei, et al.,
I was just looking at README.UNICODE, regarding interpretation of numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric strings
are supposed to adhere to the same rules." Is it correct to take that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?
I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used, which is
what I'd expect after reading README.UNICODE.
Thanks for any clarification,
Matt
Hi Andrei, et al.,
I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?
Correct.
I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.
You're correct here again, u_digit() should not be used there.
-Andrei
Hi Andrei,
All right, glad I checked. I had a few things in mind to optimize
is_numeric_string/unicode, and it's fairly straightforward in string, but
would just make things slower if u* functions were needed to do the same in
_unicode, so I was going to rethink it. Now whatever I come up with can be
easily copied to the _unicode version with just minor changes...
Thanks,
Matt
----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006
Hi Andrei, et al.,
I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?Correct.
I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.You're correct here again, u_digit() should not be used there.
-Andrei
Hi Andrei,
One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()? Is u_isspace() needed, or are
only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?
Thanks again,
Matt
----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006
Hi Andrei, et al.,
I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?Correct.
I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.You're correct here again, u_digit() should not be used there.
-Andrei
We should use whatever trim()
uses, I think.
-Andrei
Hi Andrei,
One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()? Is u_isspace() needed,
or are
only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?Thanks again,
Matt----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006Hi Andrei, et al.,
I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?Correct.
I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(),
when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.You're correct here again, u_digit() should not be used there.
-Andrei
Hello,
We should use whatever
trim()
uses, I think.
I think so too (more consistent).
--Pierre