Unicode chars allowed in numbers?

19 years ago by Matt Wilmas — view source — reply

unread

Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric strings
are supposed to adhere to the same rules." Is it correct to take that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?

I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used, which is
what I'd expect after reading README.UNICODE.

Thanks for any clarification,
Matt

19 years ago by Andrei Zmievski — view source — reply

unread

Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?

Correct.

I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.

You're correct here again, u_digit() should not be used there.

-Andrei

19 years ago by Matt Wilmas — view source — reply

unread

Hi Andrei,

All right, glad I checked. I had a few things in mind to optimize
is_numeric_string/unicode, and it's fairly straightforward in string, but
would just make things slower if u* functions were needed to do the same in
_unicode, so I was going to rethink it. Now whatever I come up with can be
easily copied to the _unicode version with just minor changes...

Thanks,
Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006

Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?

Correct.

I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.

You're correct here again, u_digit() should not be used there.

-Andrei

18 years ago by Matt Wilmas — view source — reply

unread

Hi Andrei,

One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()? Is u_isspace() needed, or are
only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?

Thanks again,
Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006

Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?

Correct.

I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(), when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.

You're correct here again, u_digit() should not be used there.

-Andrei

18 years ago by Andrei Zmievski — view source — reply

unread

We should use whatever trim() uses, I think.

-Andrei

Hi Andrei,

One more related question: What about for any leading whitespace with
numeric strings, like in zend_u_strtol()? Is u_isspace() needed,
or are
only the ASCII-equivalents (0x20, 9-13 [\t, \n, \v, \f, \r]) allowed?

Thanks again,
Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Friday, November 10, 2006

Hi Andrei, et al.,

I was just looking at README.UNICODE, regarding interpretation of
numbers:
"we restrict numbers to consist only of ASCII digits," and "Numeric
strings
are supposed to adhere to the same rules." Is it correct to take
that to
mean only UChar's with values from '0'-'9'/0x30-0x39 (and 'a'-'z'
equivalents for bases > 10)?

Correct.

I ask because in zend_u_strtol(), HANDLE_U_NUMERIC() for array
keys, etc.,
the u_digit() function is used, which also allows non-ASCII, higher-
value
digit characters, doesn't it? But then in is_numeric_unicode(),
when
checking for hex numbers, the ASCII values '0' and 'x' are used,
which is
what I'd expect after reading README.UNICODE.

You're correct here again, u_digit() should not be used there.

-Andrei

18 years ago by Pierre — view source — reply

unread

Hello,

We should use whatever trim() uses, I think.

I think so too (more consistent).

--Pierre