Hi!
I've prepared a fix for bug #18556 - see
https://github.com/php/php-src/pull/126
A slight complication there is that for internal operations we probably
want locale-independent lowercasing, while for regular string operations
we probably want to stay with locale-depenedent one. That's how I
implemented it, even though it adds a bit of complexity, but I think
it's the best way to solve it. Any comments/objections/improvements?
Johannes, is it ok to put this fix into 5.3 too?
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
I've prepared a fix for bug #18556 - see
https://github.com/php/php-src/pull/126
A slight complication there is that for internal operations we probably
want locale-independent lowercasing, while for regular string operations
we probably want to stay with locale-depenedent one. That's how I
implemented it, even though it adds a bit of complexity, but I think
it's the best way to solve it. Any comments/objections/improvements?
With this patch, will it still be possible to use foreign class names
correctly? Like writing them in Russian and expecting the
case-insensitivity to work correct?
Nikita
Hi Nikita,
That was my first problem with the code, too!
I would make the Zend engine use the ROOT Locale for PHP's language parsing
(in Java it's called Locale.ROOT, in C its the Locale named ""), so its
invariant to local settings but works correct for all of Unicode. In theory,
this "case insensitive matching" should not be done by lowercasing, in
Unicode the correct way to do it is by "case folding", which is something
different (but using the ROOT locale has almost the same effect). In general
methods like Java's String.equalsIgnoreCase(String) do this internally.
I don't think ASCII-only lowercasing is in-compatible to the allowed PHP
identifier characters used by class names and what else.
Uwe
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany
-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Wednesday, July 11, 2012 11:41 AM
To: Stas Malyshev
Cc: PHP Internals; Johannes Schlüter
Subject: Re: [PHP-DEV] bug 18556 - tolower & localesOn Wed, Jul 11, 2012 at 7:40 AM, Stas Malyshev smalyshev@sugarcrm.com
wrote:Hi!
I've prepared a fix for bug #18556 - see
https://github.com/php/php-src/pull/126
A slight complication there is that for internal operations we
probably want locale-independent lowercasing, while for regular string
operations we probably want to stay with locale-depenedent one. That's
how I implemented it, even though it adds a bit of complexity, but I
think it's the best way to solve it. Any
comments/objections/improvements?With this patch, will it still be possible to use foreign class names
correctly?
Like writing them in Russian and expecting the case-insensitivity to work
correct?Nikita
--
To unsubscribe,
visit:
http://www.php.net/unsub.php
Sorry, typo:
I don't think ASCII-only lowercasing is in-compatible to the allowed PHP
identifier characters used by class names and what else.
I don't think ASCII-only lowercasing is compatible to the allowed PHP
identifier characters used by class names and what else.
Uwe
Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Wednesday, July 11, 2012 11:41 AM
To: Stas Malyshev
Cc: PHP Internals; Johannes Schlüter
Subject: Re: [PHP-DEV] bug 18556 - tolower & localesOn Wed, Jul 11, 2012 at 7:40 AM, Stas Malyshev
smalyshev@sugarcrm.com
wrote:Hi!
I've prepared a fix for bug #18556 - see
https://github.com/php/php-src/pull/126
A slight complication there is that for internal operations we
probably want locale-independent lowercasing, while for regular
string operations we probably want to stay with locale-depenedent
one. That's how I implemented it, even though it adds a bit of
complexity, but I think it's the best way to solve it. Any
comments/objections/improvements?With this patch, will it still be possible to use foreign class names
correctly?
Like writing them in Russian and expecting the case-insensitivity to
work correct?Nikita
--
To unsubscribe,
visit:
http://www.php.net/unsub.php--
To unsubscribe,
visit:
http://www.php.net/unsub.php
Hi!
I don't think ASCII-only lowercasing is compatible to the allowed PHP
identifier characters used by class names and what else.
Right now we use tolower() which is system-dependent, locale-dependent
and doesn't really work with any character that is not 8-bit since it
uses per-character lowercasing. So I'd be very surprised if anybody
actually manages to make non-ASCII class names to work. Is there such
code in existance?
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
2012.07.11 19:17, Stas Malyshev rašė:
Hi!
I don't think ASCII-only lowercasing is compatible to the allowed PHP
identifier characters used by class names and what else.Right now we use tolower() which is system-dependent, locale-dependent
and doesn't really work with any character that is not 8-bit since it
uses per-character lowercasing. So I'd be very surprised if anybody
actually manages to make non-ASCII class names to work. Is there such
code in existance?
In some locales weird things start within ASCII range and PHP was hit by
them multiple times.
--
Tomas
Hi!
In some locales weird things start within ASCII range and PHP was hit by
them multiple times.
And that's exactly the reason for this patch.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
With this patch, will it still be possible to use foreign class names
correctly? Like writing them in Russian and expecting the
case-insensitivity to work correct?
Not really. Did it ever work properly?
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Johannes, is it ok to put this fix into 5.3 too?
Even if the concerns from this thread can be cleared I don't think we
should change the core language in a version like 5.3. 5.3 users who are
hit by this have their work-around (I assume), others have a good reason
to use a more modern PHP.
johannes
Hi!
Even if the concerns from this thread can be cleared I don't think we
should change the core language in a version like 5.3. 5.3 users who are
hit by this have their work-around (I assume), others have a good reason
to use a more modern PHP.
OK, I'll leave 5.3 alone then.
It shouldn't be much of a change unless somebody used non-ASCII class
names with different cases (which seems to be quite unlikely) but for
stability's sake we could keep it out of 5.3.
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
Even if the concerns from this thread can be cleared I don't think we
should change the core language in a version like 5.3. 5.3 users who are
hit by this have their work-around (I assume), others have a good reason
to use a more modern PHP.OK, I'll leave 5.3 alone then.
It shouldn't be much of a change unless somebody used non-ASCII class
names with different cases (which seems to be quite unlikely) but for
stability's sake we could keep it out of 5.3.
Hang on. Are you really going to push this into 5.4?
-Hannes
Hi!
Hang on. Are you really going to push this into 5.4?
Do you have some comments?
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi!
Hang on. Are you really going to push this into 5.4?
Do you have some comments?
It is a thing that can have really weird cascading affects.
Everything from strcmp()
to == and sort()
could start behaving differently.
The test cases for such a feature change surely should cover more then
the letter "I".
I don't consider this a normal bugfix that should go into any current
release branch.
-Hannes
Hi!
It is a thing that can have really weird cascading affects.
Everything from
strcmp()
to == andsort()
could start behaving differently.
Why? Neither of them is case insensitive, why would they start behaving
differently? And sort has nothing to do with it at all, lowercasing does
not affect sorting in any way I'm aware of. Could you please explain
what you mean?
The test cases for such a feature change surely should cover more then
the letter "I".
They should and they do. Why did you conclude it only covers letter I?
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
hi Stas,
Hi!
Even if the concerns from this thread can be cleared I don't think we
should change the core language in a version like 5.3. 5.3 users who are
hit by this have their work-around (I assume), others have a good reason
to use a more modern PHP.OK, I'll leave 5.3 alone then.
It shouldn't be much of a change unless somebody used non-ASCII class
names with different cases (which seems to be quite unlikely) but for
stability's sake we could keep it out of 5.3.
Same for 5.4, 5.4 IS the latest stable branch.
I'd to go with master only with this patch, not like it is a huge
issue either (% wised I mean).
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
I'd to go with master only with this patch, not like it is a huge
issue either (% wised I mean).
It's a huge issue for people using these locales (PHP is unusable for
them, as I understand), but how many use those I don't know.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
hi,
Hi!
I'd to go with master only with this patch, not like it is a huge
issue either (% wised I mean).It's a huge issue for people using these locales (PHP is unusable for
them, as I understand), but how many use those I don't know.
Right, just like some functions are huge issues for the few people using them.
If it is not good enough for 5.3, it is not good enough for 5.4.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
Right, just like some functions are huge issues for the few people using them.
And we regularly fix bugs in such functions in stable versions.
If it is not good enough for 5.3, it is not good enough for 5.4.
Sorry, I can't accept this argument. There were numerous things that are
in 5.4 but not in 5.3.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Em Wed, 11 Jul 2012 22:17:42 +0200, Stas Malyshev smalyshev@sugarcrm.com
escreveu:
And we regularly fix bugs in such functions in stable versions.
If it is not good enough for 5.3, it is not good enough for 5.4.
Sorry, I can't accept this argument. There were numerous things that are
in 5.4 but not in 5.3.
I also think this is not appropriate for 5.4, but for different reasons.
You're changing the Zend API. You introduce new public functions. This
means an extension that uses the new symbols will be compatible with PHP
5.4.6, but not before. When this happens, it's very annoying (like the
PHP_FE_END in 5.3, though that only affected source compatibility -- this
also affects binary compatibility).
You also change the behavior of zend_binary_strncasecmp(). Just like you
had to change ext/standard/string.c, other extensions may also have to
(not to mention the risk you missed something).
--
Gustavo Lopes
hi,
Em Wed, 11 Jul 2012 22:17:42 +0200, Stas Malyshev smalyshev@sugarcrm.com
escreveu:And we regularly fix bugs in such functions in stable versions.
If it is not good enough for 5.3, it is not good enough for 5.4.
Sorry, I can't accept this argument. There were numerous things that are
in 5.4 but not in 5.3.I also think this is not appropriate for 5.4, but for different reasons.
You're changing the Zend API. You introduce new public functions. This means
an extension that uses the new symbols will be compatible with PHP 5.4.6,
but not before. When this happens, it's very annoying (like the PHP_FE_END
in 5.3, though that only affected source compatibility -- this also affects
binary compatibility).
That alone is a no go for 5.4 and 5.3.
You also change the behavior of zend_binary_strncasecmp(). Just like you had
to change ext/standard/string.c, other extensions may also have to (not to
mention the risk you missed something).
That adds way too much risks, for little to no gain (sorry for the few
people suffering from this bug).
cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
That adds way too much risks, for little to no gain (sorry for the few
people suffering from this bug).
Which exactly risks? So far only description of risks was from Hannes,
and all the function he mentioned do not call any of the functions being
changed.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
hi Stas!
Hi!
That adds way too much risks, for little to no gain (sorry for the few
people suffering from this bug).Which exactly risks? So far only description of risks was from Hannes,
and all the function he mentioned do not call any of the functions being
changed.
Stability of new introduced functions is as important as old one.
Also I really do not see any kind of appealing reason to introduce
this fix in 5.3 or 5.4. It is a many years old bug, affecting a very
few amount users. Everyone is important, but they all found work
around since they met it.
5.5 should be around next year, that won't be that bad for them to
wait until then.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Hi!
You're changing the Zend API. You introduce new public functions. This
means an extension that uses the new symbols will be compatible with PHP
5.4.6, but not before. When this happens, it's very annoying (like the
PHP_FE_END in 5.3, though that only affected source compatibility -- this
also affects binary compatibility).
This is true, however relevant only if your extension uses low-level
Zend engine functions for case-insensitive string comparison, and you're
not using zvals, and you require this comparison to depend on current
locale. This is pretty exotic use case unless you're reimplementing
strcasecmp()
.
You also change the behavior of zend_binary_strncasecmp(). Just like you
had to change ext/standard/string.c, other extensions may also have to
(not to mention the risk you missed something).
That's kind of the point of the fix - to change the behavior (which
previously depended on current locale and thus produced wrong behavior).
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
2012.07.11 23:07, Pierre Joye rašė:
hi,
Hi!
I'd to go with master only with this patch, not like it is a huge
issue either (% wised I mean).It's a huge issue for people using these locales (PHP is unusable for
them, as I understand), but how many use those I don't know.Right, just like some functions are huge issues for the few people using them.
If it is not good enough for 5.3, it is not good enough for 5.4.
It is not about issues with functions. It is about PHP interpreter
operating under assumption that locale sensitive strtolower and
strtoupper follow basic US English letter case rules in a-zA-Z range.
That's huge mistake for code used in international environment and
Mustafa Kemal Atatürk is smiling in his grave every time some programmer
hits that problem.
You had less complex patch posted on #35050 bug report for the last four
years.
Code already uses (or used when I was looking at that code) locale
insensitive tolower version on Windows. Why you insist on using locale
sensitive function on other platforms.
--
Tomas
Hi!
You had less complex patch posted on #35050 bug report for the last four
years.
That patch, unfortunately, is not correct since it assumes every
function using tolower wants ASCII tolower. This makes, for example,
substr_compare()
ignore locale, and probably a bunch of other functions.
Which is probably not a desired effect.
Code already uses (or used when I was looking at that code) locale
insensitive tolower version on Windows. Why you insist on using locale
sensitive function on other platforms.
I don't know when you looked at the code, but right now for windows the
following is used:
#define zend_tolower(c) _tolower_l(c, current_locale)
As specified in http://msdn.microsoft.com/en-us/library/8h19t214.aspx,
this is definitely locale sensitive function.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
hi,
On Thu, Jul 12, 2012 at 7:17 AM, Tomas Kuliavas
tokul@users.sourceforge.net wrote:
It is not about issues with functions. It is about PHP interpreter
operating under assumption that locale sensitive strtolower and
strtoupper follow basic US English letter case rules in a-zA-Z range.
That's huge mistake for code used in international environment
I should have said "features" instead of "functions". Also it is about
some locale(s) not working well. But that does not make PHP unusable
in intl environment nor it makes this bug critical enough to be fixed
this way in 5.3 and 5.4. I'd to repeat my suggestion, let's go with
master only.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org