Fixing bug #18556 (was: Complete case-sensitivity in PHP)

13 years ago by Galen Wright-Watson — view source

unread

2012/4/22 C.Koycan5koy@gmail.com

But, I did not start this thread to discuss such bug fix, because:

It does not take a genius to figure it out, and should take minutes to
implement for someone experienced in the internals. Given the 10 year
span
and dozens of comments/complaints on the bug's entry, it's hard to say
this
issue went unnoticed. So I had to conclude that such fix has quietly been
overruled for performance and/or other undisclosed reasons.

Why does it matter if a solution is simple?

It doesn't matter, you've misunderstood.

You've misunderstood me. While you may have set out with the goal of
discussing making PHP completely case-sensitive, that doesn't preclude
others from suggesting fixes for the specific bug you mention. Indeed, some
of the first e-mails were around the bug, and not just in the context of
case-sensitive PHP.

I didn't introduce the custom case conversion solution as a
counter-argument to case-sensitive PHP, and I wasn't asking for feedback on
that solution in the context of case-sensitive PHP; I was asking for
reasons why it wouldn't be a suitable solution for the bug. The only place
case-sensitive PHP enters into it was your statement that:

As the recent comments on that page indicate, there's not a deterministic

way to resolve this issue, apart from eliminating tolower() calls for
function/class names during lookup. Hence totally case-sensitive PHP.

My proposition shows this is isn't entirely true, and branches off from the
original discussion at that point. I'm focusing on fixing the bug, which is
a smaller issue than case-sensitivity. Discussion of case-sensitivity can
continue without regard to the custom conversion solution. As such, I've
changed the subject of this e-mail.

Furthermore, going back to your original e-mail, you explicitly stated it
was about the bug, making case sensitivity subordinate to it.

This post is about bug #18556
(https://bugs.php.net/bug.php?**id=18556 https://bugs.php.net/bug.php?id=18556)

which is a decade old.

I hope you can see why others might take the bug to be the context for
case-sensitivity, rather than the other way around.

And that's what makes me curious and confused about why this bug still

exists. See, I'm drawing a conclusion with what little information I have,
and stating the reasonings it's based on (first two statements).
Overall, that and the item following it were an explanation of "why I'm
suggesting a major feature change in solution to a specific bug", although
noone directly asked me to.

In other words, you jumped to a conclusion. I wasn't asking about possible
reasons why custom conversion hasn't been accepted as the solution to this
bug. Neither was I asking why you didn't suggest it. I was (and still am)
asking for explicit, justifiable reasons as to whether or not it's a
suitable solution to the bug.

If it's already been rejected privately, it's time to bring the reasons
into the open (which is why I asked). If not, it should be considered
publicly.

A comment dated 2002-09-26 on bug's page states the bug is fixed. The next
comment dated 2006-02-17 states it reappeared.
I don't know who did what 10, 6 years ago but it's been revoked. Why?
That was the main reason I deemed this bug not fixable, hence suggest
other ways to resolve.

I don't know either, but I'm not about to disregard potential fixes if
they haven't been publicly discussed. The regression could just as easily
have been a mistake. From looking at the original fix (revision 97040,
http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa)
and the bug comments, something along the lines of what I'm suggesting has
been suggested and even implemented before, but there's no real discussion
of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't
entirely suitable as there are uppercase characters that it doesn't
convert, which suggests yet another reason for the regression, namely that
using zend_str_tolower would convert the characters that
zend_str_tolower_nlc missed.

As for the real reason why the bug reappeared, we can continue on in our
historical examination. Revision 99001 (
http://svn.php.net/viewvc?view=revision&revision=99001, also authored
by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all
internal Zend case conversion use ASCII. iliaa had this to say about the
change (http://news.php.net/php.zend-engine.cvs/478):

It appears that there no reason to keep both zend_str_tolower_nlc and

zend_str_tolower. zend_str_tolower_nlc can be safely renamed to
zend_str_tolower. The places it is used in, do not appear to depend on
locale. For people who do need it there is an alternative php function
php_strtolower, which they can use, which does respect the locale. So, if
there are no objections I'll prepare a patch that will change
zend_str_tolower_nlc to zend_str_tolower.

Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057,
authored by sterling) adds zend_str_tolower for use in
fast_call_user_function, which makes use of tolower rather than a custom
conversion. Revision 128060 (
http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
changes zend_str_tolower to use tolower instead of its custom ASCII-based
conversion. The commit message is: "make this faster and sexier". Within
these revisions, zend_lookup_class is case sensitive. This change, in
combination with 99001, mask the reason for the custom conversion.

Introduction of zend_tolower and use of tolower_l was introduced by
revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372,
authored by stas (hi, Stas!)). The commit message is: "Improve
tolower()-related functions on Windows and VC2005 by caching locale and
using tolower_l function."

There are plenty of other edits to Zend functions affecting case handling
(look over the commit messages listed in
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000)
that make similar tweaks involving case conversion and the character
encoding. What are we to conclude from all this? That the custom conversion
was a bug fix was lost as the file was edited and different people worked
on it. In other words, the fix was not lost due to a conscious decision
made by anyone, but rather the typical reason for regression (in the
original sense of the word): there's too much for anyone to keep all of it
in mind at once, so someone can easily re-introduce a bug without being
aware of it.

I trust this demonstrates that "there must be an undisclosed reason" isn't
a justifiable reason not to implement my proposed solution.

The abstract property that makes a locale problematic is obvious. I

was looking for specific locales, as they need to be identified for a
complete solution.

I'm not locale expert. Given the public complaints/bugs we can, in
practice, assume this affects Turkish and Azerbaijani only. (I don't know
about Kurdish)

Kurdish is mentioned by Mike and Tokul in the comments for the bug. I
could easily have come to the same conclusion, but I want an answer from
someone who knows without needing to make any assumptions. Are there any
locale experts (or someone willing to put in the leg-work) reading this
with a conclusive answer to my question about problematic locales?

13 years ago by Ferenc Kovacs — view source

unread

On Tue, Apr 24, 2012 at 1:06 AM, Galen Wright-Watson ww.galen@gmail.comwrote:

2012/4/22 C.Koycan5koy@gmail.com

But, I did not start this thread to discuss such bug fix, because:

It does not take a genius to figure it out, and should take minutes
to
implement for someone experienced in the internals. Given the 10 year
span
and dozens of comments/complaints on the bug's entry, it's hard to say
this
issue went unnoticed. So I had to conclude that such fix has quietly
been
overruled for performance and/or other undisclosed reasons.

Why does it matter if a solution is simple?

It doesn't matter, you've misunderstood.

You've misunderstood me. While you may have set out with the goal of
discussing making PHP completely case-sensitive, that doesn't preclude
others from suggesting fixes for the specific bug you mention. Indeed, some
of the first e-mails were around the bug, and not just in the context of
case-sensitive PHP.

I didn't introduce the custom case conversion solution as a
counter-argument to case-sensitive PHP, and I wasn't asking for feedback on
that solution in the context of case-sensitive PHP; I was asking for
reasons why it wouldn't be a suitable solution for the bug. The only place
case-sensitive PHP enters into it was your statement that:

As the recent comments on that page indicate, there's not a deterministic

way to resolve this issue, apart from eliminating tolower() calls for
function/class names during lookup. Hence totally case-sensitive PHP.

My proposition shows this is isn't entirely true, and branches off from the
original discussion at that point. I'm focusing on fixing the bug, which is
a smaller issue than case-sensitivity. Discussion of case-sensitivity can
continue without regard to the custom conversion solution. As such, I've
changed the subject of this e-mail.

Furthermore, going back to your original e-mail, you explicitly stated it
was about the bug, making case sensitivity subordinate to it.

This post is about bug #18556
(https://bugs.php.net/bug.php?**id=18556<;
https://bugs.php.net/bug.php?id=18556>;)

which is a decade old.

I hope you can see why others might take the bug to be the context for
case-sensitivity, rather than the other way around.

And that's what makes me curious and confused about why this bug still

exists. See, I'm drawing a conclusion with what little information I
have,
and stating the reasonings it's based on (first two statements).
Overall, that and the item following it were an explanation of "why I'm
suggesting a major feature change in solution to a specific bug",
although
noone directly asked me to.

In other words, you jumped to a conclusion. I wasn't asking about
possible
reasons why custom conversion hasn't been accepted as the solution to this
bug. Neither was I asking why you didn't suggest it. I was (and still am)
asking for explicit, justifiable reasons as to whether or not it's a
suitable solution to the bug.

If it's already been rejected privately, it's time to bring the reasons
into the open (which is why I asked). If not, it should be considered
publicly.

A comment dated 2002-09-26 on bug's page states the bug is fixed. The
next
comment dated 2006-02-17 states it reappeared.
I don't know who did what 10, 6 years ago but it's been revoked. Why?
That was the main reason I deemed this bug not fixable, hence suggest
other ways to resolve.

I don't know either, but I'm not about to disregard potential fixes if
they haven't been publicly discussed. The regression could just as easily
have been a mistake. From looking at the original fix (revision 97040,
http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa)
and the bug comments, something along the lines of what I'm suggesting has
been suggested and even implemented before, but there's no real discussion
of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't
entirely suitable as there are uppercase characters that it doesn't
convert, which suggests yet another reason for the regression, namely that
using zend_str_tolower would convert the characters that
zend_str_tolower_nlc missed.

As for the real reason why the bug reappeared, we can continue on in our
historical examination. Revision 99001 (
http://svn.php.net/viewvc?view=revision&revision=99001, also authored
by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all
internal Zend case conversion use ASCII. iliaa had this to say about the
change (http://news.php.net/php.zend-engine.cvs/478):

It appears that there no reason to keep both zend_str_tolower_nlc and

zend_str_tolower. zend_str_tolower_nlc can be safely renamed to
zend_str_tolower. The places it is used in, do not appear to depend on
locale. For people who do need it there is an alternative php function
php_strtolower, which they can use, which does respect the locale. So, if
there are no objections I'll prepare a patch that will change
zend_str_tolower_nlc to zend_str_tolower.

Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057,
authored by sterling) adds zend_str_tolower for use in
fast_call_user_function, which makes use of tolower rather than a custom
conversion. Revision 128060 (
http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
changes zend_str_tolower to use tolower instead of its custom ASCII-based
conversion. The commit message is: "make this faster and sexier". Within
these revisions, zend_lookup_class is case sensitive. This change, in
combination with 99001, mask the reason for the custom conversion.

Introduction of zend_tolower and use of tolower_l was introduced by
revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372,
authored by stas (hi, Stas!)). The commit message is: "Improve
tolower()-related functions on Windows and VC2005 by caching locale and
using tolower_l function."

There are plenty of other edits to Zend functions affecting case handling
(look over the commit messages listed in

http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000
)
that make similar tweaks involving case conversion and the character
encoding. What are we to conclude from all this? That the custom conversion
was a bug fix was lost as the file was edited and different people worked
on it. In other words, the fix was not lost due to a conscious decision
made by anyone, but rather the typical reason for regression (in the
original sense of the word): there's too much for anyone to keep all of it
in mind at once, so someone can easily re-introduce a bug without being
aware of it.

I trust this demonstrates that "there must be an undisclosed reason" isn't
a justifiable reason not to implement my proposed solution.

The abstract property that makes a locale problematic is obvious. I

was looking for specific locales, as they need to be identified for a
complete solution.

I'm not locale expert. Given the public complaints/bugs we can, in
practice, assume this affects Turkish and Azerbaijani only. (I don't know
about Kurdish)

Kurdish is mentioned by Mike and Tokul in the comments for the bug. I
could easily have come to the same conclusion, but I want an answer from
someone who knows without needing to make any assumptions. Are there any
locale experts (or someone willing to put in the leg-work) reading this
with a conclusive answer to my question about problematic locales?

thanks for digging this out.

ps: you had a few extra > at the end of the first lines of your sentences,
I experienced similar problems with gmail, the solution for me was to
always put an extra new line after the quoted text.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

13 years ago by Ferenc Kovacs — view source

unread

ps: you had a few extra > at the end of the first lines of your sentences,
I experienced similar problems with gmail, the solution for me was to
always put an extra new line after the quoted text.

what I meant is the beginning of the first line, not the end.

--
Ferenc Kovács
@Tyr43l - http://tyrael.hu

13 years ago by Hartmut Holzgraefe — view source

unread

http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
changes zend_str_tolower to use tolower instead of its custom ASCII-based
conversion. The commit message is: "make this faster and sexier". Within
these revisions, zend_lookup_class is case sensitive. This change, in
combination with 99001, mask the reason for the custom conversion.

Argh .... STERLING!!!111

ok, part of the story seems to be that i can't find the regression test
tests/lang/035.phpt that i mentioned in bug #18556 anywhere. In the 5.x
code base this is a test for some Expection related stuff, and in the
latest 4.x branch the highest test number in test/lang is 034.phpt

So it seems as if i somehow never really committed my test case and
so Sterling, not being aware of the "turkish" history, unfixed things
during micro optimization withozut anything in place to warn him about
the regression he introduced :(

(AFAIR it was me back then who first stumbled about "i"!=tolower("I")
in tr_TR after noticing that most of our "Image functions don't work
even though the gd extension is active" came from Turkey ...)

--
hartmut

13 years ago by C.Koy — view source

unread

Hi,
As of 5.3.0 this bug does not exist for function names. Only classes and
interfaces.

Could this be a clue for how to fix it for those as well?

13 years ago by Galen Wright-Watson — view source

unread

As of 5.3.0 this bug does not exist for function names. Only classes and
interfaces.

Turns out, if you cause a function to be called dynamically by (e.g.) using
a variable function, the bug will surface.

&lt;?php
setlocale(LC_CTYPE, 'tr_TR');
function IJK() {}
# succeeds
IJK();
$f = 'IJK';
# causes Fatal error: Call to undefined function IJK()
$f();

In contrast, if you set the locale for LC_CTYPE on the command line, the
bug doesn't arise at all because the compilation and execution phases both
use the same locale.

Could this be a clue for how to fix it for those as well?

Function names are generally resolved at compile time (dynamic function
names are resolved at run time, which is why the bug surfaces for them),
before the call to setlocale in the script has been executed. Class name
resolution is put off until execution time for autoloading and possibly
other purposes. Converting class names to lowercase at compile time may
work. A quick glance at the source shows that class_name,
fully_qualified_class_name and class_name_reference all depend on
namespace_name, which is the rule that is responsible for the parsing of
the class name.

namespace_name:
 `T_STRING` { $$ = $1; }
 | namespace_name `T_NS_SEPARATOR` T_STRING {

zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
;

However, static_scalar is also dependent on namespace_name, and I don't
believe that symbol should be made case-insensitive. Creating an additional
symbol for case-independency would allow a more targeted approach. The
various class symbols would then rely on this new symbol, rather than
namespace_name.

lc_namespace_name:

T_STRING { zend_str_tolower($1); $$ = $1; }
| lc_namespace_name T_NS_SEPARATOR T_STRING { zend_str_tolower($3);
zend_do_build_namespace_name(&$$, &$1, &$3 TSRMLS_CC); }
;

Converting class names to lower case early may have additional
consequences. It may affect class names in error messages, for example (I
didn't dig deep enough to determine this). CLASS should be unaffected
(when defining a class, the class name is parsed as a T_STRING; the value
for CLASS comes from this symbol). It also won't resolve the bug for
dynamic names. I suspect that altering variable_class_name and
dynamic_class_name_reference in a manner described previously (use a custom
lowercase conversion or temporarily switch locale) to convert the name
would resolve the bug in the dynamic case for class names. Changing a
number of the production rules for function_call in a similar manner should
resolve the bug for dynamic function call. Again, there will likely be
unintended consequences. Alternatively, updating
zend_do_begin_dynamic_function_call() and zend_do_fetch_class() to use
custom conversion should resolve the bug in the dynamic case.

I like the idea of using the system default locale for name conversion
(making name resolution independent of the current locale), but am
concerned that it will make name lookup slow. Instead, a second set of
locale-independent, unicode-aware conversion functions (basically, iliaa's
original solution, but Unicode compatible) to be used for identifiers would
make name resolution independent of the current locale. Any time an
identifiers needs to be converted, it would use one of these functions. As
a run-time optimization, non-dynamic class names could use the system
locale conversion, but that would be a separate thing from resolving this
bug.

13 years ago by Galen Wright-Watson — view source

unread

On Tue, May 1, 2012 at 11:11 AM, Galen Wright-Watson ww.galen@gmail.comwrote:

[...] Instead, a second set of locale-independent, unicode-aware
conversion functions (basically, iliaa's original solution, but Unicode
compatible) to be used for identifiers would make name resolution
independent of the current locale. [...]

I believe all these functions would need to do is use tolower, rather than
tolower_l. So, perhaps the new functions should get the old names, and the
old functions should get "_l" appended to their names.

13 years ago by C.Koy — view source

unread

As of 5.3.0 this bug does not exist for function names. Only classes and
interfaces.

Turns out, if you cause a function to be called dynamically by (e.g.) using
a variable function, the bug will surface.
 &lt;?php
 setlocale(LC_CTYPE, 'tr_TR');
 function IJK() {}
 # succeeds
 IJK();

If literal function call precedes the function definition, that would
fail too in 5.2.17, but not in 5.3.0.
What has changed in this regard 5.2->5.3 ?

 $f = 'IJK';
 # causes Fatal error: Call to undefined function IJK()
 $f();
In contrast, if you set the locale for LC_CTYPE on the command line, the
bug doesn't arise at all because the compilation and execution phases both
use the same locale.

So, the bug also arises if a script started in 'tr_TR' env locale sets
its locale to 'en_US' at runtime.

[...]

I like the idea of using the system default locale for name conversion
(making name resolution independent of the current locale), but am

As I stated above, the locale the script was started in may not always
be 'en_US' or 'C'. (assuming that's what you mean by "system default
locale")

By the way, I noticed a setlocale(LC_CTYPE, "") call in
php_module_startup()/main.c, but can't figure if it has any relevance to
this bug.

regards,

13 years ago by Galen Wright-Watson — view source

unread

>
>
>>
>>
>> As of 5.3.0 this bug does not exist for function names. Only classes and
>>> interfaces.
>>>
>>>
>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>> using
>> a variable function, the bug will surface.
>>
>> <?php
>> setlocale(LC_CTYPE, 'tr_TR');
>> function IJK() {}
>> # succeeds
>> IJK();
>>
>
> If literal function call precedes the function definition, that would fail
> too in 5.2.17, but not in 5.3.0.
> What has changed in this regard 5.2->5.3 ?
>
>
Do you mean something like the following?

<?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
setlocale(LC_CTYPE, 'en_US');
function IJK() {echo __FUNCTION__, "\n";}

I couldn't get it to generate an error under PHP 5.2.17. What am I missing?

>
>> In contrast, if you set the locale for `LC_CTYPE` on the command line, the
>> bug doesn't arise at all because the compilation and execution phases both
>> use the same locale.
>>
>>
> So, the bug also arises if a script started in 'tr_TR' env locale sets its
> locale to 'en_US' at runtime.
>
>
Yup.

$ LC_CTYPE=tr_TR php
<?php
setlocale(LC_CTYPE, 'en_US');
class I {}
$i = new I;
^D
Fatal error: Class 'I' not found in - on line 4

Call Stack:
0.3740 630760 1. {main}() -:0

I should say that the Vulcan Logic Disassembler has been very helpful to me
in exploring this bug. Thank you, Derick Rethans and the rest of the VLD
team. If you haven't tried it, check it out.

> [...]
>
>
>
>> I like the idea of using the system default locale for name conversion
>> (making name resolution independent of the current locale), but am
>>
>
> As I stated above, the locale the script was started in may not always be
> 'en_US' or 'C'. (assuming that's what you mean by "system default locale")
>
>
That's indeed what I meant; basically, the locales specified in the
`LC_CTYPE` &c. environment variables.

It shouldn't matter that the default locale isn't "en_US" or "C", as long
as PHP always uses the same locale for identifiers both during compilation
and at run-time. Of course, it also makes a certain amount sense to
explicitly decide that PHP will use a specific locale for identifiers. I
avoided suggesting that route to avoid any issues about what locales will
be universally available.

> By the way, I noticed a setlocale(LC_CTYPE, "") call in
> php_module_startup()/main.c, but can't figure if it has any relevance to
> this bug.
>
>
That would set the locale to whatever the platform uses natively. Without
the call, the locale would be "POSIX"/"C", according to the POSIX doc (
http://pubs.opengroup.org/onlinepubs/009604499/functions/setlocale.html).
It doesn't seem terribly relevant to bug 18556, since all that matters
regarding the initial locale is that its lowercase conversion is different
from the locale that's used at run-time. If I had to guess why the locale
is set to the platform native, it's so that numeric, currency and date
formatting will be consistent with the rest of the system.

13 years ago by C.Koy — view source

unread

>
>
>>
>>
>>>
>>>
>>> As of 5.3.0 this bug does not exist for function names. Only classes and
>>>> interfaces.
>>>>
>>>>
>>>> Turns out, if you cause a function to be called dynamically by (e.g.)
>>> using
>>> a variable function, the bug will surface.
>>>
>>> <?php
>>> setlocale(LC_CTYPE, 'tr_TR');
>>> function IJK() {}
>>> # succeeds
>>> IJK();
>>>
>>
>> If literal function call precedes the function definition, that would fail
>> too in 5.2.17, but not in 5.3.0.
>> What has changed in this regard 5.2->5.3 ?
>>
>>
> Do you mean something like the following?
>
> <?php
> setlocale(LC_CTYPE, 'tr_TR');
> IJK();
> setlocale(LC_CTYPE, 'en_US');
> function IJK() {echo __FUNCTION__, "\n";}
>
> I couldn't get it to generate an error under PHP 5.2.17. What am I missing?
>

Try this with 5.2.17:

<?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
function IJK() {}

13 years ago by Galen Wright-Watson — view source

unread

As of 5.3.0 this bug does not exist for function names. Only classes
and

interfaces.

Turns out, if you cause a function to be called dynamically by (e.g.)

using
a variable function, the bug will surface.
&lt;?php
setlocale(LC_CTYPE, 'tr_TR');
function IJK() {}
# succeeds
IJK();
If literal function call precedes the function definition, that would
fail
too in 5.2.17, but not in 5.3.0.
What has changed in this regard 5.2->5.3 ?

Do you mean something like the following?
&lt;?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
setlocale(LC_CTYPE, 'en_US');
function IJK() {echo __FUNCTION__, "\n";}
I couldn't get it to generate an error under PHP 5.2.17. What am I
missing?
Try this with 5.2.17:
 &lt;?php
 setlocale(LC_CTYPE, 'tr_TR');
 IJK();
 function IJK() {}

That also ran without error for me. I'm not sure how to account for the
different behavior. Here are the details of the system that I'm using:

$ uname -a

Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
$ php -v
PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
Technologies
with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH

13 years ago by C.Koy — view source

unread

That also ran without error for me. I'm not sure how to account for the
different behavior. Here are the details of the system that I'm using:

$ uname -a

Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
$ php -v
PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
Technologies
with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH

I've been experimenting with bare-bones PHP I've built from pristine
sources so far. Don't you think you should do the same, in dealing with
such a bug?

Here's the top portion of my 'php -i' output:

~/proj$ php-5.2.17/sapi/cli/php -i|head -28
phpinfo()
PHP Version => 5.2.17

System => Linux trvuntu 2.6.32-41-generic #88-Ubuntu SMP Thu Mar 29
13:08:43 UTC 2012 i686
Build Date => May 4 2012 20:03:30
Configure Command => './configure' '--disable-all' '--enable-cli'
'--enable-vld'
Server API => Command Line Interface
Virtual Directory Support => disabled
Configuration File (php.ini) Path => /usr/local/lib
Loaded Configuration File => (none)
Scan this dir for additional .ini files => (none)
additional .ini files parsed => (none)
PHP API => 20041225
PHP Extension => 20060613
Zend Extension => 220060519
Debug Build => no
Thread Safety => disabled
Zend Memory Manager => enabled
IPv6 Support => enabled
Registered PHP Streams => php, file, data, http, ftp
Registered Stream Socket Transports => tcp, udp, unix, udg
Registered Stream Filters => string.rot13, string.toupper,
string.tolower, string.strip_tags, convert.*, consumed

This program makes use of the Zend Scripting Language Engine:
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies

13 years ago by Wim Wisselink — view source

unread

As of 5.3.0 this bug does not exist for function names. Only classes
and

interfaces.

Turns out, if you cause a function to be called dynamically by (e.g.)

using
a variable function, the bug will surface.
 &lt;?php
 setlocale(LC_CTYPE, 'tr_TR');
 function IJK() {}
 # succeeds
 IJK();
If literal function call precedes the function definition, that would
fail
too in 5.2.17, but not in 5.3.0.
What has changed in this regard 5.2->5.3 ?

Do you mean something like the following?
<?php
setlocale(LC_CTYPE, 'tr_TR');
IJK();
setlocale(LC_CTYPE, 'en_US');
function IJK() {echo FUNCTION, "\n";}
I couldn't get it to generate an error under PHP 5.2.17. What am I
missing?
Try this with 5.2.17:
  &lt;?php
  setlocale(LC_CTYPE, 'tr_TR');
  IJK();
  function IJK() {}
That also ran without error for me. I'm not sure how to account for the
different behavior. Here are the details of the system that I'm using:

$ uname -a

Linux n10 3.2.6mtv10 #1 SMP Wed Mar 14 06:22:06 PDT 2012 x86_64 GNU/Linux
$ php -v
PHP 5.2.17 with Suhosin-Patch 0.9.7 (cli) (built: May 3 2012 12:16:32)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
with Zend Optimizer v3.3.9, Copyright (c) 1998-2009, by Zend
Technologies
with Suhosin v0.9.32.1, Copyright (c) 2007-2010, by SektionEins GmbH
Try to var_dump the setLocale and see if it return the specified locale
or just 'false'. If false try the following:

setlocale(LC_ALL, 'tr_TR.UTF-8');

I had the same issue.

13 years ago by C.Koy — view source

unread

Try to var_dump the setLocale and see if it return the specified locale
or just 'false'.

I thought he was way past that control. Anyway, a simple test should
suffice:

setlocale(LC_CTYPE, 'tr_TR') or exit('setlocale failed\n');

13 years ago by Galen Wright-Watson — view source

unread

I've been experimenting with bare-bones PHP I've built from pristine
sources so far. Don't you think you should do the same, in dealing with
such a bug?

My personal system is a BSD derivative; the Turkish locales on these use
latin rather than Turkish case conversion (and installing a proper Turkish
locale is a mess), so I've been testing on another system. I've been
hesitant to use its resources too heavily for professional reasons. Running
a small PHP script is one thing; though time and space required for a PHP
build isn't large on modern systems, I can't justify doing so since it's
not directly related to site operations.

Try to var_dump the setLocale and see if it return the specified locale or
just 'false'. If false try the following:

setlocale(LC_ALL, 'tr_TR.UTF-8');

I had previously tested the locale by using "setlower('I')", as it tests
both that the locale exists and uses Turkish-langage case conversion. The
systems where I tested C.Koy's script passed the "setlower" test. Turned
out to be the Zend optimizer that prevented the error. With it not loaded,
the example script failed with a "Fatal error: Call to undefined function
IJK()" error message.

Here's a breakdown:

In both PHP 5.2 and 5.3, calling a function before defining it results in a
dynamic call (INIT_FCALL_BY_NAME+DO_FCALL_BY_NAME). Here's the PHP 5.2 dump
of C.Koy's example:

line # * op fetch ext return
operands

 2     0  >   FETCH_CONSTANT                                   ~0

'LC_CTYPE'
1 SEND_VAL
~0
2 SEND_VAL
'tr_TR'
3 DO_FCALL 2
'setlocale'
3 4 INIT_FCALL_BY_NAME
'IJK'
5 DO_FCALL_BY_NAME 0
4 6 NOP
5 7 > RETURN 1
8* > ZEND_HANDLE_EXCEPTION

Here's the 5.3 dump:
line # * op fetch ext return
operands

 2     0  >   EXT_STMT
       1      EXT_FCALL_BEGIN
       2      SEND_VAL                                                 2
       3      SEND_VAL

'tr_TR'
4 DO_FCALL 2
'setlocale'
5 EXT_FCALL_END
3 6 EXT_STMT
7 INIT_FCALL_BY_NAME
'ijk', 'IJK'
8 EXT_FCALL_BEGIN
9 DO_FCALL_BY_NAME 0
10 EXT_FCALL_END
4 11 EXT_STMT
12 NOP
5 13 > RETURN 1

From line 7 in the 5.3 dump, we see 5.3 converts the function name to
lowercase during compilation, but 5.2 doesn't. Examining the source
confirms this: you can see the lowercase conversion in 5.3's
zend_do_begin_dynamic_function_call on lines 1659 (for namespaced calls)
and 1683 (for non-namespaced calls) of zend_compile.c (
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3_10/Zend/zend_compile.c?revision=323023&view=markup#l1683),
while there's no such conversion in the same function in 5.2 (
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_2/Zend/zend_compile.c?view=markup&pathrev=302150#l1450
).

5.3 only performs case conversion if the function name is a CONST
expression, which is why defining the function after calling it works but
calling a function with a variable name breaks. Correspondingly, the
ZEND_INIT_FCALL_BY_NAME_SPEC_HANDLER (in zend_vm_execute.h) uses the
first operand (which is already lowercased), while the other
INIT_FCALL_BY_NAME opcode handlers (ZEND_INIT_FCALL_BY_NAME_SPEC_HANDLER)
use the second, non-lowercased operand.

The 5.2 INIT_FCALL_BY_NAME opcode handlers only ever use the second,
un-lowercased operand.

So, what does this mean for fixing the bug? Not so much when the function
or class is stored in a variable, since these can't be converted to
lowercase at compile time without converting all variables, which is too
wasteful of both time and space (as both the unconverted and converted
strings would need to be stored). For object instantiation,
zend_do_begin_new_object gets the class name ultimately from the
namespace_name rule. zend_do_begin_new_object could then take the resulting
znode and create a second, lowercased copy, storing it as the second
operand. ZEND_NEW_SPEC_HANDLER would then be altered to use the second
operand (if not UNUSED) to instantiate the object. This certainly seems a
valid alternative to a lowercasing version of the namespace_name rule; it's
not as far reaching, which may be good (in that it has less impact) and bad
(in that there may be other instances of this bug that it won't fix).

However, neither the dual-operand solution nor lc_namespace_name will fix
the bug when the identifier is stored in a variable. That requires fixing
the run-time portion of PHP, in particular zend_fetch_class (or
zend_do_begin_class_member_function_call, zend_do_begin_new_object and
likely others) and the INIT_FCALL_BY_NAME handlers.

I get the feeling that there are still other cases yet to be discovered
where this bug surfaces.