Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:60266
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.210.170 as permitted sender)
MIME-Version: 1.0
Date: Mon, 23 Apr 2012 16:06:35 -0700
Message-ID: <CA+Ky_uNZ_zmi8cwU0XLw+kQ_J=rJh_dRktoSb5pHX7A0hx2Fng@mail.gmail.com>
To: "C.Koy" <can5koy@gmail.com>
Cc: internals@lists.php.net
Content-Type: multipart/alternative; boundary=14dae9340745fd8bd304be60b230
Subject: Fixing bug #18556 (was: Complete case-sensitivity in PHP)
From: ww.galen@gmail.com (Galen Wright-Watson)

--14dae9340745fd8bd304be60b230
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Apr 23, 2012 at 3:22 AM, C.Koy <can5koy@gmail.com> wrote:

> On 4/22/2012 11:32 PM, Galen Wright-Watson wrote:
>
>> 2012/4/22 C.Koy<can5koy@gmail.com>
>>
>>  On 4/21/2012 4:37 AM, Galen Wright-Watson wrote:
>>>
>>
>>  But, I did not start this thread to discuss such bug fix, because:
>>>
>>> 1. It does not take a genius to figure it out, and should take minutes to
>>> implement for someone experienced in the internals. Given the 10 year
>>> span
>>> and dozens of comments/complaints on the bug's entry, it's hard to say
>>> this
>>> issue went unnoticed. So I had to conclude that such fix has quietly been
>>> overruled for performance and/or other undisclosed reasons.
>>>
>>>
>> Why does it matter if a solution is simple?
>>
>
> It doesn't matter, you've misunderstood.
>

You've misunderstood me. While you may have set out with the goal of
discussing making PHP completely case-sensitive, that doesn't preclude
others from suggesting fixes for the specific bug you mention. Indeed, some
of the first e-mails were around the bug, and not just in the context of
case-sensitive PHP.

I didn't introduce the custom case conversion solution as a
counter-argument to case-sensitive PHP, and I wasn't asking for feedback on
that solution in the context of case-sensitive PHP; I was asking for
reasons why it wouldn't be a suitable solution for the bug. The only place
case-sensitive PHP enters into it was your statement that:

As the recent comments on that page indicate, there's not a deterministic
> way to resolve this issue, apart from eliminating tolower() calls for
> function/class names during lookup. Hence totally case-sensitive PHP.


My proposition shows this is isn't entirely true, and branches off from the
original discussion at that point. I'm focusing on fixing the bug, which is
a smaller issue than case-sensitivity. Discussion of case-sensitivity can
continue without regard to the custom conversion solution. As such, I've
changed the subject of this e-mail.

Furthermore, going back to your original e-mail, you explicitly stated it
was about the bug, making case sensitivity subordinate to it.

This post is about bug #18556
(https://bugs.php.net/bug.php?**id=18556<https://bugs.php.net/bug.php?id=18556>)
> which is a decade old.


I hope you can see why others might take the bug to be the context for
case-sensitivity, rather than the other way around.

And that's what makes me curious and confused about why this bug still
> exists. See, I'm drawing a conclusion with what little information I have,
> and stating the reasonings it's based on (first two statements).
> Overall, that and the item following it were an explanation of "why I'm
> suggesting a major feature change in solution to a specific bug", although
> noone directly asked me to.
>
> In other words, you jumped to a conclusion. I wasn't asking about possible
reasons why custom conversion hasn't been accepted as the solution to this
bug. Neither was I asking why you didn't suggest it. I was (and still am)
asking for explicit, justifiable reasons as to whether or not it's a
suitable solution to the bug.


>
>> If it's already been rejected privately, it's time to bring the reasons
>> into the open (which is why I asked). If not, it should be considered
>> publicly.
>>
>
> A comment dated 2002-09-26 on bug's page states the bug is fixed. The next
> comment dated 2006-02-17 states it reappeared.
> I don't know who did what 10, 6 years ago but it's been revoked. Why?
> That was the main reason I deemed this bug not fixable, hence suggest
> other ways to resolve.
>
> I don't know either, but I'm not about to disregard potential fixes if
they haven't been publicly discussed. The regression could just as easily
have been a mistake. From looking at the original fix (revision 97040,
http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa)
and the bug comments, something along the lines of what I'm suggesting has
been suggested and even implemented before, but there's no real discussion
of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't
entirely suitable as there are uppercase characters that it doesn't
convert, which suggests yet another reason for the regression, namely that
using zend_str_tolower would convert the characters that
zend_str_tolower_nlc missed.

As for the real reason why the bug reappeared, we can continue on in our
historical examination. Revision 99001 (
http://svn.php.net/viewvc?view=revision&revision=99001, also authored
by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all
internal Zend case conversion use ASCII. iliaa had this to say about the
change (http://news.php.net/php.zend-engine.cvs/478):

It appears that there no reason to keep both zend_str_tolower_nlc and
> zend_str_tolower.  zend_str_tolower_nlc can be safely renamed to
> zend_str_tolower. The places it is used in, do not appear to depend on
> locale.  For people who do need it there is an alternative php function
> php_strtolower, which they can use, which does respect the locale. So, if
> there are no objections I'll prepare a patch that will change
> zend_str_tolower_nlc to zend_str_tolower.


Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057,
authored by sterling) adds zend_str_tolower for use in
fast_call_user_function, which makes use of tolower rather than a custom
conversion. Revision 128060 (
http://svn.php.net/viewvc?view=revision&revision=128060, same author) then
changes zend_str_tolower to use tolower instead of its custom ASCII-based
conversion. The commit message is: "make this faster and sexier". Within
these revisions, zend_lookup_class is case sensitive. This change, in
combination with 99001, mask the reason for the custom conversion.

Introduction of zend_tolower and use of tolower_l was introduced by
revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372,
authored by stas (hi, Stas!)). The commit message is: "Improve
tolower()-related functions on Windows and VC2005 by caching locale and
using tolower_l function."

There are plenty of other edits to Zend functions affecting case handling
(look over the commit messages listed in
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000)
that make similar tweaks involving case conversion and the character
encoding. What are we to conclude from all this? That the custom conversion
was a bug fix was lost as the file was edited and different people worked
on it. In other words, the fix was not lost due to a conscious decision
made by anyone, but rather the typical reason for regression (in the
original sense of the word): there's too much for anyone to keep all of it
in mind at once, so someone can easily re-introduce a bug without being
aware of it.

I trust this demonstrates that "there must be an undisclosed reason" isn't
a justifiable reason not to implement my proposed solution.

The abstract property that makes a locale problematic is obvious. I
>> was looking for specific locales, as they need to be identified for a
>> complete solution.
>
>
> I'm not locale expert. Given the public complaints/bugs we can, in
> practice, assume this affects Turkish and Azerbaijani only. (I don't know
> about Kurdish)
>
> Kurdish is mentioned by Mike and Tokul in the comments for the bug. I
could easily have come to the same conclusion, but I want an answer from
someone who knows without needing to make any assumptions. Are there any
locale experts (or someone willing to put in the leg-work) reading this
with a conclusive answer to my question about problematic locales?

--14dae9340745fd8bd304be60b230--