Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:60266 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 27947 invoked from network); 23 Apr 2012 23:07:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Apr 2012 23:07:19 -0000 Authentication-Results: pb1.pair.com header.from=ww.galen@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=ww.galen@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.210.170 as permitted sender) X-PHP-List-Original-Sender: ww.galen@gmail.com X-Host-Fingerprint: 209.85.210.170 mail-iy0-f170.google.com Received: from [209.85.210.170] ([209.85.210.170:33752] helo=mail-iy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id EB/C0-22362-6A0E59F4 for ; Mon, 23 Apr 2012 19:07:19 -0400 Received: by iaeh11 with SMTP id h11so106185iae.29 for ; Mon, 23 Apr 2012 16:07:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:cc:content-type; bh=RPx5QZlfAn8vciM80pa9BJA3/voisFhS3YbORMMrUUw=; b=yWkO9Zj3dC9Twcgq9sodJ/NWpXoLxC52SlYi8ELbeGhXNsZhrlTcCVBnPMCLyZ8QtO Hd2R1jc111QWwzjypKmn7JMUkVhhGufxlreETdedEDl6AFmiRHUWJz0es/9kLFB+5HXo HvUMCY8SDp4nCw+u/5jUs7z/UfsBaaUAhpcpuaTeWRRmEHbzwbZIgZid10Cwhg8rvBNA 18cwMVp83pWtnQuj6iautwzwU1wn7ndIDJkP8iQwFiU2r9KGZypBWwAdMy3lueoIyf34 HBdyF8CtF2FuBElsQpzQC7TgYs/rotOEaRULOfq9JJNGRqRA/+C/erLVkeyoIQ4btaH/ x7mw== Received: by 10.50.51.226 with SMTP id n2mr7985596igo.68.1335222435941; Mon, 23 Apr 2012 16:07:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.144.201 with HTTP; Mon, 23 Apr 2012 16:06:35 -0700 (PDT) Date: Mon, 23 Apr 2012 16:06:35 -0700 Message-ID: To: "C.Koy" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary=14dae9340745fd8bd304be60b230 Subject: Fixing bug #18556 (was: Complete case-sensitivity in PHP) From: ww.galen@gmail.com (Galen Wright-Watson) --14dae9340745fd8bd304be60b230 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Apr 23, 2012 at 3:22 AM, C.Koy wrote: > On 4/22/2012 11:32 PM, Galen Wright-Watson wrote: > >> 2012/4/22 C.Koy >> >> On 4/21/2012 4:37 AM, Galen Wright-Watson wrote: >>> >> >> But, I did not start this thread to discuss such bug fix, because: >>> >>> 1. It does not take a genius to figure it out, and should take minutes to >>> implement for someone experienced in the internals. Given the 10 year >>> span >>> and dozens of comments/complaints on the bug's entry, it's hard to say >>> this >>> issue went unnoticed. So I had to conclude that such fix has quietly been >>> overruled for performance and/or other undisclosed reasons. >>> >>> >> Why does it matter if a solution is simple? >> > > It doesn't matter, you've misunderstood. > You've misunderstood me. While you may have set out with the goal of discussing making PHP completely case-sensitive, that doesn't preclude others from suggesting fixes for the specific bug you mention. Indeed, some of the first e-mails were around the bug, and not just in the context of case-sensitive PHP. I didn't introduce the custom case conversion solution as a counter-argument to case-sensitive PHP, and I wasn't asking for feedback on that solution in the context of case-sensitive PHP; I was asking for reasons why it wouldn't be a suitable solution for the bug. The only place case-sensitive PHP enters into it was your statement that: As the recent comments on that page indicate, there's not a deterministic > way to resolve this issue, apart from eliminating tolower() calls for > function/class names during lookup. Hence totally case-sensitive PHP. My proposition shows this is isn't entirely true, and branches off from the original discussion at that point. I'm focusing on fixing the bug, which is a smaller issue than case-sensitivity. Discussion of case-sensitivity can continue without regard to the custom conversion solution. As such, I've changed the subject of this e-mail. Furthermore, going back to your original e-mail, you explicitly stated it was about the bug, making case sensitivity subordinate to it. This post is about bug #18556 (https://bugs.php.net/bug.php?**id=18556) > which is a decade old. I hope you can see why others might take the bug to be the context for case-sensitivity, rather than the other way around. And that's what makes me curious and confused about why this bug still > exists. See, I'm drawing a conclusion with what little information I have, > and stating the reasonings it's based on (first two statements). > Overall, that and the item following it were an explanation of "why I'm > suggesting a major feature change in solution to a specific bug", although > noone directly asked me to. > > In other words, you jumped to a conclusion. I wasn't asking about possible reasons why custom conversion hasn't been accepted as the solution to this bug. Neither was I asking why you didn't suggest it. I was (and still am) asking for explicit, justifiable reasons as to whether or not it's a suitable solution to the bug. > >> If it's already been rejected privately, it's time to bring the reasons >> into the open (which is why I asked). If not, it should be considered >> publicly. >> > > A comment dated 2002-09-26 on bug's page states the bug is fixed. The next > comment dated 2006-02-17 states it reappeared. > I don't know who did what 10, 6 years ago but it's been revoked. Why? > That was the main reason I deemed this bug not fixable, hence suggest > other ways to resolve. > > I don't know either, but I'm not about to disregard potential fixes if they haven't been publicly discussed. The regression could just as easily have been a mistake. From looking at the original fix (revision 97040, http://svn.php.net/viewvc?view=revision&revision=97040, authored by iliaa) and the bug comments, something along the lines of what I'm suggesting has been suggested and even implemented before, but there's no real discussion of it. The original fix (zend_str_tolower_nlc) assumed ASCII, which isn't entirely suitable as there are uppercase characters that it doesn't convert, which suggests yet another reason for the regression, namely that using zend_str_tolower would convert the characters that zend_str_tolower_nlc missed. As for the real reason why the bug reappeared, we can continue on in our historical examination. Revision 99001 ( http://svn.php.net/viewvc?view=revision&revision=99001, also authored by iliaa) replaced zend_str_tolower with zend_str_tolower_nlc, making all internal Zend case conversion use ASCII. iliaa had this to say about the change (http://news.php.net/php.zend-engine.cvs/478): It appears that there no reason to keep both zend_str_tolower_nlc and > zend_str_tolower. zend_str_tolower_nlc can be safely renamed to > zend_str_tolower. The places it is used in, do not appear to depend on > locale. For people who do need it there is an alternative php function > php_strtolower, which they can use, which does respect the locale. So, if > there are no objections I'll prepare a patch that will change > zend_str_tolower_nlc to zend_str_tolower. Revision 128057 (http://svn.php.net/viewvc?view=revision&revision=128057, authored by sterling) adds zend_str_tolower for use in fast_call_user_function, which makes use of tolower rather than a custom conversion. Revision 128060 ( http://svn.php.net/viewvc?view=revision&revision=128060, same author) then changes zend_str_tolower to use tolower instead of its custom ASCII-based conversion. The commit message is: "make this faster and sexier". Within these revisions, zend_lookup_class is case sensitive. This change, in combination with 99001, mask the reason for the custom conversion. Introduction of zend_tolower and use of tolower_l was introduced by revision 224372 (http://svn.php.net/viewvc?view=revision&revision=224372, authored by stas (hi, Stas!)). The commit message is: "Improve tolower()-related functions on Windows and VC2005 by caching locale and using tolower_l function." There are plenty of other edits to Zend functions affecting case handling (look over the commit messages listed in http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_operators.c?view=log&pathrev=225000) that make similar tweaks involving case conversion and the character encoding. What are we to conclude from all this? That the custom conversion was a bug fix was lost as the file was edited and different people worked on it. In other words, the fix was not lost due to a conscious decision made by anyone, but rather the typical reason for regression (in the original sense of the word): there's too much for anyone to keep all of it in mind at once, so someone can easily re-introduce a bug without being aware of it. I trust this demonstrates that "there must be an undisclosed reason" isn't a justifiable reason not to implement my proposed solution. The abstract property that makes a locale problematic is obvious. I >> was looking for specific locales, as they need to be identified for a >> complete solution. > > > I'm not locale expert. Given the public complaints/bugs we can, in > practice, assume this affects Turkish and Azerbaijani only. (I don't know > about Kurdish) > > Kurdish is mentioned by Mike and Tokul in the comments for the bug. I could easily have come to the same conclusion, but I want an answer from someone who knows without needing to make any assumptions. Are there any locale experts (or someone willing to put in the leg-work) reading this with a conclusive answer to my question about problematic locales? --14dae9340745fd8bd304be60b230--