Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:60259 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 20955 invoked from network); 22 Apr 2012 20:33:12 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Apr 2012 20:33:12 -0000 Authentication-Results: pb1.pair.com smtp.mail=ww.galen@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ww.galen@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.210.170 as permitted sender) X-PHP-List-Original-Sender: ww.galen@gmail.com X-Host-Fingerprint: 209.85.210.170 mail-iy0-f170.google.com Received: from [209.85.210.170] ([209.85.210.170:34157] helo=mail-iy0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 67/63-29522-70B649F4 for ; Sun, 22 Apr 2012 16:33:11 -0400 Received: by iaeh11 with SMTP id h11so18721006iae.29 for ; Sun, 22 Apr 2012 13:33:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=0ys+nW5/k7msUM6x8qlfLUF/vk5cHEYnQm5t6tgMQPM=; b=Silow4blPv0uJnv4FjE9BjiQioHr3hArcyB+cuJeLScSyMHhZ+hEKZhGi6O0wg2FeB GixqTQbsI8ToAON2U3NJCgljheRZAqDAPEQ0lOd12wfalayuNuyjrIVOtpWjn+3ZqYPl +CxaczNfxk9KP8rCOV36g5sbUstBkjwBR81SrKWH7/pwQhLA3JFCyVGyFlQzVejZQU4n 9iamIRjaBkXaWHjF2qp7RlgfCXsTO+uTttk1gts1LIYGNF2T2dYUu7fcHCpNWkFhjTVY qfhyNemAhyuGJ1NWhLT119AlZ5TYU3p4B2YqbNuoxB6JSGMBXhJwxiCf9moNuC7lkgoj B6iw== Received: by 10.42.141.133 with SMTP id o5mr9881878icu.13.1335126788944; Sun, 22 Apr 2012 13:33:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.144.201 with HTTP; Sun, 22 Apr 2012 13:32:28 -0700 (PDT) In-Reply-To: References: Date: Sun, 22 Apr 2012 13:32:28 -0700 Message-ID: To: "C.Koy" Cc: internals@lists.php.net Content-Type: multipart/alternative; boundary=90e6ba6e82e4fc34ac04be4a6db7 Subject: Re: [PHP-DEV] Complete case-sensitivity in PHP From: ww.galen@gmail.com (Galen Wright-Watson) --90e6ba6e82e4fc34ac04be4a6db7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2012/4/22 C.Koy > On 4/21/2012 4:37 AM, Galen Wright-Watson wrote: > >> What about instead creating a special-purpose Zend function to normalize >> class names (zend_normalize_class_name, or zend_classname_tolower)? This >> function would examine the current locale and, if it's a problematic one= , >> convert the string to lower case on its own (calling zend_tolower on >> non-problematic characters). Alternatively, zend_normalize_class_name >> could >> switch LC_CTYPE to an appropriate locale (e.g. "UTF-8"; the locale could >> be >> determined at compile time), call zend_str_tolower_copy, then switch bac= k >> before returning. Then, any appropriate function (e.g. >> zend_resolve_class_name, zend_lookup_class_ex, class_exists, class_alia= s) >> would call zend_normalize_class_name instead of zend_str_tolower_copy/ >> zend_str_tolower_dup. >> > > In plain words/pseudo-code, adding an "if statement" at a certain step > should suffice, like: > > 1. lowercase the name; > 2. if the effective locale is tr_XY, then replace every "=C4=B1" with "i"= ; > 3. look up the name; > > For those who have nothing to do with Turkish locales, that should incur > the overhead of an "if" condition only. > > The fix would need to be applied to at least four functions, so adding a new function would be more maintainable. Also, there are locales that don't begin with "tr_" or have "TR" in the locale name, so the condition would need to be more complex. Converting "I" or "=C4=B1" separately from lowercase conversion is less performant than either option I describe, as it requires an extra loop, which is why I didn't bother suggesting it. I suspect switching the locale is most performant, as it doesn't require additional tests, though I haven't examined the cost of setting the locale. > But, I did not start this thread to discuss such bug fix, because: > > 1. It does not take a genius to figure it out, and should take minutes to > implement for someone experienced in the internals. Given the 10 year spa= n > and dozens of comments/complaints on the bug's entry, it's hard to say th= is > issue went unnoticed. So I had to conclude that such fix has quietly been > overruled for performance and/or other undisclosed reasons. > Why does it matter if a solution is simple? If anything, that a fix "does not take a genius" is an argument in its favor, if it also solves the problem. If it's already been rejected privately, it's time to bring the reasons into the open (which is why I asked). If not, it should be considered publicly. > 2. Absent bug #18556, case-sensitive PHP has merits as I stated in other > post and several people voiced opinions in favor. Case-sensitive PHP is > worth considering. > > It is, but it's also a major BC break, hence perhaps better suited for PHP6. Case-sensitivity is also a much bigger issue than this bug. A custom conversion function, on the other hand, produces the minimum impact of any option I've read. As such, it's hopefully a solution for this bug that everyone can agree on. >> Does this bug pop-up for locales other than Turkish, Azerbaijani and >> Kurdish >> ? >> > > Theoretically, this problem occurs for any locales sharing a letter > lowercase of which is different from each other's, and the PHP script > changes its locale among these locales throughout its execution. > > The abstract property that makes a locale problematic is obvious. I was looking for specific locales, as they need to be identified for a complete solution. --90e6ba6e82e4fc34ac04be4a6db7--