Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:48001 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5794 invoked from network); 19 Apr 2010 03:59:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Apr 2010 03:59:18 -0000 Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain adamharvey.name from 74.125.92.27 cause and error) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 74.125.92.27 qw-out-2122.google.com Received: from [74.125.92.27] ([74.125.92.27:42106] helo=qw-out-2122.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2C/D0-33034-515DBCB4 for ; Sun, 18 Apr 2010 23:59:17 -0400 Received: by qw-out-2122.google.com with SMTP id 3so613718qwe.59 for ; Sun, 18 Apr 2010 20:59:14 -0700 (PDT) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.229.212.148 with HTTP; Sun, 18 Apr 2010 20:58:54 -0700 (PDT) Date: Mon, 19 Apr 2010 11:58:54 +0800 X-Google-Sender-Auth: 9e4ad67632675970 Received: by 10.229.225.73 with SMTP id ir9mr4300567qcb.22.1271649554292; Sun, 18 Apr 2010 20:59:14 -0700 (PDT) Message-ID: To: internals@lists.php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Turkish/Azeri locale support From: aharvey@php.net (Adam Harvey) As at least some of you would already be aware, there's a long-standing issue with using PHP in a Turkish or Azeri locale, namely that case-insensitive lookups within the Zend engine (method names, for example) fail on lookups involving upper-case I characters, since lower-case I in those languages is =C4=B1 instead of i (note the lack of a dot). The long term plan for this, per bug #35050 and any number of duplicates, was to deal with it in PHP 6. Since PHP 6 isn't going to happen in its original form, I think we're going to need to revisit how we want to deal with this. There's a patch linked in the bug from Tomas Kuliavas and Marcus that fixes the problem by simply redefining zend_tolower() to a simple locale-insensitive ASCII tolower() function, which does fix the Turkish and Azeri locales. The potential breakage from this is that single-byte locales will no longer get case-insensitive lookups of non-ASCII characters: for example, somebody using fr_FR.ISO-8859-1 as a locale could no longer call a method =C3=89() as =C3=A9(). Since it doesn't break anything when us= ing multi-byte locales (which have never had case-insensitive lookups anyway since the Zend Engine uses the single-byte tolower() internally), my inclination would be to apply the patch on trunk and document it as a BC issue. I've uploaded an updated version of Tomas's patch that applies cleanly to trunk to http://www.adamharvey.name/patches/35050/zend_operators.c.diff and a phpt file to test the fix to http://www.adamharvey.name/patches/35050/bug35050.phpt. It's likely that the test would require massaging before being committed to work on Windows, but since I don't have a Windows development box readily available and don't know a thing about how Windows implements locale support, this would require help from someone familiar with the platform. So: thoughts; concerns; alternate approaches? It would be nice to have this sorted for PHP.next. Thanks, Adam