Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:78208 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 57842 invoked from network); 21 Oct 2014 17:42:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Oct 2014 17:42:49 -0000 Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 108.166.43.115 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 108.166.43.115 smtp115.ord1c.emailsrvr.com Linux 2.6 Received: from [108.166.43.115] ([108.166.43.115:49052] helo=smtp115.ord1c.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 17/0A-02077-81B96445 for ; Tue, 21 Oct 2014 13:42:48 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp23.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 91E7F280481; Tue, 21 Oct 2014 13:42:45 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp23.relay.ord1c.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 574AF2803C2; Tue, 21 Oct 2014 13:42:44 -0400 (EDT) X-Sender-Id: smalyshev@sugarcrm.com Received: from Stass-MacBook-Pro.local (108-66-6-48.lightspeed.sntcca.sbcglobal.net [108.66.6.48]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:465 (trex/5.2.13); Tue, 21 Oct 2014 17:42:44 GMT Message-ID: <54469B13.3040405@sugarcrm.com> Date: Tue, 21 Oct 2014 10:42:43 -0700 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Rowan Collins , "internals@lists.php.net" References: <1413875212.2624.3.camel@localhost.localdomain> <1413883549.2624.22.camel@localhost.localdomain> <54463347.6020002@lsces.co.uk> <54463F65.8070408@gmail.com> In-Reply-To: <54463F65.8070408@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [RFC] UString From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > Just a quick point: most of the core is not ASCII. PHP strings are byte > strings, completely divorced from any encoding. A few native functions > assume ISO8859-1 (or possibly Windows CP1252), but mostly they just > juggle which ever bytes you give them. True, but not all extensions and functions behave this way. Some (especially with intl, but not only) assume it's utf-8, for example, and for some utf-8 is a changeable default, which in practice often becomes the used encoding since people are not aware of need to track their encoding and most of them do use utf-8 anyway. > The main exception I can think of is that numbers are often handled > specially, with digits and separators as defined by ASCII. But since > we're talking UTF-8, that doesn't need to change. More interesting case actually is, well, case conversion. We unknowingly used locale-dependent lowercasing routines until the inevitable encounter with the dreaded Turkish 'i'. At which point we switched to forced ASCII. So identifiers in the engine are kind of assumed to be ASCII, even though you can somethimes sneak non-ASCII past it and it will work, but weirdly. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/