Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53483 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 57992 invoked from network); 21 Jun 2011 15:55:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jun 2011 15:55:46 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 77.240.252.9 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 77.240.252.9 avilys.eik.lt Linux 2.6 Received: from [77.240.252.9] ([77.240.252.9:54961] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7C/70-54221-FFEB00E4 for ; Tue, 21 Jun 2011 11:55:44 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 0666B1F527E for ; Tue, 21 Jun 2011 18:55:39 +0300 (EEST) Received: from 89.117.243.195 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Tue, 21 Jun 2011 18:55:39 +0300 (EEST) Message-ID: <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> In-Reply-To: References: <4DFF7A12.8060808@sugarcrm.com> <4E00818C.7040201@lsces.co.uk> <4E008EA3.4000403@lsces.co.uk> Date: Tue, 21 Jun 2011 18:55:39 +0300 (EEST) To: internals@lists.php.net User-Agent: NaSMail/1.7.1 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: =?utf-8?Q?RE:_[PHP-DEV]_foreach=28=29_for_strings?= From: tokul@users.sourceforge.net ("Tomas Kuliavas") 2011.06.21 17:38 John Crenshaw rašė: > Pierre Joye wrote: >> On Tue, Jun 21, 2011 at 1:33 PM, Lester Caine >> wrote: >>> Pierre Joye wrote: >>>>> >>>>> It depended on ICU there, and I would be against making a core thing >>>>> in >>>>>> PHP 5.x depend on ICU. >>>> >>>> It can and should be done as part of intl, actually. >>>> >>>> But that's somehow unrelated to the proposal here, as it is about >>>> byte, not characters :) >>> >>> I believe this may be where some of the new niggles may be coming from? >>> With >>> browsers returning unicode, it may be that some of the 'extra' >>> characters >>> are being returned as multibyte rather than as single bytes? Such as >>> the >>> problem reported on the general list currently. How do we ensure that >>> we are >>> dealing with single byte character strings nowadays? >> >> As it has been stated numerous times in this thread and other, we do >> not do anything with multi bytes systems, unicode, etc. mbstring and >> intl do, but php's string as of now is all about bytes, array of bytes >> if I may describe them this way. >> >> And we can't change this behavior. > > This mindset is fundamentally broken. You can call it a byte array all you > want, but the truth is that 99.999% of the time, when a developer is using > a string they need it for characters, not for bytes, and characters are > not single byte. Even English users tend to submit Unicode range > characters at an alarming rate. If you're using a WYSIWYG editor, Chrome > will submit non-breaking-spaces as the actual UTF8 encoded character, not > as an HTML encoded entity. Whether developers like it, or even know it, > supporting an extended universal character set is not really optional. They submit it in utf-8 only if your html form allows them to do that or they don't follow html specification and try to exploit your form. Set form input charset to iso-8859-1 and your nbspace will take only one byte. -- Tomas