Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53485 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61608 invoked from network); 21 Jun 2011 16:06:03 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jun 2011 16:06:03 -0000 Authentication-Results: pb1.pair.com smtp.mail=arvids.godjuks@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=arvids.godjuks@gmail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.220.170 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: arvids.godjuks@gmail.com X-Host-Fingerprint: 209.85.220.170 mail-vx0-f170.google.com Received: from [209.85.220.170] ([209.85.220.170:55720] helo=mail-vx0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 51/31-54221-B61C00E4 for ; Tue, 21 Jun 2011 12:06:03 -0400 Received: by vxi39 with SMTP id 39so1972044vxi.29 for ; Tue, 21 Jun 2011 09:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6fHSBZt7c3LFMLWp4hGJ7ZCYB2onk848f9ERs7P8k+s=; b=ePcl3r7ZxRLiM6AnUN5/IqyC7NnB4i3w88iHmpge3otfp8ubHwn9zo3CDqifPH2NTN l41qegPzO3xdijCG8Wz3Imil4osx5M7YcJ5JSAijhXR5IXS4QdmDaBS9OIn8uaztU5bM fwQP5C31b3pqZq5LjuRSkdPAdecSYJ+WdJFAg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=MR90zbPbY4/iH+Ibcoqv20tAui3HWC8z/cVzE91vhHjErDTZlc0K6n5PwXJCkuk0NT fuhGzuxwjRvld7OOR1+hJeMxpWiA+l5HdGTrFEoUaH3a6sRlLJxNUWhAKIAVaCRsmgHV k1Q4KqSJICiNfxVfHJwdcbD0++ghP21eYS3Zs= MIME-Version: 1.0 Received: by 10.52.175.7 with SMTP id bw7mr5598924vdc.32.1308672359836; Tue, 21 Jun 2011 09:05:59 -0700 (PDT) Received: by 10.52.116.66 with HTTP; Tue, 21 Jun 2011 09:05:59 -0700 (PDT) Received: by 10.52.116.66 with HTTP; Tue, 21 Jun 2011 09:05:59 -0700 (PDT) In-Reply-To: <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> References: <4DFF7A12.8060808@sugarcrm.com> <4E00818C.7040201@lsces.co.uk> <4E008EA3.4000403@lsces.co.uk> <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> Date: Tue, 21 Jun 2011 19:05:59 +0300 Message-ID: To: internals@lists.php.net Content-Type: multipart/alternative; boundary=20cf3071cdcc22ac1c04a63b07a8 Subject: Re: RE: [PHP-DEV] foreach() for strings From: arvids.godjuks@gmail.com (Arvids Godjuks) --20cf3071cdcc22ac1c04a63b07a8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As a userland developer due to my geographical nature i have to work with 3 languages constantly - english, russian (cyryllic) and latvian (witch has it's own share of non latin characters). I end up using utf-8 in every project. And some give me a headake of dealing with text parsing. mb_string covers just part of the functionality and can be turned off. I personally think something has to be done about unicode handling in php after 5.4 so that we have an official method of dealing with it in the core= . Probably it can be done in a namespace of its own and be new functionality to witch people should migrate. my 2 cents. 21.06.2011 17:56 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0= =B5=D0=BB=D1=8C "Tomas Kuliavas" =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: > 2011.06.21 17:38 John Crenshaw ra=C5=A1=C4=97: >> Pierre Joye wrote: >>> On Tue, Jun 21, 2011 at 1:33 PM, Lester Caine >>> wrote: >>>> Pierre Joye wrote: >>>>>> >>>>>> It depended on ICU there, and I would be against making a core thing >>>>>> in >>>>>>> PHP 5.x depend on ICU. >>>>> >>>>> It can and should be done as part of intl, actually. >>>>> >>>>> But that's somehow unrelated to the proposal here, as it is about >>>>> byte, not characters :) >>>> >>>> I believe this may be where some of the new niggles may be coming from= ? >>>> With >>>> browsers returning unicode, it may be that some of the 'extra' >>>> characters >>>> are being returned as multibyte rather than as single bytes? Such as >>>> the >>>> problem reported on the general list currently. How do we ensure that >>>> we are >>>> dealing with single byte character strings nowadays? >>> >>> As it has been stated numerous times in this thread and other, we do >>> not do anything with multi bytes systems, unicode, etc. mbstring and >>> intl do, but php's string as of now is all about bytes, array of bytes >>> if I may describe them this way. >>> >>> And we can't change this behavior. >> >> This mindset is fundamentally broken. You can call it a byte array all you >> want, but the truth is that 99.999% of the time, when a developer is using >> a string they need it for characters, not for bytes, and characters are >> not single byte. Even English users tend to submit Unicode range >> characters at an alarming rate. If you're using a WYSIWYG editor, Chrome >> will submit non-breaking-spaces as the actual UTF8 encoded character, no= t >> as an HTML encoded entity. Whether developers like it, or even know it, >> supporting an extended universal character set is not really optional. > > They submit it in utf-8 only if your html form allows them to do that or > they don't follow html specification and try to exploit your form. Set > form input charset to iso-8859-1 and your nbspace will take only one byte= . > > -- > Tomas > > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > --20cf3071cdcc22ac1c04a63b07a8--