Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53450 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4702 invoked from network); 20 Jun 2011 21:18:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jun 2011 21:18:56 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 77.240.252.9 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 77.240.252.9 avilys.eik.lt Linux 2.6 Received: from [77.240.252.9] ([77.240.252.9:44156] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 60/E2-22948-E39BFFD4 for ; Mon, 20 Jun 2011 17:18:55 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 1C6071F519E for ; Tue, 21 Jun 2011 00:18:51 +0300 (EEST) Received: from 89.117.243.195 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Tue, 21 Jun 2011 00:18:51 +0300 (EEST) Message-ID: <39875.5975f3c3.1308604731.nsm@avilys.eik.lt> In-Reply-To: References: <1308584208.6296.9.camel@guybrush> <1308586150.6296.13.camel@guybrush> <1308589044.8394.27.camel@inspiron> <4DFF7E2A.50506@sugarcrm.com> <1308591260.8394.47.camel@inspiron> Date: Tue, 21 Jun 2011 00:18:51 +0300 (EEST) To: internals@lists.php.net User-Agent: NaSMail/1.7.1 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: =?utf-8?Q?Re:_[PHP-DEV]_Re:_foreach=28=29_for_strings?= From: tokul@users.sourceforge.net ("Tomas Kuliavas") 2011.06.20 21:38 Robert Eisele rašė: > I really like the ideas shared here. It's a thing of consideration that > array-functions should also work with strings. Maybe this would be the way > to go, but I'm more excited about the OOP implementation of TextIterator > and > ByteIterator, which solves the whole problem at once (and is easier to > implement, as mentioned by Stas). As Jonathan said, Database results with > a > certain encoding could get iterated, too. The only way to workaround the > Text/Byte problem would be, offsetting >EVERY< string with 1-2 byte > "string-type" information or an additional type flag in the > zval-strcuture. > Handling everything with zval's instead of objects would have the > advantage, > that database-layers like mysqlnd could write the database-encoding > directly > into the zval and the user had no need to decide what encoding is used. > > A new casting operator (binary) could then cast the string to a 1-byte > array. But this is syntactical sugar over OOP-implementations - I don't > know > which one is the better choice. > > For example: > > $utf8_string = "Jägermeister"; // information of utf8 ist stored in the > zval > > foreach ($utf8_string as $k => $v) // would iterate in byte mode > > foreach ((binary)$utf8_string as $k => $v) // would iterate in text mode > > over this: > $utf8_obj = new ByteIterator("Jägermeister"); > > foreach ($utf8_obj as $k => $v) > > foreach ($utf8_obj->toText() as $k => $v) > > > I think the first one is easier and would be nicer to average developers > (and lazy programmers like me ;o) ) You assume that string is in utf-8. It can be some iso-8859-x, iso-2022-xx, utf-7, utf-16 or any other multibyte character set. If you want to parse string in symbols, use mb_substr and mb_strlen, set charset correctly and make sure that your string is in correct character set, if PHP bug about inconsistent symbol position calculation is still unfixed. -- Tomas