Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53495 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 96302 invoked from network); 21 Jun 2011 20:27:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jun 2011 20:27:30 -0000 Authentication-Results: pb1.pair.com smtp.mail=h.reindl@thelounge.net; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=h.reindl@thelounge.net; sender-id=pass Received-SPF: pass (pb1.pair.com: domain thelounge.net designates 91.118.73.15 as permitted sender) X-PHP-List-Original-Sender: h.reindl@thelounge.net X-Host-Fingerprint: 91.118.73.15 mail.thelounge.net Windows 98 (1) Received: from [91.118.73.15] ([91.118.73.15:47609] helo=mail.thelounge.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C0/70-25945-FAEF00E4 for ; Tue, 21 Jun 2011 16:27:28 -0400 Received: from srv-rhsoft.rhsoft.net (openvpn-241.thelounge.net [10.0.0.241]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.thelounge.net (Postfix) with ESMTPSA id 8F59C99 for ; Tue, 21 Jun 2011 22:27:24 +0200 (CEST) Message-ID: <4E00FEAC.3040500@thelounge.net> Date: Tue, 21 Jun 2011 22:27:24 +0200 Organization: the lounge interactive design User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc15 Lightning/1.0b3pre Thunderbird/3.1.11 MIME-Version: 1.0 To: internals@lists.php.net References: <4DFF7A12.8060808@sugarcrm.com> <4E00818C.7040201@lsces.co.uk> <4E008EA3.4000403@lsces.co.uk> <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> <4E00C370.9040803@thelounge.net> <4E00C5D0.9020302@thelounge.net> <57392.5975f3c3.1308676323.nsm@avilys.eik.lt> <4E00DA33.9040504@thelounge.net> <58008.5975f3c3.1308687593.nsm@avilys.eik.lt> In-Reply-To: <58008.5975f3c3.1308687593.nsm@avilys.eik.lt> X-Enigmail-Version: 1.1.2 OpenPGP: id=7F780279; url=http://arrakis.thelounge.net/gpg/h.reindl_thelounge.net.pub.txt Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig22FFCDC0B5B7293FBF26AB85" Subject: Re: [PHP-DEV] foreach() for strings From: h.reindl@thelounge.net (Reindl Harald) --------------enig22FFCDC0B5B7293FBF26AB85 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am 21.06.2011 22:19, schrieb Tomas Kuliavas: > 2011.06.21 20:51 Reindl Harald ra=C5=A1=C4=97: >>> utf-8 is strict format. If you expect utf-8 and someone submits >>> something >>> else, you can tell that without any string function. You can verify >>> utf-8 >>> strings in pcre. You can convert nbspace to regular space, if you wan= t. >>> utf-8 does not have any byte sequence that can collide with nbspace b= yte >>> sequence in utf-8 >> >> show me a practicable way to detect if some input data contains UTF8 >> mb_string-functions are out of the game because there are many servers= >> even of real big companies where they are not available >=20 > :) I've said pcre and not mbstring. If you read fine utf-8 manual like = I > did about 8 years ago, you would know how to detect 8bit inputs that ar= e > not in utf-8. utf-8 is variable byte length character set which has ver= y > specific rules about the way bytes are arranged. You can tell length of= > symbol in bytes based on first byte. You can tell what kind of byte val= ues > should be used for second, third, fourth, fifth or sixth byte. If you > eliminate five valid utf-8 8bit byte sequences and still have 8bit data= , > it is not utf-8 i do not understand any word and miss a simple str_is_utf8() or call it as you like which can do this native and performant on a given variable and would offer the possibility to stop a script with not expected input without degrade performance --------------enig22FFCDC0B5B7293FBF26AB85 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk4A/qwACgkQhmBjz394Annw3gCeNsez/ksf2ARPjAERLo8SP/Yd dhcAnRYt2mR0JZpa0/46NC21/jIloEHk =fwxk -----END PGP SIGNATURE----- --------------enig22FFCDC0B5B7293FBF26AB85--