Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53494 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94705 invoked from network); 21 Jun 2011 20:19:58 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jun 2011 20:19:58 -0000 Authentication-Results: pb1.pair.com header.from=tokul@users.sourceforge.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=tokul@users.sourceforge.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain users.sourceforge.net from 77.240.252.9 cause and error) X-PHP-List-Original-Sender: tokul@users.sourceforge.net X-Host-Fingerprint: 77.240.252.9 avilys.eik.lt Linux 2.6 Received: from [77.240.252.9] ([77.240.252.9:35938] helo=avilys.eik.lt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 94/20-25945-CECF00E4 for ; Tue, 21 Jun 2011 16:19:57 -0400 Received: from avilys.eik.lt (avilys.local [127.0.0.1]) by avilys.eik.lt (Postfix) with ESMTP id 28D251F5292 for ; Tue, 21 Jun 2011 23:19:53 +0300 (EEST) Received: from 89.117.243.195 (NaSMail authenticated user tomas@topolis.lt) by avilys.eik.lt with HTTP; Tue, 21 Jun 2011 23:19:53 +0300 (EEST) Message-ID: <58008.5975f3c3.1308687593.nsm@avilys.eik.lt> In-Reply-To: <4E00DA33.9040504@thelounge.net> References: <4DFF7A12.8060808@sugarcrm.com> <4E00818C.7040201@lsces.co.uk> <4E008EA3.4000403@lsces.co.uk> <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> <4E00C370.9040803@thelounge.net> <4E00C5D0.9020302@thelounge.net> <57392.5975f3c3.1308676323.nsm@avilys.eik.lt> <4E00DA33.9040504@thelounge.net> Date: Tue, 21 Jun 2011 23:19:53 +0300 (EEST) To: internals@lists.php.net User-Agent: NaSMail/1.7.1 MIME-Version: 1.0 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Subject: =?utf-8?Q?Re:_[PHP-DEV]_foreach=28=29_for_strings?= From: tokul@users.sourceforge.net ("Tomas Kuliavas") 2011.06.21 20:51 Reindl Harald rašė: >> utf-8 is strict format. If you expect utf-8 and someone submits >> something >> else, you can tell that without any string function. You can verify >> utf-8 >> strings in pcre. You can convert nbspace to regular space, if you want. >> utf-8 does not have any byte sequence that can collide with nbspace byte >> sequence in utf-8 > > show me a practicable way to detect if some input data contains UTF8 > mb_string-functions are out of the game because there are many servers > even of real big companies where they are not available :) I've said pcre and not mbstring. If you read fine utf-8 manual like I did about 8 years ago, you would know how to detect 8bit inputs that are not in utf-8. utf-8 is variable byte length character set which has very specific rules about the way bytes are arranged. You can tell length of symbol in bytes based on first byte. You can tell what kind of byte values should be used for second, third, fourth, fifth or sixth byte. If you eliminate five valid utf-8 8bit byte sequences and still have 8bit data, it is not utf-8. -- Tomas