Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:53511 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 97201 invoked from network); 22 Jun 2011 13:49:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Jun 2011 13:49:20 -0000 Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.42 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.160.42 mail-pw0-f42.google.com Received: from [209.85.160.42] ([209.85.160.42:62190] helo=mail-pw0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 78/E2-12605-FD2F10E4 for ; Wed, 22 Jun 2011 09:49:19 -0400 Received: by pwi4 with SMTP id 4so640153pwi.29 for ; Wed, 22 Jun 2011 06:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=uS4dOV574IUbYV4SwBeAZPy1iRM7AtlJxrD0/Mj7+EA=; b=ChxwIT5J1yNedun88IRhzrMHM8ExmP1pei3KJ9+sYIS+hpXPsFKCh+MT22VwZs8P+W x5Xkg9crNIVEj1o+0HP9RvHLCFHFCnf1Ug7a/tM5IZP7ou9qxgdzIuXg85ZJkOxu1Svf 075JIfsycaOEAKnrUf/fxatPYem5YR6mo/wO0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=J/2TfxTwmZIfbuI5Aemmo2ZK0R+/TFrdke1y7fm4d8uNgNIqI2vnF6zXNhqkBTbLoU dum7x25jan3wGoN/E4tT/k7+puUMNs1zgxpClgFCgBr3V4Y2jmYJX7saHoeX3LGMLELp oXjaTPu65o43wRWrOexBW9X8g2960lyVYuNXg= MIME-Version: 1.0 Received: by 10.68.27.69 with SMTP id r5mr296610pbg.382.1308750556601; Wed, 22 Jun 2011 06:49:16 -0700 (PDT) Received: by 10.68.56.104 with HTTP; Wed, 22 Jun 2011 06:49:16 -0700 (PDT) In-Reply-To: <4E01F0C6.6040401@thelounge.net> References: <4DFF7A12.8060808@sugarcrm.com> <4E00818C.7040201@lsces.co.uk> <4E008EA3.4000403@lsces.co.uk> <41269.5975f3c3.1308671739.nsm@avilys.eik.lt> <4E00C370.9040803@thelounge.net> <4E00C5D0.9020302@thelounge.net> <57392.5975f3c3.1308676323.nsm@avilys.eik.lt> <4E00DA33.9040504@thelounge.net> <58008.5975f3c3.1308687593.nsm@avilys.eik.lt> <4E00FEAC.3040500@thelounge.net> <52139.5975f3c3.1308720282.nsm@avilys.eik.lt> <4E01CEA0.2090607@thelounge.net> <4E01DE36.40009@thelounge.net> <4E01F0C6.6040401@thelounge.net> Date: Wed, 22 Jun 2011 09:49:16 -0400 Message-ID: To: Reindl Harald Cc: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] foreach() for strings From: ircmaxell@gmail.com (Anthony Ferrara) > and why this will not return true if $str is ISO-8859-1? For lower 7 bit characters (code points <=3D 127) it would return true. But if there is a single higher character (outside of ascii), it would only return true if the byte sequences follow UTF-8 semantics. So it would return false if ISO-8859-1. For example, character =E9 is 0xe9 (code point 234) in ISO-8859, but character 0xc3a9 in UTF-8. So if it encountered a byte stream such as 0xe92041 ("=E9 A"), it knows it cannot be UTF-8 since 0xe920 is not a valid byte sequence. But if it saw 0xc3a92041, ("=E9 A"), it knows it is valid UTF-8 (it could be another character set, but it is valid in UTF-8)... Please note that it's not checking if the string **is** UTF-8, just if the byte sequences in the string are valid when interpreted as UTF-8. You could have the Latin-1 string 0xc3a92041: ("=C3=A9 A") which parses as valid UTF-8... On Wed, Jun 22, 2011 at 9:40 AM, Reindl Harald wro= te: > > > Am 22.06.2011 15:30, schrieb Gustavo Lopes: >> Em Wed, 22 Jun 2011 13:21:10 +0100, Reindl Harald escreveu: >> >>> Am 22.06.2011 14:14, schrieb Gustavo Lopes: >>>> It's actually 3 lines: >>>> >>>> function str_is_utf8($str) { >>>> =A0 =A0 return $str =3D=3D "" || htmlspecialchars($str, 0, "UTF-8"); >>>> } >>> >>> >>> WTF should this do? >>> this won't return boolean >>> >> >> The reason it works is that >> 1) || coerces the operands into booleans (if they get to be evaluated) >> 2) htmlspecialchars returns "" on bad input sequence >> 3) (bool) "" =3D=3D=3D false >> >> But even if you didn't know these things, you should have bothered to at= least test it >> before sending this response > > and why this will not return true if $str is ISO-8859-1? > >