Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37859 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 94045 invoked from network); 25 May 2008 14:19:12 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 May 2008 14:19:12 -0000 Authentication-Results: pb1.pair.com header.from=steph@zend.com; sender-id=softfail Authentication-Results: pb1.pair.com smtp.mail=steph@zend.com; spf=softfail; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain zend.com does not designate 64.97.136.181 as permitted sender) X-PHP-List-Original-Sender: steph@zend.com X-Host-Fingerprint: 64.97.136.181 smtpout0181.sc1.he.tucows.com Solaris 8 (1) Received: from [64.97.136.181] ([64.97.136.181:39042] helo=n064.sc1.he.tucows.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0D/D8-21001-F5579384 for ; Sun, 25 May 2008 10:19:11 -0400 Received: from sc1-out03.emaildefenseservice.com (64.97.139.2) by n064.sc1.he.tucows.com (7.2.069.1) id 476977050158CBFC; Sun, 25 May 2008 14:18:56 +0000 X-SpamScore: 2 X-Spamcatcher-Summary: 2,0,0,0b9c782ca8c20a58,0332e30bd222c097,steph@zend.com,-,RULES_HIT:152:355:379:539:540:541:542:543:567:599:601:945:960:967:968:973:980:982:988:989:1155:1156:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1535:1543:1587:1593:1594:1676:1711:1730:1747:1766:1792:2073:2075:2078:2393:2525:2559:2563:2682:2685:2689:2827:2857:2859:2899:2933:2937:2939:2942:2945:2947:2951:2954:3022:3027:3280:3355:3622:3636:3865:3866:3867:3868:3869:3870:3871:3873:3874:3934:3936:3938: 3941:3944:3947:3950:4117:4250:4321:4605:4641:4886:5007:6119:6261:7679:7875,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:,MSBL:none,DNSBL:none,TSO:0 X-Spamcatcher-Explanation: Received: from foxbox (host81-155-113-213.range81-155.btcentralplus.com [81.155.113.213]) (Authenticated sender: steph.fox) by sc1-out03.emaildefenseservice.com (Postfix) with ESMTP; Sun, 25 May 2008 14:18:55 +0000 (UTC) Message-ID: <005201c8be72$6fea62c0$4401a8c0@foxbox> Reply-To: "Steph Fox" To: =?utf-8?Q?Johannes_Schl=C3=BCter?= Cc: "Andrei Zmievski" , "Antony Dovgal" , References: <7d6e34d80805191240k64cb1ba6k3e8f7a50ddf068c@mail.gmail.com> <4831F27B.7030001@suse.de> <296949B4-D328-49FE-968B-4942B28FE869@pooteeweet.org> <7d6e34d80805191454m69614624v7a05037fa947328e@mail.gmail.com> <698DE66518E7CA45812BD18E807866CE019F60DE@us-ex1.zend.net> <34.64.28995.1BE23384@pb1.pair.com> <02e701c8bab7$19a3dd10$4401a8c0@foxbox> <4833FD5B.2010308@daylessday.org> <003f01c8bb33$81ae5030$4401a8c0@foxbox> <48346ED3.9040505@gravitonic.com> <001501c8bbf6$5702cf50$4401a8c0@foxbox> <48371131.50003@gravitonic.com> <008f01c8bd9d$63f0cde0$4401a8c0@foxbox> <1211656471.11520.36.camel@goldfinger.johannes.nop> <014201c8bdd5$8aa482a0$4401a8c0@foxbox> <1211659860.11520.39.camel@goldfinger.johannes.nop> Date: Sun, 25 May 2008 15:20:06 +0100 Organization: Zend Technologies MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Subject: Re: [PHP-DEV] Unicode progress [Was: unicode.semantics adinfinitum] From: steph@zend.com ("Steph Fox") Johannes, >> You're telling me an explicit cast to binary could fail internally but >> not >> externally? That doesn't make a lot of sense somehow. > > Externally the user is responsible to select the proper encoding > internally PHP has to guess. case 's': case 'S': { char **p = va_arg(*va, char **); int *pl = va_arg(*va, int *); UConverter *conv = NULL; switch (Z_TYPE_PP(arg)) { case IS_UNICODE: /* handle conversion of Unicode to binary with a specific converter */ if (conv != NULL) { /* this is an 's' specifier */ SEPARATE_ZVAL_IF_NOT_REF(arg); if (convert_to_string_with_converter(*arg, conv) == FAILURE) { return ""; } *p = Z_STRVAL_PP(arg); *pl = Z_STRLEN_PP(arg); break; } else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) { return "strictly a binary string"; } /* fall through */ I'll try to explain why this isn't useful. First off, you get anomalies like this: C:\sandbox\php6\Debug_TS>php -r "echo crc32('');" Warning: crc32() expects parameter 1 to be strictly a binary string, Unicode string given in Command line code on line 1 C:\sandbox\php6\Debug_TS>php -r "echo crc32(null);" 0 Second, you don't always get the same value anyway if the encoding changes. Test script: echo crc32((binary)'שלום')."\n"; echo crc32((binary)'AKUO'); with the script saved in UTF-8 and unicode.fallback_encoding=UTF-8 unicode.runtime_encoding=UTF-8 unicode.stream_encoding=UTF-8 output is: -1600612531 1603041141 with the same script saved in ISO-8859-8 and unicode.fallback_encoding=ISO-8859-8 unicode.runtime_encoding=ISO-8859-8 unicode.stream_encoding=ISO-8859-8 output is: -2023737703 1603041141 These are exactly the same results I see under PHP 5, depending whether the script is saved in ISO-8859-8 or UTF-8. Now if I remove the (binary) cast and alter the relevant section of zend_parse_arg_impl(): } else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) { if (zval_unicode_to_string(*(arg) TSRMLS_CC) == FAILURE) { return "strictly a binary string"; } } /* fall through */ I get exactly the same results again, with or without the binary cast. All that changes is I don't get an error when I skip the casting. If the script encoding doesn't match up with whatever's set in INI, I don't get as far as that stuff anyway: Warning: Illegal or truncated character in input: offset 0, state=0 in C:\sandbox\php-src\Debug_TS\help.php on line 5 Parse error: parse error, expecting `')'' in C:\sandbox\php-src\Debug_TS\help.php on line 5 - regardless of whether I cast to binary or not, and regardless of whether I've messed with the src. - Steph > > johannes > > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php >