Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37869 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 32871 invoked from network); 25 May 2008 19:11:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 May 2008 19:11:46 -0000 Authentication-Results: pb1.pair.com smtp.mail=steph@zend.com; spf=softfail; sender-id=softfail Authentication-Results: pb1.pair.com header.from=steph@zend.com; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain zend.com does not designate 64.97.136.137 as permitted sender) X-PHP-List-Original-Sender: steph@zend.com X-Host-Fingerprint: 64.97.136.137 smtpout0137.sc1.he.tucows.com Solaris 8 (1) Received: from [64.97.136.137] ([64.97.136.137:3458] helo=n068.sc1.he.tucows.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C5/6F-21001-1F9B9384 for ; Sun, 25 May 2008 15:11:45 -0400 Received: from sc1-out02.emaildefenseservice.com (64.97.139.2) by n068.sc1.he.tucows.com (7.2.069.1) id 4769316E015C68F0; Sun, 25 May 2008 19:11:30 +0000 X-SpamScore: 20 X-Spamcatcher-Summary: 20,1.5,0,31fcb350287917f1,0332e30bd222c097,steph@zend.com,-,RULES_HIT:152:355:379:539:540:541:542:543:567:599:601:945:966:982:988:989:1155:1156:1260:1277:1311:1313:1314:1345:1437:1515:1516:1518:1534:1541:1587:1593:1594:1676:1711:1730:1747:1766:1792:2073:2075:2078:2196:2198:2199:2200:2328:2379:2393:2553:2559:2562:3027:3353:3865:3867:3869:3870:3871:3872:3873:3874:4250:4385:4605:5007:6119:6261:7875:8660,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck: none,DomainCache:0,MSF:not bulk,SPF:,MSBL:none,DNSBL:none,TSO:0 X-Spamcatcher-Explanation: (100%) OBFUSCATED_WORD2_ONLINE; Received: from foxbox (host81-155-113-213.range81-155.btcentralplus.com [81.155.113.213]) (Authenticated sender: steph.fox) by sc1-out02.emaildefenseservice.com (Postfix) with ESMTP; Sun, 25 May 2008 19:11:28 +0000 (UTC) Message-ID: <00da01c8be9b$4f4ddb90$4401a8c0@foxbox> Reply-To: "Steph Fox" To: =?utf-8?Q?Johannes_Schl=C3=BCter?= Cc: "Andrei Zmievski" , "Antony Dovgal" , Date: Sun, 25 May 2008 20:12:41 +0100 Organization: Zend Technologies MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Subject: Re: [PHP-DEV] Unicode progress [Was: unicode.semantics adinfinitum] From: steph@zend.com ("Steph Fox") Hi again Johannes, Last one on this subject I promise... > If the script encoding doesn't match up with whatever's set in INI, I > don't get as far as that stuff anyway: > > Warning: Illegal or truncated character in input: offset 0, state=0 in > C:\sandbox\php-src\Debug_TS\help.php on line 5 You were right to point out that runtime encoding isn't reliable. (Now I finally got the thing working in scripts here - there's something invisible going on with UTF-8 somewhere down the line that had me confused for a while.) A call to zend_unicode_to_string_ex() that explicitly passes the fallback converter should be good enough: } else if (c == 'S' && Z_TYPE_PP(arg) != IS_NULL /* NULL is ok */) { UErrorCode status = U_ZERO_ERROR; zend_unicode_to_string_ex(UG(fallback_encoding_conv), p, pl, Z_USTRVAL_PP(arg), Z_USTRLEN_PP(arg), &status); if (U_FAILURE(status)) { efree(p); return "strictly a binary string"; } break; } It doesn't make sense to set unicode.fallback_encoding at script level because - as I wrote earlier - the parser may respond differently depending on its value. If your INI fallback setting clashes with your script encoding you *can't* override it locally. That being the case, I don't see how a conversion that uses the fallback encoding can be wrong here. - Steph