Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:56027 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 51605 invoked from network); 3 Nov 2011 11:07:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Nov 2011 11:07:14 -0000 Authentication-Results: pb1.pair.com smtp.mail=glopes@nebm.ist.utl.pt; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=glopes@nebm.ist.utl.pt; sender-id=unknown Received-SPF: error (pb1.pair.com: domain nebm.ist.utl.pt from 85.139.253.17 cause and error) X-PHP-List-Original-Sender: glopes@nebm.ist.utl.pt X-Host-Fingerprint: 85.139.253.17 unknown Linux 2.6 Received: from [85.139.253.17] ([85.139.253.17:52888] helo=alfresco) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E2/10-50864-0E572BE4 for ; Thu, 03 Nov 2011 06:07:13 -0500 Received: from localhost ([127.0.0.1] helo=clk-0081.mshome.net) by alfresco with esmtp (Exim 4.72) (envelope-from ) id 1RLv8X-0003L5-HT for internals@lists.php.net; Thu, 03 Nov 2011 11:07:09 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: "internals@lists.php.net" References: <4EB23E3D.3010908@zend.com> <4EB25BF3.7040703@zend.com> Date: Thu, 03 Nov 2011 11:07:09 -0000 MIME-Version: 1.0 Content-Transfer-Encoding: Quoted-Printable Organization: =?utf-8?Q?N=C3=BAcleo_de_Eng=2E_Biom=C3=A9di?= =?utf-8?Q?ca_do_I=2ES=2ET=2E?= Message-ID: In-Reply-To: User-Agent: Opera Mail/11.52 (Win32) Subject: Re: [PHP-DEV] Zend Multibyte support From: glopes@nebm.ist.utl.pt ("Gustavo Lopes") Em Thu, 03 Nov 2011 10:31:47 -0000, Yasuo Ohgaki = escreveu: > One last quick question. > Zend/tests/multibyte/multibyte_encoding_001.phpt sets > mbstring.internal_encoding=3DSJIS. > > Does PHP 5.4+ suppose to work with SJIS(or other similar encoding) > internal_encoding? > No. What matters is that the parser generated by bison is able to = recognize the tokens. In an ASCII (as opposed to EBCDIC) machine, this = means the encoding must be ASCII compatible. This is the table for SJIS: http://icu-project.org/icu-bin/convexp?conv=3Dibm-943_P15A-2003&s=3DAL= L It would appear that it was ASCII compatible =E2=80=93 \x20-\x7E represe= nt = U+0020-U+007E, but if you take a closer look you'll see that these bytes= = can also appear as part of larger sequences. For instance, in this script: