Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:14003 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 9199 invoked by uid 1010); 4 Dec 2004 19:01:40 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 8968 invoked from network); 4 Dec 2004 19:01:38 -0000 Received: from unknown (HELO rwcrmhc13.comcast.net) (204.127.198.39) by pb1.pair.com with SMTP; 4 Dec 2004 19:01:38 -0000 Received: from 192.168.1.101 (pcp09278536pcs.eatntn01.nj.comcast.net[69.141.229.108]) by comcast.net (rwcrmhc13) with SMTP id <2004120419013601500n6cpke>; Sat, 4 Dec 2004 19:01:37 +0000 To: internals@lists.php.net Cc: moriyoshi@wakwak.com Content-Type: multipart/alternative; boundary="=-b5cpRxFRarBczLFyzLVV" Date: Sat, 04 Dec 2004 15:01:59 -0500 Message-ID: <1102190520.12784.30.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 Subject: mbstring internal encoding behavior From: ajb732@comcast.net (Al Baker) --=-b5cpRxFRarBczLFyzLVV Content-Type: text/plain Content-Transfer-Encoding: 7bit I've noticed some different behavior between mbstring versions 4.2.2 and 4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects the script. In 4.2.2, the encoding translation appeared to work okay and would convert Shift_JIS into UTF-8 on incoming requests. We didn't try any other encodings since this was our primary concern and worked well. The internal_encoding setting in the php.ini file was set to UTF-8. Our language file (very simple PHP array with values being the translated text) was in Shift_JIS, and this was no problem to just send this to the browser. We display send the Shift_JIS language file entries to the browser [via Smarty] as well as some other text that is stored in UTF-8 and run through mb_convert_encoding to convert it to Shift_JIS as well. All in all, this works as expected. Now, we're trying to upgrade to php4.3.9 and I can find no easy way to get the Shift_JIS to work.... in the existing setup, it would just return UTF-8 or garbled characters. In other words, mb_convert_encoding was not doing it's job, and it wouldn't even display the Shift_JIS language file entries. Manually converting the language file from Shift_JIS characters to UTF-8 and then running all the elements through mb_convert_encoding apparently did nothing as well -- unless I first called mb_internal_encoding() and set that to Shift_JIS (likewise, setting this in the php.ini file worked as well). Then, the characters would be displayed correctly in Shift_JIS. I'm not sure if this is the correct behavior though... it seems to me that the internal encoding should almost always be UTF-8 and mb_convert_encoding should work regardless of the internal encoding. I don't know the consequences of calling mb_internal_encoding at run time and what that means to database interactions, curl interactions [PEAR SOAP], etc. A few other observations: - compiling with --enable-zend-multibyte made no difference - we compiled with --enable-mbstring=all and just --enable-mbstring, which did not make a difference - the mbstring.language php.ini setting didn't appear to make a difference - calling mb_internal_encoding('SJIS') was the only way to make mb_convert_encoding($var, 'SJIS', 'UTF-8') work properly, otherwise mb_convert_encoding just spit out garbage. - http_input was set to UTF-8, SJIS, as was the detect_order. - the http_output ini setting was set to pass in the ini file, and mb_output_handler is used at run-time with the preferred encoding (either UTF-8 or SJIS) - substitute character and function overloading are both off/disabled Any suggestions? Thanks, Al Baker --=-b5cpRxFRarBczLFyzLVV--