I've noticed some different behavior between mbstring versions 4.2.2 and
4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects
the script.
In 4.2.2, the encoding translation appeared to work okay and would
convert Shift_JIS into UTF-8 on incoming requests. We didn't try any
other encodings since this was our primary concern and worked well. The
internal_encoding setting in the php.ini file was set to UTF-8. Our
language file (very simple PHP array with values being the translated
text) was in Shift_JIS, and this was no problem to just send this to the
browser. We display send the Shift_JIS language file entries to the
browser [via Smarty] as well as some other text that is stored in UTF-8
and run through mb_convert_encoding to convert it to Shift_JIS as well.
All in all, this works as expected.
Now, we're trying to upgrade to php4.3.9 and I can find no easy way to
get the Shift_JIS to work.... in the existing setup, it would just
return UTF-8 or garbled characters. In other words, mb_convert_encoding
was not doing it's job, and it wouldn't even display the Shift_JIS
language file entries. Manually converting the language file from
Shift_JIS characters to UTF-8 and then running all the elements through
mb_convert_encoding apparently did nothing as well -- unless I first
called mb_internal_encoding()
and set that to Shift_JIS (likewise,
setting this in the php.ini file worked as well). Then, the characters
would be displayed correctly in Shift_JIS. I'm not sure if this is the
correct behavior though... it seems to me that the internal encoding
should almost always be UTF-8 and mb_convert_encoding should work
regardless of the internal encoding.
I don't know the consequences of calling mb_internal_encoding at run
time and what that means to database interactions, curl interactions
[PEAR SOAP], etc.
A few other observations:
- compiling with --enable-zend-multibyte made no difference
- we compiled with --enable-mbstring=all and just --enable-mbstring,
which did not make a difference - the mbstring.language php.ini setting didn't appear to make a
difference - calling mb_internal_encoding('SJIS') was the only way to make
mb_convert_encoding($var, 'SJIS', 'UTF-8') work properly, otherwise
mb_convert_encoding just spit out garbage. - http_input was set to UTF-8, SJIS, as was the detect_order.
- the http_output ini setting was set to pass in the ini file, and
mb_output_handler is used at run-time with the preferred encoding
(either UTF-8 or SJIS) - substitute character and function overloading are both off/disabled
Any suggestions?
Thanks,
Al Baker
Japanese multibyte encoding and Unicode, and some singlebyte encodings
are supported by mbstring up to PHP 4.2.x.
From 4.3.x, the Korean and Chinese multibyte encoding are also supported,
and the language setting is introduced.
You should define mbstring.language in php.ini,
mbstring.language = Japanese; for Japanese encoding like Shift_JIS
mbstring.language = Korean; for Korean encoding
It affects the automatic encoding detection of the user input
(POST/GET/Cookie) and the default encoding.
Rui
On Sat, 04 Dec 2004 15:01:59 -0500
Al Baker ajb732@comcast.net wrote:
I've noticed some different behavior between mbstring versions 4.2.2 and
4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects
the script.In 4.2.2, the encoding translation appeared to work okay and would
convert Shift_JIS into UTF-8 on incoming requests. We didn't try any
other encodings since this was our primary concern and worked well. The
internal_encoding setting in the php.ini file was set to UTF-8. Our
language file (very simple PHP array with values being the translated
text) was in Shift_JIS, and this was no problem to just send this to the
browser. We display send the Shift_JIS language file entries to the
browser [via Smarty] as well as some other text that is stored in UTF-8
and run through mb_convert_encoding to convert it to Shift_JIS as well.
All in all, this works as expected.Now, we're trying to upgrade to php4.3.9 and I can find no easy way to
get the Shift_JIS to work.... in the existing setup, it would just
return UTF-8 or garbled characters. In other words, mb_convert_encoding
was not doing it's job, and it wouldn't even display the Shift_JIS
language file entries. Manually converting the language file from
Shift_JIS characters to UTF-8 and then running all the elements through
mb_convert_encoding apparently did nothing as well -- unless I first
calledmb_internal_encoding()
and set that to Shift_JIS (likewise,
setting this in the php.ini file worked as well). Then, the characters
would be displayed correctly in Shift_JIS. I'm not sure if this is the
correct behavior though... it seems to me that the internal encoding
should almost always be UTF-8 and mb_convert_encoding should work
regardless of the internal encoding.
--
Rui Hirokawa rui_hirokawa@ybb.ne.jp
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.289 / Virus Database: 265.4.7 - Release Date: 2004/12/07