Hi all,
I noticed that "Zend Multibyte Support" won't be on with
./sapi/cli/php -d zend.multibyte=1
nor
zend.multibyte=on (in php.ini)
This happens both php-src and php-src-5.4.
According to php.ini-production from php-src:
; If enabled, scripts may be written in encodings that are incompatible with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off
I thought it became runtime option.
Is this a bug or am I missing something?
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
It is almost same for me.
What kind of configure option you are using ?
For me (Ubuntu 11.10),
if I made PHP-5.4 with 'configure --enable-mbstring', it works fine.
But, if I made it with 'configure --enable-mbstring
--with-apxs2=/usr/bin/apxs2', a Shift_JIS encoded PHP script causes the
parser error.
For both cases, zend.multibyte and zend.script_encoding can be specified.
I believe that it was OK in Ubuntu 11.04 with the same option
(--with-axps2).
Rui
Yasuo Ohgaki wrote:
Hi all,
I noticed that "Zend Multibyte Support" won't be on with
./sapi/cli/php -d zend.multibyte=1
nor
zend.multibyte=on (in php.ini)This happens both php-src and php-src-5.4.
According to php.ini-production from php-src:
; If enabled, scripts may be written in encodings that are incompatible with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = OffI thought it became runtime option.
Is this a bug or am I missing something?--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
how did you see that "Zend Multibyte Support" weren't enabled?
$ sapi/cli/php -d zend.multibyte=1 -i | grep -i multibyte
Zend Multibyte Support => provided by mbstring
zend.multibyte => On => On ******
Multibyte Support => enabled
Multibyte string engine => libmbfl
Multibyte (japanese) regex support => enabled
Multibyte regex (oniguruma) backtrack check => On
Multibyte regex (oniguruma) version => 4.7.1
Thanks. Dmitry.
Hi all,
I noticed that "Zend Multibyte Support" won't be on with
./sapi/cli/php -d zend.multibyte=1
nor
zend.multibyte=on (in php.ini)This happens both php-src and php-src-5.4.
According to php.ini-production from php-src:
; If enabled, scripts may be written in encodings that are incompatible with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = OffI thought it became runtime option.
Is this a bug or am I missing something?--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Dimity & Rui,
I was trying to see if PHP 5.4 and trunk was also affected by this bug report.
--with-zend-multibyte and --enable-debug reports LEAK with run-test.php
https://bugs.php.net/bug.php?id=60194
So I configured as "./configure --enable-debug"
Sorry for being a lazy reader. I could turn on zend.multibyte with
"--enable-mbstring".
Now I see PHP 5.3 doesn't have dependency to mbstring, but PHP 5.4+ has.
I haven't read the last sentence.Thanks.
It looks like php-src-5.4 doesn't has leaks as PHP 5.3 does.
(I haven't completed the test, since it seems zend.multibyte is NOT working)
$ TEST_PHP_EXECUTABLE=./sapi/cli/php ./run-tests.php -m -c ./php.ini
=====================================================================
PHP : ./sapi/cli/php
PHP_SAPI
: cli
PHP_VERSION
: 5.4.0RC1-dev
ZEND_VERSION: 2.4.0
PHP_OS
: Linux - Linux dev.inter.es-i.jp 2.6.35.14-2m.mo7.x86_64
#1 SMP Mon Sep 12 11:09:50 JST 2011 x86_64
INI actual : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4/php.ini
More .INIs :
CWD : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4
Extra dirs :
VALGRIND : valgrind-3.6.1
TIME START 2011-11-03 16:36:57
PASS EXPECT [tests/run-test/test001.phpt]
PASS EXPECTF [tests/run-test/test002.phpt]
PASS EXPECTREGEX [tests/run-test/test003.phpt]
.....
However, I got the same parse error, as Rui mentioned, with SJIS source script.
sjis.php
<?php
echo '表';
(You cannot copy&paste, since this would be UTF-8. If you need
file, let me know. I'll directly mail you.)
表 is valid SJIS char code and has \ as second byte.
Since PHP complains with parse error, it seems zend.multibyte is not working on
PHP 5.4.
$ ./sapi/cli/php -i | grep multibyte
zend.multibyte => On => On
$ ./sapi/cli/php /usr/local/apache2.0/htdocs/sjis.php
PHP Parse error: syntax error, unexpected ''�';'
(T_ENCAPSED_AND_WHITESPACE) in /usr/local/apache2.0/htdocs/sjis.php on
line 2
It seems like working with zend.multibyte=Off.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Yasuo,
how did you see that "Zend Multibyte Support" weren't enabled?
$ sapi/cli/php -d zend.multibyte=1 -i | grep -i multibyte
Zend Multibyte Support => provided by mbstring
zend.multibyte => On => On ******
Multibyte Support => enabled
Multibyte string engine => libmbfl
Multibyte (japanese) regex support => enabled
Multibyte regex (oniguruma) backtrack check => On
Multibyte regex (oniguruma) version => 4.7.1Thanks. Dmitry.
Hi all,
I noticed that "Zend Multibyte Support" won't be on with
./sapi/cli/php -d zend.multibyte=1
nor
zend.multibyte=on (in php.ini)This happens both php-src and php-src-5.4.
According to php.ini-production from php-src:
; If enabled, scripts may be written in encodings that are incompatible
with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = OffI thought it became runtime option.
Is this a bug or am I missing something?--
Yasuo Ohgaki
yohgaki@ohgaki.net
I suppose PHP can't autodetect SJIS encoding and needs a hint
for 5.4
$ php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php
5.3 must be compiled with --enable-zend-multibyte, then
$ php -d mbstring.script_encoding=SJIS for 5.3
(I've just tested 5.4 but not 5.3. Just don't have 5.3 compiled with
--enable-zend-multibyte).
Thanks. Dmitry.
Hi Dimity& Rui,
I was trying to see if PHP 5.4 and trunk was also affected by this bug report.
--with-zend-multibyte and --enable-debug reports LEAK with run-test.php
https://bugs.php.net/bug.php?id=60194So I configured as "./configure --enable-debug"
Sorry for being a lazy reader. I could turn on zend.multibyte with
"--enable-mbstring".
Now I see PHP 5.3 doesn't have dependency to mbstring, but PHP 5.4+ has.
I haven't read the last sentence.Thanks.It looks like php-src-5.4 doesn't has leaks as PHP 5.3 does.
(I haven't completed the test, since it seems zend.multibyte is NOT working)$ TEST_PHP_EXECUTABLE=./sapi/cli/php ./run-tests.php -m -c ./php.ini
=====================================================================
PHP : ./sapi/cli/php
PHP_SAPI
: cli
PHP_VERSION
: 5.4.0RC1-dev
ZEND_VERSION: 2.4.0
PHP_OS
: Linux - Linux dev.inter.es-i.jp 2.6.35.14-2m.mo7.x86_64
#1 SMP Mon Sep 12 11:09:50 JST 2011 x86_64
INI actual : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4/php.ini
More .INIs :
CWD : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4
Extra dirs :
VALGRIND : valgrind-3.6.1TIME START 2011-11-03 16:36:57
PASS EXPECT [tests/run-test/test001.phpt]
PASS EXPECTF [tests/run-test/test002.phpt]
PASS EXPECTREGEX [tests/run-test/test003.phpt]
.....However, I got the same parse error, as Rui mentioned, with SJIS source script.
sjis.php
<?php
echo '表';(You cannot copy&paste, since this would be UTF-8. If you need
file, let me know. I'll directly mail you.)表 is valid SJIS char code and has \ as second byte.
Since PHP complains with parse error, it seems zend.multibyte is not working on
PHP 5.4.$ ./sapi/cli/php -i | grep multibyte
zend.multibyte => On => On
$ ./sapi/cli/php /usr/local/apache2.0/htdocs/sjis.php
PHP Parse error: syntax error, unexpected ''�';'
(T_ENCAPSED_AND_WHITESPACE) in /usr/local/apache2.0/htdocs/sjis.php on
line 2It seems like working with zend.multibyte=Off.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.netHi Yasuo,
how did you see that "Zend Multibyte Support" weren't enabled?
$ sapi/cli/php -d zend.multibyte=1 -i | grep -i multibyte
Zend Multibyte Support => provided by mbstring
zend.multibyte => On => On ******
Multibyte Support => enabled
Multibyte string engine => libmbfl
Multibyte (japanese) regex support => enabled
Multibyte regex (oniguruma) backtrack check => On
Multibyte regex (oniguruma) version => 4.7.1Thanks. Dmitry.
Hi all,
I noticed that "Zend Multibyte Support" won't be on with
./sapi/cli/php -d zend.multibyte=1
nor
zend.multibyte=on (in php.ini)This happens both php-src and php-src-5.4.
According to php.ini-production from php-src:
; If enabled, scripts may be written in encodings that are incompatible
with
; the scanner. CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings. To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = OffI thought it became runtime option.
Is this a bug or am I missing something?--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Dimity,
Now it seems working as it supposed. Thanks.
$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php?
(? is due to my terminal encoding. It sets to UTF-8)
It seems LEAK problem was gone with PHP 5.4, too.
[yohgaki@dev php-src-5.4]$ TEST_PHP_EXECUTABLE=./sapi/cli/php
./run-tests.php -m -c ./php.ini
=====================================================================
PHP : ./sapi/cli/php
PHP_SAPI
: cli
PHP_VERSION
: 5.4.0RC1-dev
ZEND_VERSION: 2.4.0
PHP_OS
: Linux - Linux dev.inter.es-i.jp 2.6.35.14-2m.mo7.x86_64
#1 SMP Mon Sep 12 11:09:50 JST 2011 x86_64
INI actual : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4/php.ini
More .INIs :
CWD : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4
Extra dirs :
VALGRIND : valgrind-3.6.1
TIME START 2011-11-03 18:23:53
PASS EXPECT [tests/run-test/test001.phpt]
PASS EXPECTF [tests/run-test/test002.phpt]
PASS EXPECTREGEX [tests/run-test/test003.phpt]
PASS INI section allows '=' [tests/run-test/test004.phpt]
PASS Error message handling (without ZendOptimizer)
[tests/run-test/test005.phpt]
PASS Error messages are shown [tests/run-test/test006.phpt]
PASS dirname test [tests/run-test/test007.phpt]
--
Yasuo Ohgaki
yohgaki@ohgaki.net
php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php
Oops, I thought "?" was due to terminal encoding, but I double checked with
redirecting to a file.
$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php > tt
It became "?" instead of "表"..
It seems something wrong.
Thanks for you time.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Dimity,
Now it seems working as it supposed. Thanks.
$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php?
(? is due to my terminal encoding. It sets to UTF-8)It seems LEAK problem was gone with PHP 5.4, too.
[yohgaki@dev php-src-5.4]$ TEST_PHP_EXECUTABLE=./sapi/cli/php
./run-tests.php -m -c ./php.ini=====================================================================
PHP : ./sapi/cli/php
PHP_SAPI
: cli
PHP_VERSION
: 5.4.0RC1-dev
ZEND_VERSION: 2.4.0
PHP_OS
: Linux - Linux dev.inter.es-i.jp 2.6.35.14-2m.mo7.x86_64
#1 SMP Mon Sep 12 11:09:50 JST 2011 x86_64
INI actual : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4/php.ini
More .INIs :
CWD : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4
Extra dirs :
VALGRIND : valgrind-3.6.1TIME START 2011-11-03 18:23:53
PASS EXPECT [tests/run-test/test001.phpt]
PASS EXPECTF [tests/run-test/test002.phpt]
PASS EXPECTREGEX [tests/run-test/test003.phpt]
PASS INI section allows '=' [tests/run-test/test004.phpt]
PASS Error message handling (without ZendOptimizer)
[tests/run-test/test005.phpt]
PASS Error messages are shown [tests/run-test/test006.phpt]
PASS dirname test [tests/run-test/test007.phpt]--
Yasuo Ohgaki
yohgaki@ohgaki.netphp -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php
Em Thu, 03 Nov 2011 09:38:11 -0000, Yasuo Ohgaki yohgaki@ohgaki.net
escreveu:
Oops, I thought "?" was due to terminal encoding, but I double checked
with
redirecting to a file.$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS
sjis.php > ttIt became "?" instead of "表"..
It seems something wrong.Thanks for you time.
You also have to say what it's converted to (I think, but I'm not sure,
the default is ISO-8859-1). This works:
$echo 表 | uconv -f utf-8 -t sjis > sjis.php
$./php -d zend.multibyte=1 -d zend.script_encoding=SJIS -d
mbstring.internal_encoding=utf-8 sjis.php
表
--
Gustavo Lopes
Hi Gutavo,
Now I see what is going on. I thought string is treated as just a
bunch of data once
is was compiled and passed through.
$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS -d
mbstring.internal_encoding=utf-8 sjis.php 表
Thanks for your time.
May be I should document this, since there may be a confusion.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Em Thu, 03 Nov 2011 09:38:11 -0000, Yasuo Ohgaki yohgaki@ohgaki.net
escreveu:Oops, I thought "?" was due to terminal encoding, but I double checked
with
redirecting to a file.$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php
tt
It became "?" instead of "表"..
It seems something wrong.Thanks for you time.
You also have to say what it's converted to (I think, but I'm not sure,
the default is ISO-8859-1). This works:$echo 表 | uconv -f utf-8 -t sjis > sjis.php
$./php -d zend.multibyte=1 -d zend.script_encoding=SJIS -d
mbstring.internal_encoding=utf-8 sjis.php
表--
Gustavo Lopes
One last quick question.
Zend/tests/multibyte/multibyte_encoding_001.phpt sets
mbstring.internal_encoding=SJIS.
Does PHP 5.4+ suppose to work with SJIS(or other similar encoding)
internal_encoding?
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Em Thu, 03 Nov 2011 10:31:47 -0000, Yasuo Ohgaki yohgaki@ohgaki.net
escreveu:
One last quick question.
Zend/tests/multibyte/multibyte_encoding_001.phpt sets
mbstring.internal_encoding=SJIS.Does PHP 5.4+ suppose to work with SJIS(or other similar encoding)
internal_encoding?
No. What matters is that the parser generated by bison is able to
recognize the tokens. In an ASCII (as opposed to EBCDIC) machine, this
means the encoding must be ASCII compatible.
This is the table for SJIS:
http://icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL
It would appear that it was ASCII compatible – \x20-\x7E represent
U+0020-U+007E, but if you take a closer look you'll see that these bytes
can also appear as part of larger sequences.
For instance, in this script:
<?php
function a漾() {}
the character 漾 is represented with \xE0\x40, where \x40 represents @ in
ASCII, so this would give an error, the same this would give an error:
<?php
function aà@() {}
would give an error. In fact, If I save the first script as UTF-8 and then
run PHP:
$ ./php -d zend.multibyte=1 -d zend.script_encoding=UTF-8 -d
mbstring.internal_encoding=SJIS sjis.php
php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal:
Assertion `internal_encoding &&
zend_multibyte_check_lexer_compatibility(internal_encoding)' failed.
Aborted
it gives an assertion error.
--
Gustavo Lopes
Hi Gustavo,
Thanks for reply.
As long as bison didn't understand multibyte chars, parser would not
work well with them.
Your reply is exactly what I expected.
Thank you for clarification.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Em Thu, 03 Nov 2011 10:31:47 -0000, Yasuo Ohgaki yohgaki@ohgaki.net
escreveu:One last quick question.
Zend/tests/multibyte/multibyte_encoding_001.phpt sets
mbstring.internal_encoding=SJIS.Does PHP 5.4+ suppose to work with SJIS(or other similar encoding)
internal_encoding?No. What matters is that the parser generated by bison is able to recognize
the tokens. In an ASCII (as opposed to EBCDIC) machine, this means the
encoding must be ASCII compatible.This is the table for SJIS:
http://icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALLIt would appear that it was ASCII compatible – \x20-\x7E represent
U+0020-U+007E, but if you take a closer look you'll see that these bytes can
also appear as part of larger sequences.For instance, in this script:
<?php
function a漾() {}the character 漾 is represented with \xE0\x40, where \x40 represents @ in
ASCII, so this would give an error, the same this would give an error:<?php
function aà@() {}would give an error. In fact, If I save the first script as UTF-8 and then
run PHP:$ ./php -d zend.multibyte=1 -d zend.script_encoding=UTF-8 -d
mbstring.internal_encoding=SJIS sjis.php
php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal:
Assertion `internal_encoding &&
zend_multibyte_check_lexer_compatibility(internal_encoding)' failed.
Abortedit gives an assertion error.
--
Gustavo Lopes
$ sapi/cli/php -d zend.multibyte=1 -d zend.script_encodinSJIS -d
mbstring.internal_encoding=UTF8 -d mbstring.output_encoding=UTF8 sjis.php
表
Too many different encodings :)
Thanks. Dmitry.
Oops, I thought "?" was due to terminal encoding, but I double checked with
redirecting to a file.$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php> tt
It became "?" instead of "表"..
It seems something wrong.Thanks for you time.
--
Yasuo Ohgaki
yohgaki@ohgaki.netHi Dimity,
Now it seems working as it supposed. Thanks.
$ ./sapi/cli/php -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php?
(? is due to my terminal encoding. It sets to UTF-8)It seems LEAK problem was gone with PHP 5.4, too.
[yohgaki@dev php-src-5.4]$ TEST_PHP_EXECUTABLE=./sapi/cli/php
./run-tests.php -m -c ./php.ini=====================================================================
PHP : ./sapi/cli/php
PHP_SAPI
: cli
PHP_VERSION
: 5.4.0RC1-dev
ZEND_VERSION: 2.4.0
PHP_OS
: Linux - Linux dev.inter.es-i.jp 2.6.35.14-2m.mo7.x86_64
#1 SMP Mon Sep 12 11:09:50 JST 2011 x86_64
INI actual : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4/php.ini
More .INIs :
CWD : /home/yohgaki/ext/svn/oss/php.net/php-src-5.4
Extra dirs :
VALGRIND : valgrind-3.6.1TIME START 2011-11-03 18:23:53
PASS EXPECT [tests/run-test/test001.phpt]
PASS EXPECTF [tests/run-test/test002.phpt]
PASS EXPECTREGEX [tests/run-test/test003.phpt]
PASS INI section allows '=' [tests/run-test/test004.phpt]
PASS Error message handling (without ZendOptimizer)
[tests/run-test/test005.phpt]
PASS Error messages are shown [tests/run-test/test006.phpt]
PASS dirname test [tests/run-test/test007.phpt]--
Yasuo Ohgaki
yohgaki@ohgaki.netphp -d zend.multibyte=1 -d zend.script_encoding=SJIS sjis.php