Hi all,
Currently, we have many encoding settings. It would be nicer if we have
central encoding settings.
https://wiki.php.net/rfc/default_encoding
The patch is PoC, but the intent would be clear.
Any comments are appreciated.
Thank you.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
Currently, we have many encoding settings. It would be nicer if we have
central encoding settings.https://wiki.php.net/rfc/default_encoding
The patch is PoC, but the intent would be clear.
Any comments are appreciated.Thank you.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
I'm not sure what it is you are actually trying to achieve here ??
+1 on the 5.5 changes
But the rest I don't really understand what the aim is, it would seem
that renaming settings, especially ones that are not actually anything
to do with the core, is just breaking compatibility for no good reason.
What I could understand is a proposal to move the functionality provided
by mbstring/iconv into core and introduce dot script_encoding
complementary settings:
zend.input_encoding
zend.output_encoding
I could understand this kind of proposal being aimed at 6.
I don't get it ...
Cheers
Joe
Hi Joe,
I'm not sure what it is you are actually trying to achieve here ??
I have 3 objectives in this RFC.
- Setting charset in HTTP header is recommended since the first XSS
advisory in 2000 Feb. by CERT and Microsoft. - There are too many encoding settings and it is better to consolidated.
- If we have yet another multibyte string module in the future, the new
settings can be used.
I'll add these if I didn't write them in RFC later.
I proposed "default_charset=UTF-8" years ago, but there were many users
uses "ISO-8859-"/"EUC-"/etc at that time and we decided leave the setting
to users.
+1 on the 5.5 changes
But the rest I don't really understand what the aim is, it would seem that
renaming settings, especially ones that are not actually anything to do
with the core, is just breaking compatibility for no good reason.
Encoding must be specified for proper operation. It's a security risk also.
What I could understand is a proposal to move the functionality provided
by mbstring/iconv into core and introduce dot script_encoding complementary
settings:zend.input_encoding
zend.output_encodingI could understand this kind of proposal being aimed at 6.
I don't think Zend engine will have multibyte char handling feature at
least any time soon.
Currently, Zend engine has zend multibyte option, but it's only for
encoding that is not
compatible ISO-8859-1. (e.g. SJIS, BIG5. These encodings has \ in chars and
engine
would not work script written by these encodings with zend multibyte off.)
However, having encoding settings in the engine will work also even if it
does not use
them. It may be a good idea have these settings in the engine. I'm +1 for
this idea.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Joe,
I'm not sure what it is you are actually trying to achieve here ??
I have 3 objectives in this RFC.
- Setting charset in HTTP header is recommended since the first XSS
advisory in 2000 Feb. by CERT and Microsoft.- There are too many encoding settings and it is better to consolidated.
- If we have yet another multibyte string module in the future, the new
settings can be used.I'll add these if I didn't write them in RFC later.
I proposed "default_charset=UTF-8" years ago, but there were many users
uses "ISO-8859-"/"EUC-"/etc at that time and we decided leave the setting
to users.+1 on the 5.5 changes
But the rest I don't really understand what the aim is, it would seem that
renaming settings, especially ones that are not actually anything to do
with the core, is just breaking compatibility for no good reason.Encoding must be specified for proper operation. It's a security risk also.
What I could understand is a proposal to move the functionality provided
by mbstring/iconv into core and introduce dot script_encoding complementary
settings:zend.input_encoding
zend.output_encodingI could understand this kind of proposal being aimed at 6.
I don't think Zend engine will have multibyte char handling feature at
least any time soon.Currently, Zend engine has zend multibyte option, but it's only for
encoding that is not
compatible ISO-8859-1. (e.g. SJIS, BIG5. These encodings has \ in chars and
engine
would not work script written by these encodings with zend multibyte off.)However, having encoding settings in the engine will work also even if it
does not use
them. It may be a good idea have these settings in the engine. I'm +1 for
this idea.Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
I don't see that it is possible to merge the settings from different
libraries, what if an application is relying on mbstring and iconv
having different settings ??
It's possible that applications are relying on the separation of their
settings in order to function properly, is what I am trying to say.
The only way you could possibly merge those configuration settings is by
also merging the functionality, there's no backward compatible way to do
that, but I can imagine at some time in the future those libraries being
used to support all of the required input/output/script encoding
features at the level of Zend.
I don't see how this can move forward and not break stuff ...
Cheers
Joe
I don't see that it is possible to merge the settings from different
libraries, what if an application is relying on mbstring and iconv having
different settings ??
I think this use case is descibed in the RFC. The default_charset can be
overwritten:
default_charset < php.* < mbstring./iconv. < encoding specified by
functions
It's possible that applications are relying on the separation of their
settings in order to function properly, is what I am trying to say.
The same like above.
The only way you could possibly merge those configuration settings is by
also merging the functionality, there's no backward compatible way to do
that, but I can imagine at some time in the future those libraries being
used to support all of the required input/output/script encoding features
at the level of Zend.I don't see how this can move forward and not break stuff ...
I think it's the same like above...You can override the default setting, so
everything should be fine.
I'm +1 for this, as there are really to much unnecessary settings around!
I don't see that it is possible to merge the settings from different
libraries, what if an application is relying on mbstring and iconv having
different settings ??I think this use case is descibed in the RFC. The default_charset can be
overwritten:
default_charset < php.* < mbstring./iconv. < encoding specified by
functionsIt's possible that applications are relying on the separation of their
settings in order to function properly, is what I am trying to say.The same like above.
The only way you could possibly merge those configuration settings is by
also merging the functionality, there's no backward compatible way to do
that, but I can imagine at some time in the future those libraries being
used to support all of the required input/output/script encoding features
at the level of Zend.I don't see how this can move forward and not break stuff ...
I think it's the same like above...You can override the default setting, so
everything should be fine.I'm +1 for this, as there are really to much unnecessary settings around!
How could you override them ??
If they are removed then they cannot be referenced.
If they are not being removed then nothing is being simplified ...
Cheers
Joe
Hi Joe,
How could you override them ??
It's in PoC patch.
I made it while 5.5 was in beta, but it would work.
If they are removed then they cannot be referenced.
If they are not being removed then nothing is being simplified ...
The most important objective is when you are using 'UTF-8' (I guess it's
standard today)
All you should do is
default_charset='UTF-8'
then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
mbstring functions, etc)
I have to work on functions, but php.ini related staff is in PoC patch.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Joe,
How could you override them ??
It's in PoC patch.
I made it while 5.5 was in beta, but it would work.If they are removed then they cannot be referenced.
If they are not being removed then nothing is being simplified ...
The most important objective is when you are using 'UTF-8' (I guess it's
standard today)
All you should do isdefault_charset='UTF-8'
then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
mbstring functions, etc)
I have to work on functions, but php.ini related staff is in PoC patch.
I forgot to mention that it helps i18n applications also.
For example, preg and sqlite only accepts UTF-8 as MBCS char. Users may
write
if (ini_get('default_charset') !== 'UTF-8') {
$str = mb_convert_encoding($str, 'UTF-8');
}
preg, sqlite function calls here.
It simplifies things for sure.
I'll add these in RFC later.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Joe,
How could you override them ??
It's in PoC patch.
I made it while 5.5 was in beta, but it would work.If they are removed then they cannot be referenced.
If they are not being removed then nothing is being simplified ...
The most important objective is when you are using 'UTF-8' (I guess it's
standard today)
All you should do isdefault_charset='UTF-8'
then PHP uses the setting anywhere it can apply. (e.g. htmlspecialchars,
mbstring functions, etc)
I have to work on functions, but php.ini related staff is in PoC patch.I forgot to mention that it helps i18n applications also.
For example, preg and sqlite only accepts UTF-8 as MBCS char. Users may
writeif (ini_get('default_charset') !== 'UTF-8') {
$str = mb_convert_encoding($str, 'UTF-8');
}
preg, sqlite function calls here.It simplifies things for sure.
I'll add these in RFC later.Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Sorry, I'm a shit. I should have looked at the patch first before
opening my big gob.
I will look at the patch, and join in when I have a clue :)
Cheers
Joe
Hi all,
Currently, we have many encoding settings. It would be nicer if we have
central encoding settings.https://wiki.php.net/rfc/default_encoding
The patch is PoC, but the intent would be clear.
Any comments are appreciated.
I would like to propose this RFC for 5.6.
https://wiki.php.net/rfc/default_encoding
This change will not break existing applications. It
tweaks php.ini settings to consolidate various encoding settings
and make "default_charset" default.
If you have any comments, please let me know before start vote.
Thank you.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
Currently, we have many encoding settings. It would be nicer if we have
central encoding settings.https://wiki.php.net/rfc/default_encoding
The patch is PoC, but the intent would be clear.
Any comments are appreciated.
This RFC is accepted 8 vs. 1
Thank you!
I'll prepare complete patch to review.
There is related RFC.
https://wiki.php.net/rfc/multibyte_char_handling
Comments for this RFC is appreciated.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net