Yasuo, internals, doc-friends,
I'm working on the 5.6 migration guide, and I'm a little confused
about what state the default encoding RFC actually ended up in after
it was accepted and merged. UPGRADING says:
Changes were made to character set handling in:
- the iconv and mbstring extensions,
- and
htmlentities()
,htmlspecialchars()
,html_entity_decode()
functionsThe precedence for these is now:
default_charset < internal/input/output_encoding < (mbstring.* || iconv.*) < function parameter
For example, the easiest way to use the UTF-8 encoding is to set
default_charset=UTF-8 and leave the following php.ini parameters
The way this reads to me, assuming I don't have any other encoding
settings set, I should be able to set default_encoding to (for
example) "cp1252" and get Windows-1252 handling as the default in
htmlentities()
, htmlspecialchars()
and html_entity_decode()
if I don't
specify the encoding parameter. In practice, though, that doesn't seem
to be the case. I created this script:
<?php
ini_set('default_charset', 'cp1252');
var_dump(htmlentities("\xA3", ENT_HTML5));
var_dump(htmlentities("\xA3", ENT_HTML5, 'cp1252'));
var_dump(html_entity_decode("£", ENT_HTML5));
var_dump(html_entity_decode("£", ENT_HTML5, 'cp1252'));
?>
And ran it using the -n option, so that anything I had set in my
php.ini would be ignored. My expected output would have been for the
two htmlentities()
and html_entity_decode()
calls to return the same
strings (ignore the unknown character glyphs; my terminal is UTF-8, so
I expect Windows-1252 output to be broken):
string(7) "£"
string(7) "£"
string(1) "�"
string(1) "�"
But instead, I got this output, suggesting that the calls without
explicit charset parameters were treated as UTF-8:
string(0) ""
string(7) "£"
string(2) "£"
string(1) "�"
What am I missing here? Do htmlentities()
, htmlspecialchars()
and
html_entity_decode()
actually respect default_charset? (Also, if I set
internal_encoding, input_encoding and output_encoding, which one
should get used for each?)
Thanks,
Adam
Hi Adam
string(7) "£"
string(7) "£"
string(1) "�"
string(1) "�"But instead, I got this output, suggesting that the calls without
explicit charset parameters were treated as UTF-8:string(0) ""
string(7) "£"
string(2) "£"
string(1) "�"What am I missing here? Do
htmlentities()
,htmlspecialchars()
and
html_entity_decode()
actually respect default_charset? (Also, if I set
internal_encoding, input_encoding and output_encoding, which one
should get used for each?)
It looks l missed some.
I'll commit it soon.
Thank you.
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Adam,
string(7) "£"
string(7) "£"
string(1) "�"
string(1) "�"But instead, I got this output, suggesting that the calls without
explicit charset parameters were treated as UTF-8:string(0) ""
string(7) "£"
string(2) "£"
string(1) "�"What am I missing here? Do
htmlentities()
,htmlspecialchars()
and
html_entity_decode()
actually respect default_charset? (Also, if I set
internal_encoding, input_encoding and output_encoding, which one
should get used for each?)It looks l missed some.
I'll commit it soon.
Sorry, I missed to handle it properly.
I've committed patch. I should work now.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Sorry, I missed to handle it properly.
I've committed patch. I should work now.
I mean it should work.
I shouldn't rely on 30 seconds delay..
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Yasuo Ohgaki wrote:
string(0) ""
string(7) "£"
string(2) "£"
string(1) "�"What am I missing here? Do
htmlentities()
,htmlspecialchars()
and
html_entity_decode() actually respect default_charset? (Also, if I set
internal_encoding, input_encoding and output_encoding, which one
should get used for each?)It looks l missed some.
I'll commit it soon.
The very character I had great fun with until UTF-8 became the norm :)
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk