Hello!
I am trying to understand how PHP6 handling of unicode works, I think I am
missing something.
My config is:
;;;;;;;;;;;;;;;;;;;;
; Unicode settings ;
;;;;;;;;;;;;;;;;;;;;
unicode.semantics = on
unicode.runtime_encoding = iso-8859-1
unicode.script_encoding = iso-8859-1
unicode.output_encoding = utf-8
unicode.from_error_mode = U_INVALID_SUBSTITUTE
unicode.from_error_subst_char = 3f
unicode.fallback_encoding = iso-8859-1
I use a mysql database, with iso-8859-1 (Portuguese - latin 1) text, with
accented characters.
What I was trying to understand was, because unicode.runtime_encoding =
iso-8859-1, I tought that all internal operations were done in this
encoding, and only when outputting (unicode.output_encoding = utf-8) data
would be converted to utf-8. So to me, I did a mysql query with latin1, data
comes to my variables as iso-8859-1, I use them, and only when I echo'ed
them, they would become utf-8, from a iso-8859-1-to-utf-8-like function.
But when I do query in any record that have accented characters I get this
warning (using mysql_fetch_assoc):
Could not convert binary string to Unicode string (converter UTF-8 failed on
bytes (0xE7) at offset 9)
for all accented characters in all fields.
The strange thing to me, is the mysql_fetch_assoc function give this error
even before I accessed the field values, as I understanded from the above
explanation.
If I changed the set names query to:
mysql_query('set names utf8', $this->mysql_link);
then it works, but I would like to understand how this works, to make my
program the right way from the start.
Did I misundertood something?
Thanks,
Rangel
php6 always handles all "internal work" in utf-8
But, from what I remember, there should be a way to specify encoding
in which you expect data to arrive (in this example from mysql_*)
I think, that someone who knows more will give you more details
Hello!
I am trying to understand how PHP6 handling of unicode works, I think I am
missing something.My config is:
;;;;;;;;;;;;;;;;;;;;
; Unicode settings ;
;;;;;;;;;;;;;;;;;;;;unicode.semantics = on
unicode.runtime_encoding = iso-8859-1
unicode.script_encoding = iso-8859-1
unicode.output_encoding = utf-8
unicode.from_error_mode = U_INVALID_SUBSTITUTE
unicode.from_error_subst_char = 3f
unicode.fallback_encoding = iso-8859-1I use a mysql database, with iso-8859-1 (Portuguese - latin 1) text, with
accented characters.What I was trying to understand was, because unicode.runtime_encoding =
iso-8859-1, I tought that all internal operations were done in this
encoding, and only when outputting (unicode.output_encoding = utf-8) data
would be converted to utf-8. So to me, I did a mysql query with latin1, data
comes to my variables as iso-8859-1, I use them, and only when I echo'ed
them, they would become utf-8, from a iso-8859-1-to-utf-8-like function.But when I do query in any record that have accented characters I get this
warning (using mysql_fetch_assoc):
Could not convert binary string to Unicode string (converter UTF-8 failed on
bytes (0xE7) at offset 9)for all accented characters in all fields.
The strange thing to me, is the mysql_fetch_assoc function give this error
even before I accessed the field values, as I understanded from the above
explanation.If I changed the set names query to:
mysql_query('set names utf8', $this->mysql_link);
then it works, but I would like to understand how this works, to make my
program the right way from the start.Did I misundertood something?
Thanks,
Rangel--
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
php6 always handles all "internal work" in utf-8
No, it is UTF-16 internally.
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org
Hmm ok, so what does the parameter "unicode.runtime_encoding = iso-8859-1"
do?
This means that, when "unicode.semantics = on", internally everything is
utf-16, event text read from iso-8859-1 script files and database data, so
if I want to return iso-8859-1 data, I need to set "unicode.output_encoding
= iso-8859-1", and php will convert all echo'ed data from utf-16, is this
it?
If so, can I change unicode.output_encoding at runtime, with ini_set?
----- Original Message -----
From: "Derick Rethans" derick@php.net
To: "Alexey Zakhlestin" indeyets@gmail.com
Cc: "Rangel Reale" listas@rangelreale.com; internals@lists.php.net
Sent: Thursday, May 03, 2007 5:58 AM
Subject: Re: [PHP-DEV] Trying to understand PHP6's unicode support
php6 always handles all "internal work" in utf-8
No, it is UTF-16 internally.
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.463 / Virus Database: 269.6.1/776 - Release Date: 25/4/2007
12:19
Hmm ok, so what does the parameter "unicode.runtime_encoding = iso-8859-1"
do?This means that, when "unicode.semantics = on", internally everything is
utf-16, event text read from iso-8859-1 script files and database data, so
if I want to return iso-8859-1 data, I need to set "unicode.output_encoding
= iso-8859-1", and php will convert all echo'ed data from utf-16, is this
it?If so, can I change unicode.output_encoding at runtime, with ini_set?
from: http://www.php.net/manual/nl/ini.php
unicode.output_encoding : PHP_INI_ALL
So yes, it can be set with ini_set.
Tijnema
----- Original Message -----
From: "Derick Rethans" derick@php.net
To: "Alexey Zakhlestin" indeyets@gmail.com
Cc: "Rangel Reale" listas@rangelreale.com; internals@lists.php.net
Sent: Thursday, May 03, 2007 5:58 AM
Subject: Re: [PHP-DEV] Trying to understand PHP6's unicode supportphp6 always handles all "internal work" in utf-8
No, it is UTF-16 internally.
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org--
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.463 / Virus Database: 269.6.1/776 - Release Date: 25/4/2007
12:19
Hmm ok, so what does the parameter "unicode.runtime_encoding = iso-8859-1"
do?
IIRC runtime_encoding is used to translate Unicode strings (UTF-16) to
plain strings if some internal function accepts only plain strings.
Stanislav Malyshev, Zend Products Engineer
stas@zend.com http://www.zend.com/
Hello!
I am trying to understand how PHP6 handling of unicode works, I think I am
missing something.My config is:
;;;;;;;;;;;;;;;;;;;;
; Unicode settings ;
;;;;;;;;;;;;;;;;;;;;unicode.semantics = on
unicode.runtime_encoding = iso-8859-1
unicode.script_encoding = iso-8859-1
unicode.output_encoding = utf-8
unicode.from_error_mode = U_INVALID_SUBSTITUTE
unicode.from_error_subst_char = 3f
unicode.fallback_encoding = iso-8859-1I use a mysql database, with iso-8859-1 (Portuguese - latin 1) text, with
accented characters.What I was trying to understand was, because unicode.runtime_encoding =
iso-8859-1, I tought that all internal operations were done in this
encoding, and only when outputting (unicode.output_encoding = utf-8) data
would be converted to utf-8. So to me, I did a mysql query with latin1,
data
comes to my variables as iso-8859-1, I use them, and only when I echo'ed
them, they would become utf-8, from a iso-8859-1-to-utf-8-like function.But when I do query in any record that have accented characters I get this
warning (using mysql_fetch_assoc):
Could not convert binary string to Unicode string (converter UTF-8 failed
on
bytes (0xE7) at offset 9)for all accented characters in all fields.
The strange thing to me, is the mysql_fetch_assoc function give this error
even before I accessed the field values, as I understanded from the above
explanation.If I changed the set names query to:
mysql_query('set names utf8', $this->mysql_link);
then it works, but I would like to understand how this works, to make my
program the right way from the start.Did I misundertood something?
I think you are confusing PHP features with MySQL features. By running
'SET NAMES' query you set client, connection and result character sets in
MySQL (see http://dev.mysql.com/doc/refman/4.1/en/charset.html). It is not
related to PHP. MySQL has internal character set support since 4.1
version.
--
Tomas
I think you are confusing PHP features with MySQL features. By running
'SET NAMES' query you set client, connection and result character sets in
MySQL (see http://dev.mysql.com/doc/refman/4.1/en/charset.html). It is not
related to PHP. MySQL has internal character set support since 4.1
version.
But PHP 6 still needs to get the data in a "proper" encoding so it
matters. Please see http://news.php.net/php.i18n/1062 until posting 1066
for another discussion started by Rangel Reale about the same issue.
johannes