Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29133 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48532 invoked by uid 1010); 3 May 2007 02:00:12 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 48516 invoked from network); 3 May 2007 02:00:12 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 May 2007 02:00:12 -0000 Authentication-Results: pb1.pair.com header.from=listas@rangelreale.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=listas@rangelreale.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain rangelreale.com designates 200.234.205.155 as permitted sender) X-PHP-List-Original-Sender: listas@rangelreale.com X-Host-Fingerprint: 200.234.205.155 hm32.locaweb.com.br Received: from [200.234.205.155] ([200.234.205.155:50079] helo=hm32.locaweb.com.br) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3F/07-10138-A2249364 for ; Wed, 02 May 2007 22:00:12 -0400 Received: (qmail 26065 invoked from network); 3 May 2007 02:00:05 -0000 Received: from unknown (10.1.10.142) by hm32.locaweb.com.br with QMQP; 3 May 2007 02:00:05 -0000 Received: from unknown (HELO rangeldc) (rangelreale@rangelreale.com@201.95.191.145) by hm456.locaweb.com.br with SMTP; 3 May 2007 02:00:27 -0000 Message-ID: <006c01c78d26$beda26f0$0301a8c0@rangeldc> To: Date: Wed, 2 May 2007 22:59:52 -0300 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.3028 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Subject: Trying to understand PHP6's unicode support From: listas@rangelreale.com ("Rangel Reale") Hello! I am trying to understand how PHP6 handling of unicode works, I think I am missing something. My config is: ;;;;;;;;;;;;;;;;;;;; ; Unicode settings ; ;;;;;;;;;;;;;;;;;;;; unicode.semantics = on unicode.runtime_encoding = iso-8859-1 unicode.script_encoding = iso-8859-1 unicode.output_encoding = utf-8 unicode.from_error_mode = U_INVALID_SUBSTITUTE unicode.from_error_subst_char = 3f unicode.fallback_encoding = iso-8859-1 I use a mysql database, with iso-8859-1 (Portuguese - latin 1) text, with accented characters. What I was trying to understand was, because unicode.runtime_encoding = iso-8859-1, I tought that all internal operations were done in this encoding, and only when outputting (unicode.output_encoding = utf-8) data would be converted to utf-8. So to me, I did a mysql query with latin1, data comes to my variables as iso-8859-1, I use them, and only when I echo'ed them, they would become utf-8, from a iso-8859-1-to-utf-8-like function. But when I do query in any record that have accented characters I get this warning (using mysql_fetch_assoc): ---------- Could not convert binary string to Unicode string (converter UTF-8 failed on bytes (0xE7) at offset 9) ---------- for all accented characters in all fields. The strange thing to me, is the mysql_fetch_assoc function give this error even before I accessed the field values, as I understanded from the above explanation. If I changed the set names query to: mysql_query('set names utf8', $this->mysql_link); then it works, but I would like to understand how this works, to make my program the right way from the start. Did I misundertood something? Thanks, Rangel