Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:24490 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 70983 invoked by uid 1010); 18 Jul 2006 17:40:54 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 70966 invoked from network); 18 Jul 2006 17:40:53 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Jul 2006 17:40:53 -0000 X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.5 (sometimes 2.4) (4) Received: from ([204.11.219.139:58901] helo=lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.3 r(11751M)) with ESMTP id 63/58-11992-32D1DB44 for ; Tue, 18 Jul 2006 13:40:52 -0400 Received: from [66.228.175.145] (borndress-lm.corp.yahoo.com [66.228.175.145]) (authenticated bits=0) by lerdorf.com (8.13.7/8.13.7/Debian-1) with ESMTP id k6IHekjb024309; Tue, 18 Jul 2006 10:40:47 -0700 In-Reply-To: <44BC07B0.3070505@ctindustries.net> References: <44BC07B0.3070505@ctindustries.net> Mime-Version: 1.0 (Apple Message framework v623) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-ID: <236712dad8ce4ce9e4c1b68726fc3d64@gravitonic.com> Content-Transfer-Encoding: 7bit Cc: "internals@lists.php.net" Date: Tue, 18 Jul 2006 10:41:17 -0700 To: Rob Richards X-Mailer: Apple Mail (2.623) Subject: Re: [PHP-DEV] unicode and xml extensions From: andrei@gravitonic.com (Andrei Zmievski) Rob, I have not tested the patch, but it looks good to me on cursory overview. I assume it passes your tests? The only comment I have is regarding the usage of 't' and 'T' specifiers. Since you always have to pass binary UTF-8 strings to libxml, we should always use 's' specifier and let PHP downconvert Unicode strings based on the runtime encoding (which you set to UTF-8). -Andrei On Jul 17, 2006, at 2:57 PM, Rob Richards wrote: > Attached is a patch for my initial cut for unicode and XML (made > against the /ext directory). > I started with XMLReader since it was the smallest. > The code can probably be optimized a bit, but I want to make sure this > is how it should be because the changes made here will be the changes > needed for the rest of the XML based extensions (simplexml, xsl, > xmlwriter, and xml to a point). > > It includes the following: > Macros defined in php_libxml.h (names can be changed if anyone has > a problem with them). > ZVAL_XML_STRING(z, s, flags) > RETVAL_XML_STRING(s, flags) > These are used to take the UTF-8 output from libxml2 functions > and return correct string (UTF-16 when running unicode mode or UTF-8 > when not) > > XMLReader: > In order to maintain BC with PHP 5 it accepts unicode and binary > strings (UTF-8 as in PHP 5) as parameters. The paramters can be mixed > (some unicode and some binary so strings are properly converted to > UTF-8 to work with libxml2). > > In order to only require 1 hash table for properties, the > following is used in MINIT: > zend_u_hash_init(&xmlreader_prop_handlers, 0, NULL, NULL, 1, > (zend_bool)zend_ini_long("unicode.semantics", > sizeof("unicode.semantics"), 1)); > > Tests have been updated for unicode mode. > > Let me know if anyone sees any problems with these changes. >