Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:15049 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 69381 invoked by uid 1010); 17 Feb 2005 14:42:37 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 69365 invoked from network); 17 Feb 2005 14:42:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Feb 2005 14:42:37 -0000 X-Host-Fingerprint: 216.117.147.250 unknown Linux 2.4/2.6 Received: from ([216.117.147.250:36969] helo=ctindustries.net) by pb1.pair.com (ecelerity 1.2 (r4437)) with SMTP id ED/42-21802-C5DA4124 for ; Thu, 17 Feb 2005 09:42:36 -0500 Received: from [192.168.1.136] (rrcs-24-97-234-130.nys.biz.rr.com [24.97.234.130]) (authenticated bits=0) by ctindustries.net (8.12.8/8.12.8) with ESMTP id j1HDafaP010347; Thu, 17 Feb 2005 08:36:42 -0500 Message-ID: <4214AD7B.70209@ctindustries.net> Date: Thu, 17 Feb 2005 09:43:07 -0500 User-Agent: Mozilla Thunderbird 0.9 (X11/20041127) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Joe Orton CC: internals@lists.php.net References: <20050217112613.GA30445@redhat.com> <421482B3.20504@bitflux.ch> <42148BE2.5000406@ctindustries.net> <20050217132838.GB28565@redhat.com> In-Reply-To: <20050217132838.GB28565@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.82/707/Wed Feb 16 17:00:07 2005 on ctindustries.net X-Virus-Status: Clean Subject: Re: [PHP-DEV] [PATCH] ext/xml/compat.c fix for #32001 From: rrichards@ctindustries.net (Rob Richards) Right, as long as you explicity include the encoding. So with the patch if you include a BOM, it wont parse (previous behavior worked fine) and if you include a prolog without explicit encoding it will always use UTF-8 (previous behavior was to autodetect encoding based on the charset used for first few chars of the prolog). Would it be possible to ifdef it then and for older libxml (only needed when trying to use autoencoding) see if its possible to see if xmlDetectEndocding can be used prior to sending off to parser and if it returns no encoding then set the charset to utf-8 to avoid the infinite loop? This would preserver BC for anyone using prolog with no explicit encoding or BOM. Rob Joe Orton wrote: >On Thu, Feb 17, 2005 at 07:19:46AM -0500, Rob Richards wrote: > > >>It looks like there would be BC breaks unless libxml with the bug fix is >>used as the encoding is detected properly and no infinite loop if an xml >>declaration or BOM is used in the xml. So basically with the patch there >>is no more autodetecting if used with any other libxml versions (though >>no more possibilities of inifinte loops). >> >> > >That's not quite right: detection based on an ASCII explicit encoding= still works fine with the patch applied (e.g. for >encoding=ISO-8859-1 documents). It's *only* documents which have a BOM >which will then fail to parse. > >So it is a bit of a tricky trade-off... > >