Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:15183 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 17045 invoked by uid 1010); 25 Feb 2005 16:01:04 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 19831 invoked from network); 24 Feb 2005 20:44:39 -0000 Received: from unknown (HELO ctindustries.net) (127.0.0.1) by localhost with SMTP; 24 Feb 2005 20:44:39 -0000 X-Host-Fingerprint: 216.117.147.250 unknown Linux 2.4/2.6 Received: from ([216.117.147.250:41864] helo=ctindustries.net) by pb1.pair.com (ecelerity HEAD r(5124)) with SMTP id 10/2B-43751-7BC3E124 for ; Thu, 24 Feb 2005 15:44:39 -0500 Received: from [127.0.0.1] (dsta-aa203.pivot.net [66.186.171.203]) (authenticated bits=0) by ctindustries.net (8.12.8/8.12.8) with ESMTP id j1OJcUaR026294; Thu, 24 Feb 2005 14:38:37 -0500 Message-ID: <421E3DC9.7000707@ctindustries.net> Date: Thu, 24 Feb 2005 15:49:13 -0500 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Joe Orton CC: internals@lists.php.net References: <20050217112613.GA30445@redhat.com> <421482B3.20504@bitflux.ch> <42148BE2.5000406@ctindustries.net> <20050217132838.GB28565@redhat.com> In-Reply-To: <20050217132838.GB28565@redhat.com> Content-Type: multipart/mixed; boundary="------------050600070707040903080602" X-Virus-Scanned: ClamAV 0.83/723/Thu Feb 24 06:54:24 2005 on ctindustries.net X-Virus-Status: Clean Subject: Re: [PHP-DEV] [PATCH] ext/xml/compat.c fix for #32001 From: rrichards@ctindustries.net (Rob Richards) --------------050600070707040903080602 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit What about this patch? Really hacky as the charset is checked before the inital parse and it basically duplicates the libxml code with the correct fix, but seems to work ok. Havent tried it with any large datasets yet which require multiple parse calls, but it should work. Rob Joe Orton wrote: > >That's not quite right: detection based on an ASCII explicit encoding= still works fine with the patch applied (e.g. for >encoding=ISO-8859-1 documents). It's *only* documents which have a BOM >which will then fail to parse. > >So it is a bit of a tricky trade-off... > > > --------------050600070707040903080602 Content-Type: text/plain; name="compat.c.diff.txt" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="compat.c.diff.txt" SW5kZXg6IGNvbXBhdC5jDQo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQpSQ1MgZmlsZTogL3JlcG9zaXRvcnkv cGhwLXNyYy9leHQveG1sL2NvbXBhdC5jLHYNCnJldHJpZXZpbmcgcmV2aXNpb24gMS40MA0K ZGlmZiAtcjEuNDAgY29tcGF0LmMNCjQ4MGE0ODEsNDk3DQo+IA0KPiAvKiBUaGUgZm9sbG93 aW5nIGZ1bmN0aW9uIGlzIGEgaGFjayB0byBrZWVwIEJDIHdoaWxlIGF2b2lkaW5nIA0KPiB0 aGUgaW5pZml0ZSBsb29wIGluIGxpYnhtbCA8IDIuNi4xOCB3aGljaCBvY2N1cnMgd2hlbiBu byBlbmNvZGluZyANCj4gaGFzIGJlZW4gZGVmaW5lZCBhbmQgbm9uZSBjYW4gYmUgZGV0ZWN0 ZWQgKi8NCj4gI2lmIExJQlhNTF9WRVJTSU9OIDwgMjA2MTgNCj4gCWlmIChwYXJzZXItPnBh cnNlci0+aW5zdGF0ZSA9PSBYTUxfUEFSU0VSX1NUQVJUICYmIA0KPiAJCXBhcnNlci0+cGFy c2VyLT5jaGFyc2V0ID09IFhNTF9DSEFSX0VOQ09ESU5HX05PTkUgJiYgZGF0YV9sZW4gPj0g NCkgew0KPiAJCXhtbENoYXIgc3RhcnRbNF07DQo+IA0KPiAJCXN0YXJ0WzBdID0gKmRhdGE7 DQo+IAkJc3RhcnRbMV0gPSBkYXRhWzFdOw0KPiAJCXN0YXJ0WzJdID0gZGF0YVsyXTsNCj4g CQlzdGFydFszXSA9IGRhdGFbM107DQo+IAkJeG1sU3dpdGNoRW5jb2RpbmcocGFyc2VyLT5w YXJzZXIsIHhtbERldGVjdENoYXJFbmNvZGluZygmc3RhcnRbMF0sIDQpKTsNCj4gCX0NCj4g I2VuZGlmDQo+IA0K --------------050600070707040903080602--