Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:15043 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 2456 invoked by uid 1010); 17 Feb 2005 11:26:48 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 2177 invoked from network); 17 Feb 2005 11:26:46 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Feb 2005 11:26:46 -0000 X-Host-Fingerprint: 66.187.233.31 mx1.redhat.com Linux 2.4/2.6 Received: from ([66.187.233.31:37002] helo=mx1.redhat.com) by pb1.pair.com (ecelerity 1.2 (r4437)) with SMTP id E7/A6-21802-A6F74124 for ; Thu, 17 Feb 2005 06:26:37 -0500 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j1HBQFfs005061 for ; Thu, 17 Feb 2005 06:26:15 -0500 Received: from radish.cambridge.redhat.com (radish.cambridge.redhat.com [172.16.18.90]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j1HBQEO25923 for ; Thu, 17 Feb 2005 06:26:14 -0500 Received: from radish.cambridge.redhat.com (localhost.localdomain [127.0.0.1]) by radish.cambridge.redhat.com (8.13.1/8.12.7) with ESMTP id j1HBQDcq030779 for ; Thu, 17 Feb 2005 11:26:13 GMT Received: (from jorton@localhost) by radish.cambridge.redhat.com (8.13.1/8.12.10/Submit) id j1HBQDkw030778 for internals@lists.php.net; Thu, 17 Feb 2005 11:26:13 GMT Date: Thu, 17 Feb 2005 11:26:13 +0000 To: internals@lists.php.net Message-ID: <20050217112613.GA30445@redhat.com> Mail-Followup-To: internals@lists.php.net Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: [PATCH] ext/xml/compat.c fix for #32001 From: jorton@redhat.com (Joe Orton) libxml2's charset encoding auto-detection mode is broken with the push parser in current versions of libxml2, I found that recently: http://bugzilla.gnome.org/show_bug.cgi?id=162613 but trying to force it can trigger infinite loops in libxml2, which is what happens in http://bugs.php.net/?id=32001 So I think it's best to not force this mode. Future versions of libxml2 will set parser->charset to XML_CHAR_ENCODING_NONE by default with the push parser and will hence work as desired with no explicit setting of parser->charset required. Is this patch OK? http://www.apache.org/~jorton/php_xmlenc.diff Index: ext/xml/compat.c =================================================================== RCS file: /repository/php-src/ext/xml/compat.c,v retrieving revision 1.32.2.7 diff -u -r1.32.2.7 compat.c --- ext/xml/compat.c 17 Dec 2004 12:21:34 -0000 1.32.2.7 +++ ext/xml/compat.c 17 Feb 2005 11:12:08 -0000 @@ -379,8 +379,6 @@ } if (encoding != NULL) { parser->parser->encoding = xmlStrdup(encoding); - } else { - parser->parser->charset = XML_CHAR_ENCODING_NONE; } parser->parser->replaceEntities = 1; parser->parser->wellFormed = 0; Index: ext/xml/tests/bug32001.phpt =================================================================== RCS file: ext/xml/tests/bug32001.phpt diff -N ext/xml/tests/bug32001.phpt --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ ext/xml/tests/bug32001.phpt 17 Feb 2005 11:12:08 -0000 @@ -0,0 +1,40 @@ +--TEST-- +Bug #32001 (infinite loop in libxml character encoding detection) +--FILE-- +simple note"; +xml_parse_into_struct($myparser, $simple, $myvals, $mytags); +var_dump($myvals); +--EXPECT-- +array(3) { + [0]=> + array(3) { + ["tag"]=> + string(4) "PARA" + ["type"]=> + string(4) "open" + ["level"]=> + int(1) + } + [1]=> + array(4) { + ["tag"]=> + string(4) "NOTE" + ["type"]=> + string(8) "complete" + ["level"]=> + int(2) + ["value"]=> + string(11) "simple note" + } + [2]=> + array(3) { + ["tag"]=> + string(4) "PARA" + ["type"]=> + string(5) "close" + ["level"]=> + int(1) + } +}