Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:87420 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 64050 invoked from network); 30 Jul 2015 21:53:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Jul 2015 21:53:18 -0000 Authentication-Results: pb1.pair.com smtp.mail=rowan.collins@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=rowan.collins@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.212.180 as permitted sender) X-PHP-List-Original-Sender: rowan.collins@gmail.com X-Host-Fingerprint: 209.85.212.180 mail-wi0-f180.google.com Received: from [209.85.212.180] ([209.85.212.180:33344] helo=mail-wi0-f180.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 46/44-34806-CCC9AB55 for ; Thu, 30 Jul 2015 17:53:16 -0400 Received: by wicmv11 with SMTP id mv11so36632375wic.0 for ; Thu, 30 Jul 2015 14:53:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=user-agent:in-reply-to:references:mime-version :content-transfer-encoding:content-type:subject:from:date:to :message-id; bh=7uQCNdRs0ykhhuUWnyF3HJLIRRGE2LP+L8sEZut25LQ=; b=VxGOO1eeNiI/ahB3eHgFb6EdDnSFlD4dvkbiHPWxDl/39yEZczhrnzTx33Hs18Qagw nt0ML8A3j95T1/3Xnb5xBccTIIf2thL65RwI4J324z1mTE0gAFKpZS9qBiSyWgAefElB Ty7j+RXumCIuML21HtoFSinwS1tjxvos9hnJiSPjL20QcFwWliLM/WpL4KepDLuRqUzQ VFko6NbDkx2tgwdF4/bfgPu+uZrdNRKrXWYMSNwwDJV1Nzdr3EI4fj11HGb5oOWK1xmX saQF4RwhjDEbO0xiK/IZQGYHyTKbj/woIo4NglG9/UtAMTEm+koMghvnjguKnM1n+5u4 hKOA== X-Received: by 10.180.198.199 with SMTP id je7mr473051wic.34.1438293193305; Thu, 30 Jul 2015 14:53:13 -0700 (PDT) Received: from [192.168.0.6] (cpc68956-brig15-2-0-cust215.3-3.cable.virginm.net. [82.6.24.216]) by smtp.gmail.com with ESMTPSA id c11sm1181019wib.1.2015.07.30.14.53.11 for (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 Jul 2015 14:53:12 -0700 (PDT) User-Agent: K-9 Mail for Android In-Reply-To: <55BA8A75.30208@cdatazone.org> References: <55B94D57.4070509@gmail.com> <55BA22C5.2020104@cdatazone.org> <55BA34EF.6030209@gmail.com> <55BA8A75.30208@cdatazone.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Date: Thu, 30 Jul 2015 22:53:02 +0100 To: internals@lists.php.net Message-ID: <0C5C85D4-1B7F-4533-9DB0-6FBC42AA048D@gmail.com> Subject: Re: [PHP-DEV] Disabling External Entities in libxml By Default From: rowan.collins@gmail.com (Rowan Collins) On 30 July 2015 21:35:01 BST, Rob Richards wrote: >On 7/30/15 10:30 AM, Rowan Collins wrote: >> Rob Richards wrote on 30/07/2015 14:12: >>> If you are already working with a trusted document then you should >>> safely be able to disable the entity loader. If you aren't then >>> wouldn't you want to do some sort of checking (especially if you >dont >>> have an XML gateway fronting the system) for other malicious things >>> before even opening the document regardless if it has external >>> entities or not. >> >> Can you give any pointers to what kind of checking this would be, and > >> how it would be carried out without parsing the XML document in the >> first place? >> >> According to the bug report, one of the affected uses is the >> SoapClient, which by definition is dealing with remote data. I can >see >> how that could be considered "untrusted", but I can't think of any >> particular action that would make it more trusted (quite apart from >> the lack of an obvious point to intercept the data before it is >parsed). >> >> Would it not make more sense for the parser to operate in an >> "untrusted" mode - disabling external entities, maybe different >limits >> on stack depth, etc? >> >> Regards, > >All depends upon what you are trying to accomplish as this covers tree, > >streaming, different types of schemas, xsl, etc... >For example, you can easily check if there is a DTD, imports/includes, >specific xslt functionality, list goes on and on without ever having to > >load the document. There really is no one size fit all imo so what one >considers untrusted someone else would consider trusted. So effectively we should all write partial XML parsers to determine the contents of the file, in order to decide if it's the data we expected? Would it not make more sense to leave that to the XML library, with a whitelist of features we actually need, URLs we trust for includes, etc? I never want an XML file to execute system commands on my behalf; do I have to write a regex to make sure they don't? Regards, -- Rowan Collins [IMSoP]