Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:64603 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 3504 invoked from network); 6 Jan 2013 11:13:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jan 2013 11:13:24 -0000 Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 173.203.6.139 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 173.203.6.139 smtp139.ord.emailsrvr.com Linux 2.6 Received: from [173.203.6.139] ([173.203.6.139:56057] helo=smtp139.ord.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 4C/89-62408-25C59E05 for ; Sun, 06 Jan 2013 06:13:23 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp26.relay.ord1a.emailsrvr.com (SMTP Server) with ESMTP id ADB4C1C008E; Sun, 6 Jan 2013 06:13:19 -0500 (EST) X-Virus-Scanned: OK Received: by smtp26.relay.ord1a.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 5A0E81C007E; Sun, 6 Jan 2013 06:13:19 -0500 (EST) Message-ID: <50E95C4E.3060609@sugarcrm.com> Date: Sun, 06 Jan 2013 03:13:18 -0800 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Adam Jon Richardson CC: "internals@lists.php.net" References: <50E8B6B2.1030404@sugarcrm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] Providing improved functionality for escaping html (and other) output. From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > if ($allowed_html) { > // cycle through the whitelisted sequences > foreach($allowed_html as $sequence) { What is supposed to be in $allowed_html? If those are simple fixed strings and such, why can't you just do preg_split with PREG_SPLIT_DELIM_CAPTURE and encode each other element of the result, or PREG_SPLIT_OFFSET_CAPTURE if you need something more interesting? I would seriously advise though against trying to do HTML parsing with regexps unless they are very simple, since browsers will accept a lot of broken HTML and will happily run scripts in it, etc. > Bridging the gap between strip_tags and htmlspecialchars seems like a > reasonable consideration for PHP's core. While I do use HTMLPurifier I think with level of complexity that is needed to cover anything but the most primitive cases, you need a full-blown HTML/XML parser there. Which we do have, so why not use any of them instead of reinventing them, if that's what you need? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227