Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67995 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 66528 invoked from network); 28 Jun 2013 04:38:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Jun 2013 04:38:20 -0000 Authentication-Results: pb1.pair.com smtp.mail=kris.craig@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=kris.craig@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.219.51 as permitted sender) X-PHP-List-Original-Sender: kris.craig@gmail.com X-Host-Fingerprint: 209.85.219.51 mail-oa0-f51.google.com Received: from [209.85.219.51] ([209.85.219.51:64770] helo=mail-oa0-f51.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 91/49-34034-B331DC15 for ; Fri, 28 Jun 2013 00:38:20 -0400 Received: by mail-oa0-f51.google.com with SMTP id i4so1807658oah.10 for ; Thu, 27 Jun 2013 21:38:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ja1toItevWQbzVVcs22k9VpBPDO3W+Vx2XYUfPGLkAc=; b=gRrJ6sE9mJlPytnfXqXBmkBC32WjGuxd4gCyjOJn+qaRzAFmWDsT1/uJgpF9Ca2aeF HLq6qWEzV2ZECeyXxbUt7UQGYeXxWQYp8JK4ro5a2ldajhT1pSF2P7I7GUBiO202Otl2 kQAg6FeFai9KYPPOq/RsB0MsIiwOctbsC2fpsWb8qTObgUR0Xkn6LMNRVYgSf7/ukBuV N4CurPM4xz7+I5V+qJjI8gUNPzVwSj/4KX9umrYdCV0V1tM2kWbnOC/QZ0fham+fp/J4 w5KlAKMIFad2OYIoz4yorXSUBX0qMCSEaElgiiBvvXGl8xSoO3sUZnWQmwmf3QXJtmyk UqUg== MIME-Version: 1.0 X-Received: by 10.60.36.230 with SMTP id t6mr4382616oej.39.1372394297130; Thu, 27 Jun 2013 21:38:17 -0700 (PDT) Received: by 10.182.65.102 with HTTP; Thu, 27 Jun 2013 21:38:17 -0700 (PDT) In-Reply-To: References: Date: Thu, 27 Jun 2013 21:38:17 -0700 Message-ID: To: Tjerk Anne Meesters Cc: Yasuo Ohgaki , PHP internals list Content-Type: multipart/alternative; boundary=089e01183f289285b204e02f7207 Subject: Re: [PHP-DEV] ENT_ALL or similar option for htmlspecialchars[_decode]? From: kris.craig@gmail.com (Kris Craig) --089e01183f289285b204e02f7207 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Jun 27, 2013 at 9:20 PM, Kris Craig wrote: > > > On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters wrote: > >> >> >> >> On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig wrote: >> >>> On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki >>> wrote: >>> >>> > >>> > 2013/6/27 Kris Craig >>> > >>> >> I just noticed that htmlspecialchars_decode doesn't convert entities >>> like >>> >> and . >>> >> >>> > >>> > I think htmlspecialchars_decode() only decodes >>> > >>> > ext/standard/html_tables.h >>> > static const entity_stage3_row stage3_table_be_apos_00000[] = { >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, { >>> > {"apos", 4} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>> > {NULL, 0} } }, >>> > {0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, { >>> > {NULL, 0} } }, >>> > }; >>> > >>> > IIRC >>> > I may be wrong. >>> > >>> > >>> >> Is there a bitmask I'm missing or are those simply not >>> >> supported right now? If the latter, any thoughts on adding something >>> >> along >>> >> the lines of ENT_ALL to convert all valid entities from/to their >>> >> respective >>> >> characters? >>> >> >>> > >>> > What you are looking for is html_entity_decode(), I think. >>> > >>> > $ php -n -r 'var_dump(html_entity_decode(" ="));' >>> > string(2) " >>> > =" >>> > >>> > >>> Yeah I tried html_entity_decode already, but it just returned NULL. On >>> the >>> same input string, htmlspecialchars_decode returned the input string but >>> with *some* special characters decoded; 10 and 13 ("\r\n", I think) were >>> >>> left in their encoded state. I'm not sure why there wouldn't be an >>> option >>> to decode all html special characters. >>> >> >> The html_entity_decode() function shouldn't return NULL, but even an >> empty string sounds like a bug, could you file a report for this and >> provide a reproducible test code? >> > > Yeah I admit it could be an empty string as opposed to NULL. I wasn't > using a var_dump() so I just assumed. > > I'll take another look at it and get those details. > > --Kris > > Ok I've confirmed what's happening. If I include and/or in the string argument passed to html_entities_decode, it returns an empty string, presumably because those entities are not recognized by the function. Here's what the manual says: If the input string contains an invalid code unit sequence within the given > encoding an empty string will be returned, unless either the ENT_IGNORE or > ENT_SUBSTITUTE flags are set. Can somebody explain why ENT_IGNORE isn't enabled by default? What's the use-case for having it return the entire string as empty simply because it contained one or more unrecognized entities? If anything, shouldn't it at least return FALSE instead? I would say that the bug here appears to be the fact that those valid entities are not currently recognized, which makes me curious as to whether or not there might be other valid entities that aren't supported, as well. --Kris --089e01183f289285b204e02f7207--