Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67996 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 68053 invoked from network); 28 Jun 2013 04:47:05 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Jun 2013 04:47:05 -0000 Authentication-Results: pb1.pair.com smtp.mail=tjerk.meesters@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=tjerk.meesters@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.182 as permitted sender) X-PHP-List-Original-Sender: tjerk.meesters@gmail.com X-Host-Fingerprint: 209.85.216.182 mail-qc0-f182.google.com Received: from [209.85.216.182] ([209.85.216.182:33311] helo=mail-qc0-f182.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 88/99-34034-7451DC15 for ; Fri, 28 Jun 2013 00:47:04 -0400 Received: by mail-qc0-f182.google.com with SMTP id e10so1099780qcy.27 for ; Thu, 27 Jun 2013 21:47:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=4JMfmOgV9+TWcS7rKVA7O5QigvBPr9d3xqO2WQ2+6ls=; b=EEtfBtiw5fdtCjvgqyY35X8c2inxc/mPXErL4jQkXWOMIRSxj4HZZEuQ6Yt+ksIqto CaGul5IG+Q1ojhY/5sL9xYnZ65qdf3mBlOPeEji21IudPTIrXfPpcJ7lvtAqCjDOHxBc M4EJtErI/9EdOdkm7pBVRMWYRAFPPf71APYu5NLXq2Oh3lWRQO9eErl/E6liEv3MEE8c f87vFqxBHjGD1xmwFpOfMqvJWbqIsV+ALGYJkNkLXlhS3k8RwuDpIHjTGEPZj4IXd/Xv jVQctpYZBnk5FaEFtzUFDxMubF0npekc1sJ4mwjD+AQ6K1rSzZa5s+5DI4HrxOTTfREV b2VQ== MIME-Version: 1.0 X-Received: by 10.229.191.193 with SMTP id dn1mr3556752qcb.61.1372394821445; Thu, 27 Jun 2013 21:47:01 -0700 (PDT) Sender: tjerk.meesters@gmail.com Received: by 10.49.99.67 with HTTP; Thu, 27 Jun 2013 21:47:01 -0700 (PDT) In-Reply-To: References: Date: Fri, 28 Jun 2013 12:47:01 +0800 X-Google-Sender-Auth: uNNpUe5X2GIK2NXAP_C8XpQJZfg Message-ID: To: Kris Craig Cc: Yasuo Ohgaki , PHP internals list Content-Type: multipart/alternative; boundary=90e6ba308cacd30ea804e02f91f8 Subject: Re: [PHP-DEV] ENT_ALL or similar option for htmlspecialchars[_decode]? From: datibbaw@php.net (Tjerk Anne Meesters) --90e6ba308cacd30ea804e02f91f8 Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 28, 2013 at 12:38 PM, Kris Craig wrote: > > > On Thu, Jun 27, 2013 at 9:20 PM, Kris Craig wrote: > >> >> >> On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters wrote: >> >>> >>> >>> >>> On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig wrote: >>> >>>> On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki >>>> wrote: >>>> >>>> > >>>> > 2013/6/27 Kris Craig >>>> > >>>> >> I just noticed that htmlspecialchars_decode doesn't convert entities >>>> like >>>> >> and . >>>> >> >>>> > >>>> > I think htmlspecialchars_decode() only decodes >>>> > >>>> > ext/standard/html_tables.h >>>> > static const entity_stage3_row stage3_table_be_apos_00000[] = { >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, >>>> { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, { >>>> > {"apos", 4} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { >>>> > {NULL, 0} } }, >>>> > {0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, { >>>> > {NULL, 0} } }, >>>> > }; >>>> > >>>> > IIRC >>>> > I may be wrong. >>>> > >>>> > >>>> >> Is there a bitmask I'm missing or are those simply not >>>> >> supported right now? If the latter, any thoughts on adding something >>>> >> along >>>> >> the lines of ENT_ALL to convert all valid entities from/to their >>>> >> respective >>>> >> characters? >>>> >> >>>> > >>>> > What you are looking for is html_entity_decode(), I think. >>>> > >>>> > $ php -n -r 'var_dump(html_entity_decode(" ="));' >>>> > string(2) " >>>> > =" >>>> > >>>> > >>>> Yeah I tried html_entity_decode already, but it just returned NULL. On >>>> the >>>> same input string, htmlspecialchars_decode returned the input string but >>>> with *some* special characters decoded; 10 and 13 ("\r\n", I think) were >>>> >>>> left in their encoded state. I'm not sure why there wouldn't be an >>>> option >>>> to decode all html special characters. >>>> >>> >>> The html_entity_decode() function shouldn't return NULL, but even an >>> empty string sounds like a bug, could you file a report for this and >>> provide a reproducible test code? >>> >> >> Yeah I admit it could be an empty string as opposed to NULL. I wasn't >> using a var_dump() so I just assumed. >> >> I'll take another look at it and get those details. >> >> --Kris >> >> > Ok I've confirmed what's happening. If I include and/or in > the string argument passed to html_entities_decode, it returns an empty > string, presumably because those entities are not recognized by the > function. Here's what the manual says: > You might want to be a bit more specific, because this code works fine across most versions: http://3v4l.org/dan3Q > > If the input string contains an invalid code unit sequence within the >> given encoding an empty string will be returned, unless either the >> ENT_IGNORE or ENT_SUBSTITUTE flags are set. > > This is the manual page for htmlentities(), which is (one of) the reverse operations of html_entity_decode(). > > Can somebody explain why ENT_IGNORE isn't enabled by default? What's the > use-case for having it return the entire string as empty simply because it > contained one or more unrecognized entities? If anything, shouldn't it at > least return FALSE instead? > > I would say that the bug here appears to be the fact that those valid > entities are not currently recognized, which makes me curious as to whether > or not there might be other valid entities that aren't supported, as well. > > --Kris > > -- -- Tjerk --90e6ba308cacd30ea804e02f91f8--