Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67999 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 73278 invoked from network); 28 Jun 2013 05:22:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Jun 2013 05:22:01 -0000 Authentication-Results: pb1.pair.com header.from=glopes@nebm.ist.utl.pt; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=glopes@nebm.ist.utl.pt; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain nebm.ist.utl.pt from 193.136.128.21 cause and error) X-PHP-List-Original-Sender: glopes@nebm.ist.utl.pt X-Host-Fingerprint: 193.136.128.21 smtp1.ist.utl.pt Linux 2.6 Received: from [193.136.128.21] ([193.136.128.21:33178] helo=smtp1.ist.utl.pt) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C8/BA-34034-67D1DC15 for ; Fri, 28 Jun 2013 01:21:59 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp1.ist.utl.pt (Postfix) with ESMTP id 2763F70003E3 for ; Fri, 28 Jun 2013 06:21:55 +0100 (WEST) X-Virus-Scanned: by amavisd-new-2.6.4 (20090625) (Debian) at ist.utl.pt Received: from smtp1.ist.utl.pt ([127.0.0.1]) by localhost (smtp1.ist.utl.pt [127.0.0.1]) (amavisd-new, port 10025) with LMTP id 24TGWypDjMqf for ; Fri, 28 Jun 2013 06:21:54 +0100 (WEST) Received: from nebm.ist.utl.pt (unknown [IPv6:2001:690:2100:4::58:1]) by smtp1.ist.utl.pt (Postfix) with ESMTP id CCFED7000450 for ; Fri, 28 Jun 2013 06:21:54 +0100 (WEST) Received: from localhost ([127.0.0.1] helo=nebm.ist.utl.pt) by nebm.ist.utl.pt with esmtp (Exim 4.72) (envelope-from ) id 1UsR86-0001MM-OU for internals@lists.php.net; Fri, 28 Jun 2013 06:21:54 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 28 Jun 2013 07:21:54 +0200 To: Organization: =?UTF-8?Q?N=C3=BAcleo_de_Engenharia_Biom=C3=A9dica_do_Insti?= =?UTF-8?Q?tuto_Superior_T=C3=A9cnico?= In-Reply-To: References: Message-ID: <682a8bb626dd810bed4112eb939d14c3@nebm.ist.utl.pt> X-Sender: glopes@nebm.ist.utl.pt User-Agent: RoundCube Webmail/0.8-rc Subject: Re: [PHP-DEV] =?UTF-8?Q?ENT=5FALL=20or=20similar=20option=20for?= =?UTF-8?Q?=20htmlspecialchars=5B=5Fdecode=5D=3F?= From: glopes@nebm.ist.utl.pt (Gustavo Lopes) Em 2013-06-28 4:10, Kris Craig escreveu: > On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki > wrote: > >> 2013/6/27 Kris Craig >> >>> Yeah I tried html_entity_decode already, but it just returned NULL. >>> On >>> the same input string, htmlspecialchars_decode returned the input >>> string >>> but with *some* special characters decoded; 10 and 13 ("\r\n", I >>> think) >>> were left in their encoded state. I'm not sure why there wouldn't >>> be an >>> option to decode all html special characters. >>> You are missing the design purpose of htmlspecialchars_decode and html_entity_decode. Thruth is, they are not useful as they might seem. Their purpose is not to decode all the entities, like a browser would do. We do not implement anything approaching the sort parsing a browser would do; for instance, html 5 says you should accept certain entities not terminated with ; and parse the stream in a certain way and we don't do it at all. The purpose of those two functions is just to provide something approaching an inverse function for htmlspecialchars() and htmlentities(). html_entity_decode() has somewhat deviated from this (for instance, it decodes all numeric entites), but I think this should nevertheless be the proper way one should think about those two functions. >> >> Not only HTML entities, we really needs to add several >> decoder/encoder to >> core. >> For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc. >> I hope someone is working on it :) >> > > Would you be interested in co-authoring an RFC with me for this? > See http://php.net/manual/en/transliterator.transliterate.php For HTML entities, out of the box, only a transliterator for numeric entities is provided (hex-any/XML10), but you can easily build your ruleset for the named entities. The performance will be below of that of a dedicated algorithm, though. And it only supports UTF-8. -- Gustavo Lopes