I just noticed that htmlspecialchars_decode doesn't convert entities like

 and 
. Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something along
the lines of ENT_ALL to convert all valid entities from/to their respective
characters?
--Kris
2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities like

 and 
.
I think htmlspecialchars_decode()
only decodes
ext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL,
0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, { {NULL,
0} } },
};
IIRC
I may be wrong.
Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something along
the lines of ENT_ALL to convert all valid entities from/to their respective
characters?
What you are looking for is html_entity_decode()
, I think.
$ php -n -r 'var_dump(html_entity_decode("
="));'
string(2) "
="
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities like

 and 
.I think
htmlspecialchars_decode()
only decodesext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
{NULL, 0} } },
};IIRC
I may be wrong.Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something
along
the lines of ENT_ALL to convert all valid entities from/to their
respective
characters?What you are looking for is
html_entity_decode()
, I think.$ php -n -r 'var_dump(html_entity_decode(" ="));'
string(2) "
="
Yeah I tried html_entity_decode already, but it just returned NULL. On the
same input string, htmlspecialchars_decode returned the input string but
with some special characters decoded; 10 and 13 ("\r\n", I think) were
left in their encoded state. I'm not sure why there wouldn't be an option
to decode all html special characters.
--Kris
Hi Kris,
2013/6/27 Kris Craig kris.craig@gmail.com
Yeah I tried html_entity_decode already, but it just returned NULL. On
the same input string, htmlspecialchars_decode returned the input string
but with some special characters decoded; 10 and 13 ("\r\n", I think)
were left in their encoded state. I'm not sure why there wouldn't be an
option to decode all html special characters.
Not only HTML entities, we really needs to add several decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Kris,
2013/6/27 Kris Craig kris.craig@gmail.com
Yeah I tried html_entity_decode already, but it just returned NULL. On
the same input string, htmlspecialchars_decode returned the input string
but with some special characters decoded; 10 and 13 ("\r\n", I think)
were left in their encoded state. I'm not sure why there wouldn't be an
option to decode all html special characters.Not only HTML entities, we really needs to add several decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)
Would you be interested in co-authoring an RFC with me for this?
--Kris
Em 2013-06-28 4:10, Kris Craig escreveu:
On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:2013/6/27 Kris Craig kris.craig@gmail.com
Yeah I tried html_entity_decode already, but it just returned NULL.
On
the same input string, htmlspecialchars_decode returned the input
string
but with some special characters decoded; 10 and 13 ("\r\n", I
think)
were left in their encoded state. I'm not sure why there wouldn't
be an
option to decode all html special characters.
You are missing the design purpose of htmlspecialchars_decode and
html_entity_decode. Thruth is, they are not useful as they might seem.
Their purpose is not to decode all the entities, like a browser would
do. We do not implement anything approaching the sort parsing a browser
would do; for instance, html 5 says you should accept certain entities
not terminated with ; and parse the stream in a certain way and we don't
do it at all. The purpose of those two functions is just to provide
something approaching an inverse function for htmlspecialchars()
and
htmlentities()
. html_entity_decode()
has somewhat deviated from this
(for instance, it decodes all numeric entites), but I think this should
nevertheless be the proper way one should think about those two
functions.
Not only HTML entities, we really needs to add several
decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)Would you be interested in co-authoring an RFC with me for this?
See http://php.net/manual/en/transliterator.transliterate.php For HTML
entities, out of the box, only a transliterator for numeric entities is
provided (hex-any/XML10), but you can easily build your ruleset for the
named entities. The performance will be below of that of a dedicated
algorithm, though. And it only supports UTF-8.
--
Gustavo Lopes
2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities
like

 and 
.I think
htmlspecialchars_decode()
only decodesext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
{NULL, 0} } },
};IIRC
I may be wrong.Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something
along
the lines of ENT_ALL to convert all valid entities from/to their
respective
characters?What you are looking for is
html_entity_decode()
, I think.$ php -n -r 'var_dump(html_entity_decode(" ="));'
string(2) "
="Yeah I tried html_entity_decode already, but it just returned NULL. On the
same input string, htmlspecialchars_decode returned the input string but
with some special characters decoded; 10 and 13 ("\r\n", I think) were
left in their encoded state. I'm not sure why there wouldn't be an option
to decode all html special characters.
The html_entity_decode()
function shouldn't return NULL, but even an empty
string sounds like a bug, could you file a report for this and provide a
reproducible test code?
--Kris
--
Tjerk
On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters datibbaw@php.netwrote:
On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities
like

 and 
.I think
htmlspecialchars_decode()
only decodesext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
{NULL, 0} } },
};IIRC
I may be wrong.Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something
along
the lines of ENT_ALL to convert all valid entities from/to their
respective
characters?What you are looking for is
html_entity_decode()
, I think.$ php -n -r 'var_dump(html_entity_decode(" ="));'
string(2) "
="Yeah I tried html_entity_decode already, but it just returned NULL. On
the
same input string, htmlspecialchars_decode returned the input string but
with some special characters decoded; 10 and 13 ("\r\n", I think) wereleft in their encoded state. I'm not sure why there wouldn't be an option
to decode all html special characters.The
html_entity_decode()
function shouldn't return NULL, but even an empty
string sounds like a bug, could you file a report for this and provide a
reproducible test code?
Yeah I admit it could be an empty string as opposed to NULL. I wasn't
using a var_dump()
so I just assumed.
I'll take another look at it and get those details.
--Kris
On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters datibbaw@php.netwrote:
On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities
like

 and 
.I think
htmlspecialchars_decode()
only decodesext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
{NULL, 0} } },
};IIRC
I may be wrong.Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something
along
the lines of ENT_ALL to convert all valid entities from/to their
respective
characters?What you are looking for is
html_entity_decode()
, I think.$ php -n -r 'var_dump(html_entity_decode(" ="));'
string(2) "
="Yeah I tried html_entity_decode already, but it just returned NULL. On
the
same input string, htmlspecialchars_decode returned the input string but
with some special characters decoded; 10 and 13 ("\r\n", I think) wereleft in their encoded state. I'm not sure why there wouldn't be an
option
to decode all html special characters.The
html_entity_decode()
function shouldn't return NULL, but even an
empty string sounds like a bug, could you file a report for this and
provide a reproducible test code?Yeah I admit it could be an empty string as opposed to NULL. I wasn't
using avar_dump()
so I just assumed.I'll take another look at it and get those details.
--Kris
Ok I've confirmed what's happening. If I include
and/or
in the
string argument passed to html_entities_decode, it returns an empty string,
presumably because those entities are not recognized by the function.
Here's what the manual says:
If the input string contains an invalid code unit sequence within the given
encoding an empty string will be returned, unless either the
ENT_IGNORE
or
ENT_SUBSTITUTE
flags are set.
Can somebody explain why ENT_IGNORE
isn't enabled by default? What's the
use-case for having it return the entire string as empty simply because it
contained one or more unrecognized entities? If anything, shouldn't it at
least return FALSE
instead?
I would say that the bug here appears to be the fact that those valid
entities are not currently recognized, which makes me curious as to whether
or not there might be other valid entities that aren't supported, as well.
--Kris
On Thu, Jun 27, 2013 at 7:54 PM, Tjerk Anne Meesters datibbaw@php.netwrote:
On Thu, Jun 27, 2013 at 4:42 PM, Kris Craig kris.craig@gmail.comwrote:
On Thu, Jun 27, 2013 at 12:03 AM, Yasuo Ohgaki yohgaki@ohgaki.net
wrote:2013/6/27 Kris Craig kris.craig@gmail.com
I just noticed that htmlspecialchars_decode doesn't convert entities
like

 and 
.I think
htmlspecialchars_decode()
only decodesext/standard/html_tables.h
static const entity_stage3_row stage3_table_be_apos_00000[] = {
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"quot", 4} } }, {0,
{
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {"amp", 3} } }, {0, {
{"apos", 4} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, { {NULL, 0} } }, {0, {
{NULL, 0} } },
{0, { {"lt", 2} } }, {0, { {NULL, 0} } }, {0, { {"gt", 2} } }, {0, {
{NULL, 0} } },
};IIRC
I may be wrong.Is there a bitmask I'm missing or are those simply not
supported right now? If the latter, any thoughts on adding something
along
the lines of ENT_ALL to convert all valid entities from/to their
respective
characters?What you are looking for is
html_entity_decode()
, I think.$ php -n -r 'var_dump(html_entity_decode(" ="));'
string(2) "
="Yeah I tried html_entity_decode already, but it just returned NULL. On
the
same input string, htmlspecialchars_decode returned the input string but
with some special characters decoded; 10 and 13 ("\r\n", I think) wereleft in their encoded state. I'm not sure why there wouldn't be an
option
to decode all html special characters.The
html_entity_decode()
function shouldn't return NULL, but even an
empty string sounds like a bug, could you file a report for this and
provide a reproducible test code?Yeah I admit it could be an empty string as opposed to NULL. I wasn't
using avar_dump()
so I just assumed.I'll take another look at it and get those details.
--Kris
Ok I've confirmed what's happening. If I include and/or in
the string argument passed to html_entities_decode, it returns an empty
string, presumably because those entities are not recognized by the
function. Here's what the manual says:
You might want to be a bit more specific, because this code works fine
across most versions:
If the input string contains an invalid code unit sequence within the
given encoding an empty string will be returned, unless either the
ENT_IGNORE
orENT_SUBSTITUTE
flags are set.
This is the manual page for htmlentities()
, which is (one of) the reverse
operations of html_entity_decode()
.
Can somebody explain why
ENT_IGNORE
isn't enabled by default? What's the
use-case for having it return the entire string as empty simply because it
contained one or more unrecognized entities? If anything, shouldn't it at
least returnFALSE
instead?I would say that the bug here appears to be the fact that those valid
entities are not currently recognized, which makes me curious as to whether
or not there might be other valid entities that aren't supported, as well.--Kris
--
Tjerk