Hello,
I discovered today that anonymous class names contain a null byte right
after "class@anonymous". I don't think class names should contain
non-printable characters.
How about removing that null byte?
Regards, Niklas
Hello,
I discovered today that anonymous class names contain a null byte right
after "class@anonymous". I don't think class names should contain
non-printable characters.How about removing that null byte?
Regards, Niklas
It's also quite worrying that it is leaking memory addresses.
Hello,
I discovered today that anonymous class names contain a null byte right
after "class@anonymous". I don't think class names should contain
non-printable characters.How about removing that null byte?
Regards, Niklas
PHP uses null bytes quite a lot to produce deliberately illegal
identifiers. For instance the old eval-like create_function()
[e.g.
https://3v4l.org/hqHjh] and the serialization of private members [e.g.
https://3v4l.org/R6Y6k]
In this case, I guess the "@" in "class@anonymous" makes the name
illegal anyway, but I'm not sold on the null byte being more
unacceptable here than anywhere else.
Regards,
--
Rowan Collins
[IMSoP]
PHP uses null bytes quite a lot to produce deliberately illegal
identifiers. For instance the old eval-likecreate_function()
[e.g.
https://3v4l.org/hqHjh] and the serialization of private members [e.g.
https://3v4l.org/R6Y6k]In this case, I guess the "@" in "class@anonymous" makes the name illegal
anyway, but I'm not sold on the null byte being more unacceptable here than
anywhere else.Regards,
--
Rowan Collins
[IMSoP]That doesn't mean it's a good approach (cough namespaces cough), and
these bits of "magic" are supposed to be hidden away from users. I'm
guessing in this particular instance, the point of the null is to make
string operations cut off after "anonymous", however string operations that
respect the zval string length aren't going to do this.
e.g. var_dump()
the class name is put through sprintf and it cuts off at
the null, but get_class or ReflectionClass::getName() just returns the
original string, and exposes the implementation details.
Problem is that e.g. exception to string casts do not handle it properly,
there may be other affected areas.
Regards, Niklas
2015-11-05 16:14 GMT+01:00 Leigh leight@gmail.com:
On 5 November 2015 at 14:59, Rowan Collins rowan.collins@gmail.com
wrote:PHP uses null bytes quite a lot to produce deliberately illegal
identifiers. For instance the old eval-likecreate_function()
[e.g.
https://3v4l.org/hqHjh] and the serialization of private members [e.g.
https://3v4l.org/R6Y6k]In this case, I guess the "@" in "class@anonymous" makes the name
illegal
anyway, but I'm not sold on the null byte being more unacceptable here
than
anywhere else.Regards,
--
Rowan Collins
[IMSoP]That doesn't mean it's a good approach (cough namespaces cough), and
these bits of "magic" are supposed to be hidden away from users. I'm
guessing in this particular instance, the point of the null is to make
string operations cut off after "anonymous", however string operations that
respect the zval string length aren't going to do this.e.g.
var_dump()
the class name is put through sprintf and it cuts off at
the null, but get_class or ReflectionClass::getName() just returns the
original string, and exposes the implementation details.
On 5 November 2015 at 14:59, Rowan Collins rowan.collins@gmail.com
wrote:PHP uses null bytes quite a lot to produce deliberately illegal
identifiers. For instance the old eval-likecreate_function()
[e.g.
https://3v4l.org/hqHjh] and the serialization of private members [e.g.
https://3v4l.org/R6Y6k]In this case, I guess the "@" in "class@anonymous" makes the name
illegal
anyway, but I'm not sold on the null byte being more unacceptable here
than anywhere else.Regards,
--
Rowan Collins
[IMSoP]That doesn't mean it's a good approach (cough namespaces cough), and
these bits of "magic" are supposed to be hidden away from users. I'm
guessing in this particular instance, the point of the null is to make
string operations cut off after "anonymous", however string operations
that respect the zval string length aren't going to do this.e.g.
var_dump()
the class name is put through sprintf and it cuts off at
the null, but get_class or ReflectionClass::getName() just returns the
original string, and exposes the implementation details.
Internal names for anonymous classes need to be unique.
The current method of ensuring uniqueness ('\0' + filename + lexer pos) was
written by Dmitry[1], following from Joe's original implementation[2]. As I
understand it, this was supposed to be entirely hidden from userland; hence
the null byte.
So, I prepared a patch for get_class()
and ReflectionClass::getName()
,
which in my view should behave the same way as var_dump()
etc., but I've
now realised that ignoring the unique suffix from the class name will break
functionality that is otherwise desirable:
-
Aliasing an anonymous class... (see bug70106[3])
$instance = new class {};
$class_name = get_class($instance);
class_alias($class_name, 'MyAlias'); -
Creating a new anonymous class instance dynamically...
$instance = new class {};
$class_name = get_class($instance);
$new_instance = new $class_name;...although the
get_class()
used here isn't necessary, and you could
write= new $instance;
without ever knowing $instance's class.
Our options seem to be:
(A) Leave everything as it is. This puts us in a situation where anonymous
class names are treated inconsistently in userland; a situation that I
think could lead to considerable confusion about the "real" names of
these classes and how they should be used.
(B) Patch get_class()
and ReflectionClass::getName()
(any others?) to
strip the suffix (the '\0' and everything after it) from anonymous
class
names prior to display. This prevents userland code from distinguishing
between multiple anonymous classes; breaking functionality as described
above. To mitigate this, we could allow class_alias()
to accept
either
a class name OR an instance in the first argument. The new
operator
already supports an instance in place of a class name.
(C) Assuming that userland SHOULD have access to the internal names of
anonymous classes, remove the null byte so that the full names are
displayed everywhere. The drawback of this option is that class names
won't be as predictable. It occurs to me that a different naming scheme
could help here - wouldn't a monotonically increasing integer suffix be
enough? The current scheme seems like an awful waste of memory, too.
Option (B) seems like the best to me, because I can't really see a need to
know the unique suffix -- provided any use-cases are catered for by an
alternative (modifying class_alias()
should be enough).
Thoughts?
Steve
[1] http://git.php.net/?p=php-src.git;a=commitdiff;h=2a1b0b1#patch1
[2] http://git.php.net/?p=php-src.git;a=commitdiff;h=49608e0#patch12
[3] https://bugs.php.net/bug.php?id=70106
2015-11-09 16:27 GMT+01:00 Steven Hilder <
steven.hilder@sevenpercent.solutions>:
On 5 November 2015 at 14:59, Rowan Collins rowan.collins@gmail.com
wrote:PHP uses null bytes quite a lot to produce deliberately illegal
identifiers. For instance the old eval-likecreate_function()
[e.g.
https://3v4l.org/hqHjh] and the serialization of private members [e.g.
https://3v4l.org/R6Y6k]In this case, I guess the "@" in "class@anonymous" makes the name
illegal
anyway, but I'm not sold on the null byte being more unacceptable here
than anywhere else.Regards,
--
Rowan Collins
[IMSoP]That doesn't mean it's a good approach (cough namespaces cough), and
these bits of "magic" are supposed to be hidden away from users. I'm
guessing in this particular instance, the point of the null is to make
string operations cut off after "anonymous", however string operations
that respect the zval string length aren't going to do this.e.g.
var_dump()
the class name is put through sprintf and it cuts off at
the null, but get_class or ReflectionClass::getName() just returns the
original string, and exposes the implementation details.Internal names for anonymous classes need to be unique.
The current method of ensuring uniqueness ('\0' + filename + lexer pos) was
written by Dmitry[1], following from Joe's original implementation[2]. As I
understand it, this was supposed to be entirely hidden from userland; hence
the null byte.So, I prepared a patch for
get_class()
andReflectionClass::getName()
,
which in my view should behave the same way asvar_dump()
etc., but I've
now realised that ignoring the unique suffix from the class name will break
functionality that is otherwise desirable:
Aliasing an anonymous class... (see bug70106[3])
$instance = new class {};
$class_name = get_class($instance);
class_alias($class_name, 'MyAlias');Creating a new anonymous class instance dynamically...
$instance = new class {};
$class_name = get_class($instance);
$new_instance = new $class_name;...although the
get_class()
used here isn't necessary, and you could
write= new $instance;
without ever knowing $instance's class.Our options seem to be:
(A) Leave everything as it is. This puts us in a situation where anonymous
class names are treated inconsistently in userland; a situation that I
think could lead to considerable confusion about the "real" names of
these classes and how they should be used.(B) Patch
get_class()
andReflectionClass::getName()
(any others?) to
strip the suffix (the '\0' and everything after it) from anonymous
class
names prior to display. This prevents userland code from distinguishing
between multiple anonymous classes; breaking functionality as described
above. To mitigate this, we could allowclass_alias()
to accept
either
a class name OR an instance in the first argument. Thenew
operator
already supports an instance in place of a class name.(C) Assuming that userland SHOULD have access to the internal names of
anonymous classes, remove the null byte so that the full names are
displayed everywhere. The drawback of this option is that class names
won't be as predictable. It occurs to me that a different naming scheme
could help here - wouldn't a monotonically increasing integer suffix be
enough? The current scheme seems like an awful waste of memory, too.Option (B) seems like the best to me, because I can't really see a need to
know the unique suffix -- provided any use-cases are catered for by an
alternative (modifyingclass_alias()
should be enough).Thoughts?
Steve
[1] http://git.php.net/?p=php-src.git;a=commitdiff;h=2a1b0b1#patch1
[2] http://git.php.net/?p=php-src.git;a=commitdiff;h=49608e0#patch12
[3] https://bugs.php.net/bug.php?id=70106
Having the path info is quite useful for debugging purposes.
Regards, Niklas
Having the path info is quite useful for debugging purposes.
Regards, Niklas
It is, but it will always still be available from
ReflectionClass::getFileName()
2015-11-09 17:46 GMT+01:00 Leigh leight@gmail.com:
Having the path info is quite useful for debugging purposes.
Regards, Niklas
It is, but it will always still be available from
ReflectionClass::getFileName()
But most implementations just use something like:
$type = is_object($x) ? get_class($x) : gettype($x);
This results in non-useful debugging info when the path gets stripped.
On 9 November 2015 at 15:27, Steven Hilder
So, I prepared a patch for
get_class()
andReflectionClass::getName()
,
which in my view should behave the same way asvar_dump()
etc., but I've
now realised that ignoring the unique suffix from the class name will break
functionality that is otherwise desirable:
Can you share your patch?
It should be possible to return the sanitised name without removing it
internally.(would have to cover the get_parent_class and getParentClass
versions too, and have a quick audit of other places class names can be
spat out.)
Can we at least get the memory address hidden (non-debug builds only
perhaps? - var_dump appends a #n why not just use this number.)
On 9 November 2015 at 15:27, Steven Hilder
Can you share your patch?
See
https://github.com/php/php-src/compare/master...stevenhilder:hide-anon-class-suffix
Feedback very welcome :)
It should be possible to return the sanitised name without removing it
internally.(would have to cover the get_parent_class and getParentClass
versions too, and have a quick audit of other places class names can be
spat out.)
Yes, that's what I had in mind. The class names are sanitised only for
display in userland; I haven't changed the way that the names are generated
or stored internally.
I've applied the change to get_parent_class()
and get_called_class()
.
Nothing needed to be done for ReflectionClass::getParentClass()
because
it
returns another ReflectionClass
object, not the parent's name.
The __CLASS__
magic constant and ::class
syntax also needed changes.
The
patch includes a test which shows all of the scenarios that I've accounted
for. Please do let me know if you can think of any others.
I've also implemented the change to class_alias()
as I described in my
previous email.
Can we at least get the memory address hidden (non-debug builds only
perhaps? - var_dump appends a #n why not just use this number.)
It isn't possible to use the number that var_dump()
shows, because the
names for anonymous classes are generated during compile-time, so the
handle
isn't available yet.
How about removing that null byte?
I would suggest replacing it with something else. As I am making this
work
for Xdebug, I really need to ship the full name to the IDE so that they
can later do look-ups on anonymous class properties, and with the \0 the
IDEs seem to fuck this up. I suggest using another @ instead of the \0.
Derick - forgive me, I'm not at all familiar with how Xdebug+IDE
integration
works, but can you please elaborate a little on the following...?
-
Since Xdebug is an extension, why does the display of class names in
userland bear any impact? -
If IDEs had access to the filename portion of the class name, how would
that help them to lookup that specific class (since there could be many
anonymous classes defined in that one file)? -
Even if IDEs had access to the full anonymous class name - including the
(unpredictable) memory address of the lexer position at compile-time, how
would an IDE have any better ability to lookup the correct class?
Kind regards,
Steve
On Tue, Nov 10, 2015 at 11:04 PM, Steven Hilder <
steven.hilder@sevenpercent.solutions> wrote:
On 9 November 2015 at 15:27, Steven Hilder
Can you share your patch?See
https://github.com/php/php-src/compare/master...stevenhilder:hide-anon-class-suffixFeedback very welcome :)
It should be possible to return the sanitised name without removing it
internally.(would have to cover the get_parent_class and getParentClass
versions too, and have a quick audit of other places class names can be
spat out.)Yes, that's what I had in mind. The class names are sanitised only for
display in userland; I haven't changed the way that the names are generated
or stored internally.I've applied the change to
get_parent_class()
andget_called_class()
.
Nothing needed to be done forReflectionClass::getParentClass()
because
it
returns anotherReflectionClass
object, not the parent's name.The
__CLASS__
magic constant and::class
syntax also needed changes.
The
patch includes a test which shows all of the scenarios that I've accounted
for. Please do let me know if you can think of any others.I've also implemented the change to
class_alias()
as I described in my
previous email.
I'd like to clarify some things here from the perspective of the people who
implemented this naming scheme.
The naming was chosen to be this way intentionally, to combine two
behaviors we would like to have:
a) Things like get_class()
MUST return a class name that uniquely
identifies the anonymous class. You should be able to instantiate a new
instance of the same class based on it, etc. Having these return truncated
names is an absolute no-go, because it would require people to explicitly
handle anonymous classes in many context that currently "just work".
b) We do not want to display the fully mangled name of an anonymous class
in error messages or debug dumps. While the information contained there can
be useful, it also significantly bloats outputs. Furthermore we will likely
change the exact naming structure in the future, in which case the
anonymous class name will no longer contain any directly useful information
(maybe some hash).
Using a NUL byte conveniently achieves both of these goals. We did not
however take into account that this would cause issues with 3rd party
tooling that does not support binary data.
As such, we may have to change the naming. However we must retain goal a)
and would like to retain b). To do this we could use names along the lines
of class@anonymous$1, where the uniquely-identifying information is short
enough to be no bother when displayed. Generating these kinds of names is
non-trivial if you consider caching and may require some changes to the way
we perform class bindings. (As a side bonus, depending on how this is done,
we might fix https://bugs.php.net/bug.php?id=64291 as well.)
Not sure if this is still in scope for PHP 7.0.
Nikita
Using a NUL byte conveniently achieves both of these goals. We did not
however take into account that this would cause issues with 3rd party
tooling that does not support binary data.
I withdraw my comment. It does not cause issues for me (anymore). Having
the NUL is actually making things easier.
cheers,
Derick
2015-11-11 14:44 GMT+01:00 Derick Rethans derick@php.net:
Using a NUL byte conveniently achieves both of these goals. We did not
however take into account that this would cause issues with 3rd party
tooling that does not support binary data.I withdraw my comment. It does not cause issues for me (anymore). Having
the NUL is actually making things easier.cheers,
Derick
Semi-Related: NUL is especially an issue in exception messages in
combination with get_class, because they truncate the messages:
https://bugs.php.net/bug.php?id=70894
Regards, Niklas
I discovered today that anonymous class names contain a null byte
right after "class@anonymous". I don't think class names should
contain non-printable characters.How about removing that null byte?
I would suggest replacing it with something else. As I am making this
work for Xdebug, I really need to ship the full name to the IDE so that
they can later do look-ups on anonymous class properties, and with the
\0 the IDEs seem to fuck this up. I suggest using another @ instead of
the \0.
cheers,
Derick
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
I discovered today that anonymous class names contain a null byte
right after "class@anonymous". I don't think class names should
contain non-printable characters.How about removing that null byte?
I would suggest replacing it with something else. As I am making this
work for Xdebug, I really need to ship the full name to the IDE so that
they can later do look-ups on anonymous class properties, and with the
\0 the IDEs seem to fuck this up. I suggest using another @ instead of
the \0.
Just to add another voice: +1 for this (or something similar). Having
the \0 can be worked around, but just not having to deal with it at
all is a hell of a lot easier for those of us working on third party
tooling.
Adam
Hi Derick,
Derick Rethans wrote:
I would suggest replacing it with something else. As I am making this
work for Xdebug, I really need to ship the full name to the IDE so that
they can later do look-ups on anonymous class properties, and with the
\0 the IDEs seem to fuck this up. I suggest using another @ instead of
the \0.
This is mostly tangential, but I note that we use "{closure}" as the
name of anonymous functions. So, maybe it should be "{class}@..." for
anonymous classes? It's at least semi-consistent...
Thanks.
Andrea Faulds
http://ajf.me/
This is mostly tangential, but I note that we use "{closure}" as the name
of anonymous functions. So, maybe it should be "{class}@..." for
anonymous
classes? It's at least semi-consistent...
Hi Andrea,
Only slightly tangential - I'm interested in your input on this issue, and
I'm keen to get this sorted out before 7.0 launches, so that we don't get
stuck with something suboptimal until 8.0 because of BC concerns.
To be honest, the original meaning of the class names has been lost since
Phil and Joe's RFC[1]. It had originally been class@0x[addr]
, which had a
reasonable semantic interpretation of "the (otherwise anonymous) class
which
can be found at the address [addr]". Dmitry added[2] the null byte and the
filename, but he did so between the '@' and the '0x[addr]', with
'anonymous'
added (just for good measure?). At this point, the purpose of the
'@'-symbol
was gone, in my opinion.
Furthermore, the RFC allowed for the portion of the name before the '@' to
identify the parent class:
class mine {}
new class extends mine {};
This class name will be “mine@0x7fc4be471000”.
This behaviour was present in Joe's original implementation, but later
removed[3] by Nikita. His commit message says that this was "dead code",
but
I don't see what was wrong with it.
The result of all of this is that we have a naming convention for anonymous
classes that is not only inconsistent with an existing feature of the
language, but has little in the way of reasoning behind its current format.
My patch[4] aims to save userland from this mess by hiding the entire
suffix
wherever anonymous classes' names are to be displayed. This means that all
anonymous classes will show as "class@anonymous" (which I'd happily change
to "{class}" in keeping with "{closure}") in userland, with no way of
retrieving the internal name; not even using ReflectionClass
. My
reasoning
is that users should never need to know the underlying names; if a class
has
a distinct name, then it's not really anonymous, is it?
Kind regards,
Steve
[1] https://wiki.php.net/rfc/anonymous_classes#internal_class_naming
[2] http://git.php.net/?p=php-src.git;a=commitdiff;h=2a1b0b1#patch1
[3] http://git.php.net/?p=php-src.git;a=commitdiff;h=4a7061b#patch1
[4]
https://github.com/php/php-src/compare/master...stevenhilder:hide-anon-class-suffix
Steven Hilder wrote on 11/11/2015 02:13:
This behaviour was present in Joe's original implementation, but later
removed[3] by Nikita. His commit message says that this was "dead
code", but
I don't see what was wrong with it.
It looks to me like 'decl->name' will never be set for an anonymous
class, so the old code was just passing a null into
zend_name_anon_class(), which is pointless (hence "dead" code, i.e.
"unreachable").
I guess what it should have been passed (maybe was in previous
revisions) was extends_name, which is only extracted from the AST much
further down the function.
http://git.php.net/?p=php-src.git;a=blob;f=Zend/zend_compile.c;hb=HEAD#l5315