[RFC] token_get_all() TOKEN_AS_OBJECT mode

5 years ago by Larry Garfield — view source

unread

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

I love everything about this.

I would agree with Nicolas that a static constructor would be better. I don't know about polyfilling it, but it's definitely more self-descriptive.
I'm skeptical about the methods. I can see them being useful, but also being bikeshed material. For instance, if you're doing annotation parsing then docblocks are not ignorable. They're what you're actually looking for.

Two possible additions, feel free to ignore if they're too complicated:

Should it return an array of token objects, or a lazy iterable? If I'm only interested in certain types (eg, doc strings, classes, etc.) then a lazy iterable would allow me to string some filter and map operations on to it and use even less memory overall, since the whole tree is not in memory at once.
Rather than provide bikesheddable methods, would it be feasible to take a queue from PDO and let users specify a subclass of PhpToken to fetch into? That way the properties are always there, but a user can attach whatever methods make sense for them.

IMO the laziness would be more valuable, since a struct class can be operated on by an external function just as easily as a method, especially if using a functional pipeline.

--Larry Garfield

5 years ago by Nikita Popov — view source

unread

On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield larry@garfieldtech.com
wrote:

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to
it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

I love everything about this.

I would agree with Nicolas that a static constructor would be better.
I don't know about polyfilling it, but it's definitely more
self-descriptive.

I'm skeptical about the methods. I can see them being useful, but also
being bikeshed material. For instance, if you're doing annotation parsing
then docblocks are not ignorable. They're what you're actually looking for.

Two possible additions, feel free to ignore if they're too complicated:

Should it return an array of token objects, or a lazy iterable? If I'm
only interested in certain types (eg, doc strings, classes, etc.) then a
lazy iterable would allow me to string some filter and map operations on to
it and use even less memory overall, since the whole tree is not in memory
at once.

I'm going to take you up on your offer and ignore this one :P Returning
tokens as an iterator is inefficient because it requires full lexer state
backups and restores for each token. Could be optimized, but I wouldn't
bother with it for this feature. I also personally have no use-case for a
lazy token stream. (It's technically sufficient for parsing, but if you
want to preserve formatting, you're going to be preserving all the tokens
anyway.)

Rather than provide bikesheddable methods, would it be feasible to take
a queue from PDO and let users specify a subclass of PhpToken to fetch
into? That way the properties are always there, but a user can attach
whatever methods make sense for them.

It would be technically feasible. If we go with a static method for
construction, then one might even say that there's reasonable expectation
that PhpToken::getAll(...) is going to return PhpToken[] and
MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[].

I'm a bit apprehensive about this though, specifically because you mention
PDO... which, I think, isn't exactly a success story when it comes to this.
If we do this, then the behavior would be that the object gets created, the
properties populated, and no constructor gets called. The last part is
important -- when you start calling constructors and magic methods, that's
where the mess starts and you get PDO.

Regards,
Nikita

5 years ago by Nikita Popov — view source

unread

On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield larry@garfieldtech.com
wrote:

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to
it.
If this generates too much bikeshed, I'll drop methods from the
proposal.

Regards,
Nikita

I love everything about this.

I would agree with Nicolas that a static constructor would be better.
I don't know about polyfilling it, but it's definitely more
self-descriptive.

I'm skeptical about the methods. I can see them being useful, but
also being bikeshed material. For instance, if you're doing annotation
parsing then docblocks are not ignorable. They're what you're actually
looking for.

Two possible additions, feel free to ignore if they're too complicated:

Should it return an array of token objects, or a lazy iterable? If
I'm only interested in certain types (eg, doc strings, classes, etc.) then
a lazy iterable would allow me to string some filter and map operations on
to it and use even less memory overall, since the whole tree is not in
memory at once.

I'm going to take you up on your offer and ignore this one :P Returning
tokens as an iterator is inefficient because it requires full lexer state
backups and restores for each token. Could be optimized, but I wouldn't
bother with it for this feature. I also personally have no use-case for a
lazy token stream. (It's technically sufficient for parsing, but if you
want to preserve formatting, you're going to be preserving all the tokens
anyway.)

Rather than provide bikesheddable methods, would it be feasible to
take a queue from PDO and let users specify a subclass of PhpToken to fetch
into? That way the properties are always there, but a user can attach
whatever methods make sense for them.

It would be technically feasible. If we go with a static method for
construction, then one might even say that there's reasonable expectation
that PhpToken::getAll(...) is going to return PhpToken[] and
MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[].

I'm a bit apprehensive about this though, specifically because you mention
PDO... which, I think, isn't exactly a success story when it comes to this.
If we do this, then the behavior would be that the object gets created, the
properties populated, and no constructor gets called. The last part is
important -- when you start calling constructors and magic methods, that's
where the mess starts and you get PDO.

After thinking about this a bit more, there's a very nice solution to this:
Something that's missing from the current proposal is a constructor. Right
now, if code wants to insert new tokens, then those would have to be
constructed by creating the object and then manually assigning properties,
so we should definitely have a constructor.

Once we have one, we can mark it final, and thus make the construction
behavior well-defined, even if the class is extended.

Nikita

5 years ago by Thomas Hruska — view source

unread

On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield larry@garfieldtech.com
wrote:

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to
it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

I love everything about this.

I would agree with Nicolas that a static constructor would be better.
I don't know about polyfilling it, but it's definitely more
self-descriptive.

I'm skeptical about the methods. I can see them being useful, but also
being bikeshed material. For instance, if you're doing annotation parsing
then docblocks are not ignorable. They're what you're actually looking for.

Two possible additions, feel free to ignore if they're too complicated:

Should it return an array of token objects, or a lazy iterable? If I'm
only interested in certain types (eg, doc strings, classes, etc.) then a
lazy iterable would allow me to string some filter and map operations on to
it and use even less memory overall, since the whole tree is not in memory
at once.

I'm going to take you up on your offer and ignore this one :P Returning
tokens as an iterator is inefficient because it requires full lexer state
backups and restores for each token. Could be optimized, but I wouldn't
bother with it for this feature. I also personally have no use-case for a
lazy token stream. (It's technically sufficient for parsing, but if you
want to preserve formatting, you're going to be preserving all the tokens
anyway.)

Try passing a 10MB PHP file that's all code into token_get_all(). It's
pretty easy to hit hard memory limits and/or start crashing PHP when
token_get_all() tokenizes the whole thing into a giant array or set of
objects. Calling gc_mem_caches() when the previous RAM bits aren't
needed anymore helps. Stream-based token parsing would be better for
RAM usage but I can see how that might be complex to implement and
largely not worth it since such scenarios will be rare and require the
ability to maintain lexer state externally as you mentioned and would
only be used by this part of the software.

--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

And once you find my software useful:

http://cubiclesoft.com/donate/

5 years ago by Levi Morrison via internals — view source

unread

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

Overall it looks great. Thanks, Nikita.

I do think it should use a separate function from token_get_all, and
am not sure I care whether it is a function or static method.

The only helper I see as totally non-controversial is getTokenName,
which seems completely reasonable and users should not have to keep
re-implementing that. I am open to more discussion about the others.

5 years ago by Sara Golemon — view source

unread

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Yep. I remember bringing this up in 2016 and there was generally good
reception to it from you, Andrea, Stas, and Stig at the very least. Why
isn't it in? It got derailed by some bike shed colorizing and a little bit
of workplace drama then dropped on the floor.

Thanks for picking it up, and I agree with your response to Larry. As nice
as it would be to lazy iterate, the scanner is just in NO shape to tolerate
reentering userspace and potentially reinvoking the scanner before the
first run through is done.

I also agree that being able to subclass the token would be great, but that
PDO made some mistakes here and we can do that as a separate addition later
on if there's not consensus now.

I'm +1 for NOT overloading the token_get_all() function, but rather putting
a static factory method on the PhpToken class (or whatever you call it).
When we add subclassability, we can always add additional statics if our
initial signature doesn't work out.

I'm not clear why you want to final the base constructor. IMO we populate
the fields on object creation, then invoke constructor (which is a no-op in
the base class). Later uses of subclassing can deal with the properties
(or not) at that time, in their own constructor, delegating (or not) to the
base.

TL;DR +1, because I wanted this four years ago. :)
-Sara

5 years ago by Marco Pivetta — view source

unread

Thanks for picking it up, and I agree with your response to Larry. As nice
as it would be to lazy iterate, the scanner is just in NO shape to tolerate
reentering userspace and potentially reinvoking the scanner before the
first run through is done.

If this is the current state, maybe it would suffice to declare the return
type as iterable, and return a strict (fully populated) structure in a
first implementation, later to be changed to an iterator, if applicable?

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

5 years ago by Sara Golemon — view source

unread

Thanks for picking it up, and I agree with your response to Larry. As
nice
as it would be to lazy iterate, the scanner is just in NO shape to
tolerate
reentering userspace and potentially reinvoking the scanner before the
first run through is done.

If this is the current state, maybe it would suffice to declare the return
type as iterable, and return a strict (fully populated) structure in a
first implementation, later to be changed to an iterator, if applicable?

I think that's an optimistic, but ultimately harmless approach. Arrays
satisfy Iterables. Worst case we never improve on that. Best case we get
iterable token generators.
+1

-Sara

5 years ago by Nikita Popov — view source

unread

Thanks for picking it up, and I agree with your response to Larry. As
nice
as it would be to lazy iterate, the scanner is just in NO shape to
tolerate
reentering userspace and potentially reinvoking the scanner before the
first run through is done.

If this is the current state, maybe it would suffice to declare the return
type as iterable, and return a strict (fully populated) structure in a
first implementation, later to be changed to an iterator, if applicable?

Marco Pivetta

The fact that it returns an array is an important part of the contract. If
an iterator variant is added in the future, it should be added as a
separate method.

I don't want to be writing

$tokens = PhpToken::getAll($code);
if (!is_array($tokens)) {
$tokens = iterator_to_array($tokens);
}

to convert this to the right type. And I also don't want my usage to be
implicitly "upgraded" to an iterator in the future: The iterator will
always be less efficient, and I don't want to be forced to use it if I need
an array anyway.

Nikita

5 years ago by Nikita Popov — view source

unread

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to
it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Yep. I remember bringing this up in 2016 and there was generally good
reception to it from you, Andrea, Stas, and Stig at the very least. Why
isn't it in? It got derailed by some bike shed colorizing and a little bit
of workplace drama then dropped on the floor.

Thanks for picking it up, and I agree with your response to Larry. As
nice as it would be to lazy iterate, the scanner is just in NO shape to
tolerate reentering userspace and potentially reinvoking the scanner before
the first run through is done.

I also agree that being able to subclass the token would be great, but
that PDO made some mistakes here and we can do that as a separate addition
later on if there's not consensus now.

I'm +1 for NOT overloading the token_get_all() function, but rather
putting a static factory method on the PhpToken class (or whatever you call
it). When we add subclassability, we can always add additional statics if
our initial signature doesn't work out.

As there seems to be a pretty clear consensus on this point at least, I've
updated the proposal to add the static factory method.

I'm not clear why you want to final the base constructor. IMO we populate

the fields on object creation, then invoke constructor (which is a no-op in
the base class). Later uses of subclassing can deal with the properties
(or not) at that time, in their own constructor, delegating (or not) to the
base.

Populating the fields on creation and then calling the constructor is the
whole problem with PDO, because that's not how PHP usually does
construction. Normally there is either no constructor and properties are
explicitly populated, or there is a constructor, in which case only the
constructor is called. Populating properties first and the calling a
constructor is something that essentially only PDO does.

To clarify a bit what I meant with the final constructor: As part of the
last update, I've added the following constructor:

class PhpToken {
public function __construct(int $id, string $text, int $line = -1, int
$pos = -1);
}

This constructor will initialize the corresponding properties. Now, the
behavior that would make most sense to me (if extension of the class is
allowed) is that MyPhpToken::getAll() is going to create the new tokens
based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the
constructor final, then we could hardcode the construction behavior of the
base class without introducing any kind of weird rules, it would be just
the usual language semantics.

If we don't make the constructor final, then we would have to actually call
it (if it is overridden -- otherwise we can use more optimized
initialization code). We can do that (I believe calling user code here
should be perfectly safe -- the lexer is reentrant after all), it's just
extra complexity and I'm not sure it's actually useful. The final
constructor would still allow a) adding methods in the child class and b)
adding properties with default values in the child class, which seems like
it should cover most of the usefulness.

So I think the options here are:
a) Make PhpToken final and simplify don't support this.
b) Make PhpToken::__construct() final and (very easily) support basic
extension.
c) Make PhpToken::__construct() non-final and support full extension, with
some extra effort.

Regards,
Nikita

5 years ago by Rowan Tommins — view source

unread

This constructor will initialize the corresponding properties. Now, the
behavior that would make most sense to me (if extension of the class is
allowed) is that MyPhpToken::getAll() is going to create the new tokens
based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the
constructor final, then we could hardcode the construction behavior of the
base class without introducing any kind of weird rules, it would be just
the usual language semantics.

It's worth noting that this is how ext/simplexml works: the $class
parameter of simplexml_load_string and simplexml_load_file must be the name
of a class that inherits from SimpleXMLElement, and an instance of that
class will be constructed for each element of the document. The constructor
of SimpleXMLElement is final, so the internal initialisation logic doesn't
have to actually call it, it can just initialise the private state directly.

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Nikita Popov — view source

unread

On Fri, Feb 14, 2020 at 7:30 PM Rowan Tommins rowan.collins@gmail.com
wrote:

This constructor will initialize the corresponding properties. Now, the
behavior that would make most sense to me (if extension of the class is
allowed) is that MyPhpToken::getAll() is going to create the new tokens
based on "new MyPhpToken($id, $text, $line, $pos)". If we mark the
constructor final, then we could hardcode the construction behavior of
the
base class without introducing any kind of weird rules, it would be just
the usual language semantics.

It's worth noting that this is how ext/simplexml works: the $class
parameter of simplexml_load_string and simplexml_load_file must be the name
of a class that inherits from SimpleXMLElement, and an instance of that
class will be constructed for each element of the document. The constructor
of SimpleXMLElement is final, so the internal initialisation logic doesn't
have to actually call it, it can just initialise the private state
directly.

Thanks for pointing that out, I wasn't aware that SimpleXMLElement is
already using this pattern. Given that, I've updated the RFC to go with
that option now.

Nikita

5 years ago by Nikita Popov — view source

unread

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

I think this proposal is in a pretty decent shape now, and I'd like to move
it to voting soon. The only remaining open question is whether we want to
add any additional predefined methods. As the class can now be extended,
every library can add their own methods, but there might still be value in
providing some things by default, primarily for performance reason. For
example, the proposed is() method can be a good bit more efficient when
implemented directly in extension code.

Any feedback on this point?

Nikita

5 years ago by Larry Garfield — view source

unread

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

I think this proposal is in a pretty decent shape now, and I'd like to move
it to voting soon. The only remaining open question is whether we want to
add any additional predefined methods. As the class can now be extended,
every library can add their own methods, but there might still be value in
providing some things by default, primarily for performance reason. For
example, the proposed is() method can be a good bit more efficient when
implemented directly in extension code.

Any feedback on this point?

Nikita

Seems good to me in its current state.

One last thought I had: in its current form, it seems like it would be possible, in concept, to create tokens directly via the constructor (ignoring the positional values), then have a generic render command you can apply to a list of them to use for code generation. I know that's not the intent here, but in concept is what I described possible?

--Larry Garfield

5 years ago by Nikita Popov — view source

unread

On Tue, Feb 25, 2020 at 5:59 PM Larry Garfield larry@garfieldtech.com
wrote:

On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov nikita.ppv@gmail.com
wrote:

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array
of
PhpToken objects. This reduces memory usage, improves performance,
makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods
to it.
If this generates too much bikeshed, I'll drop methods from the
proposal.

I think this proposal is in a pretty decent shape now, and I'd like to
move
it to voting soon. The only remaining open question is whether we want to
add any additional predefined methods. As the class can now be extended,
every library can add their own methods, but there might still be value
in
providing some things by default, primarily for performance reason. For
example, the proposed is() method can be a good bit more efficient when
implemented directly in extension code.

Any feedback on this point?

Nikita

Seems good to me in its current state.

One last thought I had: in its current form, it seems like it would be
possible, in concept, to create tokens directly via the constructor
(ignoring the positional values), then have a generic render command you
can apply to a list of them to use for code generation. I know that's not
the intent here, but in concept is what I described possible?

Sure, that works. You can construct your own tokens and the "render
command" should actually be as simple as implode(array_column($tokens,
'text')).

Nikita

5 years ago by Theodore Brown — view source

unread

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

I think this proposal is in a pretty decent shape now, and I'd like to move
it to voting soon. The only remaining open question is whether we want to
add any additional predefined methods. As the class can now be extended,
every library can add their own methods, but there might still be value in
providing some things by default, primarily for performance reason. For
example, the proposed is() method can be a good bit more efficient when
implemented directly in extension code.

Any feedback on this point?

Hi Nikita,

Thanks for this RFC. The proposed PhpToken class will definitely make it
easier to work with parsed tokens (I struggled with this when working on
the migration script for deprecated alternate array/string offset syntax).

In regards to the open question of additional methods, the getTokenName()
method would be very welcome. isIgnorable() would also be helpful
sometimes, though I don't care much one way or another if it's added.

But I'm really skeptical about the value of the is($kind) method. Code
working with tokens should know the type of value being checked (ID int,
text content, or array of one or the other), and a method that accepts
any of three types can make it harder to understand what's going on.

For example, the proposed isIgnorable() method doesn't need to use
is(), it could just check whether the id property is in the array
of token constants.

Best regards,
Theodore

5 years ago by Nikita Popov — view source

unread

On Tue, Feb 25, 2020 at 7:16 PM Theodore Brown theodorejb@outlook.com
wrote:

On Thu, Feb 13, 2020 at 10:47 AM Nikita Popov nikita.ppv@gmail.com
wrote:

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods
to it.
If this generates too much bikeshed, I'll drop methods from the
proposal.

I think this proposal is in a pretty decent shape now, and I'd like to
move
it to voting soon. The only remaining open question is whether we want to
add any additional predefined methods. As the class can now be extended,
every library can add their own methods, but there might still be value
in
providing some things by default, primarily for performance reason. For
example, the proposed is() method can be a good bit more efficient when
implemented directly in extension code.

Any feedback on this point?

Hi Nikita,

Thanks for this RFC. The proposed PhpToken class will definitely make it
easier to work with parsed tokens (I struggled with this when working on
the migration script for deprecated alternate array/string offset syntax).

In regards to the open question of additional methods, the getTokenName()
method would be very welcome. isIgnorable() would also be helpful
sometimes, though I don't care much one way or another if it's added.

But I'm really skeptical about the value of the is($kind) method. Code
working with tokens should know the type of value being checked (ID int,
text content, or array of one or the other), and a method that accepts
any of three types can make it harder to understand what's going on.

For example, the proposed isIgnorable() method doesn't need to use
is(), it could just check whether the id property is in the array
of token constants.

To provide some context, a method like is() is primarily useful in the
implementation of other methods accepting a token-descriptor to search for.
Token stream implementation commonly have many helper methods along the
lines of

public function findRight($pos, $findTokenType) {
$tokens = $this->tokens;
for ($count = \count($tokens); $pos < $count; $pos++) {
if ($tokens[$pos]->is($findTokenType)) {
return $pos;
}
}
return -1;
}

which search for tokens in different directions, skip tokens, check that
tokens exist while skipping whitespace etc. Having a fast primitive to
perform the token type check would be quite useful for that, I think.

Regards,
Nikita

5 years ago by michal.brzuchalski@gmail.com — view source

unread

czw., 13 lut 2020 o 10:48 Nikita Popov nikita.ppv@gmail.com napisał(a):

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

Hi Nikita,

I really like the RFC itself and got only one question regarding the class
name
which I couldn't find an answer in RFC.

Why PHP prefix in PhpToken class name?
Why can't we go with Token class name alone without the prefix?

The only one which includes PHP in class names so far are only:

__PHP_Incomplete_Class
php_user_filter

Above taken from https://www.php.net/manual/en/reserved.classes.php

BR,
Michał Brzuchalski

5 years ago by Nikita Popov — view source

unread

Hi internals,

This has been discussed a while ago already, now as a proper proposal:
https://wiki.php.net/rfc/token_as_object

tl;dr is that it allows you to get token_get_all() output as an array of
PhpToken objects. This reduces memory usage, improves performance, makes
code more uniform and readable... What's not to like?

An open question is whether (at least to start with) PhpToken should be
just a data container, or whether we want to add some helper methods to it.
If this generates too much bikeshed, I'll drop methods from the proposal.

Regards,
Nikita

I've done some final updates to the RFC (
https://wiki.php.net/rfc/token_as_object) and intend to open voting
tomorrow, if there are major objections. The last changes were to clarify
that getTokenName() returns null for unknown token IDs, and to provide some
more extended rationale for what the individual methods are useful for.

@Michal: The reason why it's called PhpToken is that it is ... a PHP token
:) At least that's what it is intended to be used for -- while you could
technically use it to represent other kinds of tokens, I don't think that's
a usage that should be endorsed. Just "Token" is very generic, and not even
specific to lexing/tokenization. The term "Token" is commonly used in other
contexts as well (e.g. token ring).

Regards,
Nikita