[RFC][VOTING] Context Sensitive Lexer

11 years ago by Marcio Almada — view source — reply

unread

Hi,

Since no more issues appeared on discussion, the voting for the "Context
Sensitive Lexer" is now open. The voting will close in exactly 14 days
counting from now:

RFC: https://wiki.php.net/rfc/context_sensitive_lexer#votes

Since so few people participated on discussions, if you decide to vote "no"
please share your motives on the mailing lists by replying to this thread.
This will collaborate to the RFC process and can lead to improvements on
possible follow up RFCs.

Acknowledgments to Bob Weinand for all the advice and cooperation.

Thanks,
Márcio.

11 years ago by Nikita Popov — view source — reply

unread

Hi,

Since no more issues appeared on discussion, the voting for the "Context
Sensitive Lexer" is now open. The voting will close in exactly 14 days
counting from now:

RFC: https://wiki.php.net/rfc/context_sensitive_lexer#votes

Since so few people participated on discussions, if you decide to vote "no"
please share your motives on the mailing lists by replying to this thread.
This will collaborate to the RFC process and can lead to improvements on
possible follow up RFCs.

Acknowledgments to Bob Weinand for all the advice and cooperation.

Thanks,
Márcio.

After reviewing the implementation, I've decided to vote "no" on this RFC.

I had originally assumed that if this proposal is limited to method names
and class constants only the implementation should be pretty simple and
robust. However it turned out that there are a number of complications, the
two primary ones being:

a) We allow to define multiple class constants in one declaration. I.e. you
can write not only

const FOO = 42;

but also

const FOO = 42, BAR = 24, BAZ = 0;

This is further complicated by the fact that we allow many types of (static
scalar) expressions to be used as values for constants, so a declaration
might actually look like

const FOO = 42, BAR = [24, BAZ];

as well.

b) The trait adaptations syntax allows using method names without a generic
prefix like "::" or "->" in a number of ways. E.g. in

use A { A::foo as bar; }

foo and bar are method names. On the other hand in

use A { A::foo as public; }

the "public" is not a method name, it is used for trait method visibility
aliasing. Trait aliasing (but not precedence!) adaptations also allow
non-absolute method references on the left hand side, so

use A { foo as bar; }

without the leading A:: is possible as well.

In order to support these kinds of things, the lexer has to track a bunch
of state (e.g. to know if we're currently in a trait adaptation list as
opposed to a namespace or closure "use" list) and also make extensive use
of lexer lookahead. We also have to accurately track nesting of braces,
which is easy to get wrong when considering details like string
interpolation syntax.

After a number of issues was fixed by Marcio, the patch still contains a
few lexer rules that I don't really understand, but which seem to be
required to avoid regressions in some edge case.

So, tl;dr: I think the patch is too risky. Even if we can make sure that
we've covered all the current edge-cases and don't regress anything, I'm
afraid that this will cause complications with future changes. This ends up
replicating too many parsing aspects in the lexer in a rather ad-hoc manner.

Nikita

11 years ago by Sebastian Bergmann — view source — reply

unread

Am 04.03.2015 um 09:52 schrieb Nikita Popov:

So, tl;dr: I think the patch is too risky. Even if we can make sure that
we've covered all the current edge-cases and don't regress anything, I'm
afraid that this will cause complications with future changes. This ends up
replicating too many parsing aspects in the lexer in a rather ad-hoc manner.

I trust Nikita's expertise and have removed my +1 vote. I really want
to have what this RFC aims for but not at the price of robustness
(regressions) and flexibility (impediment for future feature
development).

11 years ago by Marcio Almada — view source — reply

unread

2015-03-04 5:52 GMT-03:00 Nikita Popov nikita.ppv@gmail.com:

After reviewing the implementation, I've decided to vote "no" on this RFC.

I had originally assumed that if this proposal is limited to method names
and class constants only the implementation should be pretty simple and
robust. However it turned out that there are a number of complications, the
two primary ones being:

a) We allow to define multiple class constants in one declaration. I.e.
you can write not only
const FOO = 42;
but also
const FOO = 42, BAR = 24, BAZ = 0;
This is further complicated by the fact that we allow many types of
(static scalar) expressions to be used as values for constants, so a
declaration might actually look like
const FOO = 42, BAR = [24, BAZ];
as well.

b) The trait adaptations syntax allows using method names without a
generic prefix like "::" or "->" in a number of ways. E.g. in
use A { A::foo as bar; }
foo and bar are method names. On the other hand in
use A { A::foo as public; }
the "public" is not a method name, it is used for trait method visibility
aliasing. Trait aliasing (but not precedence!) adaptations also allow
non-absolute method references on the left hand side, so
use A { foo as bar; }
without the leading A:: is possible as well.

In order to support these kinds of things, the lexer has to track a bunch
of state (e.g. to know if we're currently in a trait adaptation list as
opposed to a namespace or closure "use" list) and also make extensive use
of lexer lookahead. We also have to accurately track nesting of braces,
which is easy to get wrong when considering details like string
interpolation syntax.

After a number of issues was fixed by Marcio, the patch still contains a
few lexer rules that I don't really understand, but which seem to be
required to avoid regressions in some edge case.

So, tl;dr: I think the patch is too risky. Even if we can make sure that
we've covered all the current edge-cases and don't regress anything, I'm
afraid that this will cause complications with future changes. This ends up
replicating too many parsing aspects in the lexer in a rather ad-hoc manner.

Nikita

Despite the problems you pointed out are already fixed I understand your
position. In fact, the implementation must be better than this and I'm
already working on a better implementation regardless of this voting
passing or not.

I've been thinking about dropping this voting since yesterday but, seeing
the bright side, at least the current polling showed that most people want
the feature and the ones that voted "no" did it because of the
implementation and this is already enough information for me to continue
working to find other alternatives :)

I'll try to have a better patch, using lexing feedback instead, until the
end of the voting (in 11 days) so at least we can discuss it for PHP 7.1
instead.

To all:

Some people are voting on the feature without taking the implementation
itself in consideration, and it's ok if voters are doing that because not
everyone is able to evaluate the implementation. So I'm also thinking about
voting only for the feature for now.

If the implementation becomes good enough - according to internals - in
time for PHP7 we could merge it, otherwise we postpone it to PHP 7.1 or
until we get an implementation that is good enough. In that case the
feature could be approved and the implementation evaluated later only by
the ones who feel capable to.

What do you think about that? I don't know how this could fit (if it fits)
on the RFC process, so opinions are welcome. If this idea doesn't fit well
with the current RFC process, then I'll do another proposal targeting PHP
7.1 anyway.

Thanks,
Márcio

11 years ago by Marcio Almada — view source — reply

unread

Hi,

Just passing by to announce I already have a working version of the new
patch: https://github.com/php/php-src/pull/1158

The patch is 100% compatible with the proposed one with the advantages:

Has no regression or forward compatibility risks and is highly
predictable
Has an very small footprint compared to the previous attempt involving
a pure lexical approach
Is highly configurable, to make a word semi-reserved you only have to
edit a single inclusive list in a parser rule.
Requires a single compile time check

More than ever, I'd like to advice voters to vote for the feature as the
new implementation is already on it's way. There still some work to be
done, please refer to the task list on the pull request to see what still
needs to be done.

It would be nice to have the new patch reviewed too.

Thanks,
Márcio

2015-03-04 14:29 GMT-03:00 Marcio Almada marcio.web2@gmail.com:

Hi

2015-03-04 5:52 GMT-03:00 Nikita Popov nikita.ppv@gmail.com:
After reviewing the implementation, I've decided to vote "no" on this RFC.

I had originally assumed that if this proposal is limited to method names
and class constants only the implementation should be pretty simple and
robust. However it turned out that there are a number of complications, the
two primary ones being:

a) We allow to define multiple class constants in one declaration. I.e.
you can write not only
const FOO = 42;
but also
const FOO = 42, BAR = 24, BAZ = 0;
This is further complicated by the fact that we allow many types of
(static scalar) expressions to be used as values for constants, so a
declaration might actually look like
const FOO = 42, BAR = [24, BAZ];
as well.

b) The trait adaptations syntax allows using method names without a
generic prefix like "::" or "->" in a number of ways. E.g. in
use A { A::foo as bar; }
foo and bar are method names. On the other hand in
use A { A::foo as public; }
the "public" is not a method name, it is used for trait method visibility
aliasing. Trait aliasing (but not precedence!) adaptations also allow
non-absolute method references on the left hand side, so
use A { foo as bar; }
without the leading A:: is possible as well.

In order to support these kinds of things, the lexer has to track a bunch
of state (e.g. to know if we're currently in a trait adaptation list as
opposed to a namespace or closure "use" list) and also make extensive use
of lexer lookahead. We also have to accurately track nesting of braces,
which is easy to get wrong when considering details like string
interpolation syntax.

After a number of issues was fixed by Marcio, the patch still contains a
few lexer rules that I don't really understand, but which seem to be
required to avoid regressions in some edge case.

So, tl;dr: I think the patch is too risky. Even if we can make sure that
we've covered all the current edge-cases and don't regress anything, I'm
afraid that this will cause complications with future changes. This ends up
replicating too many parsing aspects in the lexer in a rather ad-hoc manner.

Nikita

11 years ago by Matteo Beccati — view source — reply

unread

Hi Marcio,

Just passing by to announce I already have a working version of the new
patch: https://github.com/php/php-src/pull/1158

The patch is 100% compatible with the proposed one with the advantages:
- Has no regression or forward compatibility risks and is highly
predictable
- Has an very small footprint compared to the previous attempt involving
a pure lexical approach
- Is highly configurable, to make a word semi-reserved you only have to
edit a single inclusive list in a parser rule.
- Requires a single compile time check
More than ever, I'd like to advice voters to vote for the feature as the
new implementation is already on it's way. There still some work to be
done, please refer to the task list on the pull request to see what still
needs to be done.

I like the approach. I'm no lexer expert but the new PR seems much
nicer. Moreover, I like the feature itself but previously voted no on
the implementation. I've therefore switched my vote to a "yes".

--
Matteo Beccati

Development & Consulting - http://www.beccati.com/

11 years ago by Sebastian Bergmann — view source — reply

unread

Am 09.03.2015 um 09:44 schrieb Matteo Beccati:

I like the approach. I'm no lexer expert but the new PR seems much nicer.
Moreover, I like the feature itself but previously voted no on the
implementation. I've therefore switched my vote to a "yes".

Dito.

11 years ago by Nikita Popov — view source — reply

unread

Hi,

Just passing by to announce I already have a working version of the new
patch: https://github.com/php/php-src/pull/1158

The patch is 100% compatible with the proposed one with the advantages:

Has no regression or forward compatibility risks and is highly
predictable

Has an very small footprint compared to the previous attempt involving
a pure lexical approach

Is highly configurable, to make a word semi-reserved you only have to
edit a single inclusive list in a parser rule.

Requires a single compile time check

More than ever, I'd like to advice voters to vote for the feature as the
new implementation is already on it's way. There still some work to be
done, please refer to the task list on the pull request to see what still
needs to be done.

It would be nice to have the new patch reviewed too.

The new implementation does indeed look much nicer :) The only open
question left is how you want to deal with ext/tokenizer support. Can you
clarify your plan regarding that?

Also, is this vote about the new implementation now?

Nikita

11 years ago by Dmitry Stogov — view source — reply

unread

On Mon, Mar 9, 2015 at 6:47 AM, Marcio Almada marcio.web2@gmail.com
wrote:

Hi,

Just passing by to announce I already have a working version of the new
patch: https://github.com/php/php-src/pull/1158

The patch is 100% compatible with the proposed one with the advantages:

Has no regression or forward compatibility risks and is highly
predictable

Has an very small footprint compared to the previous attempt
involving
a pure lexical approach

Is highly configurable, to make a word semi-reserved you only have
to
edit a single inclusive list in a parser rule.

Requires a single compile time check

More than ever, I'd like to advice voters to vote for the feature as the
new implementation is already on it's way. There still some work to be
done, please refer to the task list on the pull request to see what still
needs to be done.

It would be nice to have the new patch reviewed too.

The new implementation does indeed look much nicer :) The only open
question left is how you want to deal with ext/tokenizer support. Can you
clarify your plan regarding that?

Also, is this vote about the new implementation now?

Lets vote on proposal in general, but delay merging once the implementation
is good enough.
In this case I would say +1.

Thanks. Dmitry.

Nikita

11 years ago by Marcio Almada — view source — reply

unread

Hi,

2015-03-11 11:49 GMT-03:00 Nikita Popov nikita.ppv@gmail.com:

On Mon, Mar 9, 2015 at 6:47 AM, Marcio Almada marcio.web2@gmail.com
wrote:

Hi,

Just passing by to announce I already have a working version of the new
patch: https://github.com/php/php-src/pull/1158

The patch is 100% compatible with the proposed one with the advantages:

Has no regression or forward compatibility risks and is highly
predictable

Has an very small footprint compared to the previous attempt
involving
a pure lexical approach

Is highly configurable, to make a word semi-reserved you only have to
edit a single inclusive list in a parser rule.

Requires a single compile time check

More than ever, I'd like to advice voters to vote for the feature as the
new implementation is already on it's way. There still some work to be
done, please refer to the task list on the pull request to see what still
needs to be done.

It would be nice to have the new patch reviewed too.

The new implementation does indeed look much nicer :) The only open
question left is how you want to deal with ext/tokenizer support. Can you
clarify your plan regarding that?

I'm working right now to add full support to ext tokenizer back. I'm not
very experienced with the PHP implementation so, needless to say, I found
numerous ways to break the tokenizer extension \o/

But a solution is getting closer and might get pushed soon. If some harder
problem I don't have enough information to solve arise I'll try to ask for
some advice or collaboration, though it doesn't seem necessary for now as
we still have some time and it's been a great learning experience :)

Also, is this vote about the new implementation now?

I'm updating the RFC (rush time here gimme a few minutes) to list both
implementations side by side on the patch section, and explaining that the
second patch supersedes the first one.

Nikita

Márcio

11 years ago by Pascal MARTIN — view source — reply

unread

the voting for the "Context
Sensitive Lexer" is now open.

Hi,

After discussing this with other members of AFUP (well, not any of the
implementations, as we don't really have the expertise needed for that
-- but the feature as seen by end-users), we are on the +1 side for the
idea.

Thanks for your work!

--
Pascal MARTIN, AFUP - French UG
http://php-internals.afup.org/