Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:86274
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CAOsHV+tPBU5Z+cm+KxfOM9phFJEqe7A1eS9WZKRzR7aeEh1bmA@mail.gmail.com>
References: <CAOsHV+tPBU5Z+cm+KxfOM9phFJEqe7A1eS9WZKRzR7aeEh1bmA@mail.gmail.com>
Date: Sat, 16 May 2015 23:48:59 +0200
Message-ID: <CAF+90c-0x8Sh=LD=oHtWKx-ihX18Udam5b0gegricjPPFO2zJQ@mail.gmail.com>
To: Marcio Almada <marcio.web2@gmail.com>
Cc: PHP internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=047d7bf0bfc2a90805051639ed69
Subject: Re: [PHP-DEV] Context Sensitive Language RFC - Implementation Candidate
From: nikita.ppv@gmail.com (Nikita Popov)

--047d7bf0bfc2a90805051639ed69
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 20, 2015 at 5:32 PM, Marcio Almada <marcio.web2@gmail.com>
wrote:

> Hi,
>
> The Context Sensitive Lexer RFC
> <https://wiki.php.net/rfc/context_sensitive_lexer> passed :) and by the
> time of the voting phase, we decided to vote for the feature only and lat=
er
> discuss quality analysis on the implementations aimed to fulfill the RFC.
>
> First, I'd like to thank you all for that decision. I know that's an
> exception on the RFC process, and I am glad we choose this path for the
> following reasons:
>
> 1. Voting different RFCs describing the same feature with slightly
> different implementations would cause us to waste many voting cycles (may=
be
> entire release cycles)  without a guarantee of quality. The main reason t=
o
> establish an RFC process is to chase for quality - to follow all the rule=
s
> strictly, in this case, would be to contradict our main objective here.
>
> 2. Knowing in advance that the feature was already approved is a motivati=
ng
> factor to go on and try a good number of possible implementations and
> propose the best ones, instead of recursively voting until an
> implementation pass.
>
> With that said, this is the proposed pull request:
>
> Pull Request: https://github.com/php/php-src/pull/1221
> Diff: https://github.com/php/php-src/pull/1221/files
> RFC: https://wiki.php.net/rfc/context_sensitive_lexer
>
> There is sufficient description of the pull request itself. The ones that
> participated in the previous discussions probably won't have trouble to
> understand it, but feel free to share any doubts or suggestions here, if
> necessary.
>
> Thanks,
> M=C3=A1rcio
>

Sorry for late response, forgot about this RFC. I've only glanced over it,
but the patch looks okay from the technical side.

The thing that's bothering me is the fact that this patch is basically
saying: "It is no longer possible to correctly tokenize PHP without also
parsing it."

For example, if you're writing a syntax highlighter for PHP and you want
that syntax highlighter to be correct, you'll be writing not only a lexer,
but also a parser for PHP (which is significantly more complicated).
Actually it's worse than that: The approach of running a parser
concurrently with the token collection does not work for highlighting code
snippets for example, where the snippet may not form syntactically fully
valid code. Syntax highlighting only being an example, this applies to any
external tooling that's not written in PHP and does not have the benefit of
using token_get_all().

I don't know how important this is to us, but I'm somewhat vary of going
more into the C++ direction (where you essentially need a full
type-analyzer to do a parse).


This is why I still prefer the dead-simple approach of making the next
label after :: and "function" unreserved (what we do for the label after ->
already), combined with forbidding reserved names for free functions in the
compiler (similar to the blacklist we have for classes). This doesn't cover
everything (like trait adaptations), but I think it covers the 97% case
(and actually allows us to really allow all names, without exceptions).

Nikita

--047d7bf0bfc2a90805051639ed69--