Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:86280
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.65 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CAF+90c-0x8Sh=LD=oHtWKx-ihX18Udam5b0gegricjPPFO2zJQ@mail.gmail.com>
References: <CAOsHV+tPBU5Z+cm+KxfOM9phFJEqe7A1eS9WZKRzR7aeEh1bmA@mail.gmail.com>
 <CAF+90c-0x8Sh=LD=oHtWKx-ihX18Udam5b0gegricjPPFO2zJQ@mail.gmail.com>
Date: Sat, 16 May 2015 20:52:08 -0300
Message-ID: <CAOsHV+somYyqxAWiPhp4wK2b1aj2MgF114F2Kx2ZB3B6B62H4Q@mail.gmail.com>
To: Nikita Popov <nikita.ppv@gmail.com>
Cc: PHP internals <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=089e0158bb4a42231005163ba7f1
Subject: Re: [PHP-DEV] Context Sensitive Language RFC - Implementation Candidate
From: marcio.web2@gmail.com (Marcio Almada)

--089e0158bb4a42231005163ba7f1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi!


> Sorry for late response, forgot about this RFC. I've only glanced over it=
,
> but the patch looks okay from the technical side.
>
>
No problem :) there are other more important issues being discussed that
should be prioritized, specially your engine exception RFC.


> The thing that's bothering me is the fact that this patch is basically
> saying: "It is no longer possible to correctly tokenize PHP without also
> parsing it."
>

It won't be possible either with your idea "of making the next label after
'::' and 'function' unreserved" because it doesn't cover the entire PHP
syntax. So I took the approach that fulfills 100% support.


> For example, if you're writing a syntax highlighter for PHP and you want
> that syntax highlighter to be correct, you'll be writing not only a lexer=
,
> but also a parser for PHP (which is significantly more complicated).
> Actually it's worse than that: The approach of running a parser
> concurrently with the token collection does not work for highlighting cod=
e
> snippets for example, where the snippet may not form syntactically fully
> valid code.
>

If we are talking about syntax highlighters only, then it's pretty easy for
an external tool to consider whatever is in front of `::` and `function` a
name and highlight it as such, you simply use *token_get_all($code)*
without the proposed *TOKEN_PARSE* flag and transform the resulting array
accordingly by applying a simple callback to simulate a lookahead.


> Syntax highlighting only being an example, this applies to any external
> tooling that's not written in PHP and does not have the benefit of using
> token_get_all().
>

With any approach taken, external tools will need to handle that the same
way. The point is, do we want "simpler" rules and pretend a small part of
PHP syntax doesn't even exists just because of these external tools?

External tools can handle that. Static analyzers inherently have much more
complex issues to deal with other than a white list of semi reserved names.


> I don't know how important this is to us, but I'm somewhat vary of going
> more into the C++ direction (where you essentially need a full
> type-analyzer to do a parse).
>
>
Yes, indeed - and this was only possible because we have an AST now
(thanks) - but I don't have a problem with it *as long as the parsing is
opt in*, and that's the case here.


> This is why I still prefer the dead-simple approach of making the next
> label after :: and "function" unreserved (what we do for the label after =
->
> already), combined with forbidding reserved names for free functions in t=
he
> compiler (similar to the blacklist we have for classes). This doesn't cov=
er
> everything (like trait adaptations), but I think it covers the 97% case
> (and actually allows us to really allow all names, without exceptions).
>

The problem with this "dead simple" approach is that it leaves syntax
behind like class const list and, like you just said, trait adaptations.
So unless a working solution that covers all syntax is given, this simpler
idea you suggested is a no go for me as it's not better than the solution
already being proposed.


> Nikita
>

I hope we can find enough grounds to agree here ^^

M=C3=A1rcio

--089e0158bb4a42231005163ba7f1--