Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:86274 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 89745 invoked from network); 16 May 2015 21:49:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 16 May 2015 21:49:04 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.47 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 74.125.82.47 mail-wg0-f47.google.com Received: from [74.125.82.47] ([74.125.82.47:33449] helo=mail-wg0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id FA/80-14891-E4BB7555 for ; Sat, 16 May 2015 17:49:03 -0400 Received: by wgin8 with SMTP id n8so150129042wgi.0 for ; Sat, 16 May 2015 14:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=pQ3Aox+ywlJDc5sjUmegrKSb87G6fxjf37qmANmuiQA=; b=SJ0FR8cZqZoK4GSOcq0pcpE26XF2F5H1RzydUaG55sKthNsJxTzbv+ELPTB75t1ZxI 7JzO2PgsQflj2N8NGbEUycozT27MogdCnE2ITyAib+7RLqgCHqcAZGFnk25kJHFn1w5e jmhaROAYFlJ5RV4bqysVfBhqp/Tn2XMbYFkBwVw0M6Tac01iFuRhm3iBQUD2R5aak5Oo IeRIg7OCEj5pSIoj4LNkoZbp+e211xLh48XDJ06/TZQItmWOzdu7lopEzRI7d6Yj8Tw6 EMllUiSYnpIjsEy/GbaJRDFzfuAfVtNldYsVx6j92MpdyhM8SQmEUR4tiDpvEZ1zIQUo Krzw== MIME-Version: 1.0 X-Received: by 10.194.82.38 with SMTP id f6mr29233730wjy.16.1431812939776; Sat, 16 May 2015 14:48:59 -0700 (PDT) Received: by 10.27.86.133 with HTTP; Sat, 16 May 2015 14:48:59 -0700 (PDT) In-Reply-To: References: Date: Sat, 16 May 2015 23:48:59 +0200 Message-ID: To: Marcio Almada Cc: PHP internals Content-Type: multipart/alternative; boundary=047d7bf0bfc2a90805051639ed69 Subject: Re: [PHP-DEV] Context Sensitive Language RFC - Implementation Candidate From: nikita.ppv@gmail.com (Nikita Popov) --047d7bf0bfc2a90805051639ed69 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Apr 20, 2015 at 5:32 PM, Marcio Almada wrote: > Hi, > > The Context Sensitive Lexer RFC > passed :) and by the > time of the voting phase, we decided to vote for the feature only and lat= er > discuss quality analysis on the implementations aimed to fulfill the RFC. > > First, I'd like to thank you all for that decision. I know that's an > exception on the RFC process, and I am glad we choose this path for the > following reasons: > > 1. Voting different RFCs describing the same feature with slightly > different implementations would cause us to waste many voting cycles (may= be > entire release cycles) without a guarantee of quality. The main reason t= o > establish an RFC process is to chase for quality - to follow all the rule= s > strictly, in this case, would be to contradict our main objective here. > > 2. Knowing in advance that the feature was already approved is a motivati= ng > factor to go on and try a good number of possible implementations and > propose the best ones, instead of recursively voting until an > implementation pass. > > With that said, this is the proposed pull request: > > Pull Request: https://github.com/php/php-src/pull/1221 > Diff: https://github.com/php/php-src/pull/1221/files > RFC: https://wiki.php.net/rfc/context_sensitive_lexer > > There is sufficient description of the pull request itself. The ones that > participated in the previous discussions probably won't have trouble to > understand it, but feel free to share any doubts or suggestions here, if > necessary. > > Thanks, > M=C3=A1rcio > Sorry for late response, forgot about this RFC. I've only glanced over it, but the patch looks okay from the technical side. The thing that's bothering me is the fact that this patch is basically saying: "It is no longer possible to correctly tokenize PHP without also parsing it." For example, if you're writing a syntax highlighter for PHP and you want that syntax highlighter to be correct, you'll be writing not only a lexer, but also a parser for PHP (which is significantly more complicated). Actually it's worse than that: The approach of running a parser concurrently with the token collection does not work for highlighting code snippets for example, where the snippet may not form syntactically fully valid code. Syntax highlighting only being an example, this applies to any external tooling that's not written in PHP and does not have the benefit of using token_get_all(). I don't know how important this is to us, but I'm somewhat vary of going more into the C++ direction (where you essentially need a full type-analyzer to do a parse). This is why I still prefer the dead-simple approach of making the next label after :: and "function" unreserved (what we do for the label after -> already), combined with forbidding reserved names for free functions in the compiler (similar to the blacklist we have for classes). This doesn't cover everything (like trait adaptations), but I think it covers the 97% case (and actually allows us to really allow all names, without exceptions). Nikita --047d7bf0bfc2a90805051639ed69--