Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:83292 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 74419 invoked from network); 20 Feb 2015 12:26:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Feb 2015 12:26:29 -0000 Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.44 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 74.125.82.44 mail-wg0-f44.google.com Received: from [74.125.82.44] ([74.125.82.44:54669] helo=mail-wg0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 45/03-54878-4F727E45 for ; Fri, 20 Feb 2015 07:26:29 -0500 Received: by mail-wg0-f44.google.com with SMTP id k14so12904098wgh.3 for ; Fri, 20 Feb 2015 04:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=h2i+WBcwjKbL6zAx5Sy8fPwoGuJRsS58v/mJ9m9c+1I=; b=vrNOKbwZDZthq1MedNk4CiNaoAeGhX4nTstX0Woj/QQKSUgqVg1tiZlCG9lialSlWe JeXjFUaQjXGiZjQHfhNcwfDSRyMARTvGmLIaUpOEWExmSpD0miYy1gndLRq/epFqPHPc dNaEqNYn3uugacU+Yn3K+msYAFhMQJeyF/e5gjkcu8aZOJGlpkrK6g5+2ua0//UoK1Lj gdRIMCPUp1Wi/33Sy6YypfufUkH1jKWMcO2QFE3VdbaQnNOdm4rTa9LjDIhqA3nmDh7G LgKgwkdDeaYOaK4XoU5m7+VbYMAkz5mQG9fTOpCFtdC+dOjwVu7RGnJiv/bo3srXh45o 8irg== MIME-Version: 1.0 X-Received: by 10.194.71.175 with SMTP id w15mr18108320wju.16.1424435185386; Fri, 20 Feb 2015 04:26:25 -0800 (PST) Received: by 10.27.10.168 with HTTP; Fri, 20 Feb 2015 04:26:25 -0800 (PST) In-Reply-To: References: Date: Fri, 20 Feb 2015 13:26:25 +0100 Message-ID: To: marcio3w@gmail.com Cc: PHP internals Content-Type: multipart/alternative; boundary=047d7bfcf0003b0c4f050f8429a7 Subject: Re: [PHP-DEV] [RFC][DISCUSSION] Context Sensitive lexer From: nikita.ppv@gmail.com (Nikita Popov) --047d7bfcf0003b0c4f050f8429a7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Feb 20, 2015 at 8:29 AM, Marcio Almada wrote: > Hi internals, > > I'd like to put the "Context Sensitive Lexer" RFC into discussion phase: > > RFC: https://wiki.php.net/rfc/context_sensitive_lexer > TL;DR commit: https://github.com/marcioAlmada/php-src/commit/c01014f9 > PR: https://github.com/php/php-src/pull/1054 > > PHP currently has ~64 globally reserved words. Not infrequently, these > reserved words end up clashing with legit alternatives to userland API > declarations. This RFC proposes minimal changes to have a context sensiti= ve > lexer with support for semi-reserved words on PHP7 without causing > maintenance issues. > > This could be especially useful to: > > - Reduce the surface of BC breaks whenever new keywords are introduced > - Avoid restricting userland APIs. Dispensing the need for hacks like > unecessary magic method calls or prefixed identifiers. > > The patch is 98% finished, the entire test suite is passing. I'm still > adding more tests to it but the hard part is done. So it's time to discus= s! > Sincerely, > M=C3=A1rcio Almada > I think we all agree that it would be nice to not be so strict about reserved keywords in some places. As such this RFC hinges on questions of implementation. The RFC uses a purely lexer-based approach, which is nice in principle, because ext/tokenizer benefits from it as well. The disadvantage of doing this in the lexer and in the scope that you're proposing (i.e. including class names) is that it requires reimplementing quite a number of parser rules via lookahead in the lexer. This means that a) the implementation depends on a complete understanding of the PHP syntax, otherwise we'll miss edge cases or be too strict in others and b) may limit us in future, because we may not be able to introduce syntax that can't be reasonably recognized with simple lexer state management or lookahead. To give you an example of a), your patch currently handles a single interface name properly nikic@saturn:~/php-src$ sapi/cli/php -r 'class Foo implements Interface {}' Fatal error: Interface 'Interface' not found in Command line code on line 1 but fails as soon as you implement multiple interfaces: nikic@saturn:~/php-src$ sapi/cli/php -r 'class Foo implements Interface, Array {}' Parse error: syntax error, unexpected 'Array' (T_ARRAY), expecting identifier (T_STRING) or namespace (T_NAMESPACE) or \\ (T_NS_SEPARATOR) in Command line code on line 1 So, I'm sure this can be worked around with a couple of new lexer rules, I'm just trying to show the systematic issues of this approach. An example for b) is harder to come by (as I'm not terribly familiar with what we can easily detect in the lexer and what we can't). One thing that comes to mind is supporting a short lambda syntax like the one available in Hack: (ClassName $a, $b, $c, $d) =3D=3D> $a As this has no prefixing "function" or similar, I suspect that it may be rather hard to detect that "ClassName" is actually a class name here and requires special treatment. Un-reserving class names now may make features like this impossible (or unnecessarily hard) to implement in the future. Due to these issues, I don't like the RFC in the current form - I think it's too ambitious. Class names simply occur in too many and diverse places= . I would suggest going with a more limited approach instead, which targets only method and class constant names. I.e. the label after -> and :: should not be reserved (we already do this for ->) and the label after "function" and "const" shouldn't be either. Of course this would also allow defining global reserved-keyword function/const names as well, so we might want to check their names against the list of reserved keywords. Though even that is just a courtesy to the user, e.g. it's already possible to define and access reserved-keyword constants using define() and constant(). Nikita --047d7bfcf0003b0c4f050f8429a7--