Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62859 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 24819 invoked from network); 6 Sep 2012 12:01:16 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Sep 2012 12:01:16 -0000 Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.42 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.215.42 mail-lpp01m010-f42.google.com Received: from [209.85.215.42] ([209.85.215.42:52370] helo=mail-lpp01m010-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CC/44-03079-B8098405 for ; Thu, 06 Sep 2012 08:01:16 -0400 Received: by lahl5 with SMTP id l5so1073153lah.29 for ; Thu, 06 Sep 2012 05:01:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GME9t6WVcmyOtRJV+Zu++UHEtLb18osTCevQJRsXn8I=; b=MHB63NMoEIege/m6unqlofhqKe6V1LgWvwjWyoKlVKONcWExSQBMoPdKDW0rV2lprj xNpLbK7mj0gtWu3T5MwmkzgKlhBNiBtdezb/7rbBVTvNtVHsPy3h1Zl8Dc2hST3g97zv ZPjPkaL3GWOJPLKN6ZKsbFj93+8PjaZnaCWQRBJ9t9b3r8ttJAbU6DIA5RSfysv3Fjsg Yhuy9xvPqQkvctoDXA7053qmwAdvN6W5Ly6gQHk+fSLMaFlvf6wQNx8tAicIw+cSGJT/ So2oZnA4doUEC2bus8Vivghi07FSnXrLS0dtr8vFXhSK7skGQuGMLmD9QhBEr67z+VTf sqfA== MIME-Version: 1.0 Received: by 10.152.123.140 with SMTP id ma12mr1714789lab.22.1346932872706; Thu, 06 Sep 2012 05:01:12 -0700 (PDT) Received: by 10.114.22.1 with HTTP; Thu, 6 Sep 2012 05:01:12 -0700 (PDT) In-Reply-To: <50486C27.20200@sugarcrm.com> References: <50474A30.1090802@nznet.gen.nz> <5047D4A6.9030003@sugarcrm.com> <50486327.9050606@nznet.gen.nz> <50486C27.20200@sugarcrm.com> Date: Thu, 6 Sep 2012 08:01:12 -0400 Message-ID: To: Stas Malyshev Cc: "Morgan L. Owens" , PHP internals Content-Type: multipart/alternative; boundary=f46d042fde3069fc2704c9073fa9 Subject: Re: [PHP-DEV] Re: Moving to an AST-based parsing/compilation process From: ircmaxell@gmail.com (Anthony Ferrara) --f46d042fde3069fc2704c9073fa9 Content-Type: text/plain; charset=ISO-8859-1 Stas, On Thu, Sep 6, 2012 at 5:25 AM, Stas Malyshev wrote: > Hi! > > > Well, apart from perhaps leaving them with a simpler language that > > doesn't have the inconsistencies and corner cases that currently exist > > (and documented ad nauseum) not because of any design decision but > > "because the parser is written that way". > > If you think writing new parser gets rid of all corner cases you are in > for a big surprise. AST is not magic and parser will always be written > exactly the way it is written - so if somebody won't implement certain > feature in a consistent way, it won't be implemented in consistent way, > AST or not. > Actually, that's not true. Right now, the parser is parsing both syntax and a good bit of grammar. That's why we have so many reserved words. The compiler step implements some of the grammar, but the parser takes care of a significant amount of it. With a move to an AST based parsing, the parser can be greatly simplified, with a very significant reduction in reserved words. This has a few benefits: 1. Reduced number of first-class tokens makes parsing the syntax potentially much more efficient. This is at the expense of a more complicated compiling step (building and processing the AST). 2. It also removes the need for the parser to worry about precedence. It's parsing for syntax only, and then lets the AST compiler step worry about operator precedence... 3. It provides the ability for the grammar to be extended without modifying the syntax. That means that PECL extensions could theoretically add compiler steps to not only extend functionality, but grammar as well. For example, it may be possible to add language rules (such as an inline keyword for functions, or pre-processor macros) that allow for extension of the language without modifying the parser (I say may, because it depends strongly on the design of the parser and AST). 4. Since the parser doesn't directly make opcodes, it would mean that syntax errors (parse errors) would be able to be 100% recoverable. Compiler errors would be just as difficult to recover from though. 5. It opens the door to leveraging 3pd systems. For example, the Zend VM could hypothetically be replaced by a LLVM based VM. That would allow for JIT based php code. Note that this isn't HipHop (which is a limited subset of PHP), but full PHP running on a JIT VM. This could be implemented as a PECL extension, utilizing the core parser and runtime environment, just swapping out the executor step... Obviously this would not be trivial to build, but right now if you wanted to build it you'd need to fork PHP to do it (hence why the existing compilers for PHP all use a different parser). And it's a bit late to take design decisions on existing PHP language, > it seems to me. > It will never be easier to do than today. As time goes on, the language will continue to grow, and the syntax and grammar will only get more complicated from here out. So the easiest time to do it will be now... Anthony --f46d042fde3069fc2704c9073fa9--