Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51179 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 87267 invoked from network); 1 Jan 2011 09:00:44 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jan 2011 09:00:44 -0000 Authentication-Results: pb1.pair.com smtp.mail=gwynne@darkrainfall.org; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=gwynne@darkrainfall.org; sender-id=unknown; domainkeys=bad (key type) Received-SPF: error (pb1.pair.com: domain darkrainfall.org from 208.97.132.119 cause and error) DomainKey-Status: bad (key type) X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: gwynne@darkrainfall.org X-Host-Fingerprint: 208.97.132.119 caiajhbdcbbj.dreamhost.com Linux 2.6 Received: from [208.97.132.119] ([208.97.132.119:52351] helo=homiemail-a10.g.dreamhost.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D4/16-30618-93DEE1D4 for ; Sat, 01 Jan 2011 04:00:42 -0500 Received: from homiemail-a10.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a10.g.dreamhost.com (Postfix) with ESMTP id 0B2AA28006C; Sat, 1 Jan 2011 01:00:39 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=darkrainfall.org; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= darkrainfall.org; b=mjhEDqjhPE3alwQw0NFHTWqMrpvD9JtSUYvYq4Cu4Z5G Vh1sM59zfI8CDNhSncUxo5Xd3mf1ZcJOb8iBjoJq8P5xCsLQRSUGD9ObRVsIyekp aB+GQ4nXXSJNzuSI/Bg9qx0qIHnLnvV2qQjj2opmXkXWqjQBXAD/ErItJSaYDqQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=darkrainfall.org; h= subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s= darkrainfall.org; bh=UmGRhBse8I/tfEea8IYBTRl5kkU=; b=RHF7TkMzCnD q37DVtJqWXpneFq+cXgOR4GDWXxi4s3xtLNUvkiK2ixifMMl8LKfvfB9jVKeTVQG +RqcfIlnqb/Zl4Pen49Yb+UgwQwNnowLaz2xprs1exlsQNbTx6JsSVB2o2fh5T85 a960QhLZNv5aWl2gwy4B6Y1Y58Ealon8= Received: from [192.168.1.3] (pool-173-48-161-16.bstnma.fios.verizon.net [173.48.161.16]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: gwynne@darkrainfall.org) by homiemail-a10.g.dreamhost.com (Postfix) with ESMTPSA id 8201F280065; Sat, 1 Jan 2011 01:00:38 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii In-Reply-To: <20101231115408.GD18520@nibiru.local> Date: Sat, 1 Jan 2011 04:00:37 -0500 Cc: internals@lists.php.net Content-Transfer-Encoding: quoted-printable Message-ID: <542423FA-1522-4AEC-8CC3-4AFF2DC4B453@darkrainfall.org> References: <20101231115408.GD18520@nibiru.local> To: weigelt@metux.de X-Mailer: Apple Mail (2.1082) Subject: Re: [PHP-DEV] Re: EBNF From: gwynne@darkrainfall.org (Gwynne Raskind) On Dec 31, 2010, at 6:54 AM, Enrico Weigelt wrote: >> After enviously looking at pythons grammar >> (http://docs.python.org/dev/reference/grammar.html) I keep feeling >> that PHP is missing out on a lot of interesting meta projects by not >> having an official EBNF. > ACK. PHP also misses a lot of other fundamental specifications > (at least I'm not aware of them). That's probably one of reasons > for the many problems experienced from user and enterprise operator > side: sudden semantic changes. >> Building your own PHP parser is _very_ hard and is PhD (Paul Biggar:) >> level stuff if you wan't to get all the edge cases right. Having = _the_ >> official EBNF would make this easier. > Hmm, perhaps it really would make a good PhD project to actually > create a clear specification, a full language report (at least for > the language itself and the core library) and write an tiny reference > implementation. Once that specification is finished, it should become > the official one where official PHP is tested against. If anyone's curious why this hasn't been done... There has never been a language grammar, so there's been nothing to = refer to at all. As for why no one's made one more recently, for fun I = snagged the .l and .y files from trunk and W3C's version of EBNF from = XML. In two hours of hacking away, I managed to come up with this = sort-of beginning to a grammar, which I'm certain contains several = errors, and only hints at a syntax: /* http://www.w3.org/TR/REC-xml/#sec-notation */ ws ::=3D [ \n\r\t]+ string ::=3D [a-zA-Z_#x7f-#xff] [a-zA-Z0-9_#x7f-#xff]* namespace-name ::=3D '\\'? string ( '\\' string )* use-declaration ::=3D 'use' ws+ namespace-name ( ws+ 'as' ws+ string )? = ( ws* ',' ws* namespace-name ( ws+ 'as' ws+ string )? )+ ws* ';' constant-declaration ::=3D 'const' ws+ string ws* '=3D' ws* = static-scalar ( ws* ',' ws* string ws* '=3D' ws* static-scalar )* ws* = ';' inner-statement ::=3D statement | function-declaration-statement | = class-declaration-statement statement ::=3D unticked-statement | string ':' unticked-statement ::=3D '{' ws* inner-statement* ws* '}' | 'if' ws* '(' ws* expr ws* ')' ws* statement ws* = elseif* ws* else-single? | 'if' ws* '(' ws* expr ws* ')' ws* ':' = inner-statement* elseif-2* ws* else-single-2? halt-compiler ::=3D '__halt_compiler' ws* '(' ws* ')' ws* ';' top-statement ::=3D inner-statement | halt-compiler | 'namespace' ws+ namespace-name ws* ';' | 'namespace' ( ws+ namespace-name )? ws* '{' ws* = top-statement-list ws* '}' | use-declaration | constant-declaration script ::=3D top-statement* Considering what it takes JUST to define namespaces, halt_compiler, = basic blocks, and the idea of a conditional statement... well, suffice = to say the "expr" production alone would be triple the size of this. It = doesn't help that there's no way I'm immediately aware of to check = whether a grammar like this is accurate. Obviously there's room for optimization. An EBNF doesn't have to jump = through some of the hoops that a re2c parser backed by a flex lexer = does; it could be simplified once all the parser rules were considered. = Or it could be written without referring to the parser at all. Whether = that would result in a better or worse grammar, I don't know. Nonetheless, it's a significant undertaking to deal with the complexity = of the language. There are dozens of tiny little edge cases in PHP's = parsing that require bunches of extra parser rules. An example from = above is the difference between using "statement" and "inner-statement" = for the two different forms of "if". Because "statement" includes basic = blocks and labels, the rule disallows writing "if: { xyz; } endif;", = since apparently Zend doesn't support arbitrary basic blocks. All those = cases wreak havoc on the grammar. In its present form, it will never = reduce down to something nearly as small as Python's. -- Gwynne