Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51182 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 32290 invoked from network); 1 Jan 2011 16:20:39 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jan 2011 16:20:39 -0000 Authentication-Results: pb1.pair.com smtp.mail=guilhermeblanco@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=guilhermeblanco@gmail.com; sender-id=pass; domainkeys=bad Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.42 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: guilhermeblanco@gmail.com X-Host-Fingerprint: 209.85.161.42 mail-fx0-f42.google.com Received: from [209.85.161.42] ([209.85.161.42:65113] helo=mail-fx0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 03/30-29250-6545F1D4 for ; Sat, 01 Jan 2011 11:20:38 -0500 Received: by fxm11 with SMTP id 11so11765613fxm.29 for ; Sat, 01 Jan 2011 08:20:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=HXKJjjkjTfFWdjep8uBTGr8Ug2EAOYHIGzNY9diKuHA=; b=R0I3gu9fSjMq39gblKKggUr0INbGZMnywc/xajgtiV2+NqKsVlN1ivVrOeDs0FqDKW VZ3hgtDJnn/0NFBNLxrrEQnISsM5xcqttK5j8COFFPUzHaT+vPuBbzf+/5B+NPjgPGk9 02OwHWvH2dMeIdQccPxfbF+NX/4OmrqbaE7kU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=L8Crv4ALBsqkRJTODImdmNKoQ59PcUgPwH9fClk0wD3sytCmjSO6M3YZBGulF6JA4a hce6UQnvjd42TcGMD+Qsv/oeh+xvl1HTwhKHmdEmf2ETxG/yOkO7+jxiO8M3u7L1QtjM XUE0QUDMuZZFyrnzqzybLpBfhlU8UOJwcJdk8= MIME-Version: 1.0 Received: by 10.223.101.201 with SMTP id d9mr713150fao.23.1293898835403; Sat, 01 Jan 2011 08:20:35 -0800 (PST) Received: by 10.223.160.68 with HTTP; Sat, 1 Jan 2011 08:20:35 -0800 (PST) In-Reply-To: References: <20101231115408.GD18520@nibiru.local> <542423FA-1522-4AEC-8CC3-4AFF2DC4B453@darkrainfall.org> Date: Sat, 1 Jan 2011 14:20:35 -0200 Message-ID: To: Rune Kaagaard Cc: internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: EBNF From: guilhermeblanco@gmail.com ("guilhermeblanco@gmail.com") Hi all, PHP grammar is far from being complex. It is possible to describe most of the syntax with a simple explanation. Example: * We can separate a program into several statements. * There're a couple of items that cannot be declared into different places (namespace, use), so consider them as top-statements. * Also, Namespace declaration may contain multiple statements if you define them under brackets. * UseStatement can only be used inside a namespace or inside global scope. * Finally, we support Classes. Now we can describe a good portion of PHP grammar: /* Terminals */ identifier char string integer float boolean /* Grammar Rules */ Literal ::=3D string | char | integer | float | boolean Qualifier ::=3D ("private" | "public" | "protected") ["static"] /* Identifiers */ NamespaceIdentifier ::=3D identifier {"\" identifier} ClassIdentifier ::=3D identifier MethodIdentifier ::=3D identifier FullyQualifiedClassIdentifier ::=3D [NamespaceIdentifier] ClassIdentifier /* Root grammar */ Program ::=3D {TopStatement} {Statement} TopStatement ::=3D NamespaceDeclaration | UseStatement | CommentStatement Statement ::=3D ClassDeclaration | FunctionDeclaration | ... /* Namespace Declaration */ NamespaceDeclaration ::=3D InlineNamespaceDeclaration | ScopeNamespaceDecla= ration InlineNamespaceDeclaration ::=3D SimpleNamespaceDeclaration ";" {UseDeclaration} {Statement} ScopeNamespaceDeclaration ::=3D SimpleNamespaceDeclaration "{" {UseDeclaration} {Statement} "}" SimpleNamespaceDeclaration ::=3D "namespace" NamespaceIdentifier /* Use Statement */ UseStatement ::=3D "use" SimpleUseStatement {"," SimpleUseStatement} ";" SimpleUseStatement ::=3D SimpleNamespaceUseStatement | SimpleClassUseStatem= ent SimpleNamespaceUseStatement ::=3D NamespaceIdentifier ["as" NamespaceIdenti= fier] SimpleClassUseStatement ::=3D FullyQualifiedClassIdentifier ["as" ClassIden= tifier] /* Comment Declaration */ CommentStatement ::=3D InlineCommentStatement | MultilineCommentStatement InlineCommentStatement ::=3D ("//" | "#") string MultilineCommentStatement ::=3D SimpleMultilineCommentStatement | DocBlockStatement SimpleMultilineCommentStatement ::=3D "/*" {"*" string} "*/" DocBlockStatement ::=3D "/**" {"*" string} "*/" /* Class Declaration */ ClassDeclaration ::=3D SimpleClassDeclaration "{" {ClassMemberDeclaration} = "}" SimpleClassDeclaration ::=3D [abstract] "class" ClassIdentifier ["extends" FullyQualifiedClassIdentifier] ["implements" FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}] ClassMemberDeclaration ::=3D ConstDeclaration | PropertyDeclaration | MethodDeclaration ConstDeclaration ::=3D [DocBlockStatement] "const" identifier "=3D" Literal= ";" PropertyDeclaration ::=3D [DocBlockStatement] Qualifier Variable ["=3D" Lit= eral] ";" MethodDeclaration ::=3D [DocBlockStatement] (PrototypeMethodDeclaration | ComplexMethodDeclaration) PrototypeMethodDeclaration ::=3D "abstract" Qualifier "function" MethodIdentifier "(" {ArgumentDeclaration} ");" ComplexMethodDeclaration ::=3D ["final"] Qualifier "function" MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}" ArgumentDeclaration ::=3D SimpleArgumentDeclatation {"," SimpleArgumentDeclaration} SimpleArgumentDeclaration ::=3D [TypeHint] Variable ["=3D" Literal] TypeHint ::=3D ArrayTypeHint | FullyQualifiedClassIdentifier ArrayTypeHint ::=3D "array" Now it is easy to continue the work and add missing rules. =3D) Cheers, On Sat, Jan 1, 2011 at 12:46 PM, Rune Kaagaard wrote: >> There has never been a language grammar, so there's been nothing to refe= r to at all. As for why no one's made one more recently, for fun I snagged = the .l and .y files from trunk and W3C's version of EBNF from XML. In two h= ours of hacking away, I managed to come up with this sort-of beginning to a= grammar, which I'm certain contains several errors, and only hints at a sy= ntax: > > I wanted to take your EBNF for a spin so I converted it to a format > that the python module "simpleparse" could read. I ironed out a couple > of kinks and fixed a bug. You can see it here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-= raskind-example/php.ebnf > > Then I created a prettyprinter to output the parsetree of some very > simple PHP code. See it here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-= raskind-example/parse_example.py > > and the output is here: > > http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-= raskind-example/parse_example.output > >> Considering what it takes JUST to define namespaces, halt_compiler, basi= c blocks, and the idea of a conditional statement... well, suffice to say t= he "expr" production alone would be triple the size of this. It doesn't hel= p that there's no way I'm immediately aware of to check whether a grammar l= ike this is accurate. > > Thanks a lot for the example, that does not look so bad :) PHP syntax > is not simple so of course the EBNF will not be either. But still any > EBNF would be a lot better than none! > > Testability is a real issue and makes for a nice catch-22. A > hypothetical roadmap could _maybe_ look like this: > > 1) Create the EBNF and reference implementation while comparing it to > a stable release. > 2) Rewrite the Zend implementation to read from the EBNF. > 3) Repeat for all current releases. > > It's tough to try to guess about things you don't really understand. > Looks like major work though! > >> Nonetheless, it's a significant undertaking to deal with the complexity = of the language. There are dozens of tiny little edge cases in PHP's parsin= g that require bunches of extra parser rules. An example from above is the = difference between using "statement" and "inner-statement" for the two diff= erent forms of "if". Because "statement" includes basic blocks and labels, = the rule disallows writing "if: { xyz; } endif;", since apparently Zend doe= sn't support arbitrary basic blocks. All those cases wreak havoc on the gra= mmar. In its present form, it will never reduce down to something nearly as= small as Python's. > > Just to have a solid, complete maintained EBNF would be a _major_ leap fo= rward! > > Thanks for your cool reply! > > Cheers > Rune > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > --=20 Guilherme Blanco Mobile: +55 (16) 9215-8480 MSN: guilhermeblanco@hotmail.com S=C3=A3o Paulo - SP/Brazil