Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:51183 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33630 invoked from network); 1 Jan 2011 16:24:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jan 2011 16:24:04 -0000 Authentication-Results: pb1.pair.com header.from=guilhermeblanco@gmail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=guilhermeblanco@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.42 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: guilhermeblanco@gmail.com X-Host-Fingerprint: 209.85.161.42 mail-fx0-f42.google.com Received: from [209.85.161.42] ([209.85.161.42:43170] helo=mail-fx0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A0/80-29250-3255F1D4 for ; Sat, 01 Jan 2011 11:24:04 -0500 Received: by fxm11 with SMTP id 11so11766508fxm.29 for ; Sat, 01 Jan 2011 08:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=sWYwQ8PjWIQsZr9bl/zNClqlp4CV5az3KnRKyX7fcxk=; b=EV5vOKzhWtS+UePK+cFBlRcQpwYjPG7b4rdUeh9m+CQJnwpyIQ9jSVYib8Qom90V4U Ejs3rscvg07m9afHCuf2wl5/K647XKqGKxlrDtTwCAUHIIpltZx8D7asrSaBOT9auV9q cAVWJdHVXvZ/ru1z2Vk/p09tR6hekO2r66EnM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=mil+n3/0PL9l4ZdFk6KT3Dl2RSSu5HkC3aIeor3DAhGArEMpadi8kKOr/sRIXxiA2y XHYaCH03rCHFTST8UP/Pr4g/182spu8xIy6Dxy8TEOh3HWwepNo7k/5Okmx6VAsxr7SI HJnG795zZOV+daBwJKzgAZrj5VK1gXGFhfMqE= MIME-Version: 1.0 Received: by 10.223.103.3 with SMTP id i3mr5272901fao.137.1293899039334; Sat, 01 Jan 2011 08:23:59 -0800 (PST) Received: by 10.223.160.68 with HTTP; Sat, 1 Jan 2011 08:23:59 -0800 (PST) In-Reply-To: References: <20101231115408.GD18520@nibiru.local> <542423FA-1522-4AEC-8CC3-4AFF2DC4B453@darkrainfall.org> Date: Sat, 1 Jan 2011 14:23:59 -0200 Message-ID: To: Rune Kaagaard Cc: internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: EBNF From: guilhermeblanco@gmail.com ("guilhermeblanco@gmail.com") As a final note, I'd like to mention that even PHP grammar being quite simple, it is light-years more complex (due to the lack of standardization) than other languages. You can compare this initial description I wrote to the Java Specification and get your own conclusions: http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html Cheers, On Sat, Jan 1, 2011 at 2:20 PM, guilhermeblanco@gmail.com wrote: > Hi all, > > PHP grammar is far from being complex. It is possible to describe most > of the syntax with a simple explanation. > Example: > > * We can separate a program into several statements. > * There're a couple of items that cannot be declared into different > places (namespace, use), so consider them as top-statements. > * Also, Namespace declaration may contain multiple statements if you > define them under brackets. > * UseStatement can only be used inside a namespace or inside global scope= . > * Finally, we support Classes. > > Now we can describe a good portion of PHP grammar: > > /* Terminals */ > identifier > char > string > integer > float > boolean > > /* Grammar Rules */ > Literal ::=3D string | char | integer | float | boolean > > Qualifier ::=3D ("private" | "public" | "protected") ["static"] > > /* Identifiers */ > NamespaceIdentifier ::=3D identifier {"\" identifier} > ClassIdentifier ::=3D identifier > MethodIdentifier ::=3D identifier > FullyQualifiedClassIdentifier ::=3D [NamespaceIdentifier] ClassIdentifier > > /* Root grammar */ > Program ::=3D {TopStatement} {Statement} > > TopStatement ::=3D NamespaceDeclaration | UseStatement | CommentStatement > Statement ::=3D ClassDeclaration | FunctionDeclaration | ... > > /* Namespace Declaration */ > NamespaceDeclaration ::=3D InlineNamespaceDeclaration | ScopeNamespaceDec= laration > InlineNamespaceDeclaration ::=3D SimpleNamespaceDeclaration ";" > {UseDeclaration} {Statement} > ScopeNamespaceDeclaration ::=3D SimpleNamespaceDeclaration "{" > {UseDeclaration} {Statement} "}" > SimpleNamespaceDeclaration ::=3D "namespace" NamespaceIdentifier > > /* Use Statement */ > UseStatement ::=3D "use" SimpleUseStatement {"," SimpleUseStatement} ";" > SimpleUseStatement ::=3D SimpleNamespaceUseStatement | SimpleClassUseStat= ement > SimpleNamespaceUseStatement ::=3D NamespaceIdentifier ["as" NamespaceIden= tifier] > SimpleClassUseStatement ::=3D FullyQualifiedClassIdentifier ["as" ClassId= entifier] > > /* Comment Declaration */ > CommentStatement ::=3D InlineCommentStatement | MultilineCommentStatement > InlineCommentStatement ::=3D ("//" | "#") string > MultilineCommentStatement ::=3D SimpleMultilineCommentStatement | > DocBlockStatement > SimpleMultilineCommentStatement ::=3D "/*" {"*" string} "*/" > DocBlockStatement ::=3D "/**" {"*" string} "*/" > > /* Class Declaration */ > ClassDeclaration ::=3D SimpleClassDeclaration "{" {ClassMemberDeclaration= } "}" > SimpleClassDeclaration ::=3D [abstract] "class" ClassIdentifier > ["extends" FullyQualifiedClassIdentifier] ["implements" > FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}] > > ClassMemberDeclaration ::=3D ConstDeclaration | PropertyDeclaration | > MethodDeclaration > ConstDeclaration ::=3D [DocBlockStatement] "const" identifier "=3D" Liter= al ";" > PropertyDeclaration ::=3D [DocBlockStatement] Qualifier Variable ["=3D" L= iteral] ";" > MethodDeclaration ::=3D [DocBlockStatement] (PrototypeMethodDeclaration > | ComplexMethodDeclaration) > > PrototypeMethodDeclaration ::=3D "abstract" Qualifier "function" > MethodIdentifier "(" {ArgumentDeclaration} ");" > ComplexMethodDeclaration ::=3D ["final"] Qualifier "function" > MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}" > ArgumentDeclaration ::=3D SimpleArgumentDeclatation {"," > SimpleArgumentDeclaration} > SimpleArgumentDeclaration ::=3D [TypeHint] Variable ["=3D" Literal] > TypeHint ::=3D ArrayTypeHint | FullyQualifiedClassIdentifier > ArrayTypeHint ::=3D "array" > > > Now it is easy to continue the work and add missing rules. =3D) > > > > Cheers, > > On Sat, Jan 1, 2011 at 12:46 PM, Rune Kaagaard wrote: >>> There has never been a language grammar, so there's been nothing to ref= er to at all. As for why no one's made one more recently, for fun I snagged= the .l and .y files from trunk and W3C's version of EBNF from XML. In two = hours of hacking away, I managed to come up with this sort-of beginning to = a grammar, which I'm certain contains several errors, and only hints at a s= yntax: >> >> I wanted to take your EBNF for a spin so I converted it to a format >> that the python module "simpleparse" could read. I ironed out a couple >> of kinks and fixed a bug. You can see it here: >> >> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne= -raskind-example/php.ebnf >> >> Then I created a prettyprinter to output the parsetree of some very >> simple PHP code. See it here: >> >> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne= -raskind-example/parse_example.py >> >> and the output is here: >> >> http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne= -raskind-example/parse_example.output >> >>> Considering what it takes JUST to define namespaces, halt_compiler, bas= ic blocks, and the idea of a conditional statement... well, suffice to say = the "expr" production alone would be triple the size of this. It doesn't he= lp that there's no way I'm immediately aware of to check whether a grammar = like this is accurate. >> >> Thanks a lot for the example, that does not look so bad :) PHP syntax >> is not simple so of course the EBNF will not be either. But still any >> EBNF would be a lot better than none! >> >> Testability is a real issue and makes for a nice catch-22. A >> hypothetical roadmap could _maybe_ look like this: >> >> 1) Create the EBNF and reference implementation while comparing it to >> a stable release. >> 2) Rewrite the Zend implementation to read from the EBNF. >> 3) Repeat for all current releases. >> >> It's tough to try to guess about things you don't really understand. >> Looks like major work though! >> >>> Nonetheless, it's a significant undertaking to deal with the complexity= of the language. There are dozens of tiny little edge cases in PHP's parsi= ng that require bunches of extra parser rules. An example from above is the= difference between using "statement" and "inner-statement" for the two dif= ferent forms of "if". Because "statement" includes basic blocks and labels,= the rule disallows writing "if: { xyz; } endif;", since apparently Zend do= esn't support arbitrary basic blocks. All those cases wreak havoc on the gr= ammar. In its present form, it will never reduce down to something nearly a= s small as Python's. >> >> Just to have a solid, complete maintained EBNF would be a _major_ leap f= orward! >> >> Thanks for your cool reply! >> >> Cheers >> Rune >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> >> > > > > -- > Guilherme Blanco > Mobile: +55 (16) 9215-8480 > MSN: guilhermeblanco@hotmail.com > S=C3=A3o Paulo - SP/Brazil > --=20 Guilherme Blanco Mobile: +55 (16) 9215-8480 MSN: guilhermeblanco@hotmail.com S=C3=A3o Paulo - SP/Brazil