Dear internals
After enviously looking at pythons grammar
(http://docs.python.org/dev/reference/grammar.html) I keep feeling
that PHP is missing out on a lot of interesting meta projects by not
having an official EBNF.
Building your own PHP parser is very hard and is PhD (Paul Biggar:)
level stuff if you wan't to get all the edge cases right. Having the
official EBNF would make this easier.
I know there is a lot of historical reasons for this and that creating
and maintaing said EBNF is a very serious task, one that is maybe too
big for a non-paying open source project.
But still I have to ask if I'm the only one thinking about this or is
there something I'm being completely ignorant about?
Happy newyear!
Rune Kaagaard
- Rune Kaagaard rumi.kg@gmail.com wrote:
Dear internals
After enviously looking at pythons grammar
(http://docs.python.org/dev/reference/grammar.html) I keep feeling
that PHP is missing out on a lot of interesting meta projects by not
having an official EBNF.
ACK. PHP also misses a lot of other fundamental specifications
(at least I'm not aware of them). That's probably one of reasons
for the many problems experienced from user and enterprise operator
side: sudden semantic changes.
Building your own PHP parser is very hard and is PhD (Paul Biggar:)
level stuff if you wan't to get all the edge cases right. Having the
official EBNF would make this easier.
Hmm, perhaps it really would make a good PhD project to actually
create a clear specification, a full language report (at least for
the language itself and the core library) and write an tiny reference
implementation. Once that specification is finished, it should become
the official one where official PHP is tested against.
cu
Enrico Weigelt, metux IT service -- http://www.metux.de/
phone: +49 36207 519931 email: weigelt@metux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
After enviously looking at pythons grammar
(http://docs.python.org/dev/reference/grammar.html) I keep feeling
that PHP is missing out on a lot of interesting meta projects by not
having an official EBNF.
ACK. PHP also misses a lot of other fundamental specifications
(at least I'm not aware of them). That's probably one of reasons
for the many problems experienced from user and enterprise operator
side: sudden semantic changes.
Building your own PHP parser is very hard and is PhD (Paul Biggar:)
level stuff if you wan't to get all the edge cases right. Having the
official EBNF would make this easier.
Hmm, perhaps it really would make a good PhD project to actually
create a clear specification, a full language report (at least for
the language itself and the core library) and write an tiny reference
implementation. Once that specification is finished, it should become
the official one where official PHP is tested against.
If anyone's curious why this hasn't been done...
There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax:
/* http://www.w3.org/TR/REC-xml/#sec-notation */
ws ::= [ \n\r\t]+
string ::= [a-zA-Z_#x7f-#xff] [a-zA-Z0-9_#x7f-#xff]*
namespace-name ::= '\'? string ( '\' string )*
use-declaration ::= 'use' ws+ namespace-name ( ws+ 'as' ws+ string )? ( ws* ',' ws* namespace-name ( ws+ 'as' ws+ string )? )+ ws* ';'
constant-declaration ::= 'const' ws+ string ws* '=' ws* static-scalar ( ws* ',' ws* string ws* '=' ws* static-scalar )* ws* ';'
inner-statement ::= statement | function-declaration-statement | class-declaration-statement
statement ::= unticked-statement | string ':'
unticked-statement ::= '{' ws* inner-statement* ws* '}' |
'if' ws* '(' ws* expr ws* ')' ws* statement ws* elseif* ws* else-single? |
'if' ws* '(' ws* expr ws* ')' ws* ':' inner-statement* elseif-2* ws* else-single-2?
halt-compiler ::= '__halt_compiler' ws* '(' ws* ')' ws* ';'
top-statement ::= inner-statement |
halt-compiler |
'namespace' ws+ namespace-name ws* ';' |
'namespace' ( ws+ namespace-name )? ws* '{' ws* top-statement-list ws* '}' |
use-declaration |
constant-declaration
script ::= top-statement*
Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate.
Obviously there's room for optimization. An EBNF doesn't have to jump through some of the hoops that a re2c parser backed by a flex lexer does; it could be simplified once all the parser rules were considered. Or it could be written without referring to the parser at all. Whether that would result in a better or worse grammar, I don't know.
Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
-- Gwynne
There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax:
I wanted to take your EBNF for a spin so I converted it to a format
that the python module "simpleparse" could read. I ironed out a couple
of kinks and fixed a bug. You can see it here:
http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
Then I created a prettyprinter to output the parsetree of some very
simple PHP code. See it here:
and the output is here:
Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate.
Thanks a lot for the example, that does not look so bad :) PHP syntax
is not simple so of course the EBNF will not be either. But still any
EBNF would be a lot better than none!
Testability is a real issue and makes for a nice catch-22. A
hypothetical roadmap could maybe look like this:
- Create the EBNF and reference implementation while comparing it to
a stable release. - Rewrite the Zend implementation to read from the EBNF.
- Repeat for all current releases.
It's tough to try to guess about things you don't really understand.
Looks like major work though!
Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
Just to have a solid, complete maintained EBNF would be a major leap forward!
Thanks for your cool reply!
Cheers
Rune
Hi all,
PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:
- We can separate a program into several statements.
- There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements. - Also, Namespace declaration may contain multiple statements if you
define them under brackets. - UseStatement can only be used inside a namespace or inside global scope.
- Finally, we support Classes.
Now we can describe a good portion of PHP grammar:
/* Terminals */
identifier
char
string
integer
float
boolean
/* Grammar Rules */
Literal ::= string | char | integer | float | boolean
Qualifier ::= ("private" | "public" | "protected") ["static"]
/* Identifiers */
NamespaceIdentifier ::= identifier {"" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier
/* Root grammar */
Program ::= {TopStatement} {Statement}
TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | ...
/* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier
/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier]
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier]
/* Comment Declaration /
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/" {"" string} "/"
DocBlockStatement ::= "/" {"" string} "/"
/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier
["extends" FullyQualifiedClassIdentifier] ["implements"
FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}]
ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";"
PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";"
MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration
| ComplexMethodDeclaration)
PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= ["final"] Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal]
TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier
ArrayTypeHint ::= "array"
Now it is easy to continue the work and add missing rules. =)
Cheers,
There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax:
I wanted to take your EBNF for a spin so I converted it to a format
that the python module "simpleparse" could read. I ironed out a couple
of kinks and fixed a bug. You can see it here:http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
Then I created a prettyprinter to output the parsetree of some very
simple PHP code. See it here:and the output is here:
Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate.
Thanks a lot for the example, that does not look so bad :) PHP syntax
is not simple so of course the EBNF will not be either. But still any
EBNF would be a lot better than none!Testability is a real issue and makes for a nice catch-22. A
hypothetical roadmap could maybe look like this:
- Create the EBNF and reference implementation while comparing it to
a stable release.- Rewrite the Zend implementation to read from the EBNF.
- Repeat for all current releases.
It's tough to try to guess about things you don't really understand.
Looks like major work though!Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
Just to have a solid, complete maintained EBNF would be a major leap forward!
Thanks for your cool reply!
Cheers
Rune--
--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
São Paulo - SP/Brazil
As a final note, I'd like to mention that even PHP grammar being quite
simple, it is light-years more complex (due to the lack of
standardization) than other languages.
You can compare this initial description I wrote to the Java
Specification and get your own conclusions:
http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html
Cheers,
On Sat, Jan 1, 2011 at 2:20 PM, guilhermeblanco@gmail.com
guilhermeblanco@gmail.com wrote:
Hi all,
PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:
- We can separate a program into several statements.
- There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements.- Also, Namespace declaration may contain multiple statements if you
define them under brackets.- UseStatement can only be used inside a namespace or inside global scope.
- Finally, we support Classes.
Now we can describe a good portion of PHP grammar:
/* Terminals */
identifier
char
string
integer
float
boolean/* Grammar Rules */
Literal ::= string | char | integer | float | booleanQualifier ::= ("private" | "public" | "protected") ["static"]
/* Identifiers */
NamespaceIdentifier ::= identifier {"" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier/* Root grammar */
Program ::= {TopStatement} {Statement}TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | .../* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier]
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier]/* Comment Declaration /
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/" {"" string} "/"
DocBlockStatement ::= "/" {"" string} "/"/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier
["extends" FullyQualifiedClassIdentifier] ["implements"
FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}]ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";"
PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";"
MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration
| ComplexMethodDeclaration)PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= ["final"] Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal]
TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier
ArrayTypeHint ::= "array"Now it is easy to continue the work and add missing rules. =)
Cheers,
There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax:
I wanted to take your EBNF for a spin so I converted it to a format
that the python module "simpleparse" could read. I ironed out a couple
of kinks and fixed a bug. You can see it here:http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
Then I created a prettyprinter to output the parsetree of some very
simple PHP code. See it here:and the output is here:
Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate.
Thanks a lot for the example, that does not look so bad :) PHP syntax
is not simple so of course the EBNF will not be either. But still any
EBNF would be a lot better than none!Testability is a real issue and makes for a nice catch-22. A
hypothetical roadmap could maybe look like this:
- Create the EBNF and reference implementation while comparing it to
a stable release.- Rewrite the Zend implementation to read from the EBNF.
- Repeat for all current releases.
It's tough to try to guess about things you don't really understand.
Looks like major work though!Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
Just to have a solid, complete maintained EBNF would be a major leap forward!
Thanks for your cool reply!
Cheers
Rune--
--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
São Paulo - SP/Brazil
--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
São Paulo - SP/Brazil
Hi Guilherme
You wrote that Java spec? Cool! Also very nice example of the PHP
EBNF! I think PHP needs a canonical one of those and that the parser
should be rewritten to represent said EBNF. Thats what I'm dreaming of
at least :)
Cheers
Rune
On Sat, Jan 1, 2011 at 5:23 PM, guilhermeblanco@gmail.com
guilhermeblanco@gmail.com wrote:
As a final note, I'd like to mention that even PHP grammar being quite
simple, it is light-years more complex (due to the lack of
standardization) than other languages.You can compare this initial description I wrote to the Java
Specification and get your own conclusions:
http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.htmlCheers,
On Sat, Jan 1, 2011 at 2:20 PM, guilhermeblanco@gmail.com
guilhermeblanco@gmail.com wrote:Hi all,
PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:
- We can separate a program into several statements.
- There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements.- Also, Namespace declaration may contain multiple statements if you
define them under brackets.- UseStatement can only be used inside a namespace or inside global scope.
- Finally, we support Classes.
Now we can describe a good portion of PHP grammar:
/* Terminals */
identifier
char
string
integer
float
boolean/* Grammar Rules */
Literal ::= string | char | integer | float | booleanQualifier ::= ("private" | "public" | "protected") ["static"]
/* Identifiers */
NamespaceIdentifier ::= identifier {"" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= [NamespaceIdentifier] ClassIdentifier/* Root grammar */
Program ::= {TopStatement} {Statement}TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | .../* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier ["as" NamespaceIdentifier]
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier ["as" ClassIdentifier]/* Comment Declaration /
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/" {"" string} "/"
DocBlockStatement ::= "/" {"" string} "/"/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= [abstract] "class" ClassIdentifier
["extends" FullyQualifiedClassIdentifier] ["implements"
FullyQualifiedClassIdentifier {"," FullyQualifiedClassIdentifier}]ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= [DocBlockStatement] "const" identifier "=" Literal ";"
PropertyDeclaration ::= [DocBlockStatement] Qualifier Variable ["=" Literal] ";"
MethodDeclaration ::= [DocBlockStatement] (PrototypeMethodDeclaration
| ComplexMethodDeclaration)PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= ["final"] Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= [TypeHint] Variable ["=" Literal]
TypeHint ::= ArrayTypeHint | FullyQualifiedClassIdentifier
ArrayTypeHint ::= "array"Now it is easy to continue the work and add missing rules. =)
Cheers,
There has never been a language grammar, so there's been nothing to refer to at all. As for why no one's made one more recently, for fun I snagged the .l and .y files from trunk and W3C's version of EBNF from XML. In two hours of hacking away, I managed to come up with this sort-of beginning to a grammar, which I'm certain contains several errors, and only hints at a syntax:
I wanted to take your EBNF for a spin so I converted it to a format
that the python module "simpleparse" could read. I ironed out a couple
of kinks and fixed a bug. You can see it here:http://code.google.com/p/php-snow/source/browse/branches/php-ebnf/gwynne-raskind-example/php.ebnf
Then I created a prettyprinter to output the parsetree of some very
simple PHP code. See it here:and the output is here:
Considering what it takes JUST to define namespaces, halt_compiler, basic blocks, and the idea of a conditional statement... well, suffice to say the "expr" production alone would be triple the size of this. It doesn't help that there's no way I'm immediately aware of to check whether a grammar like this is accurate.
Thanks a lot for the example, that does not look so bad :) PHP syntax
is not simple so of course the EBNF will not be either. But still any
EBNF would be a lot better than none!Testability is a real issue and makes for a nice catch-22. A
hypothetical roadmap could maybe look like this:
- Create the EBNF and reference implementation while comparing it to
a stable release.- Rewrite the Zend implementation to read from the EBNF.
- Repeat for all current releases.
It's tough to try to guess about things you don't really understand.
Looks like major work though!Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
Just to have a solid, complete maintained EBNF would be a major leap forward!
Thanks for your cool reply!
Cheers
Rune--
--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
São Paulo - SP/Brazil--
Guilherme Blanco
Mobile: +55 (16) 9215-8480
MSN: guilhermeblanco@hotmail.com
São Paulo - SP/Brazil
Nonetheless, it's a significant undertaking to deal with the complexity of the language. There are dozens of tiny little edge cases in PHP's parsing that require bunches of extra parser rules. An example from above is the difference between using "statement" and "inner-statement" for the two different forms of "if". Because "statement" includes basic blocks and labels, the rule disallows writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on the grammar. In its present form, it will never reduce down to something nearly as small as Python's.
Just to have a solid, complete maintained EBNF would be a major leap forward!
Having an EBNF would be useful in cases where we want to write
something like Ruby's CoffeeScript. After looking at PHP's grammar
file, it's about 1,000 lines long. Since this is used to generate the
parser, isn't it possible to strip out the C macros to create an EBNF
that catches all edge cases?
Jon
Having an EBNF would be useful in cases where we want to write
something like Ruby's CoffeeScript. After looking at PHP's grammar
file, it's about 1,000 lines long. Since this is used to generate the
parser, isn't it possible to strip out the C macros to create an EBNF
that catches all edge cases?
Not being sure at all, but I reckon a lot of those edge cases are
handled in the c macros and not in the plain "yacc"-style grammar
definition.
Hi!
But still I have to ask if I'm the only one thinking about this or is
there something I'm being completely ignorant about?
You're not the only one thinking about it. But so far nobody moved from
thinking about it to actually doing it :)
--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227