Hi folks. Ilija is still working on the implementation for the pattern matching RFC, which we want to complete before proposing it officially in case we run into implementation challenges.
Such as these, on which we'd like feedback on how to proceed.
Object property patterns
Consider this code snippet:
class C {
public $prop = 42 is Foo{};
}
The parser could interpret this in multiple ways:
- make a public property named $prop, with default value of "the result of
42 is Foo
" (which would be false), and then has an empty (and therefore invalid) property hook block - make a public property named $prop, with default value of "whatever the result of
42 is Foo {}
" (which would be false).
Since the parser doesn't allow for ambiguity, this is not workable. Because PHP uses only an LL(1) parser, there's no way to "determine from context" which is intended, eg, by saying "well the hook block is empty so it must have been part of the pattern before it."
In practice, no one should be writing code like the above, as it's needlessly nonsensical (it's statically false in all cases), but the parser doesn't know that.
The only solution we've come up with is to not have object patterns, but have a property list pattern that can be compounded with a type pattern. Eg:
$p is Point & {x: 5}
Instead of what we have now:
$p is Point{x: 5}
Ilija says this will resolve the parsing issue. It would also make it possible to match $p is {x: 5}
, which would not check the type of $p at all, just that it's an object with an $x property with value 5. That is arguably a useful feature in some cases, but does make the common case (matching type and properties) considerably more clunky.
So, questions:
- Would splitting the object pattern like that be acceptable?
- Does someone have a really good alternate suggestion that wouldn't confuse the parser?
Variable binding and pinning
Previously there was much discussion about the syntax we wanted for these features. In particular, variable binding means "pull a sub-value out of the matched value to its own variable, if the pattern matches." Variable pinning means "use some already-existing variable here to dynamically form the pattern." Naturally, these cannot both just be a variable name on their own, as that would be confusing (both for users and the engine).
For example:
$b = '12';
if ($arr is ['a' => assign to $a, 'b' => assert is equal to $b]) {
print $a;
}
Based on my research[1], the overwhelming majority of languages use a bare variable name to indicate variable binding. Only one language, Ruby, has variable pinning, which it indicates with a ^ prefix. Following Ruby's lead, as the RFC text does right now, would yield:
$b = '12';
if ($arr is ['a' => $a, 'b' => ^$b]) {
print $a;
}
That approach would be most like other languages with pattern matching.
However, there is a concern that it wouldn't be self-evident to PHP devs, and the variable binding side should have the extra marker. Ilija has suggested &, as that's what's used for references, which would result in:
$b = '12';
if ($arr is ['a' => &$a, 'b' => $b]) {
print $a;
}
There are two concerns with this approach.
- The & could get confusing with an AND conjunction, eg,
$value is int & &$x
(which is how you would bind $value to $x iff it is an integer). - In practice, binding is almost certainly going to be vastly more common than pinning, so it should likely have the shorter syntax.
There are of course other prefixes that could be used, such as let
(introduces a new keyword, possibly confusing as it wouldn't imply scope restrictions like in other languages) or var
(no new keyword, but could still be confusing and it's not obvious which side should get it), but ^ is probably the only single-character option.
So, question:
- Are you OK with the current Ruby-inspired syntax? ($a means bind, ^$b means pin.)
- If not, have you a counter-proposal that would garner consensus?
Thanks all.
[1] https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md
--
Larry Garfield
larry@garfieldtech.com
Hi folks. Ilija is still working on the implementation for the pattern matching RFC, which we want to complete before proposing it officially in case we run into implementation challenges.
Such as these, on which we'd like feedback on how to proceed.
Object property patterns
Consider this code snippet:
class C {
public $prop = 42 is Foo{};
}The parser could interpret this in multiple ways:
- make a public property named $prop, with default value of "the result of
42 is Foo
" (which would be false), and then has an empty (and therefore invalid) property hook block- make a public property named $prop, with default value of "whatever the result of
42 is Foo {}
" (which would be false).Since the parser doesn't allow for ambiguity, this is not workable. Because PHP uses only an LL(1) parser, there's no way to "determine from context" which is intended, eg, by saying "well the hook block is empty so it must have been part of the pattern before it."
In practice, no one should be writing code like the above, as it's needlessly nonsensical (it's statically false in all cases), but the parser doesn't know that.
The only solution we've come up with is to not have object patterns, but have a property list pattern that can be compounded with a type pattern. Eg:
$p is Point & {x: 5}
Instead of what we have now:
$p is Point{x: 5}
Ilija says this will resolve the parsing issue. It would also make it possible to match
$p is {x: 5}
, which would not check the type of $p at all, just that it's an object with an $x property with value 5. That is arguably a useful feature in some cases, but does make the common case (matching type and properties) considerably more clunky.So, questions:
- Would splitting the object pattern like that be acceptable?
- Does someone have a really good alternate suggestion that wouldn't confuse the parser?
Variable binding and pinning
Previously there was much discussion about the syntax we wanted for these features. In particular, variable binding means "pull a sub-value out of the matched value to its own variable, if the pattern matches." Variable pinning means "use some already-existing variable here to dynamically form the pattern." Naturally, these cannot both just be a variable name on their own, as that would be confusing (both for users and the engine).
For example:
$b = '12';
if ($arr is ['a' => assign to $a, 'b' => assert is equal to $b]) {
print $a;
}Based on my research[1], the overwhelming majority of languages use a bare variable name to indicate variable binding. Only one language, Ruby, has variable pinning, which it indicates with a ^ prefix. Following Ruby's lead, as the RFC text does right now, would yield:
$b = '12';
if ($arr is ['a' => $a, 'b' => ^$b]) {
print $a;
}That approach would be most like other languages with pattern matching.
However, there is a concern that it wouldn't be self-evident to PHP devs, and the variable binding side should have the extra marker. Ilija has suggested &, as that's what's used for references, which would result in:
$b = '12';
if ($arr is ['a' => &$a, 'b' => $b]) {
print $a;
}There are two concerns with this approach.
- The & could get confusing with an AND conjunction, eg,
$value is int & &$x
(which is how you would bind $value to $x iff it is an integer).- In practice, binding is almost certainly going to be vastly more common than pinning, so it should likely have the shorter syntax.
There are of course other prefixes that could be used, such as
let
(introduces a new keyword, possibly confusing as it wouldn't imply scope restrictions like in other languages) orvar
(no new keyword, but could still be confusing and it's not obvious which side should get it), but ^ is probably the only single-character option.So, question:
- Are you OK with the current Ruby-inspired syntax? ($a means bind, ^$b means pin.)
- If not, have you a counter-proposal that would garner consensus?
Thanks all.
[1] https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md
--
Larry Garfield
larry@garfieldtech.com
Hey Larry,
Instead of symbols, why not use words?
We already have &&, but it looks like this uses & instead, which is a bitwise-and. But the language does have “and” as a keyword. So instead of:
$value is int & &$x
It would be:
$value is int and &$x
Which removes the confusion you mentioned before (also for someone like me who uses bitwise-and quite a bit).
— Rob
Hi folks. Ilija is still working on the implementation for the
pattern matching RFC, which we want to complete before proposing it
officially in case we run into implementation challenges.Such as these, on which we'd like feedback on how to proceed.
Object property patterns
Consider this code snippet:
class C {
public $prop = 42 is Foo{};
}The parser could interpret this in multiple ways:
- make a public property named $prop, with default value of "the
result of42 is Foo
" (which would be false), and then has an empty
(and therefore invalid) property hook block- make a public property named $prop, with default value of "whatever
the result of42 is Foo {}
" (which would be false).Since the parser doesn't allow for ambiguity, this is not workable.
Because PHP uses only an LL(1) parser, there's no way to "determine
from context" which is intended, eg, by saying "well the hook block
is empty so it must have been part of the pattern before it."In practice, no one should be writing code like the above, as it's
needlessly nonsensical (it's statically false in all cases), but the
parser doesn't know that.The only solution we've come up with is to not have object patterns,
but have a property list pattern that can be compounded with a type
pattern. Eg:$p is Point & {x: 5}
Instead of what we have now:
$p is Point{x: 5}
Ilija says this will resolve the parsing issue. It would also make
it possible to match$p is {x: 5}
, which would not check the type
of $p at all, just that it's an object with an $x property with value
5. That is arguably a useful feature in some cases, but does make
the common case (matching type and properties) considerably more clunky.So, questions:
- Would splitting the object pattern like that be acceptable?
- Does someone have a really good alternate suggestion that wouldn't
confuse the parser?Variable binding and pinning
Previously there was much discussion about the syntax we wanted for
these features. In particular, variable binding means "pull a
sub-value out of the matched value to its own variable, if the
pattern matches." Variable pinning means "use some already-existing
variable here to dynamically form the pattern." Naturally, these
cannot both just be a variable name on their own, as that would be
confusing (both for users and the engine).For example:
$b = '12';
if ($arr is ['a' => assign to $a, 'b' => assert is equal to $b]) {
print $a;
}Based on my research[1], the overwhelming majority of languages use a
bare variable name to indicate variable binding. Only one language,
Ruby, has variable pinning, which it indicates with a ^ prefix.
Following Ruby's lead, as the RFC text does right now, would yield:$b = '12';
if ($arr is ['a' => $a, 'b' => ^$b]) {
print $a;
}That approach would be most like other languages with pattern matching.
However, there is a concern that it wouldn't be self-evident to PHP
devs, and the variable binding side should have the extra marker.
Ilija has suggested &, as that's what's used for references, which
would result in:$b = '12';
if ($arr is ['a' => &$a, 'b' => $b]) {
print $a;
}There are two concerns with this approach.
- The & could get confusing with an AND conjunction, eg,
$value is int & &$x
(which is how you would bind $value to $x iff it is an
integer).- In practice, binding is almost certainly going to be vastly more
common than pinning, so it should likely have the shorter syntax.There are of course other prefixes that could be used, such as
let
(introduces a new keyword, possibly confusing as it wouldn't imply
scope restrictions like in other languages) orvar
(no new keyword,
but could still be confusing and it's not obvious which side should
get it), but ^ is probably the only single-character option.So, question:
- Are you OK with the current Ruby-inspired syntax? ($a means bind,
^$b means pin.)- If not, have you a counter-proposal that would garner consensus?
Thanks all.
[1]
https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md--
Larry Garfield
larry@garfieldtech.comHey Larry,
Instead of symbols, why not use words?
We already have &&, but it looks like this uses & instead, which is a
bitwise-and. But the language does have “and” as a keyword. So instead of:$value is int & &$x
It would be:
$value is int and &$x
Which removes the confusion you mentioned before (also for someone
like me who uses bitwise-and quite a bit).— Rob
Is it possible to use the already added "is", like:
$arr is ['a' => $a, 'b' => is $b]
Or, to use the already known uniform variable syntax "{$var}", like:
$arr is ['a' => $a, 'b' => {$b}]
Best,
Marc
Hi folks. Ilija is still working on the implementation for the pattern matching RFC, which we want to complete before proposing it officially in case we run into implementation challenges.
Such as these, on which we'd like feedback on how to proceed.
[1] https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md
--
Larry Garfield
larry@garfieldtech.comHey Larry,
Instead of symbols, why not use words?
We already have &&, but it looks like this uses & instead, which is a bitwise-and. But the language does have “and” as a keyword. So instead of:
$value is int & &$x
It would be:
$value is int and &$x
Which removes the confusion you mentioned before (also for someone like me who uses bitwise-and quite a bit).
— Rob
Patterns are deliberately designed as a superset of existing DNF types. You can already have a type of Foo&Bar, so we want the pattern for "instanceof Foo || instanceof Bar" to be the same. That means & and | for conjunctions is necessary. Also supporting "and" and "or" would technically be possible, but wouldn't resolve the issue (since & would still be needed either way) and would just add more complication, confusion, and inconsistency. I don't think that's viable.
Is it possible to use the already added "is", like:
$arr is ['a' => $a, 'b' => is $b]
Would that inner "is" be capturing $b, or injecting $b? It just pushes the same question down a level, and with an LL(1) parser we cannot tell what context we're in. (If we could, there would be no problem.)
Or, to use the already known uniform variable syntax "{$var}", like:
$arr is ['a' => $a, 'b' => {$b}]
That looks like it could be confused with a property pattern, assuming we end up doing that for part 1. In which case, is that "index b maps to an object with one property, and bind that property" or "index b maps to an object with one property, and that property should be the value of $b"? Or "index b maps to the value of $b"? That doesn't seem any less confusing, for the engine or human.
--Larry Garfield
Patterns are deliberately designed as a superset of existing DNF types.
You can already have a type of Foo&Bar, so we want the pattern for
"instanceof Foo || instanceof Bar" to be the same. That means & and |
for conjunctions is necessary. Also supporting "and" and "or" would
technically be possible, but wouldn't resolve the issue (since & would
still be needed either way) and would just add more complication,
confusion, and inconsistency. I don't think that's viable.
And that of course should be "$x instanceof Foo && $x instanceof Bar" in the equivalent example. My bad. (Though the point applies for |, ||, or, just the same.)
--Larry Garfield
Hi folks. Ilija is still working on the implementation for the pattern matching RFC, which we want to complete before proposing it officially in case we run into implementation challenges.
Such as these, on which we'd like feedback on how to proceed.
[1] https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md
--
Larry Garfield
larry@garfieldtech.comHey Larry,
Instead of symbols, why not use words?
We already have &&, but it looks like this uses & instead, which is a bitwise-and. But the language does have “and” as a keyword. So instead of:
$value is int & &$x
It would be:
$value is int and &$x
Which removes the confusion you mentioned before (also for someone like me who uses bitwise-and quite a bit).
— Rob
Patterns are deliberately designed as a superset of existing DNF types. You can already have a type of Foo&Bar, so we want the pattern for "instanceof Foo || instanceof Bar" to be the same. That means & and | for conjunctions is necessary. Also supporting "and" and "or" would technically be possible, but wouldn't resolve the issue (since & would still be needed either way) and would just add more complication, confusion, and inconsistency. I don't think that's viable.
Interesting way of looking at it, not wrong either. I guess I'm not used to this syntax that much. It's like reading the following statement and working out what the value is without running it:
add($a = add($a = 2, $a), $a = add($a, $a)) + $a;
To me, mixing up both setting and using values in the same statement reads like above. Once you work out how it works, it makes sense that the first parameter is 4 and the second is 8, with $a == 8 and the result being 20 (https://3v4l.org/MPb5P). It takes some getting used to, though, and only works because we intrinsically understand the parser to AST to opcodes -- even if we don't know the details.
Personally, it might be worth adding binding/setting values until a later RFC just to reduce the complexity until people get used to it. Right now, you (and maybe a couple of others) are probably the only people on earth who can truly understand how the syntax actually works; the rest of us may not yet be equipped to give useful feedback.
It won't stop me from trying to help, but I also don't want to waste your time by saying incredibly incorrect statements like using "and" instead of "&" to denote a type. :)
— Rob
Patterns are deliberately designed as a superset of existing DNF types. You can already have a type of Foo&Bar, so we want the pattern for "instanceof Foo || instanceof Bar" to be the same. That means & and | for conjunctions is necessary. Also supporting "and" and "or" would technically be possible, but wouldn't resolve the issue (since & would still be needed either way) and would just add more complication, confusion, and inconsistency. I don't think that's viable.
Interesting way of looking at it, not wrong either. I guess I'm not
used to this syntax that much. It's like reading the following
statement and working out what the value is without running it:
Well, right now no one is used to it. :-) Unless they've spent a lot of time in Rust. Ilija and I have been staring at it longer than anyone else, obviously, but any new feature is, well, new.
Personally, it might be worth adding binding/setting values until a
later RFC just to reduce the complexity until people get used to it.
Right now, you (and maybe a couple of others) are probably the only
people on earth who can truly understand how the syntax actually works;
the rest of us may not yet be equipped to give useful feedback.
That's been suggested a few times. The challenge here is:
- variable binding is a core part of pattern matching in basically every language that has it. The decomposition capability is the primary use case. So leaving that out gives us... basically a more compact way to chain instanceof, which isn't that impressive.
- Variable pinning is an optional feature, most languages don't have it, but it's a nice-to-have. It would be fine to punt to later.
- Except we really need to ensure that the syntax for variable binding doesn't cause problems for a future variable pinning feature due to syntax/parsing confusion. That means we have to do at least a large chunk of the design and trial implementation up-front to make sure we're not painting ourselves into a corner.
- At which point, we've basically implemented pinning anyway so may as well just include it.
--Larry Garfield
Hi folks. Ilija is still working on the implementation for the pattern
matching RFC, which we want to complete before proposing it officially in
case we run into implementation challenges.Such as these, on which we'd like feedback on how to proceed.
Object property patterns
Consider this code snippet:
class C {
public $prop = 42 is Foo{};
}The parser could interpret this in multiple ways:
- make a public property named $prop, with default value of "the result of
42 is Foo
" (which would be false), and then has an empty (and therefore
invalid) property hook block- make a public property named $prop, with default value of "whatever the
result of42 is Foo {}
" (which would be false).Since the parser doesn't allow for ambiguity, this is not workable.
Because PHP uses only an LL(1) parser, there's no way to "determine from
context" which is intended, eg, by saying "well the hook block is empty so
it must have been part of the pattern before it."In practice, no one should be writing code like the above, as it's
needlessly nonsensical (it's statically false in all cases), but the parser
doesn't know that.The only solution we've come up with is to not have object patterns, but
have a property list pattern that can be compounded with a type pattern.
Eg:$p is Point & {x: 5}
Instead of what we have now:
$p is Point{x: 5}
Ilija says this will resolve the parsing issue. It would also make it
possible to match$p is {x: 5}
, which would not check the type of $p at
all, just that it's an object with an $x property with value 5. That is
arguably a useful feature in some cases, but does make the common case
(matching type and properties) considerably more clunky.So, questions:
- Would splitting the object pattern like that be acceptable?
- Does someone have a really good alternate suggestion that wouldn't
confuse the parser?Variable binding and pinning
Previously there was much discussion about the syntax we wanted for these
features. In particular, variable binding means "pull a sub-value out of
the matched value to its own variable, if the pattern matches." Variable
pinning means "use some already-existing variable here to dynamically form
the pattern." Naturally, these cannot both just be a variable name on
their own, as that would be confusing (both for users and the engine).For example:
$b = '12';
if ($arr is ['a' => assign to $a, 'b' => assert is equal to $b]) {
print $a;
}Based on my research[1], the overwhelming majority of languages use a bare
variable name to indicate variable binding. Only one language, Ruby, has
variable pinning, which it indicates with a ^ prefix. Following Ruby's
lead, as the RFC text does right now, would yield:$b = '12';
if ($arr is ['a' => $a, 'b' => ^$b]) {
print $a;
}That approach would be most like other languages with pattern matching.
However, there is a concern that it wouldn't be self-evident to PHP devs,
and the variable binding side should have the extra marker. Ilija has
suggested &, as that's what's used for references, which would result in:$b = '12';
if ($arr is ['a' => &$a, 'b' => $b]) {
print $a;
}There are two concerns with this approach.
- The & could get confusing with an AND conjunction, eg,
$value is int & &$x
(which is how you would bind $value to $x iff it is an integer).- In practice, binding is almost certainly going to be vastly more common
than pinning, so it should likely have the shorter syntax.There are of course other prefixes that could be used, such as
let
(introduces a new keyword, possibly confusing as it wouldn't imply scope
restrictions like in other languages) orvar
(no new keyword, but could
still be confusing and it's not obvious which side should get it), but ^ is
probably the only single-character option.So, question:
- Are you OK with the current Ruby-inspired syntax? ($a means bind, ^$b
means pin.)- If not, have you a counter-proposal that would garner consensus?
Thanks all.
[1]
https://github.com/Crell/php-rfcs/blob/master/pattern-matching/research.md--
Larry Garfield
larry@garfieldtech.com
Hi, Larry!
First of all, I'm very excited about your Pattern Matching RFC and looking
forward to it.
Because PHP uses only an LL(1) parser
Are there any plans to upgrade the parser to bypass these limitations? I
remember Nikita shared some thoughts on why this is not trivial in
https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has changed
since then?
--
Valentin
Hi, Larry!
First of all, I'm very excited about your Pattern Matching RFC and
looking forward to it.Because PHP uses only an LL(1) parser
Are there any plans to upgrade the parser to bypass these limitations?
I remember Nikita shared some thoughts on why this is not trivial in
https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has
changed since then?--
Valentin
I'm not aware of any plans to change the parser. That would be a rather dramatic and invasive change.
--Larry Garfield
Because PHP uses only an LL(1) parser
Actually, we're using an LALR(1) parser; LL(1) is more constrained
(although, personally, I like those for their simplicity).
Are there any plans to upgrade the parser to bypass these limitations?
I remember Nikita shared some thoughts on why this is not trivial in
https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has
changed since then?I'm not aware of any plans to change the parser. That would be a rather dramatic and invasive change.
There have been ideas to use some more powerful features of bison[1],
like GLR, so that would not necessarily be a drastic and invasive
change. I'm not aware of any concrete plans, and these more powerful
features are not without downsides.
Christoph
Are there any plans to upgrade the parser to bypass these limitations?
I remember Nikita shared some thoughts on why this is not trivial in
https://wiki.php.net/rfc/arrow_functions_v2. Maybe something has
changed since then?I'm not aware of any plans to change the parser. That would be a rather dramatic and invasive change.
There have been ideas to use some more powerful features of bison[1],
like GLR, so that would not necessarily be a drastic and invasive
change. I'm not aware of any concrete plans, and these more powerful
features are not without downsides.
I don't think there's a big incentive to switch to a GLR parser right
now. First off, I don't believe it actually solves the ambiguity
problem we've described in this thread (class C { public $prop = 42 is Foo{}; }
), which is not limited by lookahead, but is a full blown
syntax ambiguity. Technically it could be solved in our current
LALR(1) parser by duplicating the expr production, removing pattern
matching in this production and using it solely for property
initializers, but this is a bad long term solution.
Secondly, single lookahead grammars are easier for machines and humans
to understand. Unfortunately, it's hard to predict future syntax
changes, but I believe we have managed to find acceptable compromises
so far. It's worth noting that some newer languages also strive to
avoid +1 lookahead grammars. As an example, see Rust's turbofish
syntax (e.g. Vec::<u32>
), used for generics in the general
expression context to avoid confusion with <
lower than comparison.
Also worth noting: Switching to a GLR parser might cause a significant
amount of work for nikic/PHP-Parser, which is based on
ircmaxell/php-yacc, which can only generate LALR(1) parsers. It might
cause even more problems for token-based tools. Sticking with the
generics example, [bar < Bar, Baz > ()]
will require a lot of
scanning to understand whether to remove the spaces between bar and
<
. The ::<
turbofish syntax on the other hand immediately
indicates generics.
Anyway, it seems we have slightly gone off-topic. :)
Ilija
However, there is a concern that it wouldn't be self-evident to PHP devs, and the variable binding side should have the extra marker. Ilija has suggested &, as that's what's used for references, which would result in:
$b = '12';
if ($arr is ['a' => &$a, 'b' => $b]) {
print $a;
}There are two concerns with this approach.
- The & could get confusing with an AND conjunction, eg,
$value is int & &$x
(which is how you would bind $value to $x iff it is an integer).- In practice, binding is almost certainly going to be vastly more common than pinning, so it should likely have the shorter syntax.
There is already something analogous, in the sense of "binding to
someone else's variable" in named parameters. Running with that analogy
gives:
$b = '12';
if ($arr is ['a' => a:, 'b' => $b]) {
print $a;
}