Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)
To that end, we're looking for very high level feedback on this RFC:
https://wiki.php.net/rfc/pattern-matching
By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:
- Part of the RFC.
- Secondary votes in the RFC.
- Future Scope.
So we know where to focus our efforts to bring it to a proper discussion.
Thank you all for your participation.
--
Larry Garfield
larry@garfieldtech.com
You won't believe it, but just right now I've been thinking about that it
would be a wonderful feature for PHP to have some kind of type-tests (like
$a is Foo&Bar
or $b is Foo|Baz|null
), and here you write out this email.
I didn't read the whole RFC, but I'd like to say that having at least
aforementioned type tests would be really helpful
Thank you for your effort and have a nice day!
OMG, this RFC is a true masterpiece!!!
Congratulations, it turned out really well! I hope this gets approved soon!
Rodrigo A. Vieira
Em 20 de jun. de 2024, 15:03 -0300, Eugene Sidelnyk zsidelnik@gmail.com, escreveu:
You won't believe it, but just right now I've been thinking about that it would be a wonderful feature for PHP to have some kind of type-tests (like
$a is Foo&Bar
or$b is Foo|Baz|null
), and here you write out this email.I didn't read the whole RFC, but I'd like to say that having at least aforementioned type tests would be really helpful
Thank you for your effort and have a nice day!
On Thu, Jun 20, 2024 at 7:41 PM Larry Garfield larry@garfieldtech.com
wrote:
Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching
since the early work on Enumerations. A number of people have noticed and
said they're looking forward to it.It's definitely not going to make it into 8.4, but we are looking for
early feedback on scoping the RFC. In short, there's a whole bunch of
possible patterns that could be implemented, and some of them we already
have, but we want to get a sense of what scope the zeitgeist would want in
the "initial" RFC, which would be appropriate as secondary votes, and which
we should explicitly save-for-later. The goal is to not spend time on
particular patterns that will be contentious or not pass, and focus effort
on fleshing out and polishing those that do have a decent consensus. (And
thereby, we hope, avoiding an RFC failing because enough people dislike one
little part of it.)To that end, we're looking for very high level feedback on this RFC:
https://wiki.php.net/rfc/pattern-matching
By "very high level," I mean, please, do not sweat specific syntax details
right now. That's a distraction. What we're asking right now is "which of
these patterns should we spend time sweating specific syntax details on in
the coming weeks/months?" There will be ample time for detail bikeshedding
later, and we've identified a couple of areas where we know for certain
further syntax development will be needed because we both hate the current
syntax. :-)If you want to just read the Overview section for a survey of the possible
patterns and our current recommendations, you likely don't need to read the
rest of the RFC at this point. You can if you want, but again, please stay
high-level. Our goal at the moment is to get enough feedback to organize
the different options into three groups:
- Part of the RFC.
- Secondary votes in the RFC.
- Future Scope.
So we know where to focus our efforts to bring it to a proper discussion.
Thank you all for your participation.
--
Larry Garfield
larry@garfieldtech.com
I have been looking forward to this RFC, it's such a quality of life to be
able to do all this! In terms of things to focus on, I'd personally be very
happy with the property/param guards, "as" and "is <regex>", but I won't
say no to anything we can get extra here because it's all really nice to
have.
I noticed that with array structure patterns the count is omitted when
using ...
if ($list is [1, 3, ...]) {
print "Yes";
}
// True. Equivalent to:
if (is_array($list)
&& array_key_exists(0, $list) && $list[0] === 1
&& array_key_exists(1, $list) && $list[1] === 3
) {
print "Yes";
}
Wouldn't this need a count($list) >= 2
? I'm not sure if the
underlying mechanism does the count check as well, but it seems to me like
a guard clause for performance reasons in the PHP variant. Maybe a tangent,
what about iterators?
"Limited expression pattern"
I think this would be very valuable to have, though in the proposal it
seems cumbersome to use regardless of syntax. It feels like I'm going to be
using the variable binding less often than matching against other
variables, what about an "out" equivalent?
$result = match ($p) is {
Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z",
Point{$x, $y, $z} => "x is $x, y is $y, z is $z",
};
// vs
$x = 3;
$y = 9;
$result = match ($p) is {
Point{x: 3, y: 9, out $z} => "x is 3, y is 9, z is $z",
Point{$x, $y, out $z} => "x is 3, y is 9, z is $z",
};
To me this makes it much more readable, just not sure if this is even
feasible. This is not meant as bikeshedding the syntax, more of an
alternative approach to when to use which.
I have been looking forward to this RFC, it's such a quality of life to
be able to do all this! In terms of things to focus on, I'd personally
be very happy with the property/param guards, "as" and "is <regex>",
but I won't say no to anything we can get extra here because it's all
really nice to have.I noticed that with array structure patterns the count is omitted when
using ...if ($list is [1, 3, ...]) { print "Yes"; } // True. Equivalent to: if (is_array($list) && array_key_exists(0, $list) && $list[0] === 1 && array_key_exists(1, $list) && $list[1] === 3 ) { print "Yes"; }
Wouldn't this need a
count($list) >= 2
? I'm not sure if the
underlying mechanism does the count check as well, but it seems to me
like a guard clause for performance reasons in the PHP variant.
At the moment, the implementation doesn't actually compile down into those primitives; it has its own all-C implementation. Having it instead compile down to those operations is something Ilija is exploring to see how feasible it would be. (The main advantage being that the optimizer, JIT, etc. wouldn't have to do anything new to support optimizing patterns.) The examples shown in the RFC for now are just for logical equivalency to explain the functionality. In this case, the array_key_exists()
checks are sufficient for what is actually being specified, so the count()
is redundant. The final implementation will almost certainly be more performant than my example equivalencies. :-)
Maybe a tangent, what about iterators?
Not supported, as you cannot examine them "all at once", by definition. I don't even know what an iterator-targeted pattern would look like, though if someone figured that out in the future there's no intrinsic reason such a pattern couldn't be added at that time.
"Limited expression pattern"
I think this would be very valuable to have, though in the proposal it
seems cumbersome to use regardless of syntax. It feels like I'm going
to be using the variable binding less often than matching against other
variables, what about an "out" equivalent?$result = match ($p) is { Point{x: 3, y: 9, $z} => "x is 3, y is 9, z is $z", Point{$x, $y, $z} => "x is $x, y is $y, z is $z", }; // vs $x = 3; $y = 9; $result = match ($p) is { Point{x: 3, y: 9, out $z} => "x is 3, y is 9, z is $z", Point{$x, $y, out $z} => "x is 3, y is 9, z is $z", };
To me this makes it much more readable, just not sure if this is even
feasible. This is not meant as bikeshedding the syntax, more of an
alternative approach to when to use which.
A couple of people have noted that. Assuming at least one of those two synaxes makes it into the initial RFC (I think variable binding has to for it to be really useful), we'll have a whole separate sub-discussion on that, I'm sure.
Though, I would expect variable binding to be used more than expressions, not less, which would make the marker make more sense on the expression. But that's something to bikeshed later.
--Larry Garfield
Larry Garfield larry@garfieldtech.com hat am 20.06.2024 19:38 CEST geschrieben:
Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)
To that end, we're looking for very high level feedback on this RFC:
https://wiki.php.net/rfc/pattern-matching
By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:
- Part of the RFC.
- Secondary votes in the RFC.
- Future Scope.
So we know where to focus our efforts to bring it to a proper discussion.
Thank you all for your participation.
--
Larry Garfield
larry@garfieldtech.com
Thank you!
$var is *; // Matches anything, more useful in the structure patterns below.
maybe also consider:
$var is mixed; // Matches anything, more useful in the structure patterns below.
// Array application, apply a pattern across an array
$foo is array<strings>; // All values in $foo must be strings
$foo is array<int|float>; // All values in $foo must be ints or floats
+1
Regards
Thomas
Thank you!
$var is *; // Matches anything, more useful in the structure patterns below.
maybe also consider:
$var is mixed; // Matches anything, more useful in the structure patterns below.
:thinking face emoji: That should actually already work naturally through the type support. And should indeed match anything, so... maybe we'll drop the wildcard and just document to use mixed
for that? It's a bit more to type, but should be pretty self-explanatory and eliminates a syntax, so... We'll consider this further.
// Array application, apply a pattern across an array
$foo is array<strings>; // All values in $foo must be strings
$foo is array<int|float>; // All values in $foo must be ints or floats+1
Regards
Thomas
--Larry Garfield
This definitely looks like a powerful feature I'm looking forward to.
If property/param/return guards are implemented, do you see them eventually
replacing the property/param/return types we have nowadays?
Asking for a friend.
Hello, peoples.
To that end, we're looking for very high level feedback on this RFC:
As I started reading I starting thinking of "whatabouts" based on my
experience with pattern matching in other languages, and as I skimmed
the RFC I found each of them being addressed. I'm looking forward to this.
If you want my feedback about match() "is" placement, I can see the
benefits of both, and they don't look mutually exclusive, since the "is"
effectively just distributes over the branches to produce the inline
alternative; with that interpretation it's just an error to have "is" in
the top position and something other than a type pattern in any branch
because it would be equivalent to "is <not a type pattern>".
I suspect the case where one is matching against a list of types will
turn out to be quite common, so if "match ($somevar) is {" weren't
implemented there'd soon be people asking for something equivalent to
save them typing "is" over and over.
One thing to note is that if "is" were to be in the top position, it
means every branch has to be a type pattern, which means instead of
"default" the catch-all branch would be "mixed". (That's a question:
won't the branches of "match($var)is{" need to range over every possible
type?)
One tiny note about BC breakage:
If the as keyword is adopted as well, that will also be a new global
keyword.
"as" is already a global keyword (as in "foreach($arr as $e)"). So
that's not such a problem after all.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?
if ($x is Countable { count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true, current()
: $value, next()
: null }) ...
Maybe it goes too far.
For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.
For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.
E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $values
The concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.
--- Andreas
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?if ($x is Countable {
count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true,current()
: $value,next()
: null }) ...Maybe it goes too far.
For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $valuesThe concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.--- Andreas
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array is array<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?if ($x is Countable {
count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true,current()
: $value,next()
: null }) ...Maybe it goes too far.
For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $valuesThe concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.--- Andreas
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array isarray<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.
And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.
E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $valuesThe concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.--- Andreas
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array isarray<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.
This is straying a bit for this RFC's discussion, but, I'm wondering if a better approach to generics for arrays would be to just not do generics for arrays.
Instead, have generics be a class-only thing, and add new built-in types (along the lines of the classes/interfaces in the Data Structures extension) specifically to provide collection support. This would accomplish several things:
- Separate object types (e.g. Array, Map, OrderedMap, Set, SparseArray, etc) rather than one "array" type that does everything. Each could have underlying storage and accessors optimized for one specific use-case, rather than having to be efficient with several different use-cases.
- No BC breaks. array and all the existing array_* functions remain untouched and unchanged. Somewhere years down the line, they can be discouraged in favor of the new interfaces.
- Being objects, these new data types would all have a fancy OOP interface, which could make chaining operations easy.
The major interoperability concern in this model would be the cost of translating between the new types and legacy array types at API boundaries for legacy code. Possibly this might limit utility to greenfield development. But since it'd be entirely new and opt-in types, there's no direct BC concerns, and maybe some of the typechecking perf hit when you validate inserts/updates could be elided by the optimizer in the presence of typehints. (e.g. you have an Array<int> and you insert a value the compiler or optimizer can prove is an int, you don't need to do a runtime type check.) There'd also probably have to be something done to maintain the COW semantics that array has without having to have explicit clone operations.
-John
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?if ($x is Countable {
count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true,current()
: $value,next()
: null }) ...Maybe it goes too far.
For the variable binding, I noticed that we can overwrite the original variable:
$x is SomethingWrapper { something: $x }
In this case the bool return is not really needed.
For now this usage looks a bit unintuitive to me, but I might change
my mind and grow to like it, not sure.For "weak mode" ~int, and also some other concepts, I notice that this
RFC is ahead of the type system.E.g. should something like array<int> be added to the type system in
the future, or do we leave the type system behind, and rely on the new
"guards"?
public array $values is array<int>
OR
public array<int> $valuesThe concern here would be if in the future we plan to extend the type
system in a way that is inconsistent or incompatible with the pattern
matching system.--- Andreas
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array isarray<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.
Well, of course, nothing in software is "just" anything.
As to how much overhead? I guess you could create a subtype of array
that is typed, then people could use it when they need it and if it
gets up-casted to an array, you can just toss out all the counts. As
far as down-casting to the typed array, it would be no less
inefficient than doing $arr = (fn(MyType ...$arr) =>
$arr)(...$someArray); right now.
Robert Landers
Software Engineer
Utrecht NL
On Fri, Jun 21, 2024 at 7:20 PM Robert Landers landers.robert@gmail.com
wrote:
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array isarray<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.Well, of course, nothing in software is "just" anything.
Counters are not cheap as we need one slot for each type in the array so we
need a dynamic buffer and an indirection, plus absolutely every mutation
needs to update a counter, including writes to references. It is possible
to remove the counters and to maintain an optimistic upper bound of the
type (computing the type more precisely when type checking fails), but I
feel this would not work well with pattern matching.
Also, a few things complicate this:
- Nested writes like $a[0][0][0]=1 need to backtrack to update the type of
all parent arrays after the element is added/updated - Supporting references to properties whose type is a typed array, or
dimensions of these properties, is very hard
Fixed-type arrays may be easier to support but there are important
drawbacks in usability IMHO. This does not play well with CoW semantics.
Best Regards,
Arnaud
I'm always surprised why arrays can't keep track of their internal
types. Every time an item is added to the map, just chuck in the type
and a count, then if it is removed, decrement the counter, and if
zero, remove the type. Thus checking if an array isarray<int>
should be a near O(1) operation. Memory usage might be an issue (a
couple bytes per type in the array), but not terrible.... but then
again, I've been digging into the type system quite a bit over the
last few months.And every time a modification happens, directly or indirectly, you'll
have to modify the counts too. Given how much arrays / hash tables are
used within the PHP codebase, this will eventually add up to a lot of
overhead. A lot of internal functions that work with arrays will need
to be audited and updated too. Lots of potential for introducing bugs.
It's (unfortunately) not a matter of "just" adding some counts.Well, of course, nothing in software is "just" anything.
Counters are not cheap as we need one slot for each type in the array so we need a dynamic buffer and an indirection, plus absolutely every mutation needs to update a counter, including writes to references. It is possible to remove the counters and to maintain an optimistic upper bound of the type (computing the type more precisely when type checking fails), but I feel this would not work well with pattern matching.
To me, that sounds kinda silly. PHP does reference counting and while
there is an overhead, it doesn't prevent us from using it...
Anyway, while you make a good point for the pathological case, I
suspect the impact to the popular case (mostly homogenous arrays), the
performance impact will be negligible compared to the productivity
impact of being able to type-check arrays.
Also, a few things complicate this:
- Nested writes like $a[0][0][0]=1 need to backtrack to update the type of all parent arrays after the element is added/updated
Nesting could be forbidden, at least at first. I think saying
"array<array<array<int>>>" is forbidden is totally fine.
- Supporting references to properties whose type is a typed array, or dimensions of these properties, is very hard
Also, depends on the defined behavior? If you have a typed property
that is array<string> and you try to add an int to it ... it could
fail. Thus not being hard at all.
Fixed-type arrays may be easier to support but there are important drawbacks in usability IMHO. This does not play well with CoW semantics.
Best Regards,
Arnaud
Robert Landers
Software Engineer
Utrecht NL
To me, that sounds kinda silly. PHP does reference counting and while
there is an overhead, it doesn't prevent us from using it...
The reference count is a single pre-allocated integer on the C struct
holding the value; and even then, a lot of effort was put in during the
"PHP-NG" project (which landed in PHP 7.0) to avoid reference counting
values that didn't need it. (Nikita wrote a great post explaining the
changes here:
https://www.npopov.com/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html)
The problem with type counters is not just that there's more than one
per array, it's that there's a variable number, so you can't
preallocate memory for them, they need to reference some extra resizable
hash-table. That's the "indirection" that Arnaud is talking about.
I've thought in the past about caching type checks that had passed, so
that a series of array<int> checks could be made "for the price of one"
if the array didn't change between. That might at least mean the
additional table of types would only need accessing on type checks, not
every write, but a lot of checks would still need to iterate the whole
array.
Having a structure that only allows one type could potentially be much
more efficient, because you only need to allocate space for the type
identifier, and check against it on write.
--
Rowan Tommins
[IMSoP]
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?if ($x is Countable {
count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true,current()
: $value,next()
: null }) ...Maybe it goes too far.
Much to my surprise, Ilija says that's technically possible. However, now that we have property hooks the need for calling a method is far less, so we'd rather not in the interest of simplicity for now. A future-RFC could probably add that if it's really needed, but I don't think most languages support that anyway. (Need to verify.)
--Larry Garfield
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations.
I like what I see, a lot!
One quick thought that came to my mind, regarding objects:
Could we check method return values?if ($x is Countable {
count()
: 0 }) ...
if ($p is Point { getX(): 3 }) ...
if ($x is Stringable { __toString(): 'hello' }|'hello') ...
while ($it is Iterator { valid(): true,current()
: $value,next()
: null }) ...Maybe it goes too far.
Much to my surprise, Ilija says that's technically possible. However, now that we have property hooks the need for calling a method is far less, so we'd rather not in the interest of simplicity for now. A future-RFC could probably add that if it's really needed, but I don't think most languages support that anyway. (Need to verify.)
--Larry Garfield
Even if property hooks will be available at the time, a lot of classes
will be from other packages that use getter methods to communicate.
And, even with property hooks, there will be use cases where getter
methods remain the preferable implementation choice.
But ok for me to push things like this to follow-up RFCs.
This leads me to a different question regarding property guards.
What if we validate an object property guard, which later changes?
class C {
public function __construct(
public readonly Point $point is Point { $z: 0 };
) {}
}
$c = new C(new Point(0, 0, 0));
$c->point->z = 5;
The second operation would violate the guard condition.
However, I suspect that the operation will not trigger any validation
for a guard where the object is referenced.
So it would be up to the developer to only use this if Point::$z is
readonly, or they cannot rely on the guard.
This leads me to a different question regarding property guards.
What if we validate an object property guard, which later changes?class C {
public function __construct(
public readonly Point $point is Point { $z: 0 };
) {}
}$c = new C(new Point(0, 0, 0));
$c->point->z = 5;The second operation would violate the guard condition.
However, I suspect that the operation will not trigger any validation
for a guard where the object is referenced.
So it would be up to the developer to only use this if Point::$z is
readonly, or they cannot rely on the guard.
This is future-scope, so we won't delve into it too deeply. But the general idea (for me) is that a property guard pattern compiles down to a set hook on the property (or equivalent). That means it would apply to all writes to that property, unless you deliberately bypass it using reflection. So it would be no less (or more) of a guarantee than an equivalent set hook.
--Larry Garfield
Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching
since the early work on Enumerations. A number of people have noticed
and said they're looking forward to it.
Hi Larry,
I haven't time to read through the full RFC at the moment, but a couple of thoughts:
As Andreas says, we should be careful not to pre-empt things that might be added to the type system in general, and end up with incompatible syntax or semantics. That particularly applies to the generic-like array<int> syntax, which is quite likely to end up in the language in some form.
The "weak-mode flag" seems useful at first glance, but unfortunately PHP has multiple sets of coercion rules, and some are ... not great. It's also not immediately obvious which contexts should actually perform coercion, and which should just assert that it's possible (e.g. match($foo) is { ~int => (int)$foo } feels redundant). So I think that would need its own RFC to avoid being stuck with something sub-optimal.
Similarly, the "as" keyword has potential, but I'm not sure about the naming, and whether it should be more than one feature. Asserting a type, casting between types, and de-structuring a type are all different use cases:
$input = '123'; $id = $input as int; // looks like a cast, but actually an assertion which will fail?
$handler as SpecialHandler; // looks like an unused expression, but actually an assertion?
$position as [$x, $y]; // looks like its writing to $position, but actually the same as [$x, $y] = $position?
It's worth noting that in languages which statically track the type of a variable, "$foo = $bar as SomeInterface" is actually a type of object cast; but in PHP, it's the value that tracks the type, and interfaces are "duck-typed", so it would be equivalent to "assert($bar is SomeInterface); $foo = $bar;" which isn't quite the same thing.
Regards,
Rowan Tommins
[IMSoP]
Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching
since the early work on Enumerations. A number of people have noticed
and said they're looking forward to it.Hi Larry,
I haven't time to read through the full RFC at the moment, but a couple
of thoughts:As Andreas says, we should be careful not to pre-empt things that might
be added to the type system in general, and end up with incompatible
syntax or semantics. That particularly applies to the generic-like
array<int> syntax, which is quite likely to end up in the language in
some form.
As noted in another thread, I don't believe that would cause any engine-level conflicts. Whether it would cause human-level conflicts is another, and valid, question.
The "weak-mode flag" seems useful at first glance, but unfortunately
PHP has multiple sets of coercion rules, and some are ... not great.
It's also not immediately obvious which contexts should actually
perform coercion, and which should just assert that it's possible
(e.g. match($foo) is { ~int => (int)$foo } feels redundant). So I think
that would need its own RFC to avoid being stuck with something
sub-optimal.
We still have different implicit coercion rules? I assumed it would be implemented to match weak-mode parameters, not casting. Though I agree, there are devils in the details on this one.
Similarly, the "as" keyword has potential, but I'm not sure about the
naming, and whether it should be more than one feature. Asserting a
type, casting between types, and de-structuring a type are all
different use cases:$input = '123'; $id = $input as int; // looks like a cast, but actually
an assertion which will fail?
$handler as SpecialHandler; // looks like an unused expression, but
actually an assertion?
$position as [$x, $y]; // looks like its writing to $position, but
actually the same as [$x, $y] = $position?It's worth noting that in languages which statically track the type of
a variable, "$foo = $bar as SomeInterface" is actually a type of object
cast; but in PHP, it's the value that tracks the type, and interfaces
are "duck-typed", so it would be equivalent to "assert($bar is
SomeInterface); $foo = $bar;" which isn't quite the same thing.
Valid points. The line between validation and casting is a bit squishy, as some casts can be forced (eg, string to int gives 0 sometimes), and others just cannot (casting to an object). So would $a as array<~int> be casting, validating, or both? Patterns make sense for validating, so it's natural to look to them for validate-and-cast. Though I recognize it could then complicate the cast-only case, if it exists.
--Larry Garfield
We still have different implicit coercion rules?
I think Gina counted 10 contexts which behave slightly differently...
Regards,
Rowan Tommins
[IMSoP]
Valid points. The line between validation and casting is a bit squishy,
as some casts can be forced (eg, string to int gives 0 sometimes), and
others just cannot (casting to an object). So would $a as
array<~int> be casting, validating, or both?
I think my concern is that both "x is T" and "x as T" read naturally as
expressions, where their main purpose is to evaluate to a result, and
side-effects are exceptional.
From that point of view, we can give intuitive meaning to the following:
- $foo is int => boolean; is $foo of type int?
- $foo is ~int => boolean; can $foo be "safely" cast to int?
- $foo as ~int => int; cast $foo to int (unless unsafe)
But then what does this mean?
- $foo as int => int; cast $foo to int if it's already an int !?
Similarly for a lot of other patterns:
- $foo as [int, int, int] => ??
- $foo as 3|5|null => ??
It seems like what's actually wanted here is something with an active
verb, like "assert"; or a closed statement like "must be":
- $foo mustbe int; statement - if $foo is not an int, throw an error
- $foo mustbe ~int; statement - if $foo cannot be "safely" cast to int,
throw an error - $foo mustbe [int, int, int];
- $foo mustbe 3|5|null;
Then the "validate-and-cast" case is a completely separate feature,
whose argument is a type, not a pattern (straw-man syntax):
- safe_cast($foo as int); cast $foo to int, unless unsafe
- safe_cast($foo as int|string); cast $foo to either int or string,
using the same rules as parameters in mode 0 - safe_cast($foo as [int, int, int]); error, no such type to cast as
I think the uneasiness around binding vs matching against variables is
related: if "$x is $y" is an expression, $y reads naturally as an input
to that expression:
$arr = [ 123 ];
$fortyTwo = 42;
$arr is [ int ]; // true
$arr is [ 42 ]; // false
$arr is [ $fortyTwo ]; // false ?
As currently proposed, it's actually an output, and means something
like this:
$arr is [ mixed{bind $fortyTwo} ] // true, with $fortyTwo set to 123 !
How about using "=" to mark that a variable name is being assigned to:
$arr is [ $fortyTwo ]; // false
$arr is [ $id=int ]; // true, and $id set to 123
$arr is [ $id= ]; // equivalent to [ $id=mixed ], i.e. bind $id without
further constraining its value
So taking an example from the RFC:
$result = match ($p) is {
Point{x: 3, y: 9, $z=} => "x is 3, y is 9, z is $z",
Point{$z=, $x=, y: 4} => "x is $x, y is 4, z is $z",
Point{x: 5, $y=} => "x is 5, y is $y, and z doesn't matter",
Point{$x=, $y=, $z=} => "x is $x, y is $y, z is $z",
};
--
Rowan Tommins
[IMSoP]
On Sat, Jun 22, 2024 at 8:04 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:
Valid points. The line between validation and casting is a bit squishy,
as some casts can be forced (eg, string to int gives 0 sometimes), and
others just cannot (casting to an object). So would $a as
array<~int> be casting, validating, or both?I think my concern is that both "x is T" and "x as T" read naturally as expressions, where their main purpose is to evaluate to a result, and side-effects are exceptional.
From that point of view, we can give intuitive meaning to the following:
- $foo is int => boolean; is $foo of type int?
- $foo is ~int => boolean; can $foo be "safely" cast to int?
- $foo as ~int => int; cast $foo to int (unless unsafe)
But then what does this mean?
- $foo as int => int; cast $foo to int if it's already an int !?
I've brought this up before, but I mostly see "as" being useful for
static analysis. That's what I've mostly used it for C#, anyway.
Logically, you know the type, but due to one-thing-or-another you
can't "prove" the type is that type (such as foreach-arrays or dealing
with results from user-code callbacks in library code). I want to be
able to say "this is an int or else."
[snip]
Now ... that being said, I'm def not a fan of this "~" thing. It makes
no sense to me. Semantically, what is the difference between "123",
123, and 123.0 other than how the bits are arranged in memory? If it
can become an int -- regardless of how it is laid out in memory -- I
expect to end up with an int as long as it is an integer. I would find
it super annoying to have to fix that bug report with a single
character, just because someone returned 123.0 instead of 123 because
they rounded the number and I got the result from a callback.
Further ~ means bitwise-not, so when you start mixing in literals ...
what does it mean?
$y = $x as ~1|~float;
What is the value of $y? Is it the literal negative two, casted to the
literal one (int), or casted a float?
On Sat, Jun 22, 2024 at 8:04 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:Valid points. The line between validation and casting is a bit squishy,
as some casts can be forced (eg, string to int gives 0 sometimes), and
others just cannot (casting to an object). So would $a as
array<~int> be casting, validating, or both?I think my concern is that both "x is T" and "x as T" read naturally as expressions, where their main purpose is to evaluate to a result, and side-effects are exceptional.
From that point of view, we can give intuitive meaning to the following:
- $foo is int => boolean; is $foo of type int?
- $foo is ~int => boolean; can $foo be "safely" cast to int?
- $foo as ~int => int; cast $foo to int (unless unsafe)
But then what does this mean?
- $foo as int => int; cast $foo to int if it's already an int !?
I've brought this up before, but I mostly see "as" being useful for
static analysis. That's what I've mostly used it for C#, anyway.
Logically, you know the type, but due to one-thing-or-another you
can't "prove" the type is that type (such as foreach-arrays or dealing
with results from user-code callbacks in library code). I want to be
able to say "this is an int or else."[snip]
Now ... that being said, I'm def not a fan of this "~" thing. It makes
no sense to me. Semantically, what is the difference between "123",
123, and 123.0 other than how the bits are arranged in memory? If it
can become an int -- regardless of how it is laid out in memory -- I
expect to end up with an int as long as it is an integer. I would find
it super annoying to have to fix that bug report with a single
character, just because someone returned 123.0 instead of 123 because
they rounded the number and I got the result from a callback.
In other words, I want what I currently get from mode 0, but instead
of a warning, an error; but also not the sledgehammer of direct
casting in mode 1.
In mode 0, this is currently a warning
(fn(int $x) => print($x))(123.1);
and this is fine:
(fn(int $x) => print($x))(123.0);
But if you are used to working in mode 1, both are a fatal error:
// strict_types=1
(fn(int $x) => print($x))((int)123.1);
Which is probably much worse -- depending on your business -- because
you don't get any notice that things are going wrong; it's completely
silent.
What I would find (and I think everyone would find, IMHO) is that you
actually want to cast to int if it can be cast exactly to that type,
otherwise fail.
(fn(int $x) => print($x))(123.1 as int); // crash
(fn(int $x) => print($x))(123.0 as int); // success!
Further ~ means bitwise-not, so when you start mixing in literals ...
what does it mean?$y = $x as ~1|~float;
What is the value of $y? Is it the literal negative two, casted to the
literal one (int), or casted a float?
I've brought this up before, but I mostly see "as" being useful for
static analysis. That's what I've mostly used it for C#, anyway.
Logically, you know the type, but due to one-thing-or-another you
can't "prove" the type is that type (such as foreach-arrays or dealing
with results from user-code callbacks in library code). I want to be
able to say "this is an int or else."
I absolutely see the use case for that; I just don't think "as" is a
good word for it, because that's not what it means in normal English.
Incidentally, according to the C# docs at
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/type-testing-and-cast#as-operator
The as operator explicitly converts the result of an expression to a
given reference or nullable value type. If the conversion isn't
possible, the as operator returns null. Unlike a cast expression, the as
operator never throws an exception.
So it more closely matches my intuition: a statement of just "foo as
Bar;" would be useless, because it's calculating a value and discarding
it, with no side effects.
As you say, the conversion might not be of the value, but of the
statically analysed type, but in C#, that's all part of the language. In
PHP "$foo = $bar as SomeInterface;" would have no visible effect except
in third-party tooling, where it can already be written "/** @var
SomeInterface $foo */ $foo = $bar;"
--
Rowan Tommins
[IMSoP]
On Sat, Jun 22, 2024 at 10:53 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:
I've brought this up before, but I mostly see "as" being useful for
static analysis. That's what I've mostly used it for C#, anyway.
Logically, you know the type, but due to one-thing-or-another you
can't "prove" the type is that type (such as foreach-arrays or dealing
with results from user-code callbacks in library code). I want to be
able to say "this is an int or else."I absolutely see the use case for that; I just don't think "as" is a good word for it, because that's not what it means in normal English.
Incidentally, according to the C# docs at https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/type-testing-and-cast#as-operator
The as operator explicitly converts the result of an expression to a given reference or nullable value type. If the conversion isn't possible, the as operator returns null. Unlike a cast expression, the as operator never throws an exception.
So it more closely matches my intuition: a statement of just "foo as Bar;" would be useless, because it's calculating a value and discarding it, with no side effects.
In general, you assign the result of the operation so that the output
is useful. Here's how that might look in PHP with the C# rules:
function foo(BarInterface $bar) {
$baz = $bar as Baz;
$baz?->thing();
$bar->otherThing();
}
With "is" then it looks a little more wonky but isn't far from the
current instanceof method:
function foo(BarInterface $bar) {
if ( $bar is Baz ) $bar->thing();
$bar->otherThing();
}
With fibers/async, "as" is actually more important than "is" (at least
as far as crashing goes):
class Foo {
public BarInterface $bar;
public function doStuff() {
$baz = $this->bar as Baz;
// some stuff with $baz
callComplexThing(); // suspends current fiber,
// $this->bar is no longer the same object
// or concrete type when we return
$baz->something();
}
}
If we were to do an "is" check on the first line, by the time the
fiber is resumed, we've got a completely different type on our hands
and it would crash. Maybe that is desirable, maybe not, but we know
that we have a reference of the type we want and it won't be changed
under us by using "as."
As you say, the conversion might not be of the value, but of the statically analysed type, but in C#, that's all part of the language. In PHP "$foo = $bar as SomeInterface;" would have no visible effect except in third-party tooling, where it can already be written "/** @var SomeInterface $foo */ $foo = $bar;"
Hopefully my examples show how it can be useful (at least when it
returns null if it is the wrong type). When it gives a TypeError or
something, it becomes far less useful -- at least for the sake of
conciseness. However, it becomes far more useful to dealing with
scalar casts:
function foo(int $type) {}
foo(123.456 as int); // crashes
foo(null as int); // crashes
But even if we did return null, those would crash unless foo() took
int|null, which may or may not be what you want ...
With it always being an error if it doesn't match, it's really not
that useful, as you point out.
--
Rowan Tommins
[IMSoP]
On Sat, Jun 22, 2024 at 11:57 PM Robert Landers
landers.robert@gmail.com wrote:
On Sat, Jun 22, 2024 at 10:53 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:I've brought this up before, but I mostly see "as" being useful for
static analysis. That's what I've mostly used it for C#, anyway.
Logically, you know the type, but due to one-thing-or-another you
can't "prove" the type is that type (such as foreach-arrays or dealing
with results from user-code callbacks in library code). I want to be
able to say "this is an int or else."I absolutely see the use case for that; I just don't think "as" is a good word for it, because that's not what it means in normal English.
Incidentally, according to the C# docs at https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/type-testing-and-cast#as-operator
The as operator explicitly converts the result of an expression to a given reference or nullable value type. If the conversion isn't possible, the as operator returns null. Unlike a cast expression, the as operator never throws an exception.
So it more closely matches my intuition: a statement of just "foo as Bar;" would be useless, because it's calculating a value and discarding it, with no side effects.
In general, you assign the result of the operation so that the output
is useful. Here's how that might look in PHP with the C# rules:function foo(BarInterface $bar) {
$baz = $bar as Baz;
$baz?->thing();
$bar->otherThing();
}With "is" then it looks a little more wonky but isn't far from the
current instanceof method:function foo(BarInterface $bar) {
if ( $bar is Baz ) $bar->thing();
$bar->otherThing();
}With fibers/async, "as" is actually more important than "is" (at least
as far as crashing goes):class Foo {
public BarInterface $bar;public function doStuff() {
$baz = $this->bar as Baz;
// some stuff with $baz
callComplexThing(); // suspends current fiber,
// $this->bar is no longer the same object
// or concrete type when we return
$baz->something();
}
}If we were to do an "is" check on the first line, by the time the
fiber is resumed, we've got a completely different type on our hands
and it would crash. Maybe that is desirable, maybe not, but we know
that we have a reference of the type we want and it won't be changed
under us by using "as."As you say, the conversion might not be of the value, but of the statically analysed type, but in C#, that's all part of the language. In PHP "$foo = $bar as SomeInterface;" would have no visible effect except in third-party tooling, where it can already be written "/** @var SomeInterface $foo */ $foo = $bar;"
Hopefully my examples show how it can be useful (at least when it
returns null if it is the wrong type). When it gives a TypeError or
something, it becomes far less useful -- at least for the sake of
conciseness. However, it becomes far more useful to dealing with
scalar casts:function foo(int $type) {}
foo(123.456 as int); // crashes
foo(null as int); // crashesBut even if we did return null, those would crash unless foo() took
int|null, which may or may not be what you want ...With it always being an error if it doesn't match, it's really not
that useful, as you point out.
Side-note: this is why my original proposal had two modes (based on
the existence of null in the typecheck):
foo(123.456 as int);
would crash due to being unable to cleanly cast to int.
foo(123.456 as int|null);
would crash from passing null (since literally everything can be
casted to null and it can't be casted to an int).
--
Rowan Tommins
[IMSoP]
In general, you assign the result of the operation so that the output
is useful. Here's how that might look in PHP with the C# rules:function foo(BarInterface $bar) {
$baz = $bar as Baz;
$baz?->thing();
$bar->otherThing();
}
The difference is that in C#, there is an actual difference between the methods available on $bar and $baz. The "as" is performing a conversion, so the English word makes sense.
In PHP, all we could do is have a one-off runtime check, then return the value unchanged. So written out in full, that would be:
function foo(BarInterface $bar) {
$baz = null;
if ( $bar is Baz ) $baz = $bar;
$baz?->thing();
$bar->otherThing();
}
Or:
function foo(BarInterface $bar) {
$baz = $bar;
if ( ! $bar is Baz ) $baz = null;
$baz?->thing();
$bar->otherThing();
}
I can see some use in a shorthand for that, but it doesn't have much in common with casts, or any meaning of "as" that I can think of, so I don't think that would be the right name for it.
With fibers/async, "as" is actually more important than "is" (at least
as far as crashing goes):class Foo {
public BarInterface $bar;public function doStuff() {
$baz = $this->bar as Baz;
// some stuff with $baz
callComplexThing(); // suspends current fiber,
// $this->bar is no longer the same object
// or concrete type when we return
$baz->something();
}
}
The thing that's stopping this crashing is not the "as", it's that you're backing up the old value of $this->bar into a local variable. Nothing about $baz will "remember" that you made some type or pattern check against it just before/after you assigned it.
However, it becomes far more useful to dealing with
scalar casts:function foo(int $type) {}
foo(123.456 as int); // crashes
foo(null as int); // crashes
Yes, those are actually casts/conversions in PHP, and I've repeatedly said I'd like to have those, on top of the "never fail" casts we have now. Whether that's with an "as" operator, a "cast" function, or some extension of the existing cast syntax, is an open question.
Crucially, though, I don't think most patterns would be relevant in that context, as I pointed out in a previous message.
Regards,
Rowan Tommins
[IMSoP]
On Sun, Jun 23, 2024 at 2:19 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:
In general, you assign the result of the operation so that the output
is useful. Here's how that might look in PHP with the C# rules:function foo(BarInterface $bar) {
$baz = $bar as Baz;
$baz?->thing();
$bar->otherThing();
}The difference is that in C#, there is an actual difference between the methods available on $bar and $baz. The "as" is performing a conversion, so the English word makes sense.
Ah, I see what you mean. Yeah, PHP doesn't understand type casts too
well (as in class/interface-types) but they are there, just clunky to
deal with. Here's some real-ish code with a bunch of things stripped
out:
abstract class Event {}
interface HasInnerEvent { public function getInnerEvent(): Event; }
interface DelayedEvent { public function getDelay(): int; }
class ScheduleTask extends Event implements HasInnerEvent, DelayedEvent { ... }
class Signal extends Event { ... }
Then when receiving an event:
function storeEventLocally(DelayedEvent $event): void;
function handleInnerEvent(HasInnerEvent $event): void;
/** @var Event $event */
foreach ($events as $event) {
if ($event instanceof DelayedEvent) {
storeEventLocally($event);
continue;
}
if ($event instanceof HasInnerEvent) {
handleInnerEvent($event);
}
}
With only "is" it doesn't change much (if at all, structurally), but
with the C# rules, I only need a few changes, and it becomes much more
readable (IMHO):
function storeEventLocally(DelayedEvent|null $event): void;
function handleInnerEvent(HasInnerEvent|null $event): void;
/** @var Event $event */
foreach ($events as $event) {
storeEventLocally($shouldContinue = $event as DelayedEvent);
if ($shouldContinue) continue;
$mainEvent = handleInnerEvent($event as HasInnerEvent);
}
In PHP, all we could do is have a one-off runtime check, then return the value unchanged. So written out in full, that would be:
function foo(BarInterface $bar) {
$baz = null;
if ( $bar is Baz ) $baz = $bar;
$baz?->thing();
$bar->otherThing();
}Or:
function foo(BarInterface $bar) {
$baz = $bar;
if ( ! $bar is Baz ) $baz = null;
$baz?->thing();
$bar->otherThing();
}I can see some use in a shorthand for that, but it doesn't have much in common with casts, or any meaning of "as" that I can think of, so I don't think that would be the right name for it.
"as" is just fancy casting (usually), but PHP doesn't have any real
casts other than scalars so it makes the feature itself hard to grok,
and tossing in errors makes the entire example I showed completely
pointless because it would fail. In fact, I am not sure what the case
is for an "as" that throws now that I've written it out a few times.
With fibers/async, "as" is actually more important than "is" (at least
as far as crashing goes):class Foo {
public BarInterface $bar;public function doStuff() {
$baz = $this->bar as Baz;
// some stuff with $baz
callComplexThing(); // suspends current fiber,
// $this->bar is no longer the same object
// or concrete type when we return
$baz->something();
}
}The thing that's stopping this crashing is not the "as", it's that you're backing up the old value of $this->bar into a local variable. Nothing about $baz will "remember" that you made some type or pattern check against it just before/after you assigned it.
I was pointing out issues with the inherent patterns they propose, but
you're not wrong. :)
However, it becomes far more useful to dealing with
scalar casts:function foo(int $type) {}
foo(123.456 as int); // crashes
foo(null as int); // crashesYes, those are actually casts/conversions in PHP, and I've repeatedly said I'd like to have those, on top of the "never fail" casts we have now. Whether that's with an "as" operator, a "cast" function, or some extension of the existing cast syntax, is an open question.
Crucially, though, I don't think most patterns would be relevant in that context, as I pointed out in a previous message.
Regards,
Rowan Tommins
[IMSoP]
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
Hi Larry, I have definitely been looking forward to this. Perhaps more
so than property hooks and avis.
By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
I think that a lot of this would be best broken up. Although much of
it is aimed towards the same general idea, a lot of the pieces have
specific use cases and special syntax additions. Overall I think this
rfc should be simplified to just pattern matching with the is
keyword with the patterns limited to what can be declared as property
types (DNF types) and future scoping everything else. Maybe possibly
with the addition of 1 or 2 of the top requested is
pattern matching
capabilities as secondary votes.
-
as
is
andas
have different responsibilities. I'm guessing the idea
is to keep them in sync. But I would still like to see this as a
future scope with a separate rfc. I do like the idea, and believe it's
much needed. But I think the pattern matching portionis
overshadows
theas
portion causing it not to get as much attention as far as
discussion and analysis goes. Especially if the idea is to sync them,
then that makesas
just as big of an addition asis
For example, what if instead, a generally prefered solution/syntax
were variable types, something like:
var Foo&Bar $my_var = $alpha->bravo;
Although casting and declaring are 2 separate things. This seems like
it would accomplish the same thing without the extra keyword, with the
exception of casting being inline. How much would it get used if both
were added? Would one then become an "anti-pattern"?
-
Literal patterns
Again a really nice addition. Love it and likely will be loved by
many. Although, it definitely deserves its own separate discussion.
Looking at typescript which has both enums and literal types (although
vastly different in php) caused what was once considered a nice
feature to be black listed by many. Also note how typescript separates
type land and value land. Maybe worth considering. -
Wild card
Has already been marked as not necessary it looks like and replaced by mixed. -
Object matching
Absolutely a separate rfc please. Definitely needs discussion. Could
intersect another potentially preferred solution like type aliases.
Sending one or the other into anti-pattern world.
Maybe a solution similar to this would be preferred:
type MyType = Foo&Bar
$foo = $bar as MyType|string
Or maybe not, either way I think it needs its own spotlight.
-
Array sequence patterns
More in depth discussion needed. Not sure how often this comes up as a
structure people want to check against, but it can definitely be done
in user land with array slices. Even though it might be nice for
completion's sake, it may not be worth it if there's not high demand
for it. If at all could be grouped with associative array patterns. -
Associative array patterns
Love to have this one, but it also seems like a small extension or
conflicting with array shapes. Also potentially conflicts with
generics (which may or may not ever be a thing) but still let's give
it the attention it needs. Maybe group with array shapes as well. -
Array shapes
Same as above -
Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybeis
should
simply check and not have any side effects. -
match .. is
Nice shorthand to have but i'd rather not see short hands forced in as
an all or nothing type thing as was done with property hooks. I'd also
argue that maybe short hands should not be added until a feature has
been around for at least one release and is generally accepted. That
way we're not using up syntaxes and limiting the ability to add other
syntax features without breaking backwards compatibility. Keep in mind
that theis
functionality alone allows this (or at least it should)
and a shorter version may or may not be desired.
match(true) {
$var is Foo => 'foo',
...
}
** This is not an order of preference by any means, just listed as
seen in the rfc. **
I'll stop there, and hope the message is received. In summary I would
be plenty grateful for just being able to check against DNF types
initially with support for more pattern types in the near or distant
future.
I think array shapes and literal types are the only ones I'd hope for
a sooner rather than later follow up rfc targeted hopefully for the
same release. Maybe even as secondary votes. The others I'm ok with
waiting for and would rather see follow up rfcs on them as time
permits.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
Hi Larry, I have definitely been looking forward to this. Perhaps more
so than property hooks and avis.By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
I think that a lot of this would be best broken up. Although much of
it is aimed towards the same general idea, a lot of the pieces have
specific use cases and special syntax additions. Overall I think this
rfc should be simplified to just pattern matching with theis
keyword with the patterns limited to what can be declared as property
types (DNF types) and future scoping everything else. Maybe possibly
with the addition of 1 or 2 of the top requestedis
pattern matching
capabilities as secondary votes.
To give more context, as noted, this is a stepping stone toward ADTs. Anything that is on the "hot path" for ADT support I would consider mandatory, so trying to split it up will just take more time and effort. That includes the object pattern and match support, and the object pattern realistically necessitates literals. Variable binding would also be almost mandatory for ADTs. I'm very reluctant to push off anything in that hot path, as every RFC has additional overhead, and I'm all volunteer time. :-)
as
is
andas
have different responsibilities. I'm guessing the idea
is to keep them in sync. But I would still like to see this as a
future scope with a separate rfc. I do like the idea, and believe it's
much needed. But I think the pattern matching portionis
overshadows
theas
portion causing it not to get as much attention as far as
discussion and analysis goes. Especially if the idea is to sync them,
then that makesas
just as big of an addition asis
For example, what if instead, a generally prefered solution/syntax
were variable types, something like:
var Foo&Bar $my_var = $alpha->bravo;
Although casting and declaring are 2 separate things. This seems like
it would accomplish the same thing without the extra keyword, with the
exception of casting being inline. How much would it get used if both
were added? Would one then become an "anti-pattern"?
As proposed, as
is basically:
$foo as Bar|Baz
// Becomes
if (! $foo is Bar|Baz) {
throw \Exception();
}
So it would be pretty easy to do, I believe. Whether that's what we want as
to be, that's a fair question.
- Literal patterns
Again a really nice addition. Love it and likely will be loved by
many. Although, it definitely deserves its own separate discussion.
Looking at typescript which has both enums and literal types (although
vastly different in php) caused what was once considered a nice
feature to be black listed by many. Also note how typescript separates
type land and value land. Maybe worth considering.
As noted above, I don't think it's feasible to postpone this one. It's also pretty simple, and wouldn't have any conflict with enums unless we went all in on the guards options, which we most likely will not in the initial version.
- Wild card
Has already been marked as not necessary it looks like and replaced by mixed.
Some people still want it, even though it's redundant, so it may end up as a secondary vote. I don't much care myself either way.
- Object matching
Absolutely a separate rfc please. Definitely needs discussion. Could
intersect another potentially preferred solution like type aliases.
Sending one or the other into anti-pattern world.
Maybe a solution similar to this would be preferred:type MyType = Foo&Bar $foo = $bar as MyType|string
As noted, this is on the ADT hot path so postponing it is problematic. Especially holding it on type aliases, which have been discussed for longer than this RFC has been around (nearly 4 years) and yet no actual proposal has ever been put forward. It's unwise to wait for such a feature, especially when most likely implementations would dovetail well with patterns anyway.
Array sequence patterns
More in depth discussion needed. Not sure how often this comes up as a
structure people want to check against, but it can definitely be done
in user land with array slices. Even though it might be nice for
completion's sake, it may not be worth it if there's not high demand
for it. If at all could be grouped with associative array patterns.Associative array patterns
Love to have this one, but it also seems like a small extension or
conflicting with array shapes. Also potentially conflicts with
generics (which may or may not ever be a thing) but still let's give
it the attention it needs. Maybe group with array shapes as well.Array shapes
Same as above
To clarify here, these all come as a set. Array shapes aren't their own "thing", they just fall out naturally from array patterns. So it's not possible for associative patterns to conflict with array shapes, as they are literally the same thing. :-) I'd have to check with Ilija but I don't believe there's much internal difference between list and associative patterns. This one isn't on the ADT hot path, so it could be postponed
I see no way for associative array patterns/shapes to conflict with generics at all.
- Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybeis
should
simply check and not have any side effects.
As above, this is core functionality of pattern matching for ADTs, as well as a core feature of every other language that has pattern matching, I believe. It's not out of scope, it's core scope.
- match .. is
Nice shorthand to have but i'd rather not see short hands forced in as
an all or nothing type thing as was done with property hooks. I'd also
argue that maybe short hands should not be added until a feature has
been around for at least one release and is generally accepted. That
way we're not using up syntaxes and limiting the ability to add other
syntax features without breaking backwards compatibility. Keep in mind
that theis
functionality alone allows this (or at least it should)
and a shorter version may or may not be desired.match(true) { $var is Foo => 'foo', ... }
This is also core scope of pattern matching in most languages. It's not just a shorthand, it's a direct enhancement.
--Larry Garfield
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
Hi Larry, I have definitely been looking forward to this. Perhaps more
so than property hooks and avis.By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
I think that a lot of this would be best broken up. Although much of
it is aimed towards the same general idea, a lot of the pieces have
specific use cases and special syntax additions. Overall I think this
rfc should be simplified to just pattern matching with theis
keyword with the patterns limited to what can be declared as property
types (DNF types) and future scoping everything else. Maybe possibly
with the addition of 1 or 2 of the top requestedis
pattern matching
capabilities as secondary votes.To give more context, as noted, this is a stepping stone toward ADTs. Anything that is on the "hot path" for ADT support I would consider mandatory, so trying to split it up will just take more time and effort. That includes the object pattern and match support, and the object pattern realistically necessitates literals. Variable binding would also be almost mandatory for ADTs. I'm very reluctant to push off anything in that hot path, as every RFC has additional overhead, and I'm all volunteer time. :-)
As a user of PHP, this statement concerns me. I don't want any
features rushed just because someone wants it "now" and thus end up
with something feeling half-baked, missing important features or not
completely thought through. I think it's pretty fair that this is a
pretty big RFC (as in scope). Were it a PR under review, I might would
ask you to break it up into smaller PRs because it's too big to
discuss properly and were it smaller PR's we would probably arrive at
a better solution than the giant PR.
Perhaps, it might be worth breaking up into sub-RFC's (is that a
thing?) where each pattern/feature can be discussed independently and
the vote is whether or not it gets implemented in the parent RFC but
has no binding on the parent RFC. Some things may be easy to discuss
(such as type matching) but others might be longer (such as as, or
arrays). But (and this is the key part) it wouldn't stop you from
proceeding on the parent RFC, which is implementing pattern matching
(in general) and any passed sub-RFC's.
Maybe that could be a path forward? I don't know who makes the rules,
but they're human rules and they can be anything -- in theory. Maybe
"sub-RFC's" needs an RFC to define it, but that might let you move
faster than a) trying to do everything all-at-once, and b) let
hard-to-define parts of the overall feature get the time they deserve.
--Larry Garfield
Robert Landers
Software Engineer
Utrecht NL
As a user of PHP, this statement concerns me. I don't want any
features rushed just because someone wants it "now" and thus end up
with something feeling half-baked, missing important features or not
completely thought through. I think it's pretty fair that this is a
pretty big RFC (as in scope). Were it a PR under review, I might would
ask you to break it up into smaller PRs because it's too big to
discuss properly and were it smaller PR's we would probably arrive at
a better solution than the giant PR.Perhaps, it might be worth breaking up into sub-RFC's (is that a
thing?) where each pattern/feature can be discussed independently and
the vote is whether or not it gets implemented in the parent RFC but
has no binding on the parent RFC. Some things may be easy to discuss
(such as type matching) but others might be longer (such as as, or
arrays). But (and this is the key part) it wouldn't stop you from
proceeding on the parent RFC, which is implementing pattern matching
(in general) and any passed sub-RFC's.Maybe that could be a path forward? I don't know who makes the rules,
but they're human rules and they can be anything -- in theory. Maybe
"sub-RFC's" needs an RFC to define it, but that might let you move
faster than a) trying to do everything all-at-once, and b) let
hard-to-define parts of the overall feature get the time they deserve.
Breaking up an RFC into sub-RFCs which are each interdependent for the greater scope to be coherent doesn't make any sense.
An RFC being "large" is not an issue.
Moreover, this RFC is already a "sub-RFC" of the much larger in scope meta ADT RFC.
The scope of this RFC seems pretty moderate to me and well appropriate, and part of this discussion is to establish what may be moved to later.
It is always possible to break down an RFC into new tiny RFCs about every single possible design decision, it doesn't mean this should be done.
Writing RFCs, discussing them, and responding to feedback is extremely exhausting and time-consuming, as such the authors of an RFC are entitled to decide what is in scope and what is not.
They also decide what can be moved out, and what cannot.
Part of being an RFC author is to also stand your ground on the scope and design of your RFC while taking into account the feedback.
Best regards,
Gina P. Banyard
As a user of PHP, this statement concerns me. I don't want any
features rushed just because someone wants it "now" and thus end up
with something feeling half-baked, missing important features or not
completely thought through. I think it's pretty fair that this is a
pretty big RFC (as in scope). Were it a PR under review, I might would
ask you to break it up into smaller PRs because it's too big to
discuss properly and were it smaller PR's we would probably arrive at
a better solution than the giant PR.Perhaps, it might be worth breaking up into sub-RFC's (is that a
thing?) where each pattern/feature can be discussed independently and
the vote is whether or not it gets implemented in the parent RFC but
has no binding on the parent RFC. Some things may be easy to discuss
(such as type matching) but others might be longer (such as as, or
arrays). But (and this is the key part) it wouldn't stop you from
proceeding on the parent RFC, which is implementing pattern matching
(in general) and any passed sub-RFC's.Maybe that could be a path forward? I don't know who makes the rules,
but they're human rules and they can be anything -- in theory. Maybe
"sub-RFC's" needs an RFC to define it, but that might let you move
faster than a) trying to do everything all-at-once, and b) let
hard-to-define parts of the overall feature get the time they deserve.Breaking up an RFC into sub-RFCs which are each interdependent for the greater scope to be coherent doesn't make any sense.
An RFC being "large" is not an issue.
Moreover, this RFC is already a "sub-RFC" of the much larger in scope meta ADT RFC.The scope of this RFC seems pretty moderate to me and well appropriate, and part of this discussion is to establish what may be moved to later.
It is always possible to break down an RFC into new tiny RFCs about every single possible design decision, it doesn't mean this should be done.
Writing RFCs, discussing them, and responding to feedback is extremely exhausting and time-consuming, as such the authors of an RFC are entitled to decide what is in scope and what is not.
They also decide what can be moved out, and what cannot.
Part of being an RFC author is to also stand your ground on the scope and design of your RFC while taking into account the feedback.Best regards,
Gina P. Banyard
Hey Gina,
Sorry, I wasn't exactly clear what I meant on scope. I wasn't
necessarily meaning the feature/RFC, but rather the scope of the
conversation. I count at least 12 new types of syntax here (possibly
more that I missed), and I would be surprised if some of them were to
pass in isolation; but as you said, people want the feature so they'll
pass the RFC and we'll get weird symbols that are near meaningless all
over our code. That's my concern here, not so much the feature. I want
the feature too, but there are some weird things in here that could
use longer discussions but shouldn't hold the overall feature back.
Robert Landers
Software Engineer
Utrecht NL
Sorry, I wasn't exactly clear what I meant on scope. I wasn't
necessarily meaning the feature/RFC, but rather the scope of the
conversation. I count at least 12 new types of syntax here (possibly
more that I missed), and I would be surprised if some of them were to
pass in isolation; but as you said, people want the feature so they'll
pass the RFC and we'll get weird symbols that are near meaningless all
over our code. That's my concern here, not so much the feature. I want
the feature too, but there are some weird things in here that could
use longer discussions but shouldn't hold the overall feature back.Robert Landers
Software Engineer
Utrecht NL
As stated in the original post, the intent of this thread is precisely that: To ask which patterns we should be discussing/working on now and which should be punted for later. Some of the patterns absolutely warrant further discussion that focuses on them specifically, but the intent for now is NOT to have that discussion, but just to discuss which ones to discuss and which are not worth dealing with for the initial RFC so we don't need to debate the details.
--Larry Garfield
- Object matching
Absolutely a separate rfc please. Definitely needs discussion. Could
intersect another potentially preferred solution like type aliases.
Sending one or the other into anti-pattern world.
Maybe a solution similar to this would be preferred:
php type MyType = Foo&Bar $foo = $bar as MyType|string
As noted, this is on the ADT hot path so postponing it is problematic. Especially holding it on type aliases, which have been discussed for longer than this RFC has been around (nearly 4 years) and yet no actual proposal has ever been put forward. It's unwise to wait for such a feature, especially when most likely implementations would dovetail well with patterns anyway.
I agree with Larry here, this is core to pattern matching in general and necessary for ADTs to make any sense (and I want them yesterday).
Moreover, I cannot see how this will conflict with anything vaguely close to type aliases.
Best regards,
Gina P. Banyard
To give more context, as noted, this is a stepping stone toward ADTs. Anything that is on the "hot path" for ADT support I would consider mandatory, so trying to split it up will just take more time and effort. That includes the object pattern and match support, and the object pattern realistically necessitates literals. Variable binding would also be almost mandatory for ADTs. I'm very reluctant to push off anything in that hot path, as every RFC has additional overhead, and I'm all volunteer time. :-)
I get that, and thank you for any free time you both spend here to
help php move forward. I don't mean to imply that some things don't
belong here. However, it does seem like some pieces will need more
fleshing out than others. It would be nice if those that require less
fleshing could proceed without being delayed (or completely rejected)
by those that need more. But it sounds like this is meant to be an all
or nothing type thing.
As noted above, I don't think it's feasible to postpone this one. It's also pretty simple, and wouldn't have any conflict with enums unless we went all in on the guards options, which we most likely will not in the initial version.
That's good to hear. This is one of my top requested items in this rfc
and would love to see it sooner than later.
As noted, this is on the ADT hot path so postponing it is problematic. Especially holding it on type aliases, which have been discussed for longer than this RFC has been around (nearly 4 years) and yet no actual proposal has ever been put forward. It's unwise to wait for such a feature, especially when most likely implementations would dovetail well with patterns anyway.
I should clarify I'm not suggesting to postpone based on this
hypothetical feature. Maybe it wasn't a good one, but I'm just
throwing out some potential curve balls that could be brought up
during discussion. This is one of the items that seems to need a bit
more fleshing out than others, and might potentially lead to further
delay or an outright rejection and missing out on some good pieces of
this rfc.
To clarify here, these all come as a set. Array shapes aren't their own "thing", they just fall out naturally from array patterns. So it's not possible for associative patterns to conflict with array shapes, as they are literally the same thing. :-) I'd have to check with Ilija but I don't believe there's much internal difference between list and associative patterns. This one isn't on the ADT hot path, so it could be postponed
I see no way for associative array patterns/shapes to conflict with generics at all.
Thanks for the clarification. I was looking at the example given in
the overview trying to figure out the boundaries between array
structure patterns, array shape patterns, and nested patterns. It
sounds like it can be summarized to nested patterns with the option to
add or omit ...
determining if unexpected key/values are allowed.
Also again, just throwing out hypothetical curve balls that could
delay or halt other good parts. At this point I don't actually expect
generics to ever be a thing.
- Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybeis
should
simply check and not have any side effects.As above, this is core functionality of pattern matching for ADTs, as well as a core feature of every other language that has pattern matching, I believe. It's not out of scope, it's core scope.
Ok, it just seems like a rift between expectations and reality. When
looking at things like this, if it is not actually something that's
new to me, I like to try to put myself in the shoes of someone who it
would be new to. Why? Because there are things that you can assume a
syntax does, and there are things you would only know a syntax does by
reading the docs (if it's not buried too deeply and you can find it).
I would never have assumed that there was a pattern to be appended to
$foo is Bar
that would automatically assign one of its properties to
another variable if there was a match. But I guess that comes from
ignorance as I haven't had the luxury of familiarizing myself with a
language that does this. So if it is a common thing then I will chalk
it up to a bad assumption of mine.
- match .. is
Nice shorthand to have but i'd rather not see short hands forced in as
an all or nothing type thing as was done with property hooks. I'd also
argue that maybe short hands should not be added until a feature has
been around for at least one release and is generally accepted. That
way we're not using up syntaxes and limiting the ability to add other
syntax features without breaking backwards compatibility. Keep in mind
that theis
functionality alone allows this (or at least it should)
and a shorter version may or may not be desired.match(true) { $var is Foo => 'foo', ... }
This is also core scope of pattern matching in most languages. It's not just a shorthand, it's a direct enhancement.
I didn't mean to downplay the usefulness of this. I do think it would
be incredibly useful, and would most definitely use it. To me a short
hand is always a direct enhancement. So when I say something is a
shorthand I'm not downplaying its significance at all. Although
convenient and will allow us to skip ahead in the timeline, I just
think shorthands should come after a feature, not at the same time.
- Capturing values out of a pattern and binding them to variables if matched
Ok I think that's stepping a bit far out of scope. Maybeis
should
simply check and not have any side effects.As above, this is core functionality of pattern matching for ADTs, as well as a core feature of every other language that has pattern matching, I believe. It's not out of scope, it's core scope.
Ok, it just seems like a rift between expectations and reality. When
looking at things like this, if it is not actually something that's
new to me, I like to try to put myself in the shoes of someone who it
would be new to. Why? Because there are things that you can assume a
syntax does, and there are things you would only know a syntax does by
reading the docs (if it's not buried too deeply and you can find it).I would never have assumed that there was a pattern to be appended to
$foo is Bar
that would automatically assign one of its properties to
another variable if there was a match. But I guess that comes from
ignorance as I haven't had the luxury of familiarizing myself with a
language that does this. So if it is a common thing then I will chalk
it up to a bad assumption of mine.
Hi Brandon,
Something doesn't sit right with me when an argument is made that
something should be in this RFC because it's needed for another,
future RFC (in which case, it seems like it should be part of that
RFC), but in this particular case, one doesn't need to ever care about
ADTs to justify binding being a part of a pattern matching RFC.
Without a familiarity of the concept, I think it's very easy to view
pattern matching as syntactic sugar for type or shape assertions: in
other words, it simply provides another way to ensure type safety.
Believe me, I've made the same mistake! Naming things is hard, and
"pattern matching" as a name just intuitively feels like it should
just be about the checking.
But pattern matching's value is in type checking is incidental, and I
think a core thing to note from the preamble of the RFC is the
following:
In a sense it serves a similar purpose for complex data structures as regular expressions do for strings.
Pattern matching's core purpose is to extract data from a matched data
structure, and just as regular expressions have limited value without
matching groups, pattern matching is similar without binding. I think
it would be difficult to think of an implementation of regular
expressions without matching groups, and similarly it would be
difficult to think of a useful implementation of pattern matching
without the ability to extract the data from the matched data
structure. Because of this, every language that implements the concept
also includes binding.
Hope this helps,
Mark Trapp
To clarify here, these all come as a set. Array shapes aren't their own "thing", they just fall out naturally from array patterns. So it's not possible for associative patterns to conflict with array shapes, as they are literally the same thing. :-) I'd have to check with Ilija but I don't believe there's much internal difference between list and associative patterns. This one isn't on the ADT hot path, so it could be postponed
I see no way for associative array patterns/shapes to conflict with generics at all.
Thanks for the clarification. I was looking at the example given in
the overview trying to figure out the boundaries between array
structure patterns, array shape patterns, and nested patterns. It
sounds like it can be summarized to nested patterns with the option to
add or omit...
determining if unexpected key/values are allowed.
Also again, just throwing out hypothetical curve balls that could
delay or halt other good parts. At this point I don't actually expect
generics to ever be a thing.
To clarify a bit more: There's really only two slight variations on the same pattern for arrays, list and associative. In both cases, each specified element of the array is matched against its own sub-pattern, recursively. That sub-pattern may be (almost) any other pattern.
So this:
$arr is ['a' => 'A', 'b' => 'B', 'c' => [int, int, int], 'd' => string]
Is entirely valid, and decomposes to:
$arr['a'] is 'A'
&& $arr['b'] is 'B'
&& $arr['c'][0] is int
&& $arr['c'][1] is int
&& $arr['c'][1] is int
&& $arr['d'] is string
(This is why literal patterns are basically non-negotiable. On their own, they're kinda pointless. As a sub-pattern of another pattern, they're mandatory.) Array shapes aren't really a feature per se; they're just a natural fall-out of the recursive syntax.
--Larry Garfield
Hi Larry,
Hello, peoples.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
It's definitely not going to make it into 8.4, but we are looking for early feedback on scoping the RFC. In short, there's a whole bunch of possible patterns that could be implemented, and some of them we already have, but we want to get a sense of what scope the zeitgeist would want in the "initial" RFC, which would be appropriate as secondary votes, and which we should explicitly save-for-later. The goal is to not spend time on particular patterns that will be contentious or not pass, and focus effort on fleshing out and polishing those that do have a decent consensus. (And thereby, we hope, avoiding an RFC failing because enough people dislike one little part of it.)
To that end, we're looking for very high level feedback on this RFC:
https://wiki.php.net/rfc/pattern-matching
By "very high level," I mean, please, do not sweat specific syntax details right now. That's a distraction. What we're asking right now is "which of these patterns should we spend time sweating specific syntax details on in the coming weeks/months?" There will be ample time for detail bikeshedding later, and we've identified a couple of areas where we know for certain further syntax development will be needed because we both hate the current syntax. :-)
If you want to just read the Overview section for a survey of the possible patterns and our current recommendations, you likely don't need to read the rest of the RFC at this point. You can if you want, but again, please stay high-level. Our goal at the moment is to get enough feedback to organize the different options into three groups:
- Part of the RFC.
- Secondary votes in the RFC.
- Future Scope.
So we know where to focus our efforts to bring it to a proper discussion.
Thank you all for your participation.
Is is already a really nice RFC, even if not finished yet. Also haven't
fully read it yet.
Thank you for all your work and time put into it!
I do have some questions:
-
For the generics-like pattern I do agree with the others that this
might be dangerous for the future if we (hopefully) are going at it. -
Capturing values out of a pattern and binding them to variables if
matched.
Where this is very helpful especially with match
, from the syntax I
would read it as a condition only.
$p is Point {x: 3, y: $y}; // read as $p->y === $y but it's $y = $p->y
But this is described differently
$p is Point {y: 37, x:@($x)};
I think it would be more readable on switching the logic (somehow). like:
$p is Point {x: 3, y: $y}; // $p->y === $y
$p is Point {x: 3, y:=> $y}; // $y = $p->y
- Regex pattern
This one is interesting as well ... but I would expect native regex
syntax first before introducing it as part of a different RFC. Similar
as generics.
Following up I would expect something like this:
$re = /.*/; // RegEx object
$matches = $re->match($v); // preg_match
$v is $re; // used in pattern matching
which opens up another question: Could we have an interface allowing
objects to match in a specific way?
interface Matchable {
public function match(mixed $value): bool;
}
Thanks for working on it!
Marc
Thank you all for your participation.
Is is already a really nice RFC, even if not finished yet. Also haven't
fully read it yet.
Thank you for all your work and time put into it!I do have some questions:
- For the generics-like pattern I do agree with the others that this
might be dangerous for the future if we (hopefully) are going at it.
I'm unsure. As noted in the introduction, a pattern may look like some other construct but is not that construct. So $a is [1, 2, 3]
is not actually creating an array, for example. That means using array<int> should not, at the engine level, cause any conflict with future generics implementations, should they ever materialize. It's really just a shorthand for
foreach ($arr as $v) if (!is_int($vl)) throw \Exception;
Now, whether or not it would be confusing for the user is a different question. Array-application is not part of the critical path, so if the consensus is to hold off on that for now, we can. (Answering that question is what this thread is for.)
- Capturing values out of a pattern and binding them to variables if matched.
Where this is very helpful especially with
match
, from the syntax I
would read it as a condition only.$p is Point {x: 3, y: $y}; // read as $p->y === $y but it's $y = $p->y
But this is described differently
$p is Point {y: 37, x:@($x)};
I think it would be more readable on switching the logic (somehow). like:
$p is Point {x: 3, y: $y}; // $p->y === $y $p is Point {x: 3, y:=> $y}; // $y = $p->y
There was a bug in that example that I fixed this morning. It's now:
$p is Point {x: 3, y: $y}; // If $p->x === 3, bind $p->y to $y and return true.
Please ignore the old buggy version. :-)
- Regex pattern
This one is interesting as well ... but I would expect native regex
syntax first before introducing it as part of a different RFC. Similar
as generics.
Named capture groups are already part of regex syntax, just not often used. The example is not introducing anything new there. (Although Ilija tells me it may be hard to implement, so it may get postponed anyway. TBD.)
Following up I would expect something like this:
$re = /.*/; // RegEx object $matches = $re->match($v); // preg_match $v is $re; // used in pattern matching
which opens up another question: Could we have an interface allowing
objects to match in a specific way?interface Matchable { public function match(mixed $value): bool; }
Oh my. I'm not sure how feasible that would be, or what the implications would be. Definitely future-scope at best. :-)
--Larry Garfield
- Regex pattern
This one is interesting as well ... but I would expect native regex
syntax first before introducing it as part of a different RFC. Similar
as generics.
Named capture groups are already part of regex syntax, just not often used. The example is not introducing anything new there. (Although Ilija tells me it may be hard to implement, so it may get postponed anyway. TBD.)
I think what Marc means is that it's weird to see the regex syntax
outside of a quoted string, since we don't currently have that syntax
defined in PHP.
It's not even clear what the constraints of that syntax would be; for
instance, this is valid PHP:
$result = preg_match(subject: $string, pattern: "#
(int|string)
/(./)
#xi");
But I'm guessing it would be a challenge to parse this:
$result = $string is #
(int|string)
/(./)
#xi;
The simple answer is probably to use a more limited syntax, but defining
that syntax probably deserves its own discussion.
--
Rowan Tommins
[IMSoP]
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
Hello Larry.
Thanks for this Juicy proposal.
I agree that this proposal should be proposed as a whole. If we break
it down into smaller parts, there is a chance some features will not
pass. I will show you that the range pattern and regex pattern have
greater value than what you think, and they must be placed together
with the literal pattern (in Core patterns section, not in Possible
patterns section). I don't want these patterns to fail in the poll.
I don't know the line between "high level" and beyond. I hope that
whatever I discuss here is still "high level".
-
Is there any chance of
is not
orisnot
? I am tired
looking this ugly code:!($foo instanceof Bar)
. If there is,
there should be a restricted version of pattern: no variable binding.
It doesn't make sense if the binding happens but the pattern fails. -
..=
and..<
make people have to remember what the left
side is: is it greater-than or greater-than-equal. What if those
operators are replaced with=..=
and<..=
? We don't have
to align with other languages if they have weaknesses. -
There are "between" and
>
operators explicitly shown in the
proposal (in Range pattern section). I assume that>=
,<
,
<=
implicitly included. What about the%
operator, is it
also included?%
is not directly produces boolean, but it can
simplify pattern. Note that, we must include additional rules: if
produced number is0
(zero) then it will be converted to
true
, otherwisefalse
.
$foo = 2024; // leap year
$foo is %4; // true
$bar = 2025; // not a leap year
$bar is %4; // false
// furthermore:
$baz is array<%4>; // true if all its members are leap year
- There are numeric range patterns explicitly shown in the proposal.
Hopefully there are also string range patterns.
$birthdate is "2000-01-01 00:00:00" <..< "2019-12-31 23:59:59";
$dateOfDeath is <"1980-01-01";
$name is "larry " =..< "larry!"; // true if first name is "larry"
I have no idea about string range patterns for array, since it causes
error. If there is currently no solution, hopefully we don't abandon
string range patterns entirely. The same case also occurs with numeric
range patterns for array.
- It is great that regular expression native syntax is now a first
class citizen in php (at least for pattern matching). I found there
are 2 drawbacks in this case: loss of flexibility and repeated
identical patterns. These things are not found in string based regular
expression.
// native syntax
$foo is /^https:\/\/(?<hostname>[^\/]*)/;
// string based: we can use any valid char as delimiter,
// as long as it is not used in the pattern.
$pattern = "/^https:\/\/(?<hostname>[^\/]*)/";
$pattern = "|^https://(?<hostname>[^/]*)/|"; // valid!
$pattern = "@^https://(?<hostname>[^/]*)/@"; // valid!
// ----------------------
// native syntax
$foo is /^https:\/\/(?<hostname>[^\/]*)/ | StringBox{value:
/^https:\/\/(?<hostname>[^\/]*)/};
// string based: RE stored in variable/constant can be used as many as we need
$pattern = "|^https://(?<hostname>[^/]*)/|";
$foo is @RE($pattern) | StringBox{value: @RE($pattern)};
// 1st pattern is scalar string, 2nd is string encapsulated in a class
// furthermore
use GenericPattern as GP;
class Person {
public string $firstName is @RE(GP::NAME_PTRN);
public string $lastName is @RE(GP::NAME_PTRN);
// ...
}
Hopefully, string based regular expression is also supported.
Honestly, if I could choose whether I should support native or
string-based syntax, I would choose to support string-based, as long
as native syntax is not yet fully supported in general.
@RE()
is just an illustration on how to use variable as a
regular expression in the pattern. It is stated that @()
will be
used in arbitrary expressions. Correct me if I'm wrong, regexp are one
of the things in programming that cannot be manipulated or participate
in manipulation with other parties, even with fellow regexp. Regexp
must be used alone. That is why we need dedicated syntax for regexp.
- It is shown that type pattern, literal pattern, class constant
pattern, and expression pattern can be used to form a compound
pattern. Hopefully, the range pattern and regex pattern have the same
luxury. Furthermore, hopefully they can be mixed.
$foo is 2000 =..= 2100 & %4; // leap year in 21st century
class person {
public string $firstName is /LENGTH_PTRN/
& /FORBIDDEN_CHARS_PTRN/
& /FORBIDDEN_WORDS_PTRN/;
public string $lastName is @RE(GP::LENGTH_PTRN)
& @RE(GP::FORBIDDEN_CHARS_PTRN)
& @RE(GP::FORBIDDEN_WORDS_PTRN);
// these are hard to maintain
public string $firstName is @RE(GP::COMPLEX_NAME_PTRN);
public string $lastName is @RE(GP::COMPLEX_NAME_PTRN);
}
7. I noticed that ```as``` is tightly coupled with exception. Can we
suppress this exception with ```??```?
// it is weird to see this statement (as show in the proposal)
$value = $foo as Foo {$username, $password};
// these statements are more make sense
$foo as Foo {$username, $password};
$foo is Foo {$username, $password} ?: throw new Exception();
// if it can be suppressed, "as" is more valuable than what people think.
$newRect = $rect as Rectangle{width: <=10, height: <=5} ?? new
Rectangle(width: 10, height: 5);
8. Is there any type checking for object property pattern?
class Circle {
public int $radius;
}
// this statement will always fails unless there is type checking
$circle is Circle{radius: "10"};
~~~~~
4 of the 8 points I discussed above are related to range pattern and
regex pattern. Both are used daily. From my point of view, literal
pattern is no more special than range pattern and regex pattern. IMO,
both should be placed in Core patterns section, not in Possible
patterns section.
Regards, Hendra Gunawan.
On Mon, Jun 24, 2024 at 7:04 AM Hendra Gunawan
the.liquid.metal@gmail.com wrote:
I agree that this proposal should be proposed as a whole. If we break
it down into smaller parts, there is a chance some features will not
pass.
This is exactly my concern and I fear we will be dealing with the
repercussions of a non-well-thought-out sub-feature for years to come.
Why does new Pattern()
have the ability to accept patterns as a
parameter? Can other code do that as well? Why is @($thing) a thing? @
is already an operator so seeing it used like this is strange. Same
with ~ . It's also weird to see ?
on the left hand side of a
literal.
PHP already has variable-variables, so use those instead of @(): $foo
is $$mypattern.
There's no need to use ?
to check for existence on a key, so this:
$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, ...];
because $arr['non-existent-key'] is NULL.
~ has no meaning in mode 0, only in strict mode, but will force people
in mode 0 to work like they are in strict mode or litter their code
with ~'s. PHP is duck-typed and making it act like it isn't is just
bad DX. Further, it isn't clear how it will affect literals since it
is also bitwise-not.
There's no need to use
?
to check for existence on a key, so this:$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, ...];
because $arr['non-existent-key'] is NULL.
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways. I’m fine with requiring bindings to be nullable for optional keys, but it strikes me as strictly less flexible and not consistent with the rest of PHP’s behavior, at least not under E_ALL.
(Sorry for any dups. juggling sender addresses to make the listserv happy, fixed an incomplete thought while I was at it)
Cheers,
chuck
There's no need to use
?
to check for existence on a key, so this:$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, ...];
because $arr['non-existent-key'] is NULL.
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways. I’m fine with requiring bindings to be nullable for optional keys, but it strikes me as strictly less flexible and not consistent with the rest of PHP’s behavior, at least not under E_ALL.
Hi Chuck,
To be honest, this is one of the smaller concerns I have with the new
syntax. There might be some misunderstanding here, though. A
non-existent key is NULL, always has been, and always will be. No
current RFCs are stating otherwise, to my knowledge. While a warning
is emitted when accessing a non-existent key, and some people trigger
exceptions or fatal errors on warnings, this isn’t universal, and it
isn’t always an error (after all, the value is NULL, which might be a
properly handled case). If you see the warning, it’s usually a hint
that something might be wrong, but often, a quick ?? is all you need
to make the warning go away, further reinforcing that the value of a
non-existent key is NULL.
$arr = ['a' => 'a string'];
$arr is ['a' => string, ?'b' => $value, ...];
This syntax implies that a non-existent key is a special case, and if
it passes as-is, it will be. If there is a binding and the key is
missing, what happens to that binding? Is it left in an undefined
state or is it assigned NULL? If it is assigned NULL, what is the
value of checking for non-existence; why not just check for NULL
in
the value? If it is left "undefined", then you still have to handle it
as though it is a NULL
value (perhaps more so because simply trying to
detect its value may cause the very warning you are trying to avoid --
and will fatally error in PHP 9: https://3v4l.org/Rbadk. see
https://wiki.php.net/rfc/undefined_variable_error_promotion).
Thus, this "key existence check" is inconsistent with the language and
possibly redundant, as you'd have to check that resulting value was
null anyway:
$arr is ['a' => string, 'b' => ?string, ...];
Hope that clarifies things.
$arr = ['a' => 'a string'];
$arr is ['a' => string, ?'b' => $value, ...];This syntax implies that a non-existent key is a special case, and if
it passes as-is, it will be. If there is a binding and the key is
missing, what happens to that binding? Is it left in an undefined
state or is it assigned NULL? If it is assigned NULL, what is the
value of checking for non-existence; why not just check forNULL
in
the value? If it is left "undefined", then you still have to handle it
as though it is aNULL
value (perhaps more so because simply trying to
detect its value may cause the very warning you are trying to avoid --
and will fatally error in PHP 9:https://3v4l.org/Rbadk. see
https://wiki.php.net/rfc/undefined_variable_error_promotion).Thus, this "key existence check" is inconsistent with the language and
possibly redundant, as you'd have to check that resulting value was
null anyway:$arr is ['a' => string, 'b' => ?string, ...];
Hope that clarifies things.
I see what you mean: leaving $value undefined would be a high-caliber footgun for sure! I suppose binding to optional keys could just be verboten, but that kind of restriction would make for a lousy DX. Distinguishing between “optional and not nullable” and “not optional but nullable” still seems reasonable for array shapes in general, but optional and nullable pretty much have to become one once bindings are involved.
So color me convinced :)
Cheers,
chuck
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways. I’m fine with requiring bindings to be nullable for optional keys, but it strikes me as strictly less flexible and not consistent with the rest of PHP’s behavior, at least not under E_ALL.
To be honest, this is one of the smaller concerns I have with the new
syntax. There might be some misunderstanding here, though. A
non-existent key is NULL, always has been, and always will be.
This is just not accurate. Inexistent indexes are not null in PHP,
they are undefined. PHP implicitly coerces undefined to null, because
undefined is not a value accessible to users. The same occurs when
accessing $undefinedVariable. For arrays, this fact is observable
through foreach
, warnings when accessing the index, and likely
others.
So yes, [?'foo' => string]
and ['foo' => ?string]
are indeed
different. The former accepts []
, while the latter accepts ['foo' => null]
.
$arr = ['a' => 'a string'];
$arr is ['a' => string, ?'b' => $value, ...];This syntax implies that a non-existent key is a special case, and if
it passes as-is, it will be. If there is a binding and the key is
missing, what happens to that binding?
This is the same problem as |
. Variable bindings within optional
keys must be forbidden. I already mentioned that to Larry when we
thought about this idea.
Ilija
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways. I’m fine with requiring bindings to be nullable for optional keys, but it strikes me as strictly less flexible and not consistent with the rest of PHP’s behavior, at least not under E_ALL.
To be honest, this is one of the smaller concerns I have with the new
syntax. There might be some misunderstanding here, though. A
non-existent key is NULL, always has been, and always will be.This is just not accurate. Inexistent indexes are not null in PHP,
they are undefined. PHP implicitly coerces undefined to null, because
undefined is not a value accessible to users. The same occurs when
accessing $undefinedVariable. For arrays, this fact is observable
throughforeach
, warnings when accessing the index, and likely
others.
This is a bit like telling someone who fell off a ladder that they didn’t “technically” fall, instead the Earth and them pulled at each other until they collided and the ground + body absorbed the energy.
While yes, you are “technically” correct, what you describe is essentially unobservable from the context of the running code (unless you turn the warning into an error/exception). For all direct accesses of array values ($arr['key']) an array is infinitely full of nulls (I have actually depended on this property at one point for a bloom filter).
So yes,
[?'foo' => string]
and['foo' => ?string]
are indeed
different. The former accepts[]
, while the latter accepts['foo' => null]
.
Are they actually different in practice though? That was my point. After the “is” in both cases, you’ll have to use null-coalescence to retrieve the value. For all intents, they are the same resulting code. If you can show a difference in the resulting code and how it is an improvement, I may be inclined to agree, but I can’t think of one.
$arr = ['a' => 'a string'];
$arr is ['a' => string, ?'b' => $value, ...];This syntax implies that a non-existent key is a special case, and if
it passes as-is, it will be. If there is a binding and the key is
missing, what happens to that binding?This is the same problem as
|
. Variable bindings within optional
keys must be forbidden. I already mentioned that to Larry when we
thought about this idea.Ilija
— Rob
Hi Rob
To be honest, this is one of the smaller concerns I have with the new
syntax. There might be some misunderstanding here, though. A
non-existent key is NULL, always has been, and always will be.This is just not accurate. Inexistent indexes are not null in PHP,
they are undefined. PHP implicitly coerces undefined to null, because
undefined is not a value accessible to users. The same occurs when
accessing $undefinedVariable. For arrays, this fact is observable
throughforeach
, warnings when accessing the index, and likely
others.This is a bit like telling someone who fell off a ladder that they didn’t “technically” fall, instead the Earth and them pulled at each other until they collided and the ground + body absorbed the energy.
While yes, you are “technically” correct, what you describe is essentially unobservable from the context of the running code (unless you turn the warning into an error/exception). For all direct accesses of array values ($arr['key']) an array is infinitely full of nulls (I have actually depended on this property at one point for a bloom filter).
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.
So yes,
[?'foo' => string]
and['foo' => ?string]
are indeed
different. The former accepts[]
, while the latter accepts['foo' => null]
.Are they actually different in practice though? That was my point. After the “is” in both cases, you’ll have to use null-coalescence to retrieve the value. For all intents, they are the same resulting code. If you can show a difference in the resulting code and how it is an improvement, I may be inclined to agree, but I can’t think of one.
Sure. If a null value were to mean "not set", then ['foo' => string]
should accept ['foo' => 'foo', 'bar' => null], which is absolutely
observable if the code assumes it won't see any additional indexes.
Ilija
Hi Rob
To be honest, this is one of the smaller concerns I have with the new
syntax. There might be some misunderstanding here, though. A
non-existent key is NULL, always has been, and always will be.This is just not accurate. Inexistent indexes are not null in PHP,
they are undefined. PHP implicitly coerces undefined to null, because
undefined is not a value accessible to users. The same occurs when
accessing $undefinedVariable. For arrays, this fact is observable
throughforeach
, warnings when accessing the index, and likely
others.This is a bit like telling someone who fell off a ladder that they didn’t “technically” fall, instead the Earth and them pulled at each other until they collided and the ground + body absorbed the energy.
While yes, you are “technically” correct, what you describe is essentially unobservable from the context of the running code (unless you turn the warning into an error/exception). For all direct accesses of array values ($arr['key']) an array is infinitely full of nulls (I have actually depended on this property at one point for a bloom filter).
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.
I think there is a difference between an empty array and a null, and that is (hopefully) self-evident. I’m talking about the infinite nulls IN the array. You can write a for loop of all possible keys until the end of the universe, and all you will get is null. This is fairly easy to prove. I'll wait... :p
So yes,
[?'foo' => string]
and['foo' => ?string]
are indeed
different. The former accepts[]
, while the latter accepts['foo' => null]
.Are they actually different in practice though? That was my point. After the “is” in both cases, you’ll have to use null-coalescence to retrieve the value. For all intents, they are the same resulting code. If you can show a difference in the resulting code and how it is an improvement, I may be inclined to agree, but I can’t think of one.
Sure. If a null value were to mean "not set", then ['foo' => string]
should accept ['foo' => 'foo', 'bar' => null], which is absolutely
observable if the code assumes it won't see any additional indexes.
The only way you’d observe this (that I can think of) is by performing a for-each loop over the array. In this case, I can't think of a reason you would assert the shape of individual keys beforehand. Maybe someone who would do that can chime in with a realistic example.
What I was talking about is something like this:
$hasFoo = $arr is [?'foo' => string];
// if I understand the RFC correctly, $hasFoo is true for [] and ['foo' => 'some-string']
if ( $hasFoo ) {
$foobar = $arr['foo'] ?? null;
}
Now with the alternative:
$hasFoo = $arr is ['foo' => ?string];
// in theory, this matches [], ['foo' => null] and ['foo' => 'some-string']
if ( $hasFoo ) {
$foobar = $arr['foo'] ?? null;
}
Note that the actual code did NOT change. What we have is that in the first example, a person well-versed in PHP is going to wonder why ($hasFoo is false) === true
and ($arr['foo'] is null) === true
. It's logically inconsistent with itself because we clearly specified that it shouldn't exist but it does exist and the value is NULL.
Yeah, it's weird that arrays are infinitely full of null, but surprisingly useful. If we don't like it, we can always create an RFC to treat non-existent keys as an error instead of a warning.
Ilija
— Rob
The only way you’d observe this (that I can think of) is by performing a for-each loop over the array.
There are many ways you can observe the difference between an absent key and a null value; here are just a handful off the top of my head:
-
array_key_exists()
(that's literally its purpose!) -
array_keys()
-
count()
(if the array held "an infinite number of nulls", we should return infinity for every array!) -
json_encode()
-
print_r()
,var_dump()
,var_export()
-
extract()
It may be questionable to give meaning to the difference in some of these cases, but different it definitely is.
If we don't like it, we can always create an RFC to treat non-existent keys as an error instead of a warning.
I believe that is the explicit intention or desire of those who raised it from Notice to Warning. It would certainly prevent some bugs where a typo leads to the wrong key being accessed.
Personally, I'd like to see a few use cases catered for first, like $counters[$key]++ and $groups[$key][] = $value; Perhaps by introducing some equivalent of Python's "defaultdict". Because I do agree that the current behaviour is useful sometimes (even if I disagree in how to describe it).
Regards,
Rowan Tommins
[IMSoP]
The only way you’d observe this (that I can think of) is by performing a for-each loop over the array.
There are many ways you can observe the difference between an absent key and a null value; here are just a handful off the top of my head:
array_key_exists()
(that's literally its purpose!)array_keys()
count()
(if the array held "an infinite number of nulls", we should return infinity for every array!)json_encode()
print_r()
,var_dump()
,var_export()
extract()
It may be questionable to give meaning to the difference in some of these cases, but different it definitely is.
True, but I was mainly referring to what you would do after performing an 'is', in which case, you probably wouldn't be using any of those functions, or if you needed to, then why do you need 'is'? Even with the
$hasFoo = $arr is [?'foo' => string];
You still have to run array_key_exists()
to determine whether the key exists, which means you likely still need to figure out a default value, and null-coalesce is perfect for that ... but then it just points out that it isn't that useful of a check, and that it is inconsistent with itself.
If you are running array_keys()
, then why bother performing an 'is' when you should already know the structure—that's the entire point of it. If you are running a count()
, this kind of goes back to the same thing, but I would think count()
would be run on a collection of data rather than structured data.
For json_encode, I could see something like this being a "final check" of some sort, but most people are likely json_encoding objects these days. This might be a good use case for this syntax, but I feel like this is the wrong way to solve the problem.
print_r, var_dump, etc. are more or less debugging tools. At least, I've never seen their output used for program execution. I could be wrong.
extract()
can go get crushed by an RFC :) but it does occasionally have its usefulness. Even then, until PHP 9, an undefined variable will still be null.
If we don't like it, we can always create an RFC to treat non-existent keys as an error instead of a warning.
I believe that is the explicit intention or desire of those who raised it from Notice to Warning. It would certainly prevent some bugs where a typo leads to the wrong key being accessed.
Personally, I'd like to see a few use cases catered for first, like $counters[$key]++ and $groups[$key][] = $value; Perhaps by introducing some equivalent of Python's "defaultdict". Because I do agree that the current behaviour is useful sometimes (even if I disagree in how to describe it).
I think arrays have their uses, but I'd rather see some purpose-built data structures where you can fully take advantage of their performance properties—things like linked lists, heaps, etc. Yeah, there's SPL, but its interfaces are kind of a mess since you can use a queue like a deque or a stack (which is just weird and might or might not have performance implications), for example.
Regards,
Rowan Tommins
[IMSoP]
— Rob
It may be questionable to give meaning to the difference in some of
these cases, but different it definitely is.True, but I was mainly referring to what you would do after performing
an 'is', in which case, you probably wouldn't be using any of those
functions, or if you needed to, then why do you need 'is'? Even with the$hasFoo = $arr is [?'foo' => string];
You still have to run
array_key_exists()
to determine whether the key
exists, which means you likely still need to figure out a default value,
and null-coalesce is perfect for that ... but then it just points out
that it isn't that useful of a check, and that it is inconsistent with
itself.
So the issue has nothing to do with this hypothetical infinity of
unobservable nulls, and comes entirely down to the fact that with this
pattern a variable may pass
a) because it does not have a key named 'foo', or
b) because it has a key named 'foo' with a string value.
In other words, "this key is optional, but if it is defined it must
match this pattern".
On its lonesome, that doesn't look very useful, but I expect it would be
one component of a larger pattern (such as "['bar' => string, ?'foo' =>
string, ...]"). Rather than (near-)duplicate blocks for "it does not
have the key" and "it has the key and it holds a string", there can be
one block (which might or might not care about the distinction).
It may be questionable to give meaning to the difference in some of
these cases, but different it definitely is.True, but I was mainly referring to what you would do after performing
an 'is', in which case, you probably wouldn't be using any of those
functions, or if you needed to, then why do you need 'is'? Even with the$hasFoo = $arr is [?'foo' => string];
You still have to run
array_key_exists()
to determine whether the key
exists, which means you likely still need to figure out a default value,
and null-coalesce is perfect for that ... but then it just points out
that it isn't that useful of a check, and that it is inconsistent with
itself.So the issue has nothing to do with this hypothetical infinity of
unobservable nulls, and comes entirely down to the fact that with this
pattern a variable may pass
a) because it does not have a key named 'foo', or
b) because it has a key named 'foo' with a string value.
I think this will be my last email on the subject because it’s like talking to a brick wall.
There’s nothing hypothetical about it.
while(true) var_dump([][random_int()]);
In other words, "this key is optional, but if it is defined it must
match this pattern".
Seriously, write out using it both ways. I asked in the beginning for someone to give a realistic example showing a practical difference in the final implementation and I haven’t seen one. I will gracefully eat my hat. The main issue is that key-existence-check is logically inconsistent with itself (if you say the key shouldn’t exist or be a string, you’d be surprised to get null from that key!)
And with that, I bid adieu to this topic.
On its lonesome, that doesn't look very useful, but I expect it would be
one component of a larger pattern (such as "['bar' => string, ?'foo' =>
string, ...]"). Rather than (near-)duplicate blocks for "it does not
have the key" and "it has the key and it holds a string", there can be
one block (which might or might not care about the distinction).
— Rob
Hi Rob
So the issue has nothing to do with this hypothetical infinity of
unobservable nulls, and comes entirely down to the fact that with this
pattern a variable may pass
a) because it does not have a key named 'foo', or
b) because it has a key named 'foo' with a string value.In other words, "this key is optional, but if it is defined it must
match this pattern".Seriously, write out using it both ways. I asked in the beginning for someone to give a realistic example showing a practical difference in the final implementation and I haven’t seen one. I will gracefully eat my hat. The main issue is that key-existence-check is logically inconsistent with itself (if you say the key shouldn’t exist or be a string, you’d be surprised to get null from that key!)
function test($value) {
if ($value is ['foo' => ?string]) {
$foo = $value['foo'];
}
}
test([]);
With your approach, the example above would emit a warning, even
though the context within test() doesn't look like it should. If
?string means that the index might or might not exist, all code that
accesses them must check for existence, even when not needing to
handle null itself. That doesn't seem desirable to me.
I also think the issue goes further. If anything|null means that the
offset might not exist, does that include mixed? That makes ['foo' =>
mixed] essentially useless.
Ilija
function test($value) {
if ($value is ['foo' => ?string]) {
$foo = $value['foo'];
}
}
test([]);
(Scroll to the end if you don’t care about my rebuttal)
I mean, there’s nothing wrong with this code, $foo will be null at the end.
With your approach, the example above would emit a warning, even
though the context within test() doesn't look like it should.
That happens every time I get one of these warnings. It’s nice to get the heads up that I didn’t think through something and need to either A) figure out how it got there, or B) provide a default — even if it is null — to make the warning go away.
If ?string means that the index might or might not exist, all code that
accesses them must check for existence, even when not needing to
handle null itself. That doesn't seem desirable to me.
This is true already. since this warning and null-coalescence, I can probably count on one hand the number of times I’ve not written something like ?? null beside an array access (though I prefer throwing an exception if appropriate).
I also think the issue goes further. If anything|null means that the
offset might not exist, does that include mixed? That makes ['foo' =>
mixed] essentially useless.
In thinking about it some more, for lists, it would be nice if there were two modes:
- regular mode: $a is [1,2,3] where it must match exactly.
- set mode: $a is set([1,2,3]) where order doesn’t matter, only that $a contains at least one of every given element.
——————
In other news, I finally found a realistic example where there is a difference between non-existence and null.
The scenario is this:
Imagine you accept a callback and you prepare an array to pass as parameters. In this case, you call the callback like so:
$callback(…$args);
Today, you either have to yolo it and hope everything is perfect or meticulously check the args to make sure the types match your documented callback signature.
If a key in $args doesn’t exist, it might be an error (or default value provided by the callback implementation) but if it is null, it might also be disastrously incorrect because the default value won’t be used.
By using pattern matching, we can check the structure of the array for non-existence or if exists, the documented type. This is where matching a list exactly is important (if using positional args).
So, now that there is a use case for it, we can work out expected behaviors of lists and associative arrays and nullability for both.
Anyway, does anyone have a suggestion for what kind of hat I should have for dinner?
— Rob
I wonder if this could be helpful in implementing Generic Classes?
https://wiki.php.net/rfc/generics
Interface Boxable {...}
class Box<T is Boxable> {...}
Taken from the RFC ^
Best,
Richard Miles
So the issue has nothing to do with this hypothetical infinity of
unobservable nulls, and comes entirely down to the fact that with this
pattern a variable may pass
a) because it does not have a key named 'foo', or
b) because it has a key named 'foo' with a string value.
Existing array shape implementations already distinguish optional and nullable, I can’t imagine a pattern syntax that doesn’t do that too, It’s just the case of binding with optional keys that seems to be a special case that means the type of whatever you
So, possibly inventing a novel binding syntax here because I haven’t followed the syntax discussion and I don’t see bindings in array shapes in the RFC:
[‘foo’ => $x is ?string] // just fine, $x is explicitly string|null
[?‘foo’ => $x is ?string] // also fine, $x is explicitly string|null
[?‘foo’ => $x is string] // $x is actually string|null
There’s a precedent with function parameters:
function foo(string $bar = null) // $bar is string|null
There’s some difference though
[] is [?’foo’ => $x is string] // matches, $x === null
[‘foo’ => null] is [?’foo’ => $x is string] // match fails
And sorry if my sample syntax is weird and ambiguous. I learned in the 90’s that it depends on what your definition of ‘is’ is ;)
Cheers,
chuck
There’s some difference though
[] is [?’foo’ => $x is string] // matches, $x === null
Except null is not a string.
There’s a precedent with function parameters:
function foo(string $bar = null) // $bar is string|null
which has already been voted for deprecation in 8.4:
https://wiki.php.net/rfc/deprecate-implicitly-nullable-types
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.I think there is a difference between an empty array and a null, and
that is (hopefully) self-evident. I’m talking about the infinite nulls
IN the array. You can write a for loop of all possible keys until the
end of the universe, and all you will get is null. This is fairly easy
to prove. I'll wait... :p
What about the difference between an empty array an an array that
contains a null (Ilija's example)?
echo count([]);
echo count([null]);
echo count([null, null]);
echo count([null, null, null]);
echo count([null, null, null, null]);
...
You're arguing that these are all the same array?
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.I think there is a difference between an empty array and a null, and
that is (hopefully) self-evident. I’m talking about the infinite nulls
IN the array. You can write a for loop of all possible keys until the
end of the universe, and all you will get is null. This is fairly easy
to prove. I'll wait... :pWhat about the difference between an empty array an an array that
contains a null (Ilija's example)?echo count([]);
echo count([null]);
echo count([null, null]);
echo count([null, null, null]);
echo count([null, null, null, null]);
...You're arguing that these are all the same array?
If you are accessing them by index, yes, they are all the same array. There is no observable difference. I think we already covered that count()
would show the difference between them since it’s actually a count of known indices:
for($i = 0; $i < 4; $i++) var_dump([]);
Will output 4 nulls.
— Rob
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.I think there is a difference between an empty array and a null, and
that is (hopefully) self-evident. I’m talking about the infinite nulls
IN the array. You can write a for loop of all possible keys until the
end of the universe, and all you will get is null. This is fairly easy
to prove. I'll wait... :pWhat about the difference between an empty array an an array that
contains a null (Ilija's example)?echo count([]);
echo count([null]);
echo count([null, null]);
echo count([null, null, null]);
echo count([null, null, null, null]);
...You're arguing that these are all the same array?
If you are accessing them by index, yes, they are all the same array. There is no observable difference. I think we already covered that
count()
would show the difference between them since it’s actually a count of known indices:
Sorry, I’ve not yet had enough coffee, this should be:
$arr = [];
for($i = 0; $i < 4; $i++) var_dump($arr[$i]);
Will output 4 nulls.
— Rob
— Rob
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.I think there is a difference between an empty array and a null, and
that is (hopefully) self-evident. I’m talking about the infinite nulls
IN the array. You can write a for loop of all possible keys until the
end of the universe, and all you will get is null. This is fairly easy
to prove. I'll wait... :pWhat about the difference between an empty array an an array that
contains a null (Ilija's example)?echo count([]);
echo count([null]);
echo count([null, null]);
echo count([null, null, null]);
echo count([null, null, null, null]);
...You're arguing that these are all the same array?
If you are accessing them by index, yes, they are all the same array.
There is no observable difference. I think we already covered thatcount()
would show the difference between them since it’s actually a count of known
indices:Sorry, I’ve not yet had enough coffee, this should be:
$arr = [];
for($i = 0; $i < 4; $i++) var_dump($arr[$i]);
Will output 4 nulls.
— Rob
— Rob
You are only half-correct.
It will also output 4 undefined index warnings in strict_types=1 mode :)
https://3v4l.org/DJ4AI
--
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius
__
If null array values were indeed unobservable, then [] would be === to
[null] (or at least ==), and a foreach over [null] would result in 0
iterations. But neither of those are the case.I think there is a difference between an empty array and a null, and
that is (hopefully) self-evident. I’m talking about the infinite nulls
IN the array. You can write a for loop of all possible keys until the
end of the universe, and all you will get is null. This is fairly easy
to prove. I'll wait... :pWhat about the difference between an empty array an an array that
contains a null (Ilija's example)?echo count([]);
echo count([null]);
echo count([null, null]);
echo count([null, null, null]);
echo count([null, null, null, null]);
...You're arguing that these are all the same array?
If you are accessing them by index, yes, they are all the same array. There is no observable difference. I think we already covered that
count()
would show the difference between them since it’s actually a count of known indices:Sorry, I’ve not yet had enough coffee, this should be:
$arr = [];
for($i = 0; $i < 4; $i++) var_dump($arr[$i]);
Will output 4 nulls.
— Rob
— Rob
You are only half-correct.
It will also output 4 undefined index warnings in strict_types=1 mode :) https://3v4l.org/DJ4AI
It will always output that warning, regardless of strict types, but as this thread has already covered numerous times, it is unobservable from the executing code unless you turn the warning into an exception.
--
Arvīds Godjuks
+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius
— Rob
It will always output that warning, regardless of strict types, but as this thread has already covered numerous times, it is unobservable from the executing code unless you turn the warning into an exception.
I think this conversation is probably going round in circles, becoming a kind of "no true Scotsman" debate of "well, yes, you can observe it that way, but..." and getting further and further away from the topic of this thread.
I believe the pattern syntax in question has already been earmarked for "future scope", so if/when it actually gets proposed, we can discuss the use cases, which is what actually matters.
Regards,
Rowan Tommins
[IMSoP]
$p is Point {x: 3, y: $y};
// If $p->x === 3, bind $p->y to $y and return true.
$p is Point {y: 37, x:@($x)};
// $p->x === $x && $p->y == 37
I'm just going to put in my $0.02 here and downvote this syntax. I
believe that it should be swapped. IMO, {x: 3, y: $y}
looks too
similar to the named arguments syntax.
Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it.
[THIS REPLY IS FOR THE CONVENIENCE OF https://externals.io READERS
ONLY. SORRY FOR THE UNPLEASANT DISPLAY]
Hello Larry.
Thanks for this Juicy proposal.
I agree that this proposal should be proposed as a whole. If we break
it down into smaller parts, there is a chance some features will not
pass. I will show you that the range pattern and regex pattern have
greater value than what you think, and they must be placed together
with the literal pattern (in Core patterns section, not in Possible
patterns section). I don't want these patterns to fail in the poll.
I don't know the line between "high level" and beyond. I hope that
whatever I discuss here is still "high level".
-
Is there any chance of
is not
orisnot
? I am tired
looking this ugly code:!($foo instanceof Bar)
. If there is,
there should be a restricted version of pattern: no variable binding.
It doesn't make sense if the binding happens but the pattern fails. -
..=
and..<
make people have to remember what the left
side is: is it greater-than or greater-than-equal. What if those
operators are replaced with=..=
and<..=
? We don't have
to align with other languages if they have weaknesses. -
There are "between" and
>
operators explicitly shown in the
proposal (in Range pattern section). I assume that>=
,<
,
<=
implicitly included. What about the%
operator, is it
also included?%
is not directly produces boolean, but it can
simplify pattern. Note that, we must include additional rules: if
produced number is0
(zero) then it will be converted to
true
, otherwisefalse
.
$foo = 2024; // leap year
$foo is %4; // true
$bar = 2025; // not a leap year
$bar is %4; // false
// furthermore:
$baz is array<%4>; // true if all its members are leap year
- There are numeric range patterns explicitly shown in the proposal.
Hopefully there are also string range patterns.
$birthdate is "2000-01-01 00:00:00" <..< "2019-12-31 23:59:59";
$dateOfDeath is <"1980-01-01";
$name is "larry " =..< "larry!"; // true if first name is "larry"
I have no idea about string range patterns for array, since it causes
error. If there is currently no solution, hopefully we don't abandon
string range patterns entirely. The same case also occurs with numeric
range patterns for array.
- It is great that regular expression native syntax is now a first
class citizen in php (at least for pattern matching). I found there
are 2 drawbacks in this case: loss of flexibility and repeated
identical patterns. These things are not found in string based regular
expression.
// native syntax
$foo is /^https:\/\/(?<hostname>[^\/]*)/;
// string based: we can use any valid char as delimiter,
// as long as it is not used in the pattern.
$pattern = "/^https:\/\/(?<hostname>[^\/]*)/";
$pattern = "|^https://(?<hostname>[^/]*)/|"; // valid!
$pattern = "@^https://(?<hostname>[^/]*)/@"; // valid!
// ----------------------
// native syntax
$foo is /^https:\/\/(?<hostname>[^\/]*)/ | StringBox{value:
/^https:\/\/(?<hostname>[^\/]*)/};
// string based: RE stored in variable/constant can be used as many as we need
$pattern = "|^https://(?<hostname>[^/]*)/|";
$foo is @RE($pattern) | StringBox{value: @RE($pattern)};
// 1st pattern is scalar string, 2nd is string encapsulated in a class
// furthermore
use GenericPattern as GP;
class Person {
public string $firstName is @RE(GP::NAME_PTRN);
public string $lastName is @RE(GP::NAME_PTRN);
// ...
}
Hopefully, string based regular expression is also supported.
Honestly, if I could choose whether I should support native or
string-based syntax, I would choose to support string-based, as long
as native syntax is not yet fully supported in general.
@RE()
is just an illustration on how to use variable as a
regular expression in the pattern. It is stated that @()
will be
used in arbitrary expressions. Correct me if I'm wrong, regexp are one
of the things in programming that cannot be manipulated or participate
in manipulation with other parties, even with fellow regexp. Regexp
must be used alone. That is why we need dedicated syntax for regexp.
- It is shown that type pattern, literal pattern, class constant
pattern, and expression pattern can be used to form a compound
pattern. Hopefully, the range pattern and regex pattern have the same
luxury. Furthermore, hopefully they can be mixed.
$foo is 2000 =..= 2100 & %4; // leap year in 21st century
class person {
public string $firstName is /LENGTH_PTRN/
& /FORBIDDEN_CHARS_PTRN/
& /FORBIDDEN_WORDS_PTRN/;
public string $lastName is @RE(GP::LENGTH_PTRN)
& @RE(GP::FORBIDDEN_CHARS_PTRN)
& @RE(GP::FORBIDDEN_WORDS_PTRN);
// these are hard to maintain
public string $firstName is /COMPLEX_NAME_PTRN/;
public string $lastName is @RE(GP::COMPLEX_NAME_PTRN);
}
- I noticed that
as
is tightly coupled with exception. Can we
suppress this exception with??
?
// it is weird to see this statement (as show in the proposal)
$value = $foo as Foo {$username, $password};
// these statements are more make sense
$foo as Foo {$username, $password};
$foo is Foo {$username, $password} ?: throw new Exception();
// if it can be suppressed, "as" is more valuable than what people think.
$newRect = $rect as Rectangle{width: <=10, height: <=5} ?? new
Rectangle(width: 10, height: 5);
- Is there any type checking for object property pattern?
class Circle {
public int $radius;
}
// this statement will always fails unless there is type checking
$circle is Circle{radius: "10"};
4 of the 8 points I discussed above are related to range pattern and
regex pattern. Both are used daily. From my point of view, literal
pattern is no more special than range pattern and regex pattern. IMO,
both should be placed in Core patterns section, not in Possible
patterns section.
Regards, Hendra Gunawan.
To that end, we're looking for very high level feedback on this RFC:
Hi folks. Thank you to those who have offered feedback so far. Based on the discussion, here's what we're thinking of doing (still subject to change, of course):
-
We're going to move
as
to future-scope. There's enough weirdness around it that is independent of pattern matching itself that it will likely require its own discussion and RFC, and may or may not involve full pattern support. -
Similarly, we're going to hold off on the weak-mode flag. It sounds like the language needs to do more to fix the definition of "weak mode" before it's really viable. :-( On the plus side, if the type system itself ever adds support for a "coercion permitted" flag, patterns should inherit that naturally, I think.
-
Array-application will also be pushed to future-scope. Again, there's enough type-system tie in here that is tangential to patterns that we'll pick that fight later.
-
Ilija and I have discussed regex patterns a bit further, and it sounds like they’re going to be rather complicated to implement. Even assuming we agree on the syntax for it, it would be a substantial amount of code to support. (It’s not like types or literals or range where we can just drop something pre-existing into a new function.) So we’re going to hold off on this one for now, though it does seem like a high-priority follow-up for the future. (Which doesn’t have to be us!)
So let's not discuss the above items further at this point.
-
I'm going to do some additional research into other languages to see how they handle binding vs using variables from scope, and what syntax markers they use and where. Once we have a better sense of what is out there and is known to work, we can make a more informed plan for what we should do in PHP. (Whether using a variable from scope in the pattern is part of the initial RFC is still an open question, but we do need to design it alongside the capture part to ensure they don't conflict.) Stay tuned on this front.
-
We've removed the dedicated wildcard pattern, as it's equivalent to
mixed
. If there's interest, we're open to having a secondary vote to bring it back as a short-hand pattern. It's trivial to implement and we don't have strong feelings either way. -
There's not been much discussion of range patterns. Anyone want to weigh in on those?
-
The placement of
is
onmatch()
is still an open question. -
No one has really weighed in on nested patterns for captured variables. Any thoughts there?
-
I’ve seen a suggestion for capturing the “rest” of an array when using … That’s an interesting idea, and we’ll explore it, though it looks like it may have some interesting implications that push it to future scope. It feels like a nice-to-have.
Thanks all.
--Larry Garfield
- There's not been much discussion of range patterns. Anyone want to weigh in on those?
I didn't even notice them until someone else mentioned some detail of the syntax.
Like regex patterns and the generic-like array syntax, they look really useful, but also like something that should be standardised across the language, and not just snuck in to patterns.
I don't mean we have to implement "foreach(1..10 as $b)" at the same time as "$a is 1..10", necessarily, but we're going to hate ourselves if we rush in a range syntax for patterns then realise we can't reuse it elsewhere for some reason.
My personal preference would be to keep the first RFC as focused as possible, so that we have time to discuss everything in it. Additional patterns could be an immediate follow-up RFC within the same release cycle.
So I think that means the first RFC covering:
- The concept of patterns
- The "is" keyword
- match integration
- Variable binding
- Variables in expressions (as you say, we need to at least discuss the syntax for these at the same time as variable binding)
- Type patterns
- Literal patterns
- Class constant patterns
- Compound patterns (unions and intersections)
- Array destructuring-like patterns
- Object property patterns (these feel essential, but notably are the only thing on this list not based on an existing syntax)
.
That's still a big list of things to discuss.
Once we have the concept of patterns in the language, there's plenty of scope for adding more of them, and more places to use them, but let's not be too hasty in throwing every shiny feature in at once.
- No one has really weighed in on nested patterns for captured variables. Any thoughts there?
They feel almost-essential to me, because it feels odd to have to choose to either specify an element or capture it, as in "$foo is [int, string]" vs "$foo is [$a, $b]". On the other hand, in their full glory they allow for some extremely complex patterns, and that might put some people off.
As with variables in patterns, we probably need to at least consider how to write them before committing to the overall syntax. e.g. my earlier suggestion of "$foo=" for binding fits very naturally with "$foo=int"; but if we had ">>$foo" it would feel more natural to write "int>>$foo", and would that order be better or worse?
Regards,
Rowan Tommins
[IMSoP]
To that end, we're looking for very high level feedback on this RFC:
Hi folks. Thank you to those who have offered feedback so far. Based on the discussion, here's what we're thinking of doing (still subject to change, of course):
We're going to move
as
to future-scope. There's enough weirdness around it that is independent of pattern matching itself that it will likely require its own discussion and RFC, and may or may not involve full pattern support.Similarly, we're going to hold off on the weak-mode flag. It sounds like the language needs to do more to fix the definition of "weak mode" before it's really viable. :-( On the plus side, if the type system itself ever adds support for a "coercion permitted" flag, patterns should inherit that naturally, I think.
Array-application will also be pushed to future-scope. Again, there's enough type-system tie in here that is tangential to patterns that we'll pick that fight later.
Ilija and I have discussed regex patterns a bit further, and it sounds like they’re going to be rather complicated to implement. Even assuming we agree on the syntax for it, it would be a substantial amount of code to support. (It’s not like types or literals or range where we can just drop something pre-existing into a new function.) So we’re going to hold off on this one for now, though it does seem like a high-priority follow-up for the future. (Which doesn’t have to be us!)
So let's not discuss the above items further at this point.
I'm going to do some additional research into other languages to see how they handle binding vs using variables from scope, and what syntax markers they use and where. Once we have a better sense of what is out there and is known to work, we can make a more informed plan for what we should do in PHP. (Whether using a variable from scope in the pattern is part of the initial RFC is still an open question, but we do need to design it alongside the capture part to ensure they don't conflict.) Stay tuned on this front.
We've removed the dedicated wildcard pattern, as it's equivalent to
mixed
. If there's interest, we're open to having a secondary vote to bring it back as a short-hand pattern. It's trivial to implement and we don't have strong feelings either way.There's not been much discussion of range patterns. Anyone want to weigh in on those?
The placement of
is
onmatch()
is still an open question.No one has really weighed in on nested patterns for captured variables. Any thoughts there?
I’ve seen a suggestion for capturing the “rest” of an array when using … That’s an interesting idea, and we’ll explore it, though it looks like it may have some interesting implications that push it to future scope. It feels like a nice-to-have.
Thanks all.
--Larry Garfield
This morning, while thinking about "new Pattern($pattern)," it
occurred to me: why not create an OOP extension of patterns? It
wouldn't need an RFC or language changes and would allow the
development of patterns to the point where adding it to the core
language may be a no-brainer. You (and other users of it) would have a
pretty good idea of which patterns are actually important and useful
due to actually using them.
It's probably the "long way around" but at that point, you'd basically
just be pulling in the extension and discussing syntax as the behavior
would be well-defined and battle-tested.
Robert Landers
Software Engineer
Utrecht NL
On Tue, Jun 25, 2024 at 9:44 AM Robert Landers landers.robert@gmail.com
wrote:
On Tue, Jun 25, 2024 at 2:48 AM Larry Garfield larry@garfieldtech.com
wrote:To that end, we're looking for very high level feedback on this RFC:
Hi folks. Thank you to those who have offered feedback so far. Based
on the discussion, here's what we're thinking of doing (still subject to
change, of course):
We're going to move
as
to future-scope. There's enough weirdness
around it that is independent of pattern matching itself that it will
likely require its own discussion and RFC, and may or may not involve full
pattern support.Similarly, we're going to hold off on the weak-mode flag. It sounds
like the language needs to do more to fix the definition of "weak mode"
before it's really viable. :-( On the plus side, if the type system itself
ever adds support for a "coercion permitted" flag, patterns should inherit
that naturally, I think.Array-application will also be pushed to future-scope. Again, there's
enough type-system tie in here that is tangential to patterns that we'll
pick that fight later.Ilija and I have discussed regex patterns a bit further, and it sounds
like they’re going to be rather complicated to implement. Even assuming we
agree on the syntax for it, it would be a substantial amount of code to
support. (It’s not like types or literals or range where we can just drop
something pre-existing into a new function.) So we’re going to hold off on
this one for now, though it does seem like a high-priority follow-up for
the future. (Which doesn’t have to be us!)So let's not discuss the above items further at this point.
I'm going to do some additional research into other languages to see
how they handle binding vs using variables from scope, and what syntax
markers they use and where. Once we have a better sense of what is out
there and is known to work, we can make a more informed plan for what we
should do in PHP. (Whether using a variable from scope in the pattern is
part of the initial RFC is still an open question, but we do need to design
it alongside the capture part to ensure they don't conflict.) Stay tuned
on this front.We've removed the dedicated wildcard pattern, as it's equivalent to
mixed
. If there's interest, we're open to having a secondary vote to
bring it back as a short-hand pattern. It's trivial to implement and we
don't have strong feelings either way.There's not been much discussion of range patterns. Anyone want to
weigh in on those?The placement of
is
onmatch()
is still an open question.No one has really weighed in on nested patterns for captured
variables. Any thoughts there?I’ve seen a suggestion for capturing the “rest” of an array when using
… That’s an interesting idea, and we’ll explore it, though it looks like
it may have some interesting implications that push it to future scope. It
feels like a nice-to-have.Thanks all.
--Larry Garfield
This morning, while thinking about "new Pattern($pattern)," it
occurred to me: why not create an OOP extension of patterns? It
wouldn't need an RFC or language changes and would allow the
development of patterns to the point where adding it to the core
language may be a no-brainer. You (and other users of it) would have a
pretty good idea of which patterns are actually important and useful
due to actually using them.It's probably the "long way around" but at that point, you'd basically
just be pulling in the extension and discussing syntax as the behavior
would be well-defined and battle-tested.Robert Landers
Software Engineer
Utrecht NL
Are you referring to "new Pattern($pattern)" specifically for regexes
(similar to JS), or pattern matching in general? Is this also pointing at
putting the behavior of pattern matching in an object that effectively
executes the pseudo code from the examples, with the actual pattern
matching being a syntactic sugar to be added at a later stage?
- The placement of
is
onmatch()
is still an open question.
"The latter is more explicit, and would allow individual arms to be
pattern matched or not depending on the presence of is."
Which would make it probably more useful, so I would be in favour of
having each arm have the "is".
- No one has really weighed in on nested patterns for captured
variables. Any thoughts there?
In your example, you have:
if ($foo is Foo{a: @($someA), $b is Point(x: 5, y: @($someY)) }) {
And my brain can't parse that. I would have no idea what that means at
first sight.
In addition to just normal captured, variables:
"For object patterns (only), if the variable name to extract to is the
same as the name of the property, then the property name may be
omitted."
This is one I don't like. It adds another syntax, and IMO also makes
something less obvious what is happening. Too much magic™.
So please:
if ($p is Point {z: $z, x: 3, y: $y} ) {
And not:
if ($p is Point {$z, x: 3, $y} ) {
cheers,
Derick
“Array-application will also be pushed to future-scope”
Dang, this is what I was looking forward to the most. Helping set some precedent on this issue, and this is a solid start. I’ve been thinking about what it would look like for strictly typed arrays in PHP for a while now. I think a custom implantation like SqlObjectStorage or SplFixedArray could help performance since we now have a fixed list and no need for a hash table, and it may solve some complexity in implementation. This is slightly off-topic, but if anyone is interested in working on a typed arrays initiative, please contact me directly.
That said, I agree with Robert Landers, who wrote the following:
There's no need to use ?
to check for existence on a key, so this:
$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, …];
Chuck Adams cleared that up with:
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways.
I think there is a need to ensure a key does not exist even if we’re not happy about this syntax, but if not having $arr is ['a' => string, 'b' => ?string, …]
would still make me very happy.
Best,
Richard Miles
To that end, we're looking for very high level feedback on this RFC:
Hi folks. Thank you to those who have offered feedback so far. Based on the discussion, here's what we're thinking of doing (still subject to change, of course):
We're going to move
as
to future-scope. There's enough weirdness around it that is independent of pattern matching itself that it will likely require its own discussion and RFC, and may or may not involve full pattern support.Similarly, we're going to hold off on the weak-mode flag. It sounds like the language needs to do more to fix the definition of "weak mode" before it's really viable. :-( On the plus side, if the type system itself ever adds support for a "coercion permitted" flag, patterns should inherit that naturally, I think.
Array-application will also be pushed to future-scope. Again, there's enough type-system tie in here that is tangential to patterns that we'll pick that fight later.
Ilija and I have discussed regex patterns a bit further, and it sounds like they’re going to be rather complicated to implement. Even assuming we agree on the syntax for it, it would be a substantial amount of code to support. (It’s not like types or literals or range where we can just drop something pre-existing into a new function.) So we’re going to hold off on this one for now, though it does seem like a high-priority follow-up for the future. (Which doesn’t have to be us!)
So let's not discuss the above items further at this point.
I'm going to do some additional research into other languages to see how they handle binding vs using variables from scope, and what syntax markers they use and where. Once we have a better sense of what is out there and is known to work, we can make a more informed plan for what we should do in PHP. (Whether using a variable from scope in the pattern is part of the initial RFC is still an open question, but we do need to design it alongside the capture part to ensure they don't conflict.) Stay tuned on this front.
We've removed the dedicated wildcard pattern, as it's equivalent to
mixed
. If there's interest, we're open to having a secondary vote to bring it back as a short-hand pattern. It's trivial to implement and we don't have strong feelings either way.There's not been much discussion of range patterns. Anyone want to weigh in on those?
The placement of
is
onmatch()
is still an open question.No one has really weighed in on nested patterns for captured variables. Any thoughts there?
I’ve seen a suggestion for capturing the “rest” of an array when using … That’s an interesting idea, and we’ll explore it, though it looks like it may have some interesting implications that push it to future scope. It feels like a nice-to-have.
Thanks all.
--Larry Garfield
After a coffee break, I think this is how the language could do this in a semantically pleasing way.
interface iArrayA ['a' => string ]
interface iArrayB extends iArrayA ['b' => string ]
$arr is iArrayA &| iArrayB
Best,
Richard Miles
“Array-application will also be pushed to future-scope”
Dang, this is what I was looking forward to the most. Helping set some precedent on this issue, and this is a solid start. I’ve been thinking about what it would look like for strictly typed arrays in PHP for a while now. I think a custom implantation like SqlObjectStorage or SplFixedArray could help performance since we now have a fixed list and no need for a hash table, and it may solve some complexity in implementation. This is slightly off-topic, but if anyone is interested in working on a typed arrays initiative, please contact me directly.
That said, I agree with Robert Landers, who wrote the following:
There's no need to use
?
to check for existence on a key, so this:$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, …];
Chuck Adams cleared that up with:
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways.
I think there is a need to ensure a key does not exist even if we’re not happy about this syntax, but if not having
$arr is ['a' => string, 'b' => ?string, …]
would still make me very happy.Best,
Richard Miles
To that end, we're looking for very high level feedback on this RFC:
Hi folks. Thank you to those who have offered feedback so far. Based on the discussion, here's what we're thinking of doing (still subject to change, of course):
We're going to move
as
to future-scope. There's enough weirdness around it that is independent of pattern matching itself that it will likely require its own discussion and RFC, and may or may not involve full pattern support.Similarly, we're going to hold off on the weak-mode flag. It sounds like the language needs to do more to fix the definition of "weak mode" before it's really viable. :-( On the plus side, if the type system itself ever adds support for a "coercion permitted" flag, patterns should inherit that naturally, I think.
Array-application will also be pushed to future-scope. Again, there's enough type-system tie in here that is tangential to patterns that we'll pick that fight later.
Ilija and I have discussed regex patterns a bit further, and it sounds like they’re going to be rather complicated to implement. Even assuming we agree on the syntax for it, it would be a substantial amount of code to support. (It’s not like types or literals or range where we can just drop something pre-existing into a new function.) So we’re going to hold off on this one for now, though it does seem like a high-priority follow-up for the future. (Which doesn’t have to be us!)
So let's not discuss the above items further at this point.
I'm going to do some additional research into other languages to see how they handle binding vs using variables from scope, and what syntax markers they use and where. Once we have a better sense of what is out there and is known to work, we can make a more informed plan for what we should do in PHP. (Whether using a variable from scope in the pattern is part of the initial RFC is still an open question, but we do need to design it alongside the capture part to ensure they don't conflict.) Stay tuned on this front.
We've removed the dedicated wildcard pattern, as it's equivalent to
mixed
. If there's interest, we're open to having a secondary vote to bring it back as a short-hand pattern. It's trivial to implement and we don't have strong feelings either way.There's not been much discussion of range patterns. Anyone want to weigh in on those?
The placement of
is
onmatch()
is still an open question.No one has really weighed in on nested patterns for captured variables. Any thoughts there?
I’ve seen a suggestion for capturing the “rest” of an array when using … That’s an interesting idea, and we’ll explore it, though it looks like it may have some interesting implications that push it to future scope. It feels like a nice-to-have.
Thanks all.
--Larry Garfield
“Array-application will also be pushed to future-scope”
Dang, this is what I was looking forward to the most. Helping set some precedent on this issue, and this is a solid start. I’ve been thinking about what it would look like for strictly typed arrays in PHP for a while now. I think a custom implantation like SqlObjectStorage or SplFixedArray could help performance since we now have a fixed list and no need for a hash table, and it may solve some complexity in implementation. This is slightly off-topic, but if anyone is interested in working on a typed arrays initiative, please contact me directly.
Just a good ole' circular buffer would be nice for so many things (fiber microwork queues, etc) :)
That being said, you can abuse arrays to get better (or at least the same) performance of SPL for certain data structures (e.g., priority queues/heaps: https://withinboredom.info/2022/09/04/algorithms-in-php-priority-queues-and-heaps/ -- the listing isn't complete or fully working, FWIW). Those little arrays have much power, and I love them for some tasks.
That said, I agree with Robert Landers, who wrote the following:
There's no need to use
?
to check for existence on a key, so this:$arr is ['a' => string, ?'b' => string, ...];
should be this:
$arr is ['a' => string, 'b' => ?string, …];
Chuck Adams cleared that up with:
The first means b is an optional key, but if it’s there, can only be a string. The second says b is a required key, but it may be a string or null. If there were a binding involved, that determines the type of the binding in incompatible ways.
I think there is a need to ensure a key does not exist even if we’re not happy about this syntax, but if not having
$arr is ['a' => string, 'b' => ?string, …]
would still make me very happy.
The only time I can't think of this being important (whether or not a key exists) is during serialization/deserialization, where you may want to leave something its default value. In that case, you likely won't know what the keys are supposed to be in the first place (thus unable to use them in something like "is" or "as"). In the case that you did, this would be perfectly reasonable:
$keys = array_keys($arr) is [ "a", ?"b", ?"c"];
It's still weird-looking, but it tells you that the keys may or may not exist, and it is consistent with itself. However, what if the array is in a different order or missing keys (e.g., ["c", "a"])? You can see how this gets complicated pretty quickly. If I am getting hung up on $arr['non-existent-key'] technically being null, you can imagine how someone else -- maybe even me if I thought about it long enough -- would argue that ["a", ?"b", ?"c"] may or may not match ["a", "c"] as the arguments for either way are pretty strong and the usecases for either way being just as valuable. It's a good call to punt it for deeper discussion later.
Best,
Richard Miles
Please remember to bottom-post ;)
— Rob
Le jeudi 20 juin 2024, 19:38:40 UTC+2 Larry Garfield a écrit :
To that end, we're looking for very high level feedback on this RFC:
Hello,
Thank you for this RFC.
Sorry if that’s a bit focused on syntax, but I’m really concerned by the binding syntax.
I would totally expect both these codes to do the same thing:
if ($o is Class{prop: 3}) {
and:
$x = 3;
if ($o is Class{prop: $x}) {
I’ve seen someone else in the discussion propose to invert the logic between binding and use a variable and I agree, it’s the binding/capture which should have the special syntax.
Either another operator than ':', I’ve seen '=>' proposed. Or any other idea to differentiate it.
'=>' would allow to combine match and capture like this:
$o is Class{prop: int => $y}
Côme