[VOTE] match expression

5 years ago by someniatko@gmail.com — view source

unread

Hi!

I am quite new to the internals list, but wanted to say a word, to
maybe lit some light on the discussion, for you to be able to look at
it from slightly another angle.

I have noticed this whole thread arguing whether it is required or not
to support match keyword in an if-else-ish control-flow manner: in
other words, support not only expressions in the match branches, but
also arbitrary statements.

As we know, each PHP expression can be used as a statement. We also
know that large subset of practically used statements (function call,
variable assignment) can be implemented in expression form. This
subset also tends to enlarge: recently throwing an exception became
expressible in expression form.

This means we can write code like this:

match($state) {
    State::VALID => processFurther(),
    State::PENDING => $isPending = true,
    State::INVALID => throw new \RuntimeException(),
};

IMO this covers vast majority of use-cases of the PHP statements.

The cases which are not covered are:

statements which cannot be expressed as expressions: echo, unset.
control flow which requires usage of more than one statement, e.g.
assigning two variables, or calling two functions with void return
type one after another one etc.

I would like to emphasize that this problem is the same as short
closures problem, and should be treated at the same time. Problem no.
1 could be solved by allowing statements like unset to be used in
expression form, if it's feasible. Problem no. 2 could be addressed by
allowing "complex" expressions consisting of, potentially, few
statements, language-wide, solving the issue both for short closures
and for match, if it is really needed though. Anyway it is possible
to use in-place old-school closure as a temporary workaround.

I think that sums it up nicely. Let's also remember that these are
popular, well maintained repositories and probably don't reflect the
average code quality very well.

IMO a good language should enforce better quality. Especially when we
talk about new language constructs. Short closures already enforce
SRP, at least in closures world. Also, those people who would benefit
from less boilerplate which match in expression-only form provides,
are usually the people who care about code quality. Those who you
mention as "average code quality" would not bother with new syntax at
all and would still prefer old-school switch to do whatever level of
entanglement they had used to. If, though, you insist on match to
completely replace old switch, as well as short closures to replace
traditional ones, you may want to consider to solve "complex
expression" problem once for both of them, and move this out of scope
of this RFC. This will allow for gradual improvement of the language,
from which both parties will benefit: those who want to write clean
code (which is usually split up in functions such that match can be
used straight away with only one expression per arm), and those who
use complex logic in switches - will be able to do the same in match,
but later, when "complex expressions" are implemented.

someniatko

5 years ago by someniatko@gmail.com — view source

unread

meh, sorry, i top-posted by incident. I did not have the e-mails to
respond to (because i have just subscribed to the ML), therefore i
thought if i post with the same subject, my mail will add up to the
same thread.

5 years ago by Ilija Tovilo — view source

unread

Hi someniatko

I think you have a firm grasp of the key issues but I don't agree with
your conclusion.

Problem no. 2 could be addressed by
allowing "complex" expressions consisting of, potentially, few
statements, language-wide, solving the issue both for short closures
and for match

I have analysed this approach a while back and I don't think there is
a universal and elegant solution. I have mentioned this in the list
but never fully explained it.

Upfront, when I say "block expression" I'm talking about a block {}
that contains any number of statements with some terminating
expression that will be returned.

$x = {
foo();
bar();
<= baz();
};

// Result of baz() is now assigned to $x

There are three potential use cases for language wide block expressions.

Match expressions
Arrow functions
Everything else

The problem is that they all have slightly different semantics.

A block expression in a match arm would require a return value
depending if the outer match return value is used

match ($x) {
1 => {}, // Doesn't require a return value
}

$y = match ($x) {
1 => {}, // Error, this does require a return value
};

Arrow functions would only require a return value if the function
has an explicit return type (to be consistent with functions and
normal closures)

$x = fn() => {}; // This is fine, the function returns null
$x = fn(): ?int => {}; // Uncaught TypeError: Return value of
{closure}() must be of the type int or null, none returned

It's very questionable whether we even want to allow block-style
return values in arrow functions.

$x = fn() => {
foo();
bar();
<= baz(); // Why should we allow this? You can just use return
};

For every other expression the return value of the block would
always be required

// All of these are errors, return value is required
$x = {};
foo({});
{} + 1;
// etc.

It's also highly questionable whether use case 3 is actually very
useful at all because PHP doesn't have block scoping and all the inner
variables will leak out into the outer scope. The only potential
improvement here is readability.

$this->foo = {
$bar = new Bar();
$foo = new Foo();
$foo->bar = $bar;
<= $foo; // Or whatever block syntax
};

// $bar still exists here but it's a little
// more obvious it shouldn't be used anymore

No matter if statement blocks become a language wide feature or not,
we won't get around handling these cases slightly differently.

An additional complication is that blocks already exist as "statement
list" statements:
https://github.com/php/php-src/blob/php-7.4.5/Zend/zend_language_parser.y#L427

function foo() {
{
// This is a "statement list" statement
}

if (true) {
    // This is also a "statement list" statement
}

}

Whether we'd also (a) convert these to block expressions or (b) keep
"statement list" statements and block expressions separate is unclear.
If we do convert "statement list" statements to block expressions
we'll have to make the semicolon of a statement level block expression
optional to avoid BC breaks. I received a lot of criticism for the
same thing in this RFC. If we don't convert "statement list"
statements to block expressions empty blocks won't become valid syntax
in match arms and we'll have to explicitly allow them in the grammar
(which is what this RFC is doing right now).

(a)

{
// Blocks are expressions now, expressions at a statement level
require a semicolon.
// The semicolon must stay optional or we'll have a BC break.
}

(b)

{
// This is still a statement
}

match ($x) {
1 => {
// Also a statement which means it can't be used here unless
we explicitly allow expressions AND "statement list" statements
},
}

To summarize, blocks are only really useful in match arms and arrow
functions and behave differently even in just these two cases. While I
wouldn't mind language wide blocks it isn't the universally elegant
solution people make it up to be.

This is the best explanation I could give. Let me know if it's still
not completely clear.

IMO this covers vast majority of use-cases of the PHP statements.

Not really, if you look at Nikita's analysis of 50 random switch
statements. It only covers ~40%.

IMO a good language should enforce better quality.

The main issue I have with this is that good code quality doesn't look
the same to everybody. Even programmers with decades of experience
will disagree on fundamentals. I'd be very hesitant to say that moving
two lines into a function is an improvement, especially if it's only
called once. The only thing it does is disrupt the reading flow.

match ($x) {
1 => {
foo();
bar();
},
}

vs.

match ($x) {
1 => fooAndBar(),
};

function fooAndBar() {
foo();
bar();
}

I don't think using a closure for all of these cases is a viable solution.

Also, those people who would benefit
from less boilerplate which match in expression-only form provides,
are usually the people who care about code quality.

It's not that black and white. I work in a lot of legacy projects that
could benefit from match expressions but it's simply not realistic to
refactor every single switch statement that contains more than a one
liner. Also, I don't always care about code quality the same. If I
write a throwaway script I wouldn't care if the match arm contains 20
lines but the safety of the match would be useful nonetheless.

Ilija

5 years ago by Rowan Tommins — view source

unread

There are three potential use cases for language wide block expressions.

Match expressions

Arrow functions

Everything else

The problem is that they all have slightly different semantics.
[...]

I don't think that's actually true. If I'm understanding you right, you're
concerned about two things:

Blocks which don't have a return value where one is expected / required.
Blocks which do have a return value where one is not expected.

The language already has an established convention for both cases: a
function with no return statement evaluates to NULL in expression context,
and a function with a return value can be used in statement context and the
result discarded. I see no immediate reason block expressions couldn't use
the same rule.

$y = match ($x) {
1 => {}, // Error, this does require a return value
};

This could evaluate the block to null, and thus be equivalent to:

$y = match ($x) {
1 => null,
};

$x = fn() => {}; // This is fine, the function returns null

$x = fn(): ?int => {}; // Uncaught TypeError: Return value of
{closure}() must be of the type int or null, none returned

I had no idea that was an error; I guess it's the counterpart to ": void" -
a style check rather than an actual return type check. But I don't see a
particular problem with a short closure giving the same error as the
equivalent named function (function foo(): ?int {}) so there doesn't seem
to be anything extra to define here.

$x = fn() => {
foo();
bar();
<= baz(); // Why should we allow this? You can just use return
};

Because right now, you can't use return; there are no block bodied short
closures. If we did allow "return" here, there's no fundamental reason
not to also allow it in a match expression, meaning "return this as the
value of the match expression" (we might not want to reuse the keyword,
but we could).

// All of these are errors, return value is required
$x = {};
foo({});
{} + 1;
// etc.

They would be evaluated as empty statements, and "return" null:
$x = null;
foo(null);
null + 1;

It's also highly questionable whether use case 3 is actually very
useful at all because PHP doesn't have block scoping and all the inner
variables will leak out into the outer scope.

[...]

An additional complication is that blocks already exist as "statement
list" statements

We could potentially solve both of these by introducing a new syntax which
made something explicitly a block expression. I'm not sure what the keyword
would be; "do" is already used, and "eval" has bad connotations, so I'll
use "block" as a straw man to demonstrate.

// block expression as RHS of assignment
$this->foo = block {
$bar = new Bar();
$foo = new Foo();
$foo->bar = $bar;
return $foo;
};
// $this->foo has been assigned, $bar and $foo are no longer in scope

// block expression as arm of match expression
$y = match ($x) {
1 => block {
foo();
return bar();
},
}
// if $x===1, foo() is executed, then $y gets the result of bar()

// block expression as result of short closure
$f = fn($x) => block { foo($x); bar($x); };
$f();

// even if the expression result isn't used, the scoping could apply
if ( foo() ) block {
$x = 1;
};
// $x is not defined here
// note trailing semi-colon, for the same reason you need one after a
standard anonymous function definition
// the above is actually equivalent to this:
if ( foo() ) {
block {
$x = 1;
};
}

I don't know if I like this idea, but it would be a consistent
language-wide implementation of the concept with minimal compatibility
impact.

It's not that black and white. I work in a lot of legacy projects that

could benefit from match expressions but it's simply not realistic to
refactor every single switch statement that contains more than a one
liner.

To use Larry's codenames, would those specifically benefit from "rustmatch"
(evaluating the switch to an expression) or from "switchng" (a stricter
switch statement)? I'd be interested to see a real-life example where you'd
want both the match to evaluate to a value, and the arms to contain more
than one statement.

Regards,

Rowan Tommins
[IMSoP]