Hi Internals,
I struggle to understand the benefit of "basic" enumerations and their
diminished API. In particular, I often find myself wanting to use
|from()/||tryFrom()| to convert a string to an enumeration. To do this,
I must convert it to a "backed" enum and copy & paste each name to its
value. In all other regards, I still want it to behave like a "basic"
enumeration, so I won't abuse the value of the names to act like a
mapping; the values will always mirror the names, and if I need to do
any mappings, I'll add match() functions.
My question, then, is why can't basic enumerations have these semantics
by default? Or, to state it more concretely, what would be the downside
to having all "basic" enumerations actually being "backed" enumerations
whose values implicitly mirror their names for the purposes of
converting to/from strings? Would this not make basic enumeration more
useful without any particular downsides?
Kind regards,
Bilge
Hi Internals,
I struggle to understand the benefit of "basic" enumerations and their
diminished API. In particular, I often find myself wanting to use
from()/``tryFrom()
to convert a string to an enumeration. To do this,
I must convert it to a "backed" enum and copy & paste each name to its
value. In all other regards, I still want it to behave like a "basic"
enumeration, so I won't abuse the value of the names to act like a
mapping; the values will always mirror the names, and if I need to do
any mappings, I'll add match() functions.My question, then, is why can't basic enumerations have these semantics
by default? Or, to state it more concretely, what would be the downside
to having all "basic" enumerations actually being "backed" enumerations
whose values implicitly mirror their names for the purposes of
converting to/from strings? Would this not make basic enumeration more
useful without any particular downsides?Kind regards,
Bilge
Making enums not be "fancy strings" was a very deliberate decision. The RFC covers that some. There's more information in our comparison research here:
https://github.com/Crell/enum-comparison
And I wrote an article about enum usage a while back here:
https://peakd.com/hive-168588/@crell/on-the-use-of-enums
--Larry Garfield
Hi Internals,
I struggle to understand the benefit of "basic" enumerations and their
diminished API. In particular, I often find myself wanting to use
from()/``tryFrom()
to convert a string to an enumeration. To do this,
I must convert it to a "backed" enum and copy & paste each name to its
value. In all other regards, I still want it to behave like a "basic"
enumeration, so I won't abuse the value of the names to act like a
mapping; the values will always mirror the names, and if I need to do
any mappings, I'll add match() functions.My question, then, is why can't basic enumerations have these semantics
by default? Or, to state it more concretely, what would be the downside
to having all "basic" enumerations actually being "backed" enumerations
whose values implicitly mirror their names for the purposes of
converting to/from strings? Would this not make basic enumeration more
useful without any particular downsides?Kind regards,
Bilge
Making enums not be "fancy strings" was a very deliberate decision. The RFC covers that some. There's more information in our comparison research here:https://github.com/Crell/enum-comparison
And I wrote an article about enum usage a while back here:
https://peakd.com/hive-168588/@crell/on-the-use-of-enums
--Larry Garfield
Hi Larry,
Thanks for the resources! Whilst I can appreciate this was a deliberate
design decision that did not come about by accident, I still didn't find
(skimming) anything that directly answers the question:
What would be the downside to having all "basic" enumerations actually
being implicitly "backed" enumerations?
I gather from your (presumably derogatory) referencing of the same as
"fancy strings" that you would not approve such an implementation, but I
am struggling to understand why.
Cheers,
Bilge
P.S. Sorry for the (previously) incomplete subject line.
What would be the downside to having all "basic" enumerations actually
being implicitly "backed" enumerations?
I'm not Larry but I see a lot of value in knowing that a unit value was explicitly constructed in code, and where the value cannot originate from a conversion of some input. It is particularly useful when refactoring in a large codebase to know that every way SomeEnum::A can be constructed will contain an explicit reference to that case in code.
The uses you suggest are not without merit, of course! And that's why I'm also grateful for backed enums as an option where they make sense (e.g. for values which are ultimately stored in a database).
mjec
Hi Internals,
I struggle to understand the benefit of "basic" enumerations and their
diminished API. In particular, I often find myself wanting to use
from()/``tryFrom()
to convert a string to an enumeration. To do this,
I must convert it to a "backed" enum and copy & paste each name to its
value. In all other regards, I still want it to behave like a "basic"
enumeration, so I won't abuse the value of the names to act like a
mapping; the values will always mirror the names, and if I need to do
any mappings, I'll add match() functions.My question, then, is why can't basic enumerations have these semantics
by default? Or, to state it more concretely, what would be the downside
to having all "basic" enumerations actually being "backed" enumerations
whose values implicitly mirror their names for the purposes of
converting to/from strings? Would this not make basic enumeration more
useful without any particular downsides?Kind regards,
Bilge
Making enums not be "fancy strings" was a very deliberate decision. The RFC covers that some. There's more information in our comparison research here:https://github.com/Crell/enum-comparison
And I wrote an article about enum usage a while back here:
https://peakd.com/hive-168588/@crell/on-the-use-of-enums
--Larry Garfield
Hi Larry,
Thanks for the resources! Whilst I can appreciate this was a deliberate
design decision that did not come about by accident, I still didn't find
(skimming) anything that directly answers the question:What would be the downside to having all "basic" enumerations actually
being implicitly "backed" enumerations?I gather from your (presumably derogatory) referencing of the same as
"fancy strings" that you would not approve such an implementation, but I
am struggling to understand why.Cheers,
BilgeP.S. Sorry for the (previously) incomplete subject line.
"Fancy strings" isn't derogatory. It's an accurate description of how enums work in some languages. (Although usually they're "fancy ints", rather than strings.) The model PHP went with is "fancy objects" (also used by most other languages at this point).
From the article I linked:
Enumerations take that a step further by allowing you to define an entirely new type space of values. Just as a Request object is not an integer, neither is a Direction enumeration with values Up and Down an integer. If a function takes an integer as a parameter, and you try to pass it a Request, both the code and the developer are going to just look at you funny. The exact same concept applies if you try to pass a Direction enum to an integer parameter.
And also see the "These things are not the same" section.
In your case, you want to "upcast" a string to an enum. That means you're doing some sort of deserialization, presumably. In that case, a backed enum is what you want. A unit enum isn't serializable, by design.
Now, it's true we didn't try to come up with all the possible shorthands that might make sense. I could see an argument for auto-populating the backing value off the enum name if it's not specified, something like this:
enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
We left that out of the original RFC to keep things simple and avoid (well, reduce) bikeshedding side quests. However, I think it's a fair discussion to have on whether it makes sense to do that, or similar. I'm not sure if I'd support it myself at the moment (there would also be questions about what happens with an int-backed enum), but I think it's a reasonable discussion to have.
--Larry Garfield
I could see an argument for auto-populating the backing value off the
enum name if it's not specified, something like this:
enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
This seems like a reasonable compromise. In this case, all I need to do
is change my enum to a backed enum (suffix : string
) and I get the
benefits of implicit values. I still like the idea of the same being
possible for non-backed enums, though I imagine that is a product of my
naïveté, as I do not tend to think of things in the framing of
(de)serialization.
I'm not sure if I'd support it myself at the moment
Noted, but I once again find myself needing to ask: why not? Were it up
to me, I'd say let's start right now! :)
Aside, I am not at all concerned with integer-backed enums at this
juncture, and presume that could be a separate discussion/implementation
anyway.
Cheers,
Bilge
On 22/05/2024 00:31, Larry Garfield wrote:
I could see an argument for auto-populating the backing value off the enum name if it's not specified, something like this: enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
This seems like a reasonable compromise. In this case, all I need to do
is change my enum to a backed enum (suffix: string
) and I get the
benefits of implicit values. I still like the idea of the same being
possible for non-backed enums, though I imagine that is a product of my
naïveté, as I do not tend to think of things in the framing of
(de)serialization.
Making it really hard and unnatural for people to use PHP enums as "cheap named strings/ints" is something we want to keep, because the whole value of enums is that they are their own separate type. A Direction is not a string, it's not an int, it's not an array, it's not a Product, it's a Direction. That's the end of it. Databases and HTML don't know from Direction or Product, though, so it has to get serialized down to something for those. For arbitrary objects, you basically have to write your own mechanism. (I did.) For enums, one comes built-in to make it more standardized: backed enums.
I'm not sure if I'd support it myself at the moment
Noted, but I once again find myself needing to ask: why not? Were it up
to me, I'd say let's start right now! :)
Mainly because I haven't thought it through to see what possible issues it could cause. It may be safe, or there may be currently-not-obvious issues that would result. That's the sort of thing we'd need to explore.
For instance, how useful would it be, given that the casing for an enum should be CamelCase (per PER-CS), but the serialized string is most often snake_case, and otherwise lowerCamel? We definitely cannot bake case folding magic into the behavior. So is it even useful at that point? I don't know. Maybe. That's what needs to be explored.
It's not something I'm planning to work on myself at this point, though if someone else wanted to dig into it I'm happy to help brainstorm with them.
--Larry Garfield
given that the casing for an enum should be CamelCase (per PER-CS)
Hi Larry;
I find myself yet again having to ask that php policies/discussions not revolve around the idea that PHP-FIG is a required/expected part of PHP usage.
Until a PHP RFC specifying "proper" casing for userland enums passes, can we leave the claims about what they "should be" out of discussions about language/stdlib functionality?
Cheers
Stephen
given that the casing for an enum should be CamelCase (per PER-CS)
Hi Larry;
I find myself yet again having to ask that php policies/discussions not
revolve around the idea that PHP-FIG is a required/expected part of PHP
usage.Until a PHP RFC specifying "proper" casing for userland enums passes,
can we leave the claims about what they "should be" out of discussions
about language/stdlib functionality?
-
The status quo in the ecosystem is relevant to language development. FIG is a part of that ecosystem. "Everyone in Laravel does X" or "this would break Symfony which does Y" are also a relevant observation to make, though in neither case is it a binding dictat, of course. By a similar token, the language doesn't require class-per-file, but the de facto standard for virtually every project that isn't WordPress is to use class-per-file for autoloading. It would be highly stupid of us to ignore that fact when discussing autoloader improvements.
-
The Enum RFC used PascalCase. The PHP maual uses PascalCase. We're already recommending PascalCase as the standard for enum cases.
Those who aren't following that recommendation are, from what I've seen, using ALL_CAPS. Meaning using lower_case is NOT typical, and thus the issue I mentioned (that automatically using the case name as the backing string name may not be all that useful) is present either way.
--Larry Garfield
Sent from my iPhone
On Wed, May 22, 2024, at 2:29 AM, Stephen Reay wrote:
given that the casing for an enum should be CamelCase (per PER-CS)
Hi Larry;
I find myself yet again having to ask that php policies/discussions not
revolve around the idea that PHP-FIG is a required/expected part of PHP
usage.Until a PHP RFC specifying "proper" casing for userland enums passes,
can we leave the claims about what they "should be" out of discussions
about language/stdlib functionality?
The status quo in the ecosystem is relevant to language development. FIG is a part of that ecosystem. "Everyone in Laravel does X" or "this would break Symfony which does Y" are also a relevant observation to make, though in neither case is it a binding dictat, of course. By a similar token, the language doesn't require class-per-file, but the de facto standard for virtually every project that isn't WordPress is to use class-per-file for autoloading. It would be highly stupid of us to ignore that fact when discussing autoloader improvements.
The Enum RFC used PascalCase. The PHP maual uses PascalCase. We're already recommending PascalCase as the standard for enum cases.
Those who aren't following that recommendation are, from what I've seen, using ALL_CAPS. Meaning using lower_case is NOT typical, and thus the issue I mentioned (that automatically using the case name as the backing string name may not be all that useful) is present either way.
--Larry Garfield
Hi Larry,
I didn't say the community or common uses should be ignored. I just asked you not to use the phrase "X should be Y because of <external entity>".
It suggests authority where none exists.
Cheers
Stephen
Sent from my iPhone
On Wed, May 22, 2024, at 2:29 AM, Stephen Reay wrote:
given that the casing for an enum should be CamelCase (per PER-CS)
Hi Larry;
I find myself yet again having to ask that php policies/discussions not
revolve around the idea that PHP-FIG is a required/expected part of PHP
usage.Until a PHP RFC specifying "proper" casing for userland enums passes,
can we leave the claims about what they "should be" out of discussions
about language/stdlib functionality?
The status quo in the ecosystem is relevant to language development. FIG is a part of that ecosystem. "Everyone in Laravel does X" or "this would break Symfony which does Y" are also a relevant observation to make, though in neither case is it a binding dictat, of course. By a similar token, the language doesn't require class-per-file, but the de facto standard for virtually every project that isn't WordPress is to use class-per-file for autoloading. It would be highly stupid of us to ignore that fact when discussing autoloader improvements.
The Enum RFC used PascalCase. The PHP maual uses PascalCase. We're already recommending PascalCase as the standard for enum cases.
Those who aren't following that recommendation are, from what I've seen, using ALL_CAPS. Meaning using lower_case is NOT typical, and thus the issue I mentioned (that automatically using the case name as the backing string name may not be all that useful) is present either way.
--Larry Garfield
Hi Larry,
I didn't say the community or common uses should be ignored. I just
asked you not to use the phrase "X should be Y because of <external entity>".It suggests authority where none exists.
I think you're reading far more "intent" or "enforcement" into my parenthetical than was intended or appropriate.
--Larry Garfield
I could see an argument for auto-populating the backing value off the enum name if it's not specified, something like this: enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
This seems like a reasonable compromise. In this case, all I need to do
is change my enum to a backed enum (suffix: string
) and I get the
benefits of implicit values. I still like the idea of the same being
possible for non-backed enums, though I imagine that is a product of my
naïveté, as I do not tend to think of things in the framing of
(de)serialization.Making it really hard and unnatural for people to use PHP enums as "cheap named strings/ints" is something we want to keep, because the whole value of enums is that they are their own separate type. A Direction is not a string, it's not an int, it's not an array, it's not a Product, it's a Direction. That's the end of it. Databases and HTML don't know from Direction or Product, though, so it has to get serialized down to something for those. For arbitrary objects, you basically have to write your own mechanism. (I did.) For enums, one comes built-in to make it more standardized: backed enums.
For what it's worth, the biggest downside to this decision is in
upgrading old/legacy projects to use enums. Most of the time, you want
to convert a const to an enum, and that also usually means it will be
a backed enum. Since there's no way to cast a backed enum to a
string/int, you have to instead add ->value (I really don't understand
the difference; casting would have been much more elegant); this
almost always results in lots of hidden runtime errors. At the last
couple of places, after seeing the damage the conversion could do,
doing this conversion was simply off-limits. Maybe by now, there is
some reliable tooling to handle this automatically... I haven't
looked. The point remains that it is a strange decision to use
->value
but not allow (string)
or (int)
or allow even
implementing it yourself.
I'm not sure if I'd support it myself at the moment
Noted, but I once again find myself needing to ask: why not? Were it up
to me, I'd say let's start right now! :)Mainly because I haven't thought it through to see what possible issues it could cause. It may be safe, or there may be currently-not-obvious issues that would result. That's the sort of thing we'd need to explore.
For instance, how useful would it be, given that the casing for an enum should be CamelCase (per PER-CS), but the serialized string is most often snake_case, and otherwise lowerCamel? We definitely cannot bake case folding magic into the behavior. So is it even useful at that point? I don't know. Maybe. That's what needs to be explored.
I don't see how casing of the backed value is relevant. If you want it
to be something different, then specify it in the class body. 9/10 of
the time, you just need to serialize the enum somewhere and the casing
doesn't matter. If it does matter, then specify the values manually.
It's not something I'm planning to work on myself at this point, though if someone else wanted to dig into it I'm happy to help brainstorm with them.
--Larry Garfield
For what it's worth, the biggest downside to this decision is in
upgrading old/legacy projects to use enums. Most of the time, you want
to convert a const to an enum, and that also usually means it will be
a backed enum. Since there's no way to cast a backed enum to a
string/int, you have to instead add ->value (I really don't understand
the difference; casting would have been much more elegant); this
almost always results in lots of hidden runtime errors. At the last
couple of places, after seeing the damage the conversion could do,
doing this conversion was simply off-limits. Maybe by now, there is
some reliable tooling to handle this automatically... I haven't
looked. The point remains that it is a strange decision to use
->value
but not allow(string)
or(int)
or allow even
implementing it yourself.
Union types are the correct solution here:
function move(string $direction, Player $player) { ... }
becomes
function move (Direction|string $direction, Player $player) {
if (is_string($direction)) {
$direction = Direction::from($direction);
}
// We now have a normalized enum to work with.
}
The only catch there is inheritance, where it works fine if you jut expand the implementers/children first, then the interface/parent. Then you can remove the string option at some point in the future, going the other way (interface/parent, then implementer/child).
--Larry Garfield
I could see an argument for auto-populating the backing value off the enum name if it's not specified, something like this:
enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
This seems like a reasonable compromise. In this case, all I need to do is change my enum to a backed enum (suffix: string
) and I get the benefits of implicit values. I still like the idea of the same being possible for non-backed enums, though I imagine that is a product of my naïveté, as I do not tend to think of things in the framing of (de)serialization.I'm not sure if I'd support it myself at the moment
Noted, but I once again find myself needing to ask: why not? Were it up to me, I'd say let's start right now! :)Aside, I am not at all concerned with integer-backed enums at this juncture, and presume that could be a separate discussion/implementation anyway.
Cheers,
Bilge
As a workaround, you can use something like the trait below.
trait SerializableEnum
{
public readonly string $name;
/** @return list<self> */
abstract public static function cases(): array;
public function toString(): string
{
return $this->name;
}
public static function from(string $name): self
{
return self::tryFrom($name) ?? throw new ValueError(sprintf(
'"%s" is not a valid backing value for enum %s',
$name,
self::class,
));
}
public static function tryFrom(string $name): ?self
{
foreach (self::cases() as $case) {
if ($case->name === $name) {
return $case;
}
}
return null;
}
}
enum ExampleEnum
{
use SerializableEnum;
case ONE;
case TWO;
case THREE;
}
var_dump(ExampleEnum::from('ONE'));
var_dump(ExampleEnum::from('FOUR'));
Perhaps not as clean and easy as the functionality being built-in, but it gets the job done. :-D
Aaron Piotrowski
Hi
Perhaps not as clean and easy as the functionality being built-in, but it gets the job done.
I would suggest to use the built-in functionality then.
enum cases are literally just class constants, thus you can access them
via the constant()
function or the dynamic class constant fetch syntax
for PHP 8.3+ and check their existence with defined()
:
<?php
enum ExampleEnum
{
case ONE;
case TWO;
case THREE;
}
$caseName = 'ONE';
var_dump(defined(ExampleEnum::class . "::{$caseName}"));
var_dump(constant(ExampleEnum::class . "::{$caseName}"));
var_dump(ExampleEnum::{$caseName});
Outputs:
bool(true)
enum(ExampleEnum::ONE)
enum(ExampleEnum::ONE)
Best regards
Tim Düsterhus
Hi
Perhaps not as clean and easy as the functionality being built-in, but it gets the job done.
I would suggest to use the built-in functionality then.
enum cases are literally just class constants, thus you can access them via the
constant()
function or the dynamic class constant fetch syntax for PHP 8.3+ and check their existence withdefined()
:<?php
enum ExampleEnum
{
case ONE;
case TWO;
case THREE;
}$caseName = 'ONE';
var_dump(defined(ExampleEnum::class . "::{$caseName}"));
var_dump(constant(ExampleEnum::class . "::{$caseName}"));
var_dump(ExampleEnum::{$caseName});Outputs:
bool(true)
enum(ExampleEnum::ONE)
enum(ExampleEnum::ONE)Best regards
Tim Düsterhus
Hey Tim,
This solution is flawed. Not every constant is necessary an enum case. It also isn't type-safe or as nice to read as tryFrom() and from() static methods.
Cheers,
Aaron Piotrowski
Hi
This solution is flawed. Not every constant is necessary an enum case. It also isn't type-safe or as nice to read as tryFrom() and from() static methods.
Having regular class constants on an enum is somewhat questionable [1],
but fair enough. In that case an instanceof ExampleEnum
check will do
the trick.
But you don't need to iterate on all the enum cases, checking them in
sequence, like in your suggestion.
As for the initial suggestion of implicitly backing enums: I'm strongly
against that. Personally I find that backed enums are almost never
useful. Serialization is more reliably performed using the case name and
often there is more than one reasonable scalar representation and then
you need regular methods to obtain the different representations.
Singling out one of those scalar representations as the “blessed”
representation to store in the backed value feels semantically incorrect
to me.
Best regards
Tim Düsterhus
[1] Except possibly to alias an enum case for BC purposes.
As for the initial suggestion of implicitly backing enums: I'm
strongly against that. Personally I find that backed enums are almost
never useful.
Fair enough; I think we've somewhat moved on from the initial suggestion
of implicitly backed enums, towards implicit values for string enums
(for those names without explicitly defined values). What is your
opinion on this alternative approach?
Cheers,
Bilge
Hi
Fair enough; I think we've somewhat moved on from the initial suggestion
of implicitly backed enums, towards implicit values for string enums
(for those names without explicitly defined values). What is your
opinion on this alternative approach?
Generally not a fan of implicit. I like my code to be easy to read and
reason about, not easy to write (and hard to read).
Best regards
Tim Düsterhus
I could see an argument for auto-populating the backing value off the enum name if it's not specified, something like this:
enum Options: string {
case First; // This implicitly gets "First"
case Second = '2nd';
}
This reminds me of the short-hand key-value syntax that JavaScript
allows, and people have occasionally requested equivalents for in PHP,
where { foo } is equivalent to { 'foo': foo }
The downside I see to all such short-hands is that they make it much
harder to refactor safely, because the identifier and the string value
are tied together.
For instance, maybe you want to rename Options::First to Options::Legacy
and Options::Second to Options::Modern so you edit the enum, and find
all references in code:
enum Options: string {
case Legacy;
case Modern = '2nd';
}
But now everywhere you've serialized the old value of "First" is going
to break, because the first case now has the implicit backing value of
"Legacy" instead!
To avoid this, you have to go ahead and specify all the backing values:
enum Options: string {
case Legacy = 'First';
case Modern = '2nd';
}
Having to specify both the name and value in the first place makes that
decision much more obvious, for what seems to me to be very little
up-front cost.
Regards,
--
Rowan Tommins
[IMSoP]
Hi
A unit enum isn't serializable, by design.
A unit enum is perfectly serializable: https://3v4l.org/Mf9Ou
<?php
enum MyUnitEnum {
case Foo;
case Bar;
}
var_dump($serialized = serialize(MyUnitEnum::Foo));
var_dump(unserialize($serialized) === MyUnitEnum::Foo);
Outputs:
string(22) "E:14:"MyUnitEnum:Foo";"
bool(true)
Best regards
Tim Düsterhus