Hi, internals!
9 years have passed since the last discussions of case sensitive PHP:
https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.
What is case-sensitive in PHP 8.3:
- variables
- constants (all since
https://wiki.php.net/rfc/case_insensitive_constant_deprecation) - class constants
- properties
What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)
Pros:
- no need to convert strings to lowercase inside the engine for name
lookups (a small performance and memory gain) - better fit for case sensitive platforms that PHP code is mostly run on
(Linux) - uniform handling of ASCII and non-ASCII symbols (currently non-ASCII
symbols in names are case sensitive: https://3v4l.org/PWkvG) - PSR-4 compatibility (
https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion
)
Cons:
- pain for users, obviously
- a backward compatibility layer might be difficult to implement and/or
have a performance penalty
On con 1. I think today PHP users are much more prepared for the change:
- more and more projects adopted namespaces and PSR-4 autoloading via
Composer that never supported case-insensitivity (
https://github.com/composer/composer/issues/1803,
https://github.com/composer/composer/issues/8906) which forced to mind
casing - static analyzers became more popular and they do complain about the
wrong casing (see https://psalm.dev/r/fbdeee2f38 and
https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4) - Rector appeared (it can be used to automatically prepare the codebase for
the next PHP version)
On con 2. While considering different transition options proposed in prior
discussions (compilation flag, ini option, deprecation notice) I stumbled
upon Nikita's comment (https://externals.io/message/79824#79939):
May I recommend to only target class and class-like names for an initial
RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the
case-insensitivity doesn't properly work anyway in modern code.
...
I'd also appreciate having a voting option for removing case-insensitivity
right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to
change this, I personally would rather drop it right away than start
throwing
E_STRICT
warnings that would make the case-insensitive usage
impossible anyway.
It makes a lot of sense to me: a fairly simple change in the core and no
performance penalty. At the same time, a gradual approach will reduce
the stress.
So the plan for 8.4 might be to just drop case insensitivity for class
names and that's it... Let's discuss that!
Best regards,
Valentin Udaltsov
Hi, internals!
9 years have passed since the last discussions of case sensitive PHP: https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.What is case-sensitive in PHP 8.3:
- variables
- constants (all since https://wiki.php.net/rfc/case_insensitive_constant_deprecation)
- class constants
- properties
What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)
Pros:
- no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
- better fit for case sensitive platforms that PHP code is mostly run on (Linux)
- uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)
- PSR-4 compatibility (https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion)
Cons:
- pain for users, obviously
- a backward compatibility layer might be difficult to implement and/or have a performance penalty
On con 1. I think today PHP users are much more prepared for the change:
- more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (https://github.com/composer/composer/issues/1803, https://github.com/composer/composer/issues/8906) which forced to mind casing
- static analyzers became more popular and they do complain about the wrong casing (see https://psalm.dev/r/fbdeee2f38 and https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4)
- Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)
On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (https://externals.io/message/79824#79939):
May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwingE_STRICT
warnings that would make the case-insensitive usage impossible anyway.
It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!
I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.
Cheers,
Ben
Hi, internals!
9 years have passed since the last discussions of case sensitive PHP: https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.What is case-sensitive in PHP 8.3:
- variables
- constants (all since https://wiki.php.net/rfc/case_insensitive_constant_deprecation)
- class constants
- properties
What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)
Pros:
- no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
- better fit for case sensitive platforms that PHP code is mostly run on (Linux)
- uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)
- PSR-4 compatibility (https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion)
Cons:
- pain for users, obviously
- a backward compatibility layer might be difficult to implement and/or have a performance penalty
On con 1. I think today PHP users are much more prepared for the change:
- more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (https://github.com/composer/composer/issues/1803, https://github.com/composer/composer/issues/8906) which forced to mind casing
- static analyzers became more popular and they do complain about the wrong casing (see https://psalm.dev/r/fbdeee2f38 and https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4)
- Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)
On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (https://externals.io/message/79824#79939):
May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwingE_STRICT
warnings that would make the case-insensitive usage impossible anyway.
It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!
I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.
Cheers,
Ben
In fact, it's definitely a BC break I would not personally vote for in
8.4. This isn't some minor thing squirreled away in a library--this is
the core language, with wide impact. For this reason, I believe it
should target 9.0.
I will happily vote for this feature, as long as the patch is reasonable.
The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to become
"psr\log\LoggerInterface". The problem with this is that it's not
really going to save CPU nor memory because it still has to lowercase
the namespace.
We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.
I can't think of other implementations right now. If nobody can come
up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.
2024年6月11日(火) 23:18 Levi Morrison levi.morrison@datadoghq.com:
Hi, internals!
9 years have passed since the last discussions of case sensitive PHP: https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.What is case-sensitive in PHP 8.3:
- variables
- constants (all since https://wiki.php.net/rfc/case_insensitive_constant_deprecation)
- class constants
- properties
What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)
Pros:
- no need to convert strings to lowercase inside the engine for name lookups (a small performance and memory gain)
- better fit for case sensitive platforms that PHP code is mostly run on (Linux)
- uniform handling of ASCII and non-ASCII symbols (currently non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)
- PSR-4 compatibility (https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion)
Cons:
- pain for users, obviously
- a backward compatibility layer might be difficult to implement and/or have a performance penalty
On con 1. I think today PHP users are much more prepared for the change:
- more and more projects adopted namespaces and PSR-4 autoloading via Composer that never supported case-insensitivity (https://github.com/composer/composer/issues/1803, https://github.com/composer/composer/issues/8906) which forced to mind casing
- static analyzers became more popular and they do complain about the wrong casing (see https://psalm.dev/r/fbdeee2f38 and https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4)
- Rector appeared (it can be used to automatically prepare the codebase for the next PHP version)
On con 2. While considering different transition options proposed in prior discussions (compilation flag, ini option, deprecation notice) I stumbled upon Nikita's comment (https://externals.io/message/79824#79939):
May I recommend to only target class and class-like names for an initial RFC? Those have the strongest argument in favor of case-sensitivity given
how current autoloader implementations work - essentially the case-insensitivity doesn't properly work anyway in modern code....I'd also appreciate having a voting option for removing case-insensitivity right away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change this, I personally would rather drop it right away than start throwingE_STRICT
warnings that would make the case-insensitive usage impossible anyway.
It makes a lot of sense to me: a fairly simple change in the core and no performance penalty. At the same time, a gradual approach will reduce the stress.So the plan for 8.4 might be to just drop case insensitivity for class names and that's it... Let's discuss that!
I’m not saying I agree with or support this, but I think your proposal has a better chance of being accepted if you target PHP 9.0 instead of 8.4.
Cheers,
BenIn fact, it's definitely a BC break I would not personally vote for in
8.4. This isn't some minor thing squirreled away in a library--this is
the core language, with wide impact. For this reason, I believe it
should target 9.0.I will happily vote for this feature, as long as the patch is reasonable.
The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to become
"psr\log\LoggerInterface". The problem with this is that it's not
really going to save CPU nor memory because it still has to lowercase
the namespace.We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.I can't think of other implementations right now. If nobody can come
up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.
Hi
I'm worried that have an impact on Windows (case-insensitive file
system). Even if it's only the Class name.
Looks like need to more discussion.
Regards
Yuya
--
Yuya Hamada (tekimen)
On Tue, 11 June 2024 at 17:13, Levi Morrison levi.morrison@datadoghq.com
wrote:
On Jun 10, 2024, at 20:35, Valentin Udaltsov <
udaltsov.valentin@gmail.com> wrote:Hi, internals!
9 years have passed since the last discussions of case sensitive PHP:
https://externals.io/message/79824 and https://externals.io/message/83640.
Here I would like to revisit this topic.What is case-sensitive in PHP 8.3:
- variables
- constants (all since
https://wiki.php.net/rfc/case_insensitive_constant_deprecation)- class constants
- properties
What is case-insensitive in PHP 8.3:
- namespaces
- functions
- classes (including self, parent and static relative class types)
- methods (including the magic ones)
Pros:
- no need to convert strings to lowercase inside the engine for name
lookups (a small performance and memory gain)- better fit for case sensitive platforms that PHP code is mostly run
on (Linux)- uniform handling of ASCII and non-ASCII symbols (currently
non-ASCII symbols in names are case sensitive: https://3v4l.org/PWkvG)- PSR-4 compatibility (
https://www.php-fig.org/psr/psr-4/#:~:text=All%20class%20names%20MUST%20be%20referenced%20in%20a%20case%2Dsensitive%20fashion
)Cons:
- pain for users, obviously
- a backward compatibility layer might be difficult to implement
and/or have a performance penaltyOn con 1. I think today PHP users are much more prepared for the
change:
- more and more projects adopted namespaces and PSR-4 autoloading via
Composer that never supported case-insensitivity (
https://github.com/composer/composer/issues/1803,
https://github.com/composer/composer/issues/8906) which forced to mind
casing- static analyzers became more popular and they do complain about the
wrong casing (see https://psalm.dev/r/fbdeee2f38 and
https://phpstan.org/r/1789a32d-d928-4311-b02e-155dd98afbd4)- Rector appeared (it can be used to automatically prepare the
codebase for the next PHP version)On con 2. While considering different transition options proposed in
prior discussions (compilation flag, ini option, deprecation notice) I
stumbled upon Nikita's comment (https://externals.io/message/79824#79939):
May I recommend to only target class and class-like names for an
initial RFC? Those have the strongest argument in favor of case-sensitivity
given
how current autoloader implementations work - essentially the
case-insensitivity doesn't properly work anyway in modern code....I'd also
appreciate having a voting option for removing case-insensitivity right
away, as opposed to throwing E_STRICT/E_DEPRECATED. If we want to change
this, I personally would rather drop it right away than start throwing
E_STRICT
warnings that would make the case-insensitive usage impossible
anyway.
It makes a lot of sense to me: a fairly simple change in the core and
no performance penalty. At the same time, a gradual approach will reduce
the stress.So the plan for 8.4 might be to just drop case insensitivity for class
names and that's it... Let's discuss that!I’m not saying I agree with or support this, but I think your proposal
has a better chance of being accepted if you target PHP 9.0 instead of 8.4.Cheers,
BenIn fact, it's definitely a BC break I would not personally vote for in
8.4. This isn't some minor thing squirreled away in a library--this is
the core language, with wide impact. For this reason, I believe it
should target 9.0.I will happily vote for this feature, as long as the patch is reasonable.
The most obvious implementation is not very good, though. The engine
uses lowercase names for case insensitivity. Namespaces are embedded
into the type names. To lowercase the namespace but not the type name,
one could do a reverse scan for a namespace separator on the type
name, and then lowercase from the start to the index of the namespace
separator. For example, " Psr\Log\LoggerInterface" needs to become
"psr\log\LoggerInterface". The problem with this is that it's not
really going to save CPU nor memory because it still has to lowercase
the namespace.We could refactor the engine to store the namespace separately from
the type name. This is a lot more work and will increase the size of
some types, which might be difficult at a technical level.I can't think of other implementations right now. If nobody can come
up with a better implementation, I think we should consider going with
split-sensitivity on namespaces where it matches the sensitivity of
the thing it is attached to. A namespaced class would have a case
sensitive namespace but a namesped function would still have a case
insensitive one.
Hi, Ben and Levi! Thank you for your interest!
Could you, please, elaborate on why you propose to target 9.0? That would
make perfect sense if PHP strictly followed semver, but we always have some
BC breaks in minor releases (
https://www.php.net/manual/en/migration82.incompatible.php,
https://www.php.net/manual/en/migration83.incompatible.php). So, is there a
real difference between 8.4 and 9.0 for this case? Or do you mean that this
BC break is way too big for 8.4?
Levi, if we bundle namespaces, classes and functions in a single change,
will that be easier to implement? Basically to remove lowercasing and put
the original type names in the lookup tables?
--
Best regards,
Valentin Udaltsov
Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases (https://www.php.net/manual/en/migration82.incompatible.php, https://www.php.net/manual/en/migration83.incompatible.php). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4?
Generally, the allowed backwards compatibility breaks in minor
versions are also minor breaks. These are mostly changes in extensions
rather than the core language. This change is in the main language and
it's potentially quite a big one.
Additionally, if this RFC were to pass, we would want extra time to
revisit the casing of suspect items for the same version. For example,
Pdo
vs PDO
. There's just not enough time for PHP 8.4 to do this.
Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables?
Yes, doing it all in one pass is easier to implement, and would
provide minor CPU and memory improvements.
Hi, Ben and Levi! Thank you for your interest!
Could you, please, elaborate on why you propose to target 9.0? That would make perfect sense if PHP strictly followed semver, but we always have some BC breaks in minor releases (https://www.php.net/manual/en/migration82.incompatible.php, https://www.php.net/manual/en/migration83.incompatible.php). So, is there a real difference between 8.4 and 9.0 for this case? Or do you mean that this BC break is way too big for 8.4?
Levi, if we bundle namespaces, classes and functions in a single change, will that be easier to implement? Basically to remove lowercasing and put the original type names in the lookup tables?
While we do make backwards incompatible breaks in minor PHP version (and we have done since the beginning of time, I checked last time this argument came up) we do keep them to a minimum and to be "small" BC breaks, the judgement of what "small" means is fuzzy.
Plenty of us thought converting resources to opaque object was "small" but others disagreed.
And I agree with Levi here, I am in favour of this change, but I don't think it should land in a minor.
For PHP the namespace is just a prefix to any symbol to be able to distinguish them, and namespaces are already canonicalized to be lowercase, this has been an issue when trying to remove the memory footprint of constants, as the casing of namespace was lost. [1][2][3]
Indeed you can access a namespace constant with two different casing in the namespace. [4]
One difficulty is that checking the casing at runtime for all classes/functions to check if they are conformant would likely lead to a big performance degradation.
It might be possible to check some of these things at compile time (well at least for functions) if the class/function is already available in the symbol table.
Best regards,
Gina P. Banyard
[1] https://github.com/php/php-src/pull/10954
[2] https://github.com/php/php-src/issues/11423
Would this affect unserialize()
?
I ask because MediaWiki's main "text" database table is an
immutable/append-only store where we store the text of each page revision
since ~2004. It is stored as serialised blobs of a value class. There have
been a number of different implementations over the past twenty years of
Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
When we adopted modern autoloading in MediaWiki, we quickly found that
blobs originally serialized by PHP 4 actually encoded the class in
lowercase, regardless of the casing in source code.
From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these classes,
but, it is arguably PHP's responsiblity to be able to construct what it
serialized. I suppose anything is possible when announced as a breaking
change for PHP 9.0. I wanted to share this as something to take into
consideration as part of the impact. Potentially worthy of additional
communicating, or perhaps worth supporting separately.
--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/
On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof ttijhof@wikimedia.org
wrote:
Would this affect
unserialize()
?I ask because MediaWiki's main "text" database table is an
immutable/append-only store where we store the text of each page revision
since ~2004. It is stored as serialised blobs of a value class. There have
been a number of different implementations over the past twenty years of
Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).When we adopted modern autoloading in MediaWiki, we quickly found that
blobs originally serialized by PHP 4 actually encoded the class in
lowercase, regardless of the casing in source code.From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these classes,
but, it is arguably PHP's responsiblity to be able to construct what it
serialized. I suppose anything is possible when announced as a breaking
change for PHP 9.0. I wanted to share this as something to take into
consideration as part of the impact. Potentially worthy of additional
communicating, or perhaps worth supporting separately.--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/
Hi, Timo!
Thank you very much for bringing up this important case.
Here's how I see this. If PHP gets class case-sensitivity, unserialization
of classes with lowercase names will fail. This is because the engine will
start putting MyClass
class entry with key MyClass
(not myclass
) into
the loaded classes table and serialization will not be able to find it as
myclass
.
Even if some deprecation layer is introduced (that puts both myclass
and
MyClass
keys into the table), you will first have a ton of notices and
then eventually end up with the same problem, when transition to case
sensitivity is complete. Hence I propose no deprecation layer — it does not
really help.
However, you will be able to use class_alias()
to solve your issue. If
classes are case-sensitive, class_alias(MyClass::class, 'myclass');
should work, since MyClass != myclass anymore. And serialization works
perfectly with class aliases, see https://3v4l.org/1n1as .
--
Valentin Udaltsov
I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.
There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.
Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.
I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.
Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.
I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.
So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.
On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
udaltsov.valentin@gmail.com wrote:
Would this affect
unserialize()
?I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.
From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.
--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/Hi, Timo!
Thank you very much for bringing up this important case.
Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting
MyClass
class entry with keyMyClass
(notmyclass
) into the loaded classes table and serialization will not be able to find it asmyclass
.
Even if some deprecation layer is introduced (that puts bothmyclass
andMyClass
keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.However, you will be able to use
class_alias()
to solve your issue. If classes are case-sensitive,class_alias(MyClass::class, 'myclass');
should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as .--
Valentin Udaltsov
Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.
PHP's decline in popularity is not correlated with its objective
improvements. If you long for older (broken) versions, they are still
available.
Bilge
I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
udaltsov.valentin@gmail.com wrote:Would this affect
unserialize()
?I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.
From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.
--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/Hi, Timo!
Thank you very much for bringing up this important case.
Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting
MyClass
class entry with keyMyClass
(notmyclass
) into the loaded classes table and serialization will not be able to find it asmyclass
.
Even if some deprecation layer is introduced (that puts bothmyclass
andMyClass
keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.However, you will be able to use
class_alias()
to solve your issue. If classes are case-sensitive,class_alias(MyClass::class, 'myclass');
should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as .--
Valentin Udaltsov
Hey Rokas,
Please bottom post (it's the rules), but PHP's "decline" has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. "Hyped up" languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
"full-stack" (and from talking to "full-stack" developers, it mostly
tends to mean they know Javascript -- which lo-and-behold, is the top
language; surprise surprise).
I wouldn't put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.
Robert Landers
Software Engineer
Utrecht NL
I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
udaltsov.valentin@gmail.com wrote:Would this affect
unserialize()
?I ask because MediaWiki's main "text" database table is an immutable/append-only store where we store the text of each page revision since ~2004. It is stored as serialised blobs of a value class. There have been a number of different implementations over the past twenty years of Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
When we adopted modern autoloading in MediaWiki, we quickly found that blobs originally serialized by PHP 4 actually encoded the class in lowercase, regardless of the casing in source code.
From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these classes, but, it is arguably PHP's responsiblity to be able to construct what it serialized. I suppose anything is possible when announced as a breaking change for PHP 9.0. I wanted to share this as something to take into consideration as part of the impact. Potentially worthy of additional communicating, or perhaps worth supporting separately.
--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/Hi, Timo!
Thank you very much for bringing up this important case.
Here's how I see this. If PHP gets class case-sensitivity, unserialization of classes with lowercase names will fail. This is because the engine will start putting
MyClass
class entry with keyMyClass
(notmyclass
) into the loaded classes table and serialization will not be able to find it asmyclass
.
Even if some deprecation layer is introduced (that puts bothmyclass
andMyClass
keys into the table), you will first have a ton of notices and then eventually end up with the same problem, when transition to case sensitivity is complete. Hence I propose no deprecation layer — it does not really help.However, you will be able to use
class_alias()
to solve your issue. If classes are case-sensitive,class_alias(MyClass::class, 'myclass');
should work, since MyClass != myclass anymore. And serialization works perfectly with class aliases, see https://3v4l.org/1n1as .--
Valentin UdaltsovHey Rokas,
Please bottom post (it's the rules), but PHP's "decline" has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. "Hyped up" languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
"full-stack" (and from talking to "full-stack" developers, it mostly
tends to mean they know Javascript -- which lo-and-behold, is the top
language; surprise surprise).I wouldn't put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.Robert Landers
Software Engineer
Utrecht NL
While the whining about market share is off topic, the challenges of keeping up with upgrades are valid, and have been expressed many times. (Sometimes more politely than others.)
I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it's 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it's 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.
At the moment, I'm leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.
--Larry Garfield
Coming from the property hooks/ asymmetric visibility dude, that's pretty
rich.
On Fri, Jun 14, 2024 at 10:13 AM Larry Garfield larry@garfieldtech.com
wrote:
On Fri, Jun 14, 2024 at 6:40 AM Rokas Šleinius raveren@gmail.com
wrote:I'm no one important, but I just want to say for the sake of the
public image of PHP I hope this does not pass, or at least not in the
foreseeable future.There are NO substantial gains to speak of here and the BC break is
real and it's super annoying when they pile up and up.Besides, this is slightly off topic, but I don't know if you know, but
if you take a look at stackoverflow developer survey over the years,
there has been an absolute 30% drop of php popularity in the past few
years.I would guess this is mostly the low-level developers not being fans
of the language removing magic quotes and other "super useful"
features. In other words, PHP lost the average joe as its target
audience. Joe's gone.Just my 2¢:
a) this WAS the reason PHP was great and I loved to rewrite the
systems of several very successful companies who started out with
their non-technical founders who coded their way out of the box to
begin multi-million businesses
b) the PHP core and co. (a.k.a. YOU) should be acutely aware that the
language needs to be liked by not only you, dear awesome lovely
hardcore nerds, but also the users who just need to get stuff done,
business needs fulfilled.I know this is not how YOU work, but if you ignore that part of the
language users, there might eventually not be a language to work on in
the future.So please, keep the language loose, I hate the slight inconsistency
too, but if we ruin the day for another 20% of users, it might even be
the straw that broke the camel's back.On Fri, 14 Jun 2024 at 02:38, Valentin Udaltsov
udaltsov.valentin@gmail.com wrote:On Friday, 14 June, 2024 г. at 00:04, Timo Tijhof <
ttijhof@wikimedia.org> wrote:Would this affect
unserialize()
?I ask because MediaWiki's main "text" database table is an
immutable/append-only store where we store the text of each page revision
since ~2004. It is stored as serialised blobs of a value class. There have
been a number of different implementations over the past twenty years of
Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).When we adopted modern autoloading in MediaWiki, we quickly found
that blobs originally serialized by PHP 4 actually encoded the class in
lowercase, regardless of the casing in source code.From https://3v4l.org/jl0et:
class ConcatenatedGzipHistoryBlob {…}
print serialize($blob);PHP 4.x: O:27:"concatenatedgziphistoryblob":…
PHP 5/7/8: O:27:"ConcatenatedGzipHistoryBlob":…
It is of course the application's responsibility to load these
classes, but, it is arguably PHP's responsiblity to be able to construct
what it serialized. I suppose anything is possible when announced as a
breaking change for PHP 9.0. I wanted to share this as something to take
into consideration as part of the impact. Potentially worthy of additional
communicating, or perhaps worth supporting separately.--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.
https://timotijhof.net/Hi, Timo!
Thank you very much for bringing up this important case.
Here's how I see this. If PHP gets class case-sensitivity,
unserialization of classes with lowercase names will fail. This is because
the engine will start puttingMyClass
class entry with keyMyClass
(not
myclass
) into the loaded classes table and serialization will not be able
to find it asmyclass
.
Even if some deprecation layer is introduced (that puts both
myclass
andMyClass
keys into the table), you will first have a ton of
notices and then eventually end up with the same problem, when transition
to case sensitivity is complete. Hence I propose no deprecation layer — it
does not really help.However, you will be able to use
class_alias()
to solve your issue.
If classes are case-sensitive,class_alias(MyClass::class, 'myclass');
should work, since MyClass != myclass anymore. And serialization works
perfectly with class aliases, see https://3v4l.org/1n1as .--
Valentin UdaltsovHey Rokas,
Please bottom post (it's the rules), but PHP's "decline" has little to
do with the language itself, most likely it has to do with how long
people have been coding. >42% of people have been programming less
than 9 years, and >62% for less than 14. "Hyped up" languages tend to
dominate in the earlier years of programming and even then, most of
the developers responding to that survey classify themselves as
"full-stack" (and from talking to "full-stack" developers, it mostly
tends to mean they know Javascript -- which lo-and-behold, is the top
language; surprise surprise).I wouldn't put too much weight on that survey since it is clearly
biased towards early-career devs, in the US, who know Javascript.
Fortunately, the industry is much bigger than that.Robert Landers
Software Engineer
Utrecht NLWhile the whining about market share is off topic, the challenges of
keeping up with upgrades are valid, and have been expressed many times.
(Sometimes more politely than others.)I agree that this sounds like a change with very unclear BC implications
at best, and bad ones at worst, with dubious benefit. Just how much
performance would we gain from case sensitive class names? If it's 20%,
OK, sure, that may be worth whatever BC breaks that causes on the margins.
If it's 0.2%, then frankly, no, the PR cost of pissing off people who have
to manage edge cases is not worth the hassle.At the moment, I'm leaning No on this change, because the
cost/reward/backlash ratio is just not there to support it.--Larry Garfield
Coming from the property hooks/ asymmetric visibility dude, that's pretty
rich.
Please, ad-hominem (and other) attacks are not welcome on this list.
Please familiarise yourself with the mailinglist rules
(https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md).
with kind regards,
Derick
I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it's 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it's 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.
At the moment, I'm leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.
--Larry Garfield
Would be good to see some real-world metrics, whether or not they're the
principal/only reason this might be a good change.
Bilge
I agree that this sounds like a change with very unclear BC implications at best, and bad ones at worst, with dubious benefit. Just how much performance would we gain from case sensitive class names? If it's 20%, OK, sure, that may be worth whatever BC breaks that causes on the margins. If it's 0.2%, then frankly, no, the PR cost of pissing off people who have to manage edge cases is not worth the hassle.
At the moment, I'm leaning No on this change, because the cost/reward/backlash ratio is just not there to support it.
--Larry Garfield
Would be good to see some real-world metrics, whether or not they're the
principal/only reason this might be a good change.Bilge
Yes, I am working on this.
--
Valentin Udaltsov
Hi
I ask because MediaWiki's main "text" database table is an
immutable/append-only store where we store the text of each page revision
since ~2004. It is stored as serialised blobs of a value class. There have
been a number of different implementations over the past twenty years of
Wikipedia's existence (plain text, gzip-compressed, diff-compressed, etc.).
Is it theoretically possible to migrate the table contents using an
upgrade script that does something along these lines:
UPDATE text SET blob = serialize(unserialize(blob));
? Or is it actually immutable by somehow incorporating the blob (or a
hash of the blob) in some kind of hash chain or Merkle tree?
Best regards
Tim Düsterhus