Hi Internals,
Together with George, I'd like to propose an RFC for a long-standing
problem PHP has:
casting floats to string depends on the locale settings. As this behaviour
is nonsense, and
because it can cause quite serious problems, we would like to get rid of
locale-dependence
in PHP 8.
Please find our RFC at
https://wiki.php.net/rfc/locale_independent_float_to_string.
Regards,
George and Máté
Hi,
This is a great RFC! Just one minor thing.
Outputting floats as strings in locales which change the decimal
separator will have a slightly different output. In our opinion, the
backward compatibility break won't be serious in practice
In my opinion, this will be huge. I can't trace back where the thousands of
possible conversions are used in legacy software. There's a variety of
custom format functions, sprintf implementations, string casts, and
number_format usages. Would it be possible to trigger a warning or
deprecation when using a locale that would have a different result from
what the new result will be? If I'd use a locale that results in 3.5
, I
don't need a warning. If I use a locale that results in 3,5
, I would like
to see a warning of sorts so I can fix this before untested legacy code
will seriously break data exports.
Regards,
Lynn
Hi Internals,
Together with George, I'd like to propose an RFC for a long-standing
problem PHP has:
casting floats to string depends on the locale settings. As this behaviour
is nonsense, and
because it can cause quite serious problems, we would like to get rid of
locale-dependence
in PHP 8.Please find our RFC at
https://wiki.php.net/rfc/locale_independent_float_to_string.Regards,
George and Máté
Outputting floats as strings in locales which change the decimal
separator will have a slightly different output. In our opinion, the
backward compatibility break won't be serious in practiceIn my opinion, this will be huge. I can't trace back where the thousands of
possible conversions are used in legacy software. There's a variety of
custom format functions, sprintf implementations, string casts, and
number_format usages. Would it be possible to trigger a warning or
deprecation when using a locale that would have a different result from
what the new result will be? If I'd use a locale that results in3.5
, I
don't need a warning. If I use a locale that results in3,5
, I would like
to see a warning of sorts so I can fix this before untested legacy code
will seriously break data exports.
It would certainly be possible to raise a warning/notice if a (non
SCCP optimized) float to string conversion would have different results.
This would have a (small) performance impact though, so I'd would
rather avoid enforcing it (i.e. maybe temporarily introduce an INI
directive for this purpose).
--
Christoph M. Becker
Hi,
This is a great RFC! Just one minor thing.
Outputting floats as strings in locales which change the decimal
separator will have a slightly different output. In our opinion, the
backward compatibility break won't be serious in practiceIn my opinion, this will be huge. I can't trace back where the thousands of
possible conversions are used in legacy software. There's a variety of
custom format functions, sprintf implementations, string casts, and
number_format usages. Would it be possible to trigger a warning or
deprecation when using a locale that would have a different result from
what the new result will be? If I'd use a locale that results in3.5
, I
don't need a warning. If I use a locale that results in3,5
, I would like
to see a warning of sorts so I can fix this before untested legacy code
will seriously break data exports.Regards,
Lynn
We are not saying that this won't be prevalent,especially in legacy code,
however, IMHO, this is not a serious issue as it is only a matter how one
character is displayed to end users.
As said by Christoph we did not consider a deprecation warning due to the
performance impact this would lead to as float to string conversions are a
common operation.
The idea of a temporary INI setting which warn about these conversions
is an interesting idea, but I personally would rather not introduce one.
Moreover, with a temporary INI setting how long would it last, one minor
version, one major version? Something in between the two?
I'm obviously in favor of this proposal.
Only really comment I have is on
printf()
: You're right that we have %f and
%F to toggle locale-sensitivity, but the %e, %E, %g, %G formats are always
locale-sensitive. It might make sense to introduce locale-insensitive
variants of those, especially considering that %G is considered the
"standard" floating point format. Internally we support %H for that, so we
could expose that... (Alternatively, locale-sensitivity might be removed
for e/E/g/G.)Regards,
Nikita
sight I was not aware of this issue ...
Personally, I would prefer that %e and %E are not locale aware as they
are meant to represent a standard notation. This would then mean that
%g is locale aware as %f is and %G is not because %E and %F are
not locale aware.
But I'm open to suggestion considering this edge case.
Best regards
George P. Banyard
I would like
to see a warning of sorts so I can fix this before untested legacy code
will seriously break data exports.Regards,
LynnWe are not saying that this won't be prevalent,especially in legacy code,
however, IMHO, this is not a serious issue as it is only a matter how one
character is displayed to end users.
It's so much more than a character that's being displayed. There are
automated data exports with prices, that will be automatically consumed on
the other side. These kind of changes can seriously break if they were
expected to be localized currency formats. We're talking massive amounts of
data here. I've seen price formats go wrong often and seen them be fixed
even up to a week ago. It's hard to track all usages in a 20 year old code
base.
As much as I love that this can be fixed, I rather not see it fixed at all
if there's no upgrade path available.
Regards,
Lynn
Hi Lynn,
There are automated data exports with prices, that will be automatically
consumed on the other side.
I agree that these cases can go horribly wrong. However, my reasoning is
the following:
- if a piece of code currently relies on locale-independence (e.g.
automated data exports) then this
change wouldn't cause any breakage since a workaround has already been in
place there (e.g. the
programmers usevar_export()
instead of casting) - if a piece of code relies on the locale-dependent string representation
of floats then there will be
a BC break, sure, however I believe that code isn't very sensitive to the
change in the vast majority of the
cases since that data is for presentation purposes only.
Or do you have other locale-dependent use-cases in mind? I am sure there
are some but I think the number of the situations where the change is
problematic is less than what it first seems.
Regards,
Máté
I'm very sorry, I pressed the reply instead of reply all button, I hope
this fixes it!
I agree that these cases can go horribly wrong. However, my reasoning is
the following:
- if a piece of code currently relies on locale-independence (e.g.
automated data exports) then this
change wouldn't cause any breakage since a workaround has already been in
place there (e.g. the
programmers usevar_export()
instead of casting)- if a piece of code relies on the locale-dependent string representation
of floats then there will be
a BC break, sure, however I believe that code isn't very sensitive to the
change in the vast majority of the
cases since that data is for presentation purposes only.
I know it's for presentation purpose only, sadly it's not used for just
presentation code. It's being consumed and parsed by automated imports that
expect a format different than 3.5
, because that's how it organically
evolved. Even when it was meant only for presentation, the consumer
expected the 3,5
format and will keep expecting this until we notify them
of a change (and then it will probably still take weeks before this is
fixed on their side). A lot of code that I've encountered is stuff like:
$csv .= $product->getPrice() . ';';
// older code I've seen come by
$product = get_product($$var5);
$csv .= $product['price'] . ';';
As much as I'd like to see this fixed and always give a 3.5
, sadly that's
not the case if you use a different locale. Going through thousands of
files to fix this, is not going to be an easy task, especially not as this
is often old enough to not be usable by static code analyzers.
Or do you have other locale-dependent use-cases in mind? I am sure there
are some but I think the number of the situations where the change is
problematic is less than what it first seems.
No, this is the only issue I personally see with it. However, the impact
could be severe enough to not be able to upgrade to a new PHP version in
the foreseeable future, thus I would like to see an upgrade path, one that
tells you where it would've gone wrong, instead of turning the behavior on
or off. If I can gather the logs and thus find usages, I can pro-actively
start fixing where this would go wrong. I'll take a performance hit over a
breakage. Performance penalty would only have to apply if the locale is set
to something we know will change the outcome. Maybe it turns out this is a
small issue, maybe it's a big issue... I can't tell because it's either
upgrade & break, or upgrade & work in the proposed scenario. Some breakage
may take weeks to find out because scripts run periodically.
I fully understand that you don't see this as a big issue, and I really
wish I could say the same. I can't vote, so I can't change the outcome,
please consider my use-case when moving forward, thanks!
Regards,
Lynn
Hi Lynn,
Thank you very much for the input! Then we could add a secondary vote to
the RFC where an
upgrade path was decided upon.
The first such path is the one proposed by Christoph:
Introduce a temporary ini setting with which a "debug" mode of float to
string casting could be
enabled. That is, when it's turned on and a casting is affected by locale
settings then a warning
is emitted. This setting would be removed in a (near) future version of
PHP. Although
PHP 8.1 would be an unusual candidate to do so, but it might be possible
if we found a good
way to warn people from day 0 that it's just a temporary flag? Maybe by an
immediate deprecation?
Is there any other idea for a sensible upgrade path?
Cheers:
Máté
Hello all,
We've updated the implementation [1] and RFC [2] to provide a
temporary INI setting which will emit a warning every time a locale-aware
float to string conversation would have been done in PHP 7.
It's current name debug_local_sensitive_float_casts is also up to
debate and will be voted on a secondary vote.
How long this INI setting should remain in PHP is also up to discussion.
One remaining question is if we should change the %e and %E formats
for the printf family of functions to be locale independent or not and/or if
we should expose the %H format which is similar to %G but with %E
being locale independent.
Moreover, we would like to open voting next week, so your input on this
is very much appreciated so that we can amend this before going to voting.
Best regards
George P. Banyard
[1] https://github.com/php/php-src/pull/5224
[2] https://wiki.php.net/rfc/locale_independent_float_to_string
Hi Internals,
Together with George, I'd like to propose an RFC for a long-standing
problem PHP has:
casting floats to string depends on the locale settings. As this behaviour
is nonsense, and
because it can cause quite serious problems, we would like to get rid of
locale-dependence
in PHP 8.Please find our RFC at
https://wiki.php.net/rfc/locale_independent_float_to_string.
I'm obviously in favor of this proposal.
Only really comment I have is on printf()
: You're right that we have %f and
%F to toggle locale-sensitivity, but the %e, %E, %g, %G formats are always
locale-sensitive. It might make sense to introduce locale-insensitive
variants of those, especially considering that %G is considered the
"standard" floating point format. Internally we support %H for that, so we
could expose that... (Alternatively, locale-sensitivity might be removed
for e/E/g/G.)
Regards,
Nikita
Hi,
Máté Kocsis wrote:
Hi Internals,
Together with George, I'd like to propose an RFC for a long-standing
problem PHP has:
casting floats to string depends on the locale settings. As this behaviour
is nonsense, and
because it can cause quite serious problems, we would like to get rid of
locale-dependence
in PHP 8.Please find our RFC at
https://wiki.php.net/rfc/locale_independent_float_to_string.Regards,
George and Máté
Just a little thing, saying “certain locales (notably the German,
French, and Spanish ones)” perhaps understates the issue. Most
non-English-speaking countries do not use . as the decimal separator!
https://en.wikipedia.org/wiki/Decimal_separator#Arabic_numerals
Anyway, thank you for this RFC, I think fixing this is long, long
overdue. I expect there could be a serious BC impact for certain PHP
users, but I don't think there is a good long-term solution for this
short of forcing such code to be updated.
As a potential improvement, I would suggest to try to be closer to
comprehensive in documenting which functions' behaviour changes or does
not. For example, this RFC doesn't currently say what happens to
settype()
and print_r()
.
Thanks,
Andrea