Hi!
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
While the former regards the current locale, the latter always expects a
decimal point regardless of the locale. This breaks round-trips for
locales which use something else than a dot as decimal separator (e.g.
German, which uses a comma), for instance:
$float = 1/3; // float(0,33333333333333)
$string = (string) $float; // string(16) "0,33333333333333"
$float = (float) $string; // float(0)
As for me, the question is not if, but rather when and how this
inconsistency should be resolved. Regarding the when, it seems to me
we have to wait for the next major version, i.e. PHP 8. Regarding the
how I tend to prefer the non-locale aware behavior, i.e. float to
string conversion should always produce a decimal point. Users still
can explicitly use number_format()
or NumberFormatter if they wish.
Thoughts?
--
Christoph M. Becker
On Wed, Dec 26, 2018 at 4:48 PM Christoph M. Becker cmbecker69@gmx.de
wrote:
Hi!
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
While the former regards the current locale, the latter always expects a
decimal point regardless of the locale. This breaks round-trips for
locales which use something else than a dot as decimal separator (e.g.
German, which uses a comma), for instance:$float = 1/3; // float(0,33333333333333) $string = (string) $float; // string(16) "0,33333333333333" $float = (float) $string; // float(0)
As for me, the question is not if, but rather when and how this
inconsistency should be resolved. Regarding the when, it seems to me
we have to wait for the next major version, i.e. PHP 8. Regarding the
how I tend to prefer the non-locale aware behavior, i.e. float to
string conversion should always produce a decimal point. Users still
can explicitly usenumber_format()
or NumberFormatter if they wish.Thoughts?
Yes, please. Locale-sensitive library functions are bad enough, but core
language behavior should never be locale-sensitive.
I'm definitely in favor of making float to string casts locale-independent
in PHP 8.
Nikita
Den ons. 26. dec. 2018 kl. 16.53 skrev Nikita Popov nikita.ppv@gmail.com:
On Wed, Dec 26, 2018 at 4:48 PM Christoph M. Becker cmbecker69@gmx.de
wrote:Hi!
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
While the former regards the current locale, the latter always expects a
decimal point regardless of the locale. This breaks round-trips for
locales which use something else than a dot as decimal separator (e.g.
German, which uses a comma), for instance:$float = 1/3; // float(0,33333333333333) $string = (string) $float; // string(16) "0,33333333333333" $float = (float) $string; // float(0)
As for me, the question is not if, but rather when and how this
inconsistency should be resolved. Regarding the when, it seems to me
we have to wait for the next major version, i.e. PHP 8. Regarding the
how I tend to prefer the non-locale aware behavior, i.e. float to
string conversion should always produce a decimal point. Users still
can explicitly usenumber_format()
or NumberFormatter if they wish.Thoughts?
Yes, please. Locale-sensitive library functions are bad enough, but core
language behavior should never be locale-sensitive.I'm definitely in favor of making float to string casts locale-independent
in PHP 8.
I'm in the same boat here. I remember we had some interesting
behaviorial differences on a language level too[1] (which seems to
also have maybe resurfaced after reading the newest comments).
[1] https://bugs.php.net/bug.php?id=18556
--
regards,
Kalle Sommer Nielsen
kalle@php.net
cmbecker69@gmx.de ("Christoph M. Becker") wrote:
Hi!
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
[...]
I'm shocked... Lot of code here assumes (float) does the exact reverse of
(string); forunately most servers are configures with the default locale
"C", but I'm still concerned for libraries portability.
Did you filed a bug for that?
Anyway, here is some test code I wrote:
<?php // test-float-cast.php
/**
-
Performs string to float and float to string tests vs. locale.
-
@param string $loc
LC_NUMERIC
locale to set for the test.
*/
function test($loc)
{
if( setlocale(LC_NUMERIC, $loc) ===FALSE
)
throw new RuntimeException("setlocale() failed");// Cast float to string test:
$exp = "0.5";
$got = (string) 0.5;
if( $got !== $exp )
echo "FAILED with LC_NUMERIC=$loc, (string) 0.5: got $got, exp $exp\n";// Cast string to float test:
$exp = 0.5;
$got = (float) "0.5";
if( $got !== $exp )
echo "FAILED with LC_NUMERIC=$loc, (float) "0.5": got $got, exp $exp\n";//
printf()
formatting of float:
$exp = "0.500000";
$got = sprintf("%f", 0.5);
if( $got !== $exp )
echo "FAILED with LC_NUMERIC=$loc, sprintf("%f", 0.5): got $got, exp $exp\n";
}
test("C");
test("de");
Output:
FAILED with LC_NUMERIC=de, (string) 0.5: got 0,5, exp 0.5
FAILED with LC_NUMERIC=de, sprintf("%f", 0.5): got 0,500000, exp 0.500000
The second failed test is expected, as printf, scanf and co. are intended
to be locale-aware.
But the first one? Is there any simple, efficient and safe way to convert
back a string into float?
Regards,
/|\ Umberto Salsi
/_/ www.icosaedro.it
cmbecker69@gmx.de ("Christoph M. Becker") wrote:
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
[...]I'm shocked... Lot of code here assumes (float) does the exact reverse of
(string); forunately most servers are configures with the default locale
"C", but I'm still concerned for libraries portability.Did you filed a bug for that?
There is https://bugs.php.net/77278.
Anyway, here is some test code I wrote:
[…]
Output:
FAILED with LC_NUMERIC=de, (string) 0.5: got 0,5, exp 0.5
FAILED with LC_NUMERIC=de, sprintf("%f", 0.5): got 0,500000, exp 0.500000The second failed test is expected, as printf, scanf and co. are intended
to be locale-aware.But the first one? Is there any simple, efficient and safe way to convert
back a string into float?
Well, if you have the appropriate locale set, you can use scanf() with
%f. If you don't want to rely on the per process locale, you could use
NumberFormatter::parse() instead. filter_var()
with
FILTER_VALIDATE_FLOAT
can also be used, and might be the best option if
you don't know which decimal separator is used, and you are sure there
are no thousands separators in the string.
--
Christoph M. Becker
[…]
filter_var()
with
FILTER_VALIDATE_FLOAT
can also be used, and might be the best option if
you don't know which decimal separator is used, and you are sure there
are no thousands separators in the string.
No, that doesn't work, since the 'decimal' separator must be one char.
I've mixed that up with the 'thousand' option available as of PHP 7.3.0.
--
Christoph M. Becker
Hi!
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
While the former regards the current locale, the latter always expects a
decimal point regardless of the locale. This breaks round-trips for
locales which use something else than a dot as decimal separator (e.g.
German, which uses a comma), for instance:
That may be inconvenient, but changing it now would be way worse IMHO.
Ideally, system conversions would return well-defined result (locale as
a global is a horrible idea which can only be explained by it's
genesis in simpler times where most of the software was written to be
run in one specific environment and user putting a pack of punch cards
into the reader would only need them to be processed in one locale) and
for configurable outcome you'd use functions that receive explicit
parameters.
As for me, the question is not if, but rather when and how this
inconsistency should be resolved. Regarding the when, it seems to me
If you mean that loading "0.3" from string would suddenly stop working
in Germany, then never. If you mean that (string)0.3 would return "0.3"
and not "0,3" in Germany, then again I'd say probably never, though very
slightly less confidently. Those people who didn't want it already have
code to deal with it, and those that wanted it would have their unit
tests crash and burn and their data pipelines blow up.
I think it's a very bad idea to change such things, which would create
hundreds of year-persons of headache to anybody daring to upgrade. I
understand it's a bad situation, but I think the right exit of it is to
tell people that want predictable roundtrip results to use specific
number conversion functions, and not exchange one mess to another,
BC-breaking and havoc-wreaking, mess. I agree that the right thing to do
would be to have (string)0.3 to always use dot and never use locale (did
I mention I think global locale is a horrible idea?) but that ship has
sailed. I don't see a use case that would be well served by breaking the
BC now.
--
Stas Malyshev
smalyshev@gmail.com
As for me, the question is not if, but rather when and how this
inconsistency should be resolved. Regarding the when, it seems to meIf you mean that loading "0.3" from string would suddenly stop working
in Germany, then never. If you mean that (string)0.3 would return "0.3"
and not "0,3" in Germany, then again I'd say probably never, though very
slightly less confidently. Those people who didn't want it already have
code to deal with it, and those that wanted it would have their unit
tests crash and burn and their data pipelines blow up.
It seems to me that Nikita put it nicely[1]:
[…] but core language behavior should never be locale-sensitive.
So yes, (string)0.3 should return 0.3 in any locale.
I think it's a very bad idea to change such things, which would create
hundreds of year-persons of headache to anybody daring to upgrade. I
understand it's a bad situation, but I think the right exit of it is to
tell people that want predictable roundtrip results to use specific
number conversion functions, and not exchange one mess to another,
BC-breaking and havoc-wreaking, mess. I agree that the right thing to do
would be to have (string)0.3 to always use dot and never use locale (did
I mention I think global locale is a horrible idea?) but that ship has
sailed. I don't see a use case that would be well served by breaking the
BC now.
Well, to begin with it would fix the broken behavior of var_export()
,
which is documented to “output or return a parsable string
representation of a variable”, which it does not necessarily.
Then, I'm not only thinking about the huge amount of existing code, but
also about the huge amount of code yet to written. I'm pretty sure that
many new PHP developers (especially those comming from other programming
languages) stumble over the locale-aware float to string conversion
sometime.
Finally, I don't think that the global locale is the real problem for
PHP. Rather it's PHP locale handling and the fact that setlocale()
works per process (and not per thread). When PHP starts up, no locale
is set from the enviroment (except for LC_TYPE). Only if a user
explicitly calls setlocale()
with the second argument not equal to "0",
the locale is changed from C to whatever has been chosen and is
available. Now consider a multi-threaded environment, which is rather
common on Windows. While it is possible to set the desired locale
immediately before the float to string conversion, it is not easily
possible in PHP to make that really thread-safe. This makes it very
hard to write robust code for these environments. Even if thread-safety
is not an issue, a program that worked fine for years may be subtly
broken by inserting a call to setlocale()
somewhere. This “spooky
action at a distance” could also “have their unit tests crash and burn
and their data pipelines blow up”.
[1] http://news.php.net/php.internals/103639
--
Christoph M. Becker
Well, to begin with it would fix the broken behavior of
var_export()
,
which is documented to “output or return a parsable string
representation of a variable”, which it does not necessarily.
Nonsense. There's nothing wrong with var_export()
per se. Sorry.
--
Christoph M. Becker
Hi!
So yes, (string)0.3 should return 0.3 in any locale.
If we designed it now, without any doubt. But since we have 20 years of
history behind... I'm not so sure.
Finally, I don't think that the global locale is the real problem for
PHP. Rather it's PHP locale handling and the fact thatsetlocale()
works per process (and not per thread). When PHP starts up, no locale
That's part of locale being global. Though even in environment where
threads are not involved, many apps do not account for locale quirks.
--
Stas Malyshev
smalyshev@gmail.com
On Wed, Jan 2, 2019 at 12:30 AM Stanislav Malyshev smalyshev@gmail.com
wrote:
Hi!
So yes, (string)0.3 should return 0.3 in any locale.
If we designed it now, without any doubt. But since we have 20 years of
history behind... I'm not so sure.Finally, I don't think that the global locale is the real problem for
PHP. Rather it's PHP locale handling and the fact thatsetlocale()
works per process (and not per thread). When PHP starts up, no localeThat's part of locale being global. Though even in environment where
threads are not involved, many apps do not account for locale quirks.
We have a rather hard policy against ini options that influence language
behavior. Locale-dependent language behavior is essentially the same issue,
just worse due to the mentioned issues, in particularly lack of
thread-safety and the possibility that the locale is changed by third-party
libraries at runtime.
We have removed existing ini flags controlling language behavior in the
past. I would say these removals were much more significant than what is
proposed here, but we did them anyway, and I think we are now in a better
place for it.
Regards,
Nikita
On Wed, Jan 2, 2019 at 12:30 AM Stanislav Malyshev smalyshev@gmail.com
wrote:We have a rather hard policy against ini options that influence language
behavior. Locale-dependent language behavior is essentially the same issue,
just worse due to the mentioned issues, in particularly lack of
thread-safety and the possibility that the locale is changed by third-party
libraries at runtime.We have removed existing ini flags controlling language behavior in the
past. I would say these removals were much more significant than what is
proposed here, but we did them anyway, and I think we are now in a better
place for it.
Unless I'm missing something, changing this behavior would require a full,
line-by-line audit of the code - with no Search & Replace patterns that can
find these instances in any reasonable level of reliability. Every place
where a floating number (which could come from anywhere, so not very easy
to track) is used in a string context (which too can happen in countless
different contexts, virtually impossible to track) would be affected.
Sounds pretty nightmarish to me. I for one fail to recall a behavioral
change that was quite as significant as this one in terms of the complexity
of finding instances that must be updated. Like Stas, I'm not disputing
that this is not an ideal behavior or that we'd do it differently if we
were starting from scratch - but I also agree with him that it's pretty
much out of the question to simply change it at this point.
Can you point out a change you believe is as or more significant than this
one that we did? I think the only one that comes close is
magic_quotes_runtime, and even that was significantly easier to handle in
terms of the cost of auditing the code (again, unless I'm missing
something, which is of course very much a possibility).
The solution for this might be a very unholy one - actually going against
our practices adding a new INI entry to would disable the locale-awareness
for float->string conversions; But for upgrade considerations, I don't
think we can even consider simply changing this behavior and forcing
virtually everyone using a non-dot decimal separator to undergo a full code
audit.
My 2c.
Zeev
On Wed, Jan 2, 2019 at 12:30 AM Stanislav Malyshev smalyshev@gmail.com
wrote:We have a rather hard policy against ini options that influence language
behavior. Locale-dependent language behavior is essentially the same issue,
just worse due to the mentioned issues, in particularly lack of
thread-safety and the possibility that the locale is changed by third-party
libraries at runtime.We have removed existing ini flags controlling language behavior in the
past. I would say these removals were much more significant than what is
proposed here, but we did them anyway, and I think we are now in a better
place for it.Unless I'm missing something, changing this behavior would require a full,
line-by-line audit of the code - with no Search & Replace patterns that can
find these instances in any reasonable level of reliability. Every place
where a floating number (which could come from anywhere, so not very easy
to track) is used in a string context (which too can happen in countless
different contexts, virtually impossible to track) would be affected.
Sounds pretty nightmarish to me. I for one fail to recall a behavioral
change that was quite as significant as this one in terms of the complexity
of finding instances that must be updated. Like Stas, I'm not disputing
that this is not an ideal behavior or that we'd do it differently if we
were starting from scratch - but I also agree with him that it's pretty
much out of the question to simply change it at this point.Can you point out a change you believe is as or more significant than this
one that we did? I think the only one that comes close is
magic_quotes_runtime, and even that was significantly easier to handle in
terms of the cost of auditing the code (again, unless I'm missing
something, which is of course very much a possibility).
Wasn't the removal of register_globals a similar change? Not so long
ago I've stumbled upon a script which counteracted this by extract()
ing
the superglobals manually (surely, a very bad practise, but at least
some kind of workaround to keep legacy scripts going). However, the
introduction of “Uniform Variable Syntax”[1] may have caused similar
issues; likely without any possible workaround.
The solution for this might be a very unholy one - actually going against
our practices adding a new INI entry to would disable the locale-awareness
for float->string conversions; But for upgrade considerations, I don't
think we can even consider simply changing this behavior and forcing
virtually everyone using a non-dot decimal separator to undergo a full code
audit.
Would it be a sensible option to trigger a warning or notice whenever a
float is converted to string yielding a different result than before,
using an ini directive to control this? Or perhaps even throw a
deprecation notice in this case, without even introducing an ini directive?
[1] https://wiki.php.net/rfc/uniform_variable_syntax
--
Christoph M. Becker
Unless I'm missing something, changing this behavior would require a
full,
line-by-line audit of the code - with no Search & Replace patterns that
can
find these instances in any reasonable level of reliability. Every place
where a floating number (which could come from anywhere, so not very easy
to track) is used in a string context (which too can happen in countless
different contexts, virtually impossible to track) would be affected.
Sounds pretty nightmarish to me. I for one fail to recall a behavioral
change that was quite as significant as this one in terms of the
complexity
of finding instances that must be updated. Like Stas, I'm not disputing
that this is not an ideal behavior or that we'd do it differently if we
were starting from scratch - but I also agree with him that it's pretty
much out of the question to simply change it at this point.Can you point out a change you believe is as or more significant than
this
one that we did? I think the only one that comes close is
magic_quotes_runtime, and even that was significantly easier to handle in
terms of the cost of auditing the code (again, unless I'm missing
something, which is of course very much a possibility).Wasn't the removal of register_globals a similar change? Not so long
ago I've stumbled upon a script which counteracted this byextract()
ing
the superglobals manually (surely, a very bad practise, but at least
some kind of workaround to keep legacy scripts going). However, the
introduction of “Uniform Variable Syntax”[1] may have caused similar
issues; likely without any possible workaround.
Well, the removal of register_globals was a very big deal - and was done
for arguably much more pressing reasons (security). So I wouldn't refer to
it as basis to illustrate that this isn't a big deal... That said - as you
pointed out yourself, there was a very easy workaround for those that
didn't want or couldn't afford to do a full code audit - a few lines of
user and code that emulated it.
Regarding Uniform Variable Syntax - the cases where the behavior changed
there were truly edge cases, that nobody in his right mind should be using
anyway, and that can probably also be searched for using a clever regex.
This isn’t the case here. Unless I’m missing something, a code as simple
as $x = 3.99; print “Price: $x”; would be affected.
So, I think it has a much bigger impact than the UVS incompatibility, it’s
much more difficult to find, and does not have a userland workaround unless
we introduce a language level one.
The solution for this might be a very unholy one - actually going
against
our practices adding a new INI entry to would disable the
locale-awareness
for float->string conversions; But for upgrade considerations, I don't
think we can even consider simply changing this behavior and forcing
virtually everyone using a non-dot decimal separator to undergo a full
code
audit.Would it be a sensible option to trigger a warning or notice whenever a
float is converted to string yielding a different result than before,
using an ini directive to control this? Or perhaps even throw a
deprecation notice in this case, without even introducing an ini directive?
It would be technically possible, but given the context these conversions
often occur in I think it would look awful... Also, one would have to run
their software through all possible code flows in order to know for sure
it’s safe to turn it off and move to the new behavior. And legend has it,
that not all PHP users (or developers in general) have 100% testing
coverage :)
If we do end up adding a new INI entry - maybe it can be a tristate -
legacy, legacy+notice, or new. Just a thought. And I wouldn’t commit to
actually removing it at any time by officially deprecating it...
Zeev
If we do end up adding a new INI entry - maybe it can be a tristate -
legacy, legacy+notice, or new. Just a thought. And I wouldn’t commit to
actually removing it at any time by officially deprecating it...
I have some doubts that an INI setting would be an appropriate solution.
If it's PHP_INI_SYSTEM (or such), libraries may have a hard time
dealing with this. If it's PHP_INI_ALL, the same issues as now could
still happen.
--
Christoph M. Becker
On Wed, Jan 2, 2019 at 12:30 AM Stanislav Malyshev smalyshev@gmail.com
wrote:We have a rather hard policy against ini options that influence language
behavior. Locale-dependent language behavior is essentially the same
issue,
just worse due to the mentioned issues, in particularly lack of
thread-safety and the possibility that the locale is changed by
third-party
libraries at runtime.We have removed existing ini flags controlling language behavior in the
past. I would say these removals were much more significant than what is
proposed here, but we did them anyway, and I think we are now in a better
place for it.Unless I'm missing something, changing this behavior would require a full,
line-by-line audit of the code - with no Search & Replace patterns that can
find these instances in any reasonable level of reliability. Every place
where a floating number (which could come from anywhere, so not very easy
to track) is used in a string context (which too can happen in countless
different contexts, virtually impossible to track) would be affected.
Sounds pretty nightmarish to me. I for one fail to recall a behavioral
change that was quite as significant as this one in terms of the complexity
of finding instances that must be updated. Like Stas, I'm not disputing
that this is not an ideal behavior or that we'd do it differently if we
were starting from scratch - but I also agree with him that it's pretty
much out of the question to simply change it at this point.Can you point out a change you believe is as or more significant than this
one that we did? I think the only one that comes close is
magic_quotes_runtime, and even that was significantly easier to handle in
terms of the cost of auditing the code (again, unless I'm missing
something, which is of course very much a possibility).The solution for this might be a very unholy one - actually going
against our practices adding a new INI entry to would disable the
locale-awareness for float->string conversions; But for upgrade
considerations, I don't think we can even consider simply changing this
behavior and forcing virtually everyone using a non-dot decimal separator
to undergo a full code audit.My 2c.
Zeev
I don't expect this to be a particularly large issue for two reasons:
- Not many people use this. I'm sure that there are people who use this
and use it intentionally, but I've only ever seen reference to this issue
as a bug or a gotcha. - Even if somebody is using this functionality, the only thing that's
going to happen is that their number display switches from 1,5 to 1.5.
That's a minor UX regression, not a broken application. It's something that
will have to be fixed, but it's also not critical, and for a legacy
application one might even not bother.
I think we should just put this to an RFC vote. We regularly have these
types of discussions, and people just disagree about level of anticipated
BC break relative to benefit of the change.
Nikita
I don't expect this to be a particularly large issue for two reasons:
- Not many people use this. I'm sure that there are people who use this
and use it intentionally, but I've only ever seen reference to this issue
as a bug or a gotcha.- Even if somebody is using this functionality, the only thing that's
going to happen is that their number display switches from 1,5 to 1.5.
That's a minor UX regression, not a broken application. It's something that
will have to be fixed, but it's also not critical, and for a legacy
application one might even not bother.
FWIW, neither of these are very convincing for me:
- I'm not sure what you mean "not many people use this"? People don't
convert floats to strings? - Perhaps you meant they weren't proactively relying on this behavior,
which could be true - but it doesn't matter whether people were expecting
or otherwise desiring this behavior when they wrote the code. Whatever the
current behavior is - they adjusted for it, and ended up using it,
consequently relying on it. - I view a UX change as a big deal. As we should in a language that is
very commonly used to create UI. - This could effect not only UX, but also integration code. You could
have PHP output feeding into something else - and suddenly, the format
breaks. With the fix HAVING to be in the other side, no less.
I fail to understand how we could consider changing such a fundamental
element (to-string behavior of floats) without an in-depth discussion. We
mustn't.
I think we should just put this to an RFC vote. We regularly have these
types of discussions, and people just disagree about level of anticipated
BC break relative to benefit of the change.
The point of an RFC is, in fact, to have these discussions. This is what
we're doing right now. This is what the RFC process is all about - not the
vote. It sounds to me as if you're saying "what's the point of discussing,
let's just vote" (waiting out the two weeks as needed), with which I would
wholeheartedly disagree. Apologies if you meant something else.
Zeev
I don't expect this to be a particularly large issue for two reasons:
- Not many people use this. I'm sure that there are people who use
this and use it intentionally, but I've only ever seen reference to this
issue as a bug or a gotcha.- Even if somebody is using this functionality, the only thing that's
going to happen is that their number display switches from 1,5 to 1.5.
That's a minor UX regression, not a broken application. It's something that
will have to be fixed, but it's also not critical, and for a legacy
application one might even not bother.FWIW, neither of these are very convincing for me:
- I'm not sure what you mean "not many people use this"? People don't
convert floats to strings?
No, that's not what I meant. Of course, many people convert floats to
strings. But the vast majority of them expect to get back a floating point
number in the standard format.
What I mean is that there are not many people who use float to string
conversion with the express intention of receiving a locale-dependent
result (and use a locale where the question is relevant). Those are the
only people who would be (negatively) affected by such a change.
- Perhaps you meant they weren't proactively relying on this behavior,
which could be true - but it doesn't matter whether people were expecting
or otherwise desiring this behavior when they wrote the code. Whatever the
current behavior is - they adjusted for it, and ended up using it,
consequently relying on it.
As said, I'm sure there are people relying on this. What I'm saying is that
the number of people who rely on float conversions to not be
locale-sensitive is vastly, orders of magnitudes larger than the number of
people who do rely on it being locale sensitive.
The only saving grace is that this issue only turns up relatively rarely,
because it requires you to explicitly call setlocale, as the locale is not
inherited from the environment. Or more likely, you're not going to call
setlocale, but discover this wonderful behavior because something else does
for entirely unrelated reasons.
- I view a UX change as a big deal. As we should in a language that is
very commonly used to create UI.- This could effect not only UX, but also integration code. You could
have PHP output feeding into something else - and suddenly, the format
breaks. With the fix HAVING to be in the other side, no less.
The fix doesn't have to be on the other side. Most likely you'd prefer to
fix it on your side by explicitly formatting the float in the desired
manner.
It's usually the other way around. The current behavior is prone to
breaking integration code, because data interchange layers generally do not
expect floats to use comma separators. The reason why things don't break
quite as terribly as they could is that PHP has introduced a number of
workaround over time, as these issues have been reported. That's why you
usually don't run into this when inserting float values into a DB query, at
least when using prepared statements. This issue is not handled everywhere
though (one recent example I remember is passing floats to bcmath) and I
don't think that introducing more of these special cases is how we should
be approaching this problem.
Nikita
What I mean is that there are not many people who use float to string
conversion with the express intention of receiving a locale-dependent
result (and use a locale where the question is relevant). Those are the
only people who would be (negatively) affected by such a change.
While you may very well be correct that some (maybe even most, not sure)
people don't have the express intention of receiving this behavior -
nonetheless, this is the behavior they've been seeing in the last 20
years. Many, arguably most developers code based on the behavior they see
in practice. Whether or not they thought this behavior is sensible, once
they saw this is the behavior in practice - it's likely that they relied on
it. Of course, some may have been put off but what they saw and decided to
use something else (e.g. avoiding setlocale()
altogether) - but I doubt
this is anywhere close to 100% of the developers.
- Perhaps you meant they weren't proactively relying on this behavior,
which could be true - but it doesn't matter whether people were expecting
or otherwise desiring this behavior when they wrote the code. Whatever the
current behavior is - they adjusted for it, and ended up using it,
consequently relying on it.As said, I'm sure there are people relying on this. What I'm saying is
that the number of people who rely on float conversions to not be
locale-sensitive is vastly, orders of magnitudes larger than the number of
people who do rely on it being locale sensitive.The only saving grace is that this issue only turns up relatively rarely,
because it requires you to explicitly call setlocale, as the locale is not
inherited from the environment. Or more likely, you're not going to call
setlocale, but discover this wonderful behavior because something else does
for entirely unrelated reasons.
I agree, but the real question is how many of those who are explicitly
calling setlocale()
are relying on this behavior - as the change we're
proposing effects only them anyway. So while the fact those who are using
setlocale()
are likely a small minority is a given, the real question is
within this subgroup - what's the breakdown of people relying on it. I'd
argue that within that group, those relying on it are likely a majority,
even if when they first bumped into this behavior they thought to
themselves "Huh, that's funny, I didn't expect that.". Ultimately, their
code now relies on it.
There shouldn't be any developers who are using setlocale()
and are relying
on a behavior that never existed in PHP (which doesn't mean they don't
exist - but I can't imagine they're a sizable subgroup let alone a
majority).
- I view a UX change as a big deal. As we should in a language that is
very commonly used to create UI.
- This could effect not only UX, but also integration code. You could
have PHP output feeding into something else - and suddenly, the format
breaks. With the fix HAVING to be in the other side, no less.The fix doesn't have to be on the other side. Most likely you'd prefer to
fix it on your side by explicitly formatting the float in the desired
manner.
I agree here, it doesn't have to be on the "other side" like I claimed. It
may still be easier in many cases, as at least fixing it on the PHP side
would be quite difficult (again, involve a line by line code audit).
It's usually the other way around. The current behavior is prone to
breaking integration code, because data interchange layers generally do not
expect floats to use comma separators. The reason why things don't break
quite as terribly as they could is that PHP has introduced a number of
workaround over time, as these issues have been reported. That's why you
usually don't run into this when inserting float values into a DB query, at
least when using prepared statements. This issue is not handled everywhere
though (one recent example I remember is passing floats to bcmath) and I
don't think that introducing more of these special cases is how we should
be approaching this problem.
Again, I'm not disputing that the current behavior isn't desired. I am
disputing that it's a "no big deal" to change it 20+ years after the fact,
and I am disputing that while many may not be fond of this behavior - they
can still have code that has grown to rely on it over the years.
I do think that if we do decide to change it, it should be while providing
users a long-term (and probably permanent) language level way to keep the
current behavior. Yes, it's against our motto - but then, so is such a
widescale compatibility breakage without an easy forward path that does not
involve a full line by line code audit.
Zeev
I agree, but the real question is how many of those who are explicitly
callingsetlocale()
are relying on this behavior
It occurs to me that we could ask the opposite question: what other
reasons do people have for explicitly calling setlocale()
? If we can
provide alternatives to the majority of use cases, we can put a big fat
warning on the setlocale()
manual page suggesting that people completely
avoid it - more prominent than the current one on thread-safety, mentioning
some of the other undesirable side effects, and explicitly recommending
alternatives.
Obviously, that advice won't be followed over night, but it would at least
give ammunition for people to raise PRs against libraries saying "hey,
please use this instead of setlocale()
because you broke my float
conversions".
Regards,
Rowan Collins
[IMSoP]
On Thu, Jan 3, 2019 at 3:30 PM Rowan Collins rowan.collins@gmail.com
wrote:
I agree, but the real question is how many of those who are explicitly
callingsetlocale()
are relying on this behaviorIt occurs to me that we could ask the opposite question: what other
reasons do people have for explicitly callingsetlocale()
? If we can
provide alternatives to the majority of use cases, we can put a big fat
warning on thesetlocale()
manual page suggesting that people completely
avoid it - more prominent than the current one on thread-safety, mentioning
some of the other undesirable side effects, and explicitly recommending
alternatives.Obviously, that advice won't be followed over night, but it would at least
give ammunition for people to raise PRs against libraries saying "hey,
please use this instead ofsetlocale()
because you broke my float
conversions".
Very interesting direction, I definitely think it's worthwhile to try and
answer the question you raise and depending on what we find - implement
your proposal on deprecating or semi-deprecating setlocale()
.
Zeev
AFAIK, gettext functions do depend on setlocale.
I wish so much that it wasn't the case (as you then need to have the
locale installed on the system), but it is, so setlocale definetely is
quite used in the wild and deprecating it seems a bit far-fetched
unless we can actually replace it with something else (better).
But gettext has other issues related to being cached in the current
process, as you need to restart apache if the compiled .mo files have
changed to get the new strings :(
Another function that is influenced by setlocale is strftime. This is
often the common way to display a date in a different language.
So I'm all for deprecating setlocale but before that we would need to
have something better for everything that's currently depending on it :)
BohwaZ
Hi,
Am 03.01.2019 um 10:21 schrieb Zeev Suraski:
It's usually the other way around. The current behavior is prone to
breaking integration code, because data interchange layers generally do not
expect floats to use comma separators. The reason why things don't break
quite as terribly as they could is that PHP has introduced a number of
workaround over time, as these issues have been reported. That's why you
usually don't run into this when inserting float values into a DB query, at
least when using prepared statements. This issue is not handled everywhere
though (one recent example I remember is passing floats to bcmath) and I
don't think that introducing more of these special cases is how we should
be approaching this problem.
Again, I'm not disputing that the current behavior isn't desired. I am
disputing that it's a "no big deal" to change it 20+ years after the fact,
and I am disputing that while many may not be fond of this behavior - they
can still have code that has grown to rely on it over the years.I do think that if we do decide to change it, it should be while providing
users a long-term (and probably permanent) language level way to keep the
current behavior. Yes, it's against our motto - but then, so is such a
widescale compatibility breakage without an easy forward path that does not
involve a full line by line code audit.
Would it be possible to write a patch where every
float-to-string-conversion that changes the representation because of
the locale-setting, would produce a DEPRECATION or NOTICE or similar?
The conversion output stays as it is today. Then people could run that
patch against their own private projects. Or run against most popular
100 PHP Github projects. Run against 1000 random PHP projects from
Github. Or create a specific PHP version that could run on Travis-CI or
so...
Currently there is a lot of guessing about the potentially affected
libraries and projects, maybe it's better to measure. Then we get
numbers of how many affected projects there might be. Maybe it's a huge
problem, maybe only 0.001% of the projects are affected, and if they are
affected, most likely it's considered a bug, not "intended behaviour".
Nobody should rely on the current behaviour, best practice is to use
number_format()
if needed I think.
If such a DEPRECATION/NOTICE would be emitted in 7.4, 8.0, and 8.x, then
people have many years to fix it before the change finally is done in
9.0 or 10.0. We don't have to hurry, this might be a feature with a long
period of DEPRECATION/NOTICE, potential bugs can be fixed during the
years, before the final change happens. If there are really huge
problems that we see during those years (but didn't see while
measuring), thousands of developers complaining, in the worst case the
change could be reverted.
We should measure first, and then hopefully fix strange behaviour of PHP
in the long run.
Michael
- I'm not sure what you mean "not many people use this"? People don't
convert floats to strings?
People don't format their floats using setlocale + echo. People use
things like sprintf and number_format to get the right number of
decimals, and many use number_format or even str_replace to change the
decimal separator because setlocale has weird side effects (like the one
being discussed).
- I view a UX change as a big deal. As we should in a language that
is
very commonly used to create UI.
Then what about existing UI bugs? Thanks to this discussion, I found
exactly one instance of setlocale in my whole PHP code base (used to
format a printf nicely), and I also found a bug where "stringparam=$x"
was ill-formatted because of this and produced a visible error in a
generated image (although not critical and thus unnoticed until now). I
was certainly not relying on this behaviour. It was just bad luck that
the output (at least when I saw it – don't know about other users!) was
”good enough” that I didn't notice the bug.
- This could effect not only UX, but also integration code. You could
have PHP output feeding into something else - and suddenly, the format
breaks. With the fix HAVING to be in the other side, no less.
It's reasonable to expect that a float (with known range) is a valid
number in most programming contexts such as CSS (width: <?= $float ?>px)
or HTML (input type=number value=<?= $float ?>) or JavaScript (var width
= <?= $float; ?>). Using number_format to fix these would feel almost as
bad as using number_format before every arithmetic operation.
Because of this behaviour, using setlocale will break many libraries
which output floating-point values in any other context than
user-visible text.
With the fix HAVING to be in the other side, no less.
How so? If you send floats, you can format them yourself (and you
certainly should, if they are locale-dependent!). If you receive floats,
you can parse them yourself. No need to change ”the other side”.
--
Lauri Kenttä
Hi!
- Even if somebody is using this functionality, the only thing that's
going to happen is that their number display switches from 1,5 to 1.5.
That's a minor UX regression, not a broken application. It's something
that will have to be fixed, but it's also not critical, and for a legacy
application one might even not bother.
If this is part of a data pipeline, the difference between 1,500 and
1.500 can be huge (about 1000 times ;). With luck, there would be unit
tests, so instead of broken bank account we'd have broken unit tests,
but we all know how unit test coverage tends to lag behind...
Number formatting difference may be a funny quirk in an average website
context, but could be absolutely disastrous in scientific or financial
application context.
I think we should just put this to an RFC vote. We regularly have these
types of discussions, and people just disagree about level of
anticipated BC break relative to benefit of the change.
I do not object to the RFC vote. What we're doing now is something that
comes before the vote - laying out arguments for and against it. I think
that'd be prerequisite to having an informed vote. I don't think this
change would absolutely ruin PHP if voted in, but I think I'd vote
against it, given the arguments laid out so far.
Stas Malyshev
smalyshev@gmail.com
If this is part of a data pipeline, the difference between 1,500 and
1.500 can be huge (about 1000 times ;). With luck, there would be unit
tests, so instead of broken bank account we'd have broken unit tests,
but we all know how unit test coverage tends to lag behind...
Number formatting difference may be a funny quirk in an average website
context, but could be absolutely disastrous in scientific or financial
application context.
Using floats for currency calculations may have more subtle issues. And
for scientific applications, one may not want to have
echo 123456789012345678.9; // 1,2345678901235E+17
--
Christoph M. Becker
Finally, I don't think that the global locale is the real problem for
PHP. Rather it's PHP locale handling and the fact thatsetlocale()
works per process (and not per thread). When PHP starts up, no locale
That's part of locale being global. Though even in environment where
threads are not involved, many apps do not account for locale quirks.
Like many things that originated in the 'Personal' age of PHP, the
'Server' nature is somewhat inconsistent in many areas. Working with
'time' while some people still insist on using LOCAL time on their
servers, the more consistent method is to use UTC and then identify the
CLIENTS preferred locale. Displaying other numbers have exactly the same
problem and it should be a client locale setting that decides how to
display them, with a global base of something ASCII based. Making
validation client specific removes the need to mess up the server by
trying to run multiple locales with the possible conflicts between that,
just as trying to manage multiple times is complicated if the server is
running yet another locale?
--
Lester Caine - G8HFL
Contact - https://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - https://lsces.co.uk
EnquirySolve - https://enquirysolve.com/
Model Engineers Digital Workshop - https://medw.co.uk
Rainbow Digital Media - https://rainbowdigitalmedia.co.uk
cmbecker69@gmx.de ("Christoph M. Becker") wrote:
[...] I tend to prefer the non-locale aware behavior, i.e. float to
string conversion should always produce a decimal point. Users still
can explicitly usenumber_format()
or NumberFormatter if they wish.
We all agree that the basic features of the language should NOT be
locale-aware to make easier error reporting and logging, data file writing
and parsing, session management, and libraries portability. But I would
to restate this goal more clearly:
FLOAT TO STRING CAST CONVERSION REPLACEMENT
Given a floating-point value, retrieve its canonical PHP source-code
string representation. By "canonical" I mean something that can be
parsed by the PHP interpreter like a floating-point number, not like
an int or anything else. Then, for example, 123.0 must be rendered as
"123.0" not as "123" because it looks like an int; non-finite values
NAN
and INF
must also be rendered as "NAN" and "INF". The "(string)"
cast and the embedded variable in string "$f" are locale-aware, and so
are all the printf()
&Co. functions, including var_dump()
(this latter a
big surprise; anyone willing to send a data structure dump to end user?).
The simplest way I found to get such canonical representation is
$s = var_export($f, TRUE);
which returns exactly what I expect, does not depend on the current
locale, does not depend on exotic libraries, and it is very short and
simple. It depends only on the current serialize_precision php.ini
parameter, which should already be set right (or you are going to have
problems elsewhere).
STRING TO FLOAT CAST CONVERSION REPLACEMENT
Given a string carrying the canonical representation of a floating-point
number, retrieve the floating-point number. Syntax errors must be
detectable. The result must be "float", not int or anything else.
Unsure about how much strict the parser should be in these edge cases:
"+1.2" (redundant plus sign)
"123" (looks like int, not a float)
"0123" (looks like int octal base)
Getting all this is bit more tricky. The "(float)" cast does not work
because it does not support non-finite values NAN,INF and does not allow
to detect errors. The simplest way I found is by using the serialize()
function:
/**
- Parses the PHP canonical representation of a floating point number. This
- function parses any valid PHP source code representation of a "float",
- including NAN, INF, -INF and -0 (IEEE 754 zero negative). Not locale aware.
- @param string $s String to parse. No spaces allowed, apply
trim()
if needed. - @return float Parsed floating-point number.
- @throws InvalidArgumentException Invalid syntax.
*/
function parseFloat($s)
{
// Security: untrusted strings must be checked against a basic syntax before
// being blindly submitted tounserialize()
:
if( preg_match("/^[-+]?(NAN|INF|[-+.0-9eE]++)$/sD", $s) !== 1 )
throw new InvalidArgumentException("cannot parse as a floating point number: '$s'");
//unserialize()
raises anE_NOTICE
on parse error and then returns FALSE.
$m = @unserialize("d:$s;");
if( is_int($m) )
return (float) $m; // always return what we promised
if( is_float($m) )
return $m;
throw new InvalidArgumentException("cannot parse as a floating point number: '$s'");
}
Here again, only core libraries involved, no dependencies from the locale,
not so short but the best I found up now. Things like NumberFormatter
require the 'intl' extension be enabled, and often it isn't.
By using these functions all the possible "float" values pass the
round-trip back and forth, including NAN, INF, -INF, -0 (zero negative,
for what it worth) at the highest accuracy possible of the IEEE 754
representation.
Regards,
/|\ Umberto Salsi
/_/ www.icosaedro.it
Regarding the decimal separator (aka. decimal point), the behavior of
casting float to string is inconsistent with casting string to float.
While the former regards the current locale, the latter always expects a
decimal point regardless of the locale. This breaks round-trips for
locales which use something else than a dot as decimal separator (e.g.
German, which uses a comma), for instance:$float = 1/3; // float(0,33333333333333) $string = (string) $float; // string(16) "0,33333333333333" $float = (float) $string; // float(0)
Well, there's a special case, though:
$string = (1/3) . 'foo'; // string(19) "0.33333333333333foo"
This is caused by a compile-time optimization, which obviously occurs
before any locale can be set.
--
Christoph M. Becker