Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.
The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.
https://bugs.php.net/bug.php?id=59571 - breaks Imagick
https://bugs.php.net/bug.php?id=54538 - breaks NumberFormatter
https://bugs.php.net/bug.php?id=66108 - breaks constants
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
https://bugs.php.net/bug.php?id=69348 - breaks MySQL
And there are quite a few other bug reports where setlocale is doing
exactly what it is meant to do, but the unexpected side-effects are
catching people out.
We have libraries that ship with PHP for formatting dates, numbers etc
that actually work, and don't break other libraries.
So two questions:
i) Are there any reasons why we couldn't or shouldn't plan for
removing setlocale at some point in the future? i.e. does it do
something that isn't supported by other libraries in PHP?
ii) If it's not possible (or desirable) to remove it, how could we
increase the warning level against using it?
cheers
Dan
Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.https://bugs.php.net/bug.php?id=59571 - breaks Imagick
https://bugs.php.net/bug.php?id=54538 - breaks NumberFormatter
https://bugs.php.net/bug.php?id=66108 - breaks constants
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
https://bugs.php.net/bug.php?id=69348 - breaks MySQLAnd there are quite a few other bug reports where setlocale is doing
exactly what it is meant to do, but the unexpected side-effects are
catching people out.We have libraries that ship with PHP for formatting dates, numbers etc
that actually work, and don't break other libraries.So two questions:
i) Are there any reasons why we couldn't or shouldn't plan for
removing setlocale at some point in the future? i.e. does it do
something that isn't supported by other libraries in PHP?
-1
I don’t hink that we should restrict access to standard library functions just because something might break due to usage under wrong assumptions.
ii) If it's not possible (or desirable) to remove it, how could we
increase the warning level against using it?
Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.https://bugs.php.net/bug.php?id=59571 - breaks Imagick
https://bugs.php.net/bug.php?id=54538 - breaks NumberFormatter
https://bugs.php.net/bug.php?id=66108 - breaks constants
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
https://bugs.php.net/bug.php?id=69348 - breaks MySQLAnd there are quite a few other bug reports where setlocale is doing
exactly what it is meant to do, but the unexpected side-effects are
catching people out.We have libraries that ship with PHP for formatting dates, numbers etc
that actually work, and don't break other libraries.So two questions:
i) Are there any reasons why we couldn't or shouldn't plan for
removing setlocale at some point in the future? i.e. does it do
something that isn't supported by other libraries in PHP?-1
I don’t hink that we should restrict access to standard library functions
just because something might break due to usage under wrong assumptions.ii) If it's not possible (or desirable) to remove it, how could we
increase the warning level against using it?
maybe we could try something to mark/warn extension which aren't thread
safe when using in a TS build, but other than that I think it is just an
education/documentation problem.
(and removing something which works without having an alternative is bad in
my book)
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Am 01.04.2015 um 18:15 schrieb Dan Ackroyd:
Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.
What if the system is configured with a different locale?
OK we could change the locale on startup to "C" but this could break
embedded behaviour and doesn't solve the issue at all.
There are some cases where we don't have a good alternative like fgetcsv
/ fputcsv.
The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.
From my perspective
Functionality not related to locale should not be effected by this global
-> The following bug reports are valid bugs and should be resolved
https://bugs.php.net/bug.php?id=59571 - breaks Imagick
https://bugs.php.net/bug.php?id=69348 - breaks MySQL
-> If it's part of the wrapped library it's a bug of this library
and should be fixed there
Basic PHP functions should be possible to be TS and therefore should
ignoring such global configuration else it's impossible to have a TS
version of PHP
-> if there is no alternative functionality we should add one
-> Non-TS functions should be moved into an extension and don't
allow to enable on TS build
-> Non-TS functions not movable to an extension should be rethink like:
- removing the locale based conversion on simple float to string
casts
- make strtoupper/strtolower etc. to work with ascii only
Is't there a very similar issue with fs operations and umask?
-> Is it possible to have fs operations not related on this non-TS
global configuration?
https://bugs.php.net/bug.php?id=59571 - breaks Imagick
-> It's a real bug and should be fixed (not locale based functionality)
https://bugs.php.net/bug.php?id=54538 - breaks NumberFormatter
-> It's a real bug and should be fixed (based on a different "safe" locale)
https://bugs.php.net/bug.php?id=66108 - breaks constants
-> see comment above about strtoupper
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
-> It's a bug and should be fixed (not locale based functionality)
https://bugs.php.net/bug.php?id=69348 - breaks MySQL
-> It's a bug and should be fixed (non locale based functionality)
-> Couldn't this one be a security issueAnd there are quite a few other bug reports where setlocale is doing
exactly what it is meant to do, but the unexpected side-effects are
catching people out.We have libraries that ship with PHP for formatting dates, numbers etc
that actually work, and don't break other libraries.So two questions:
i) Are there any reasons why we couldn't or shouldn't plan for
removing setlocale at some point in the future? i.e. does it do
something that isn't supported by other libraries in PHP?ii) If it's not possible (or desirable) to remove it, how could we
increase the warning level against using it?cheers
Dan
Marc
Hi!
- make strtoupper/strtolower etc. to work with ascii only
This would be a bad idea, however much better idea is to make system
case folding (i.e. methods, classes, etc.) use ascii-only. Which we
already mostly do (see zend_operators.c it explains what's going on). Of
course, there can be instances of using the wrong function.
--
Stas Malyshev
smalyshev@gmail.com
Hi
Am 01.04.2015 um 21:02 schrieb Stanislav Malyshev:
Hi!
- make strtoupper/strtolower etc. to work with ascii only
This would be a bad idea, however much better idea is to make system
case folding (i.e. methods, classes, etc.) use ascii-only. Which we
already mostly do (see zend_operators.c it explains what's going on). Of
course, there can be instances of using the wrong function.
In a dynamic language like PHP it's a very normal and basic case
handling language features dynamically including the one described in
the bug report (Find an upper case constant for the given input). What I
mean is that we should have basic functions addressing this. So for
example we need a basic function to upper/lower case only ascii characters.
Marc
Functionality not related to locale should not be effected by this global
This is issue is that that the locale affects all of the C library
that use it, which includes printf()
;
-> The following bug reports are valid bugs and should be resolved
https://bugs.php.net/bug.php?id=59571 - breaks Imagick
For Imagick, the issue is that a SVG string was being generated with
something like:
sprintf(svg, "pos: %f %f", x, y);
When the locale is set to a locale that uses commas for the decimal
point, this generates an invalid SVG string.
I don't believe there is any possible work around for this. If you
know of one, please can you say it?
Functionality not related to locale should not be effected by this global
Basic PHP functions should be possible to be TS and therefore should
ignoring such global configuration else it's impossible to have a TS
version of PHP
The global is held by the C libraries, not in PHP code. And yes, the
implication of that is that setlocale can not be thread safe.
cheers
Dan
Hi!
For Imagick, the issue is that a SVG string was being generated with
something like:sprintf(svg, "pos: %f %f", x, y);
Yeah that is a problem... Two ways I see to fix it:
- Reset the locale before it and restore afterwards (non-TS and in
general may be problematic) - Use other formatting functions like Zend engine ones or some other
library.
Global state sucks, and that's the reason locale sucks :(
Stas Malyshev
smalyshev@gmail.com
https://bugs.php.net/bug.php?id=69348 - breaks MySQL
-> It's a bug and should be fixed (non locale based functionality)
-> Couldn't this one be a security issue
No this is not a bug in this function and no not a security issue.
The user asks to escape a string and provides a double. The double is
therefore converted to a string according to PHP's rules and then
correctly escaped. Now PHP's rules are a bit unfortunate and might lead
to wrong data being stored.
From MySQL perspective the correct usage is not to escape numeric types.
Those can be put in the query directly. (Code like
$d = (double)$foo; $sql = "SELECT * FROM t WHERE d=$d";
is safe.) or maybe better use prepared statements.
Also mind: Locale not only has impact on number->string conversion but
also different string operations like uppercase/lowercase conversion:
(see Turkish i->I->y)
Revising locale might be a good idea, however not by removing a function
but by finding a way to make the behavior more explicit to the user.
Removing the function will cause trouble when interacting with external
libraries and programs which are locale dependent.
johannes
ORACLE Deutschland B.V. & Co. KG | Riesstraße 25 | 80992 München
Registergericht: Amtsgericht München, HRA 95603
Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
Hi!
The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.
Library should not assume it runs in C locale, since this may be not
true regardless of whether we ever used setlocale in PHP code, since
locale is a system setting. That said, locale is a remanant of the times
where the assumption that all processes on the machine do approximately
the same thing and service the same goal for a single client or set of
clients small enough that they are all likely be in the same place was
entirely reasonable. Of course, for a webserver this sounds like utter
madness.
That said, there may be scenarios where this is still useful (such as,
small server-side scripts, etc.) so I'm not sure we should really
deprecate it. I'd rather add a large warning to its man page that
setlocale does more than you think and you should be careful with it (of
course, not in these words :)
https://bugs.php.net/bug.php?id=59571 - breaks Imagick
https://bugs.php.net/bug.php?id=54538 - breaks NumberFormatter
This one is a bug in ICU.
https://bugs.php.net/bug.php?id=66108 - breaks constants
This seems to be a bug. I though I've fixed the case conversion to use
ascii, but maybe I missed a spot in constant()
. I'll check it.
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
This looks like misunderstanding how float-to-string works. If you asked
it to put commas as decimal separator, don't be surprised it puts commas
as decimal separator.
https://bugs.php.net/bug.php?id=69348 - breaks MySQL
This doesn't seem to be a mysql problem but rather misunderstanding how
type conversion works. Same as above.
i) Are there any reasons why we couldn't or shouldn't plan for
removing setlocale at some point in the future? i.e. does it do
something that isn't supported by other libraries in PHP?
I don't think we should remove it. I do think we should warn people that
in this day and age, setlocale()
is very bad way of doing
internationalization except for very specific cases when you know what
you are doing.
--
Stas Malyshev
smalyshev@gmail.com
Am 01.04.2015 um 20:58 schrieb Stanislav Malyshev:
Hi!
https://bugs.php.net/bug.php?id=67127 - breaks DateTime
This looks like misunderstanding how float-to-string works. If you asked
it to put commas as decimal separator, don't be surprised it puts commas
as decimal separator.
This should be true for number formatting but it's not true for date format.
-> I'm a german with comma (",") as decimal separator and I have never
seen such a comma in microtime separation
-> There is no way to define this behavior in a date format
Marc
Hi!
This should be true for number formatting but it's not true for date
format.
Yes, date format is not internationalized. There are a lot of fixed
formats, so this should not be much of a surprise. One just has to know
which ones aren't. And yes, handling transition between fixed and
localized formats is a pain, that's why setlocale()
is usually not a
good idea when you use both.
--
Stas Malyshev
smalyshev@gmail.com
Hi!
Stanislav Malyshev wrote:
Library should not assume it runs in C locale,
Because it is not thread-safe, it is not safe for libraries to alter the locale.
I can't see any workaround to make this 'just work' - when printf
cannot be relied on to work in a particular way, there is nothing the
library could do to run safely.
The only sensible thing a library could do in a non-C locale is to not
run.....but as setlocale is not thread safe, that is not possible to
detect correctly.
I don't think we should remove it. I do think we should warn people that
in this day and age,setlocale()
is very bad way of doing
internationalization except for very specific cases when you know what
you are doing.
Is there anything we can do that makes people more aware of it's side
effects, other than to make the warning larger on the manual page?
cheers
Dan
Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.
The PHP setlocale()
wrapper can be made threadsafe in a portable manner
via newlocale()/uselocale(), so that part isn't an issue. Someone just
needs to write the code for the threaded SAPIs. It hasn't been a high
priority due to how few people use non-Windows ZTS mode so far.
Obviously within the same process there may be issues with changing a
locale to something unexpected by subsequent code, but that is really a
bug in that code then. If some library function expects and only works
in the "C" locale, it should make sure it is in that locale before doing
anything and then restoring the locale back appropriately. And with a
bit of symbol trickery libraries that do actually call setlocale to
change it and restore it can be made to use our newlocale/uselocale
thread-local locale mechanism.
setlocale()
is used quite a bit and for non-threaded, which is the bulk
of php usage, it tends to work quite well. It is annoying that it is
process-wide and as you noted that can cause issues, sure, and that the
current implementation isn't threadsafe, but it isn't something that is
technically unsolvable. Rather than throwing our hands in the air and
just pulling the rug out from under people using it, we should take a
look at fixing the problems associated with it instead.
-Rasmus
Hi,
I'd like to get people's feedback for the idea of making setlocale be
either deprecated and to be removed eventually or just increasing the
warning level against people using it.The short version of why we should do this is that setlocale breaks
things. It is a process wide operation, that is not thread safe on
most platforms. Although people use it to alter the way strings are
output, it also breaks libraries that are assuming that they are
running in a "C" locale.The PHP
setlocale()
wrapper can be made threadsafe in a portable manner
via newlocale()/uselocale(), so that part isn't an issue. Someone just
needs to write the code for the threaded SAPIs. It hasn't been a high
priority due to how few people use non-Windows ZTS mode so far.Obviously within the same process there may be issues with changing a
locale to something unexpected by subsequent code, but that is really a
bug in that code then. If some library function expects and only works
in the "C" locale, it should make sure it is in that locale before doing
anything and then restoring the locale back appropriately. And with a
bit of symbol trickery libraries that do actually call setlocale to
change it and restore it can be made to use our newlocale/uselocale
thread-local locale mechanism.
setlocale()
is used quite a bit and for non-threaded, which is the bulk
of php usage, it tends to work quite well. It is annoying that it is
process-wide and as you noted that can cause issues, sure, and that the
current implementation isn't threadsafe, but it isn't something that is
technically unsolvable. Rather than throwing our hands in the air and
just pulling the rug out from under people using it, we should take a
look at fixing the problems associated with it instead.-Rasmus
To Rasmus's point, here's a PR for HHVM to provide a thread-safe setlocale
implementation: https://github.com/facebook/hhvm/pull/4736/files
It should be fairly easy to refactor the thread-safe-setlocale.(h/cpp)
files for Zend.
To Rasmus's point, here's a PR for HHVM to provide a thread-safe setlocale
implementation: https://github.com/facebook/hhvm/pull/4736/filesIt should be fairly easy to refactor the thread-safe-setlocale.(h/cpp) files
for Zend.
Ok, that' pretty awesome. So assumming that we incorporated that new
thread safe version of locale, how would we expose it? Most people who
are calling setlocale are unaware of it's side effects, and so should
be using the new safe version by default.
Some people who are calling setlocale will actually be using the
cross-thread behaviour and so that still needs to work.
setlocale is a variadic function, so it's not possible to hack in a
flag parameter. As much as I dislike ini settings, it seems like
adding one here would be sensible e.g. 'thread_safe_setlocale'
-
If it's enabled the setlocale function calls the new thread safe
functionality. -
If it's disabled the setlocale function calls the current non-TS C
setlocale function.
Does anyone have a better suggestion on exposing a thread safe version?
cheers
Dan
To Rasmus's point, here's a PR for HHVM to provide a thread-safe setlocale
implementation: https://github.com/facebook/hhvm/pull/4736/filesIt should be fairly easy to refactor the thread-safe-setlocale.(h/cpp) files
for Zend.Ok, that' pretty awesome. So assumming that we incorporated that new
thread safe version of locale, how would we expose it? Most people who
are calling setlocale are unaware of it's side effects, and so should
be using the new safe version by default.Some people who are calling setlocale will actually be using the
cross-thread behaviour and so that still needs to work.setlocale is a variadic function, so it's not possible to hack in a
flag parameter. As much as I dislike ini settings, it seems like
adding one here would be sensible e.g. 'thread_safe_setlocale'
If it's enabled the setlocale function calls the new thread safe
functionality.If it's disabled the setlocale function calls the current non-TS C
setlocale function.Does anyone have a better suggestion on exposing a thread safe version?
What about just adding another function: setlocale_global(), or
similar? I don't want a new INI setting any more than you do.
Adam
To Rasmus's point, here's a PR for HHVM to provide a thread-safe
setlocale
implementation: https://github.com/facebook/hhvm/pull/4736/filesIt should be fairly easy to refactor the thread-safe-setlocale.(h/cpp)
files
for Zend.Ok, that' pretty awesome. So assumming that we incorporated that new
thread safe version of locale, how would we expose it? Most people who
are calling setlocale are unaware of it's side effects, and so should
be using the new safe version by default.Some people who are calling setlocale will actually be using the
cross-thread behaviour and so that still needs to work.setlocale is a variadic function, so it's not possible to hack in a
flag parameter. As much as I dislike ini settings, it seems like
adding one here would be sensible e.g. 'thread_safe_setlocale'
If it's enabled the setlocale function calls the new thread safe
functionality.If it's disabled the setlocale function calls the current non-TS C
setlocale function.Does anyone have a better suggestion on exposing a thread safe version?
What about just adding another function: setlocale_global(), or
similar? I don't want a new INI setting any more than you do.Adam
--
I like the idea of an INI actually, but I would make it default to the
current bahaviour and user education so those who need TS know to change it.
Ryan
To Rasmus's point, here's a PR for HHVM to provide a thread-safe
setlocale
implementation: https://github.com/facebook/hhvm/pull/4736/filesIt should be fairly easy to refactor the thread-safe-setlocale.(h/cpp)
files
for Zend.Ok, that' pretty awesome. So assumming that we incorporated that new
thread safe version of locale, how would we expose it? Most people who
are calling setlocale are unaware of it's side effects, and so should
be using the new safe version by default.Some people who are calling setlocale will actually be using the
cross-thread behaviour and so that still needs to work.setlocale is a variadic function, so it's not possible to hack in a
flag parameter. As much as I dislike ini settings, it seems like
adding one here would be sensible e.g. 'thread_safe_setlocale'
If it's enabled the setlocale function calls the new thread safe
functionality.If it's disabled the setlocale function calls the current non-TS C
setlocale function.Does anyone have a better suggestion on exposing a thread safe version?
What about just adding another function: setlocale_global(), or
similar? I don't want a new INI setting any more than you do.Adam
--
I like the idea of an INI actually, but I would make it default to the
current bahaviour and user education so those who need TS know to change it.
The thread-safe version sets the locale globally for each thread and any
code executing in that thread will be impacted. So the semantics are
similar to how setlocale(3) works in a process. The thread-safe code would
be wrapped in #ifdefs and only take effect in a right context.
Ryan
Adam Harvey wrote:
What about just adding another function: setlocale_global(), or similar?
Do you mean setlocale would become the thread safe one, and
setlocale_global() would be the current behaviour? If so, that would
be a BC break for the small number of people who are deliberately
using setlocale globally. We could do that at a major version I guess.
I think the ini setting is still better - it allows people to use code
that was written assuming that setlocale was safe to use, rather than
forcing them to change it.
It might be best to introduce both setlocale_ts() and
setlocale_global(), so that users could call either explicitly, and
have the ini setting control which one setlocale()
points to. If we
did that we could also then plan to eventually deprecate both the ini
setting and the plain setlocale without _ts/_global.
Ryan Pallas wrote:
I like the idea of an INI actually, but I would make it default to the current behaviour and user education so those who need TS know to change it.
The problem with that is that for people who aren't aware of the
setting, they wouldn't know that they need to change the setting.
Instead they just want their code to work properly 'out of the box'.
For the vast majority of people calling the thread safe version of
setlocale would be what they problem want it to do, and so PHP should
default to the right thing for the majority of people.
Of course there is a separate argument about changing the behaviour at
major/minor version, and if this was introduced then having to
thread_safe_setlocale=off
for 7.1 and thread_safe_setlocale=on
for
8 could be correct.
cheers
Dan
Adam Harvey wrote:
What about just adding another function: setlocale_global(), or similar?
Do you mean setlocale would become the thread safe one, and
setlocale_global() would be the current behaviour? If so, that would
be a BC break for the small number of people who are deliberately
using setlocale globally. We could do that at a major version I guess.I think the ini setting is still better - it allows people to use code
that was written assuming that setlocale was safe to use, rather than
forcing them to change it.It might be best to introduce both setlocale_ts() and
setlocale_global(), so that users could call either explicitly, and
have the ini setting control which onesetlocale()
points to. If we
did that we could also then plan to eventually deprecate both the ini
setting and the plain setlocale without _ts/_global.Ryan Pallas wrote:
I like the idea of an INI actually, but I would make it default to the current behaviour and user education so those who need TS know to change it.
The problem with that is that for people who aren't aware of the
setting, they wouldn't know that they need to change the setting.
Instead they just want their code to work properly 'out of the box'.
For the vast majority of people calling the thread safe version of
setlocale would be what they problem want it to do, and so PHP should
default to the right thing for the majority of people.Of course there is a separate argument about changing the behaviour at
major/minor version, and if this was introduced then having to
thread_safe_setlocale=off
for 7.1 andthread_safe_setlocale=on
for
8 could be correct.
A step back here, please. Which concrete problem are you actually trying
to solve? HHVM needed this fix because of its single-process threaded
architecture on UNIX. PHP is not typically threaded on UNIX, so it
doesn't seem like an issue affecting very many people. It has been on
our radar for ages and has a big prominent red warning box about it in
the documentation. Threaded use of PHP on UNIX tends to be rather
specialized cases by people who understand that things like setlocale()
and putenv()
are process-wide.
That doesn't mean there is anything wrong with implementing thread-local
setlocale for our threadsafe builds, and we should definitely fix any
in-process setlocale interference issues, but the thread safety part of
it just doesn't seem like something very many people are going to care
about. Why do you?
-Rasmus