Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found
the difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for
certain Windows.
What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
Best regards.
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/
Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found
the difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for
certain Windows.What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
What kind of issue? Perhaps they are leaving in the BOM? Tell them to
configure their editors to not add a BOM.
-Rasmus
Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found
the difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for
certain Windows.What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
What kind of issue? Perhaps they are leaving in the BOM? Tell them to
configure their editors to not add a BOM.
It's not from the editor. The filename contains UTF-8 character and the
include/require fails.
Best regards.
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/
Ivan Enderlin @ Hoa wrote:
What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
What kind of issue? Perhaps they are leaving in the BOM? Tell them to
configure their editors to not add a BOM.
It's not from the editor. The filename contains UTF-8 character and the
include/require fails.
Windows does not store UTF-8 characters in file names? It uses it's 'sort off'
UTF-16 wide string? Mapping between this and UTF-8 DOES result in different
'names' at least on older versions of windows such as XP
There is also the fun with windows displaying file names in different case which
sometimes cause problems when working cross OS. I still manage to get hit with
files with the same name but just in different cases on Linux and where unicode
case conversion steps in the number of characters can change here just to add to
the fun.
It's just another part of the unicode minefield that PHP has to try and navigate
around but it's not necessarily PHP's error and it may be that a name HAS to be
changed when using windows. At least that is the current 'fix'
( I've not played with character sets on W7 STILL running W2k/XP windows servers
on private intranets so the 'problem' may have changed again? )
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
hi!
On Tue, Aug 21, 2012 at 3:10 PM, Ivan Enderlin @ Hoa
ivan.enderlin@hoa-project.net wrote:
Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found the
difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for certain
Windows.What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
It can't be fixed easily. Basically avoid by any price to use Unicode
names for the filenames of your projects.
As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
hi!
On Tue, Aug 21, 2012 at 3:10 PM, Ivan Enderlin @ Hoa
ivan.enderlin@hoa-project.net wrote:Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found the
difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for certain
Windows.What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?
It can't be fixed easily. Basically avoid by any price to use Unicode
names for the filenames of your projects.
I see.
As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.
Exactly. I can detect if PHP is running on Windows and making the
conversion on-the-fly but what kind of conversion? I have tried with
utf8_decode()
but it seems to also fail. Any other idea?
Cheers.
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/
Ivan Enderlin @ Hoa wrote:
As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.
Exactly. I can detect if PHP is running on Windows and making the conversion
on-the-fly but what kind of conversion? I have tried withutf8_decode()
but it
seems to also fail. Any other idea?
Oh that it was so simple ;)
This is perhaps one of the reasons development of PHP6 ground to a halt?
'detecting and fixing' something which seems to change at random is somewhat
difficult. For a long time I only used only lowercase file names when I was
heavily into cross OS working. Fortunately most of the customer sites who would
not allow Linux servers back then are now installing them themselves :)
This makes life a little easy now ... but add more than European character sets
into the equation, and windows is still the sticking point simply because of
'code page conversion'. When PHP6 was being roadmapped, personally I was looking
to it as the solution to these problem, but without the OS getting it right in
the first place there is not a lot we can do except just take a lot of care with
how we use unicode characters :(
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
hi Lester,
Ivan Enderlin @ Hoa wrote:
As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.Exactly. I can detect if PHP is running on Windows and making the
conversion
on-the-fly but what kind of conversion? I have tried withutf8_decode()
but it
seems to also fail. Any other idea?Oh that it was so simple ;)
This is perhaps one of the reasons development of PHP6 ground to a halt?
No, it was not.
Also unicode support on windows for IO operations (on the paths, not
the contents) is planed for php-next+1 (aka not the 5.5 but the one
after).
I'd also to ask to do not hijack this thread with a php6/whatever else
rant and keep focusing on answering Ivan's questions instead.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Pierre Joye wrote:
I'd also to ask to do not hijack this thread with a php6/whatever else
rant and keep focusing on answering Ivan's questions instead.
THERE WAS NOTHING OF A RANT !!!!
I am simply expressing the same problem you have also expressed in a different
way. Unicode and windows still don't play well together which is what is causing
problems in a lot of places not just PHP.
I would ask if any of this HAS changed in windows 7? Does windows do anything
different unicode wise? I'm going to have to move windows development work to a
windows 7 platform soon, so DO the same rules apply? On XP I know the holes and
can avoid them, the very hole Ivan is falling into.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Pierre Joye wrote:
I'd also to ask to do not hijack this thread with a php6/whatever else
rant and keep focusing on answering Ivan's questions instead.
I would ask if any of this HAS changed in windows 7? Does windows do
anything different unicode wise? I'm going to have to move windows
development work to a windows 7 platform soon, so DO the same rules apply?
On XP I know the holes and can avoid them, the very hole Ivan is falling
into.
It is not a windows OS problem but the API we use in PHP. We use the
ANSI APIs while the wildchar ones should be used to fully support
unicode filemanes. So no matter which windows version you use, the
problem remains the same.
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Pierre Joye wrote:
I'd also to ask to do not hijack this thread with a php6/whatever else
rant and keep focusing on answering Ivan's questions instead.
I would ask if any of this HAS changed in windows 7? Does windows do
anything different unicode wise? I'm going to have to move windows
development work to a windows 7 platform soon, so DO the same rules apply?
On XP I know the holes and can avoid them, the very hole Ivan is falling
into.
It is not a windows OS problem but the API we use in PHP. We use the
ANSI APIs while the wildchar ones should be used to fully support
unicode filemanes. So no matter which windows version you use, the
problem remains the same.
Ok. I will remove Unicode characters from my filenames and all will be
ok. Thanks for clarifying the situation :-).
Cheers.
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/
On Wed, Aug 22, 2012 at 9:04 AM, Ivan Enderlin @ Hoa
ivan.enderlin@hoa-project.net wrote:
As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.Exactly. I can detect if PHP is running on Windows and making the conversion
on-the-fly but what kind of conversion? I have tried withutf8_decode()
but
it seems to also fail.
It is not possible yet to automatically detect the codepage on
Windows. Also it may be changed during the request time (unlikely but
possible). But if you know it, say using an configuration parameter,
then convert from/to UTF-8 to/from this codepage using mbstring and
pass the result to the file functions, require&co. But that's really
tricky and buggy.
Any other idea?
Yes, don't use UTF-8 in your filenames, that's not portable and bring
all kind of issues (not necessary only on windows). I told you that
already back then on twitter too when you asked me about this issue.
Cheers,
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
On Wed, Aug 22, 2012 at 9:04 AM, Ivan Enderlin @ Hoa
ivan.enderlin@hoa-project.net wrote:As it may (very often) works smoothly on most unices, it won't work
ever using current releases or master on Windows. One has to set the
correct codepage and do the conversion from/to UTF-8.
Exactly. I can detect if PHP is running on Windows and making the conversion
on-the-fly but what kind of conversion? I have tried withutf8_decode()
but
it seems to also fail.
It is not possible yet to automatically detect the codepage on
Windows. Also it may be changed during the request time (unlikely but
possible). But if you know it, say using an configuration parameter,
then convert from/to UTF-8 to/from this codepage using mbstring and
pass the result to the file functions, require&co. But that's really
tricky and buggy.Any other idea?
Yes, don't use UTF-8 in your filenames, that's not portable and bring
all kind of issues (not necessary only on windows).
I had tried :-).
I told you that
already back then on twitter too when you asked me about this issue.
On Twitter? Donnot remember.
Thanks!
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/
PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/
Member of HTML and WebApps Working Group of W3C
http://w3.org/
ivan.enderlin@hoa-project.net ("Ivan Enderlin @ Hoa") wrote:
Hello,
Some of my users & contributors have met an issue with files containing
UTF-8 on certain Windows configurations (but they actually did not found
the difference). Any idea why?
The issue does not appear on Linux, BSD or Mac OS system, only for
certain Windows.What do we need to check? --enable-zend-multibyte, some php.ini magic
parameters, some ENV variables?Best regards.
--
Ivan Enderlin
Developer of Hoa
http://hoa.42/ or http://hoa-project.net/PhD. student at DISC/Femto-ST (Vesontio) and INRIA (Cassis)
http://disc.univ-fcomte.fr/ and http://www.inria.fr/Member of HTML and WebApps Working Group of W3C
http://w3.org/
If you are experimenting problems with file paths and files that do exist on a
system but not in another system (that is, source files containing non-ASCII
chars in either their name or their path), the issue may depend on the
different configuration of the locale configuration.
Under Unix/Linux check the environment variable LC_CTYPE, whose value might be
something like "language_country.UTF-8", where "UTF-8" is the encoding of file
names and paths. File names and file paths in every include*() and require*()
must then be UTF-8 encoded as well.
Windows uses the UTF-16 encoding for file names. Programs unaware of this
encoding (as the PHP interpreter still is) MUST use the current Windows code
page as set in the "Language and Regional Settings" of the control panel.
Typically LC_CTYPE
evaluates to something like "language_country.1252" on
systems configured in the western countries, where 1252 is a Windows code page
very similar, but not equal, to ISO-8859-1; so file names and paths specified
in the require*() and include*() must be encoded accordingly. Under Windows,
UTF-8 IS NOT a valid code page.
More details about this issue in my comment to
the bug 47096: https://bugs.php.net/bug.php?id=47096
Regards,
/|\ Umberto Salsi
/_/ www.icosaedro.it