18 years ago by Scott MacVicar — view source — reply

unread

I have to agree, I think giving the ability to toggle Unicode support is
going to add more confusion and grief for application developers,
especially with it being PHP_INI_SYSTEM.

The reason for the option as far I can remember was to do with
performance when you were working with binary strings, this may no
longer be valid.

Scott

Jani Taskinen wrote:

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

--Jani

18 years ago by Pierre — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

Apparently there is a certain amount of users (undefined/undefinable
masses) who like to use PHP6 new features (?) without unicode. The
arguments were about the incompatibilities (the zend fatal errors are
likely to cause more troubles :) and performance.

I was one who likes to have a single mode: unicode.

--Pierre

18 years ago by Derick Rethans — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

Apparently there is a certain amount of users (undefined/undefinable
masses) who like to use PHP6 new features (?) without unicode.

I guess you're using the "?" to point out that there are no new features
(besides Unicode)? :)

The arguments were about the incompatibilities (the zend fatal errors
are likely to cause more troubles :) and performance.

I was one who likes to have a single mode: unicode.

Yup, here as well.

regards,
Derick

18 years ago by Pierre — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

Apparently there is a certain amount of users (undefined/undefinable
masses) who like to use PHP6 new features (?) without unicode.

I guess you're using the "?" to point out that there are no new features
(besides Unicode)? :)

You guessed right :)

There will be new features besides unicode (and related, like unicode
text mode in stream or unicode FS) but nothing that will not work with
php5 as well, as far as I can see (and know for "my" extensions).

--Pierre

18 years ago by Tomas Kuliavas — view source — reply

unread

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

Apparently there is a certain amount of users (undefined/undefinable
masses) who like to use PHP6 new features (?) without unicode. The
arguments were about the incompatibilities (the zend fatal errors are
likely to cause more troubles :) and performance.

I was one who likes to have a single mode: unicode.

Changes made in PHP6 unicode_semantics=on are not backwards compatible
with PHP4 and PHP5. Same scripts (for example, reading string in bytes)
work in PHP4 and PHP5. They won't work in unicode_semantics=on. PHP6
places very strict checks on string variables. PHP script writers are not
used to it.

If developers have to maintain compatibility with PHP6
unicode_semantics=on, they will have to do that in separate code branch.
PHP6 code will break with E_PARSE errors in older PHP installs.

I don't like it, because I can't turn that thing off. Interpreter is
trying to outsmart me without knowing my coding environment.

If you need multibyte string support, you have mbstring extension. If you
want unicode function and variable names, you should remember that they
won't work in international coding environment. International developers
must use something they all understand. It means ASCII and English
function names. Will you understand purpose of the function, when function
name is written in Russian, Chinese or Arabic?

--
Tomas

18 years ago by johannes@php.net — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

The reason was to "keep BC"

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

The ini Setting changes the behaviour of the code in a quite
drastic way. This is even worse then magic_quotes which could
be fixed by using some prepending file removing/adding slashes
depending on the setting.
The UG(unicode) checks in the code make maintenance way harder.
This feature doesn't bring BC - there will still be enough BC
breaks.
I guess we're adding a few thousand UG(unicode) checks during
each request which certainly cost a bit performance

Conclusion: Let's remove that damn setting.

johannes

18 years ago by Derick Rethans — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

The reason was to "keep BC"

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

The ini Setting changes the behaviour of the code in a quite
drastic way. This is even worse then magic_quotes which could
be fixed by using some prepending file removing/adding slashes
depending on the setting.

The UG(unicode) checks in the code make maintenance way harder.

This feature doesn't bring BC - there will still be enough BC
breaks.

I guess we're adding a few thousand UG(unicode) checks during
each request which certainly cost a bit performance

Conclusion: Let's remove that damn setting.

Just to state the obvious... I agree here.

Derick

18 years ago by Rasmus Lerdorf — view source — reply

unread

Jani Taskinen wrote:

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

That's exactly why we need the toggle. We don't want to encourage
people to stay with an older version. We have enough trouble getting
people from 4 to 5 today, why build in an automatic excuse for people to
stay with 5 when all development moves to 6? If all their PHP 5-based
code works flawlessly in PHP 6, the adoption of PHP 6 will be quicker.

-Rasmus

18 years ago by Pierre — view source — reply

unread

Jani Taskinen wrote:

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

That's exactly why we need the toggle. We don't want to encourage
people to stay with an older version. We have enough trouble getting
people from 4 to 5 today, why build in an automatic excuse for people to
stay with 5 when all development moves to 6? If all their PHP 5-based
code works flawlessly in PHP 6, the adoption of PHP 6 will be quicker.

As a side note, we had the same thoughts about php5, it did not work.

--Pierre

18 years ago by Rasmus Lerdorf — view source — reply

unread

Pierre wrote:

Jani Taskinen wrote:

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?
Just stay with PHP 5 then..

That's exactly why we need the toggle. We don't want to encourage
people to stay with an older version. We have enough trouble getting
people from 4 to 5 today, why build in an automatic excuse for people to
stay with 5 when all development moves to 6? If all their PHP 5-based
code works flawlessly in PHP 6, the adoption of PHP 6 will be quicker.

As a side note, we had the same thoughts about php5, it did not work.

Not really. Nothing in PHP 5 was designed to break compatibility with
PHP 4. However in PHP 6 there are just some things that cannot be made
backward compatible in Unicode mode without being completely
inconsistent with how Unicode should work.

-Rasmus

18 years ago by Pierre — view source — reply

unread

Hi Rasmus,

As a side note, we had the same thoughts about php5, it did not work.

Not really. Nothing in PHP 5 was designed to break compatibility with
PHP 4.

In theory, I agree with you here, we were very careful about BC.
However in practice, there was a few troubles which made a smooth
migration (like running php4 code directly in php5) a little pain
until 5.1. But then the new fatal errors brought other troubles...

However my point was about making design choices to minimize the
migration momentum. It did not work for PHP5. It worked (I can be
wrong, it is too old) well for php3 to php4 because there was really a
huge improvement in the core language (begin of "OO" support, even if
it was not perfect) and its features in general (extensions).

I fear that Unicode is not that appealing for almost all users, even
those who actually need it, they already rely on mb_string and other
home made solutions. But I tend to be too much realistic/pessimist.

--Pierre

18 years ago by johannes@php.net — view source — reply

unread

Hi Jani,

During Derick's talk about PHP 6 at PHP Vikinger, I started to wonder
what exactly was the reasoning behind adding something like
"unicode.semantics" option. Derick didn't remember, neither did I.

The reason was to "keep BC"

Apparently it's another one of these "register_globals" or
"magic_quotes_*" directives we'll remove in PHP 7? :D

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable
it?
Just stay with PHP 5 then..

The ini Setting changes the behaviour of the code in a quite
drastic way. This is even worse then magic_quotes which could
be fixed by using some prepending file removing/adding slashes
depending on the setting.
The UG(unicode) checks in the code make maintenance way harder.
This feature doesn't bring BC - there will still be enough BC
breaks.
I guess we're adding a few thousand UG(unicode) checks during
each request which certainly cost a bit performance

Conclusion: Let's remove that damn setting.

johannes

18 years ago by Stanislav Malyshev — view source — reply

unread

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?

To get late static binding and namespaces, of course ;)

Stanislav Malyshev, Zend Products Engineer
stas@zend.com http://www.zend.com/

18 years ago by Cristian Rodriguez — view source — reply

unread

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?

I think if unicode.semantics remains PHP_INI_SYSTEM it is useless as
most users ( people that runs in shared hosting servers) will simple
not be able to turn it on, as well hosting companies will keep it off
because turning it on will break applications.

So, either make it at least PHP_INI_PER_DIR or remove it all togeteher
( aka.. always behave like unicode.semantics= On)

18 years ago by Rasmus Lerdorf — view source — reply

unread

Cristian Rodriguez wrote:

I mean, if PHP 6 is about unicode, why upgrade to PHP 6 and disable it?

I think if unicode.semantics remains PHP_INI_SYSTEM it is useless as
most users ( people that runs in shared hosting servers) will simple
not be able to turn it on, as well hosting companies will keep it off
because turning it on will break applications.

So, either make it at least PHP_INI_PER_DIR or remove it all togeteher
( aka.. always behave like unicode.semantics= On)

Those same shared hosting companies would never upgrade to PHP 6 if we
forced unicode semantics on them breaking legacy apps and that would
force us to maintain PHP 5 forever.

We recognize that some people are simply not going to make the effort to
make their apps unicode aware anytime soon and that fact could easily
splinter the project across the unicode line. If that unicode line
becomes synonymous with PHP 5 vs. PHP 6 we are in trouble. I would much
rather have people on the same codebase so we can move everyone ahead on
other features and keep the unicode vs. non-unicode battle to a
configuration setting within that one codebase. I really don't want to
get into the situation where we are backporting features to PHP 5 a
couple of years from now.

And we obviously did consider making it PER_DIR, but that is really
complicated. The current cries that the unicode.semantics check is
complicating code are dull compared to what they would be if we allowed
a single process to switch back and forth potentially on the same scripts.

-Rasmus

18 years ago by Peter Brodersen — view source — reply

unread

On Fri, 15 Jun 2007 02:55:16 +0100, in php.internals
rasmus@lerdorf.com (Rasmus Lerdorf) wrote:

Those same shared hosting companies would never upgrade to PHP 6 if we
forced unicode semantics on them breaking legacy apps and that would
force us to maintain PHP 5 forever.

On the other hand I feel a bit sad that if I want to write perfectly
good portable PHP 6 code that is only intended to work under PHP 6 I
still have to check for different configuration settings.

I think we were really close to get out of the
if(get_magic_quotes_gpc()) "requirement" but now it has been replaced
with a new one. Even if a developer would write (portable) PHP 6 only
code.

Of course, configurations could contain a lot of other obscure
settings that might have influence on the script but none as
widespread as the difference in magic_quotes settings.

--

Peter Brodersen

18 years ago by Rasmus Lerdorf — view source — reply

unread

Peter Brodersen wrote:

On Fri, 15 Jun 2007 02:55:16 +0100, in php.internals
rasmus@lerdorf.com (Rasmus Lerdorf) wrote:

Those same shared hosting companies would never upgrade to PHP 6 if we
forced unicode semantics on them breaking legacy apps and that would
force us to maintain PHP 5 forever.

On the other hand I feel a bit sad that if I want to write perfectly
good portable PHP 6 code that is only intended to work under PHP 6 I
still have to check for different configuration settings.

I think we were really close to get out of the
if(get_magic_quotes_gpc()) "requirement" but now it has been replaced
with a new one. Even if a developer would write (portable) PHP 6 only
code.

Of course, configurations could contain a lot of other obscure
settings that might have influence on the script but none as
widespread as the difference in magic_quotes settings.

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

-Rasmus

18 years ago by Peter Brodersen — view source — reply

unread

On Tue, 19 Jun 2007 07:26:57 -0700, in php.internals
rasmus@lerdorf.com (Rasmus Lerdorf) wrote:

I think we were really close to get out of the
if(get_magic_quotes_gpc()) "requirement" but now it has been replaced
with a new one. Even if a developer would write (portable) PHP 6 only
code.

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode.

Yeah, but that's pretty much the problem :)

If I want to create code that has to work under different versions of
PHP then it's a no-brainer that I have to check for different settings
(and maybe even create userland functions to emulate native functions
in later versions of PHP). The more BC the code has to maintain the
more settings does one have to check for.

But if I want to code to one specific version of PHP it bothers me
that I even here has to take in consideration that PHP still comes in
different flavors, even under one version.

If the unicode setting would be on for most of the installments I
might just create assumptions on this setting - pretty much the same
way that I have dared to assume that magic_quotes_runtime and
magic_quotes_sybase is off (as I have never encountered them on in
generel setups such as webhosting companies).

But my fear is that exactly the webhosting companies would end up with
different settings and might even have to create two products for
their customer: "Do you want the PHP 6 with unicode setting on or
off?".

I'm just worried that PHP 6 is the new NULL: PHP6 != PHP6 :-)

--

Peter Brodersen

18 years ago by Pierre — view source — reply

unread

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

Just as a last (hopefully) comment, even if nothing seemed to have an
influence, no matter how many we are to prefer a unicode only mode (so
far only you are in favour of it, maybe Andree too but I don't
remember his opinion on this topic :).

The gain we hope to have by keeping a non unicode mode is about having
more users moving to PHP6. I would like to know why it will work
better than with php5, any thoughts?

And let forget that maintaining (and develop/implement) these two
modes will obviously take more time.

Cheers,
--Pierre

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Pierre wrote:

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

Just as a last (hopefully) comment, even if nothing seemed to have an
influence, no matter how many we are to prefer a unicode only mode (so
far only you are in favour of it, maybe Andree too but I don't
remember his opinion on this topic :).

The gain we hope to have by keeping a non unicode mode is about having
more users moving to PHP6. I would like to know why it will work
better than with php5, any thoughts?

And let forget that maintaining (and develop/implement) these two
modes will obviously take more time.

I agree, we tried out best in PHP5 to provide support for PHP4 and it
seems this has not been overly successful. Why will this be any easier
for PHP6? Maybe we should try a different approach. Lets not hold
ourselves back. Lets break BC where it makes sense. Projects like PEAR
etc. could always claim that its also PHP5 compatible without truly
moving over. If we make a clean cut, this will not work anymore and
instead we have an opportunity to clean up, improve performance a bit by
removing unicode off hacks and let users migrate or stay. Sooner or
later they will be attracted by new shiny stuff and then they will make
a true effort to migrate over instead of these half migrations that
happened with PHP5 "adoption".

regards,
Lukas

18 years ago by Tomas Kuliavas — view source — reply

unread

Pierre wrote:

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

Just as a last (hopefully) comment, even if nothing seemed to have an
influence, no matter how many we are to prefer a unicode only mode (so
far only you are in favour of it, maybe Andree too but I don't
remember his opinion on this topic :).

The gain we hope to have by keeping a non unicode mode is about having
more users moving to PHP6. I would like to know why it will work
better than with php5, any thoughts?

And let forget that maintaining (and develop/implement) these two
modes will obviously take more time.

I agree, we tried out best in PHP5 to provide support for PHP4 and it
seems this has not been overly successful. Why will this be any easier
for PHP6? Maybe we should try a different approach. Lets not hold
ourselves back. Lets break BC where it makes sense. Projects like PEAR
etc. could always claim that its also PHP5 compatible without truly
moving over. If we make a clean cut, this will not work anymore and
instead we have an opportunity to clean up, improve performance a bit by
removing unicode off hacks and let users migrate or stay. Sooner or
later they will be attracted by new shiny stuff and then they will make
a true effort to migrate over instead of these half migrations that
happened with PHP5 "adoption".

Nope. If I can't turn off unicode_semantics, I will ask end users to use
PHP5 or write manual about running two PHP versions on one host. I can't
update code to work on PHP6 unicode_semantics=on, because it affects lots
of functions and updates will break backwards compatibility with PHP4 and
some PHP5 versions. I suspect that I won't be able to set stream encoding,
because I will have to read the stream in order to know encoding of 8bit
data. In some cases data is provided by third party and character set
information can be incorrect. In some cases same stream can output data in
different character sets. In some cases automatic charset conversions
performed by PHP libraries might break verification of received data.

I won't be attracted with new shiny stuff. Last shiny stuff that I can use
is available in PHP 5.1.0.

I don't care about Unicode support, because it breaks things. I suspect
that PHP6 Unicode extension won't give me controls that I have in PHP5 and
PHP4 strings. PHP6 Unicode support is not designed for international
environment. It is designed for nationalized environments and allows PHP
script developers to code in their native language. Code written in
French, Russian, Arabic, Japanese or Chinese is not international. Only
some people can read it. Only some people can see difference between ァ()
and ィ(). If I have to debug code written in Japanese or Arabic, language
is the main barrier in understanding the code.

--
Tomas

18 years ago by Stefan Walk — view source — reply

unread

I don't care about Unicode support, because it breaks things. I suspect
that PHP6 Unicode extension won't give me controls that I have in PHP5 and
PHP4 strings. PHP6 Unicode support is not designed for international
environment. It is designed for nationalized environments and allows PHP
script developers to code in their native language. Code written in
French, Russian, Arabic, Japanese or Chinese is not international. Only
some people can read it. Only some people can see difference between ァ()
and ィ(). If I have to debug code written in Japanese or Arabic, language
is the main barrier in understanding the code.

You're spreading FUD about PHP6s unicode support. Writing code in your
own native language has nothing to do with the unicode support in
PHP6. You can already do that in PHP4 if you use utf-8, since any
sequence of codepoints > 127 translates to a byte-sequence that is a
valid identifier for php.
Have a look at http://www.icu-project.org/ to see what the unicode
features in php6 are really about.

Regards,
Stefan (who also thinks that the switch is of no use,
unicode_semantics should be on all the time. And at least it shouldn't
be off in php.ini-dist and php.ini-recommended)

18 years ago by Tomas Kuliavas — view source — reply

unread

I don't care about Unicode support, because it breaks things. I suspect
that PHP6 Unicode extension won't give me controls that I have in PHP5
and
PHP4 strings. PHP6 Unicode support is not designed for international
environment. It is designed for nationalized environments and allows PHP
script developers to code in their native language. Code written in
French, Russian, Arabic, Japanese or Chinese is not international. Only
some people can read it. Only some people can see difference between ァ()
and ィ(). If I have to debug code written in Japanese or Arabic, language
is the main barrier in understanding the code.

You're spreading FUD about PHP6s unicode support. Writing code in your
own native language has nothing to do with the unicode support in
PHP6. You can already do that in PHP4 if you use utf-8, since any
sequence of codepoints > 127 translates to a byte-sequence that is a
valid identifier for php.
Have a look at http://www.icu-project.org/ to see what the unicode
features in php6 are really about.

Regards,
Stefan (who also thinks that the switch is of no use,
unicode_semantics should be on all the time. And at least it shouldn't
be off in php.ini-dist and php.ini-recommended)

And you are trying to make sure that new features that you like are turned
on by default even when they break things for others. Some people are
proposing changes that will enforce your preferred options without leaving
any options to others. If I don't complain, PHP developers might do what
you are proposing just like they switched unicode_semantics to
PHP_INI_SYSTEM. I can't understand performance reasons without numbers
that prove that, but I can see when somebody is trying to break things in
my code and leaves me without any good options for fixing the code.

It is possible that I am not correct. Writing functions in native language
is one of key points in Andrei's presentation.
http://www.gravitonic.com/talks/, "Unicoding With PHP 6" php|tek 2007
Chicago. Slides 49, 50

Other key points are about evaluating string length (slide 25) and offsets
(slide 47), collation (slide 60,61) and strtoupper/strtolower/strcasecmp
(slide 66). String length and offsets can be implemented with PHP5
mbstring extension. If I use PHP5 strtolower/strtoupper/strcasecmp, I must
assume that they are locale aware. These functions don't follow LC_CTYPE=C
rules, when locale is not C.

It is possible that I am not correct and I will be able to update code and
make it work in PHP6, but in order to do that I will have to use language
constructs that are not backwards compatible with older PHP versions
(slide 26, 30). I won't be able to mix PHP6 code with PHP5 code and will
have to maintain two different library versions for lots of string
functions and lots of stream operations. I will have to spend my time in
order to make sure that code works, when I am not the one who broke it. I
have other bugs to fix and they have higher priority than fixing code
broken by others.

I have already tested code in PHP6 unicode_semantics=on. Thing broke on
password encryption and fix was to do something with binary typecasting. I
need more than information currently available in
http://www.php.net/manual in order to fix it. Then fputs calls freak out
with notices about downcoded buffers and I can't leave those notices
unfixed due to error_reporting = 2047 + display_errors = on coding
requirements. And I still haven't reached the point when code does 8bit
string decoding.

We are working on different code. You have code with some specific
character set and you can control all strings. My code works with
different character sets, different sources of 8bit data and I don't
controls those 8bit strings. My experience with PHP4/5 shows that I can
work with 8bit strings better than PHP interpreter. Interpreter wins only
when I have to work with large mapping tables and even then it is not
stable (iconv), not enabled by default (recode), limited (mbstring) or
very limited (utf8_decode).

Some day I will take some standalone library and will try to make it work
in PHP6 unicode_semantics=on. Maybe then I'll stop spreading the FUD. But
for now don't expect that I will remain silent, if you propose changes
that break things in PHP5 - PHP6 backwards compatibility. I know that
unicode_semantics=on breaks things in drastic ways or my experience is
based only on unstable PHP6 development code and RC versions will be
better.

--
Tomas

18 years ago by francois.laupretre@ratp.fr — view source — reply

unread

De: Tomas Kuliavas [mailto:tokul@users.sourceforge.net]

We are working on different code. You have code with some specific
character set and you can control all strings.

Tomas, stop arguing on this. As a library maintainer, I agree with you and I don't understand where the
'killer feature' is (I heard that Yahoo China asked for it, or is it because Zend is established in
Israel, I don't know...), but, now, if people don't switch to PHP 6 (and I am sure they won't), it will
be your fault, because of your supposed FUD ;)

Francois

18 years ago by Tomas Kuliavas — view source — reply

unread

We are working on different code. You have code with some specific
character set and you can control all strings.

Tomas, stop arguing on this. As a library maintainer, I agree with you and
I don't understand where the
'killer feature' is (I heard that Yahoo China asked for it, or is it
because Zend is established in
Israel, I don't know...), but, now, if people don't switch to PHP 6 (and I
am sure they won't), it will
be your fault, because of your supposed FUD ;)

/**

@param string $string utf8 string
@return string html encoded string
*/
function test_convert_utf8ToHtml($string) {
// removed 0xE0-0xFD decoding

// decode two byte utf8 characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\1')-192)*64+(ord('\2')-128)).';'",
$string);

// remove broken utf8
$string = preg_replace("/[\200-\237]|\240|[\241-\377]/",'?',$string);

return $string;
}
// \u0105\u30A1
$string = 'ąァ';
// expected result 'ą???' or 'ąァ'
echo test_convert_utf8ToHtml($string);

Please show how to do this in PHP6 unicode.semantics=on. Without mbstring,
recode or other character set conversion extensions and without
htmlentities() function. Only core functions and pcre extension. Then make
updated function compatible with PHP 5.2.0.

test_convert_utf8ToHtml() is based on code from modular library. I know
that I can split it into PHP5 and PHP6 code, but I can find functions that
are not modulized and can't be replaced with unicode_encode(). For example
MIME Q encoding or 8bit string detection.

--
Tomas

18 years ago by Derick Rethans — view source — reply

unread

It is possible that I am not correct. Writing functions in native language
is one of key points in Andrei's presentation.

It's not one his key points, it was meant as a joke. And I know because
I've seen him give this presentation close to a dozen times.

regards,
Derick

18 years ago by Stanislav Malyshev — view source — reply

unread

PHP4 strings. PHP6 Unicode support is not designed for international
environment. It is designed for nationalized environments and allows PHP
script developers to code in their native language. Code written in

The goal of PHP 6 unicode support is definitely to allow
internationalized development. Japanese function names are of much less
importance than global support for unicode data and i18n/l10n functions

such as collation, localized formats etc.
--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Andrei Zmievski — view source — reply

unread

No one is going to write code in their own native language and
distribute it worldwide.

How can you say that "PHP6 Unicode support is not designed for
international environment"? Have you even tried it?

-Andrei

I don't care about Unicode support, because it breaks things. I
suspect
that PHP6 Unicode extension won't give me controls that I have in
PHP5 and
PHP4 strings. PHP6 Unicode support is not designed for international
environment. It is designed for nationalized environments and
allows PHP
script developers to code in their native language. Code written in
French, Russian, Arabic, Japanese or Chinese is not international.
Only
some people can read it. Only some people can see difference
between ァ()
and ィ(). If I have to debug code written in Japanese or Arabic,
language
is the main barrier in understanding the code.

18 years ago by Tomas Kuliavas — view source — reply

unread

No one is going to write code in their own native language and
distribute it worldwide.

How can you say that "PHP6 Unicode support is not designed for
international environment"? Have you even tried it?

Ok. International environment.

Do you have strtoupper|strtolower|strcasecmp functions operating in
LC_CTYPE=C without switching locale? If I remember correctly, PHP does not
use those even internally and developers are constantly triggering same
Turkish|Kurdish|Azerbaijani bug in different functions. If I want case
conversion or case insensitive comparison functions to follow C rules and
not LC_CTYPE=some_translation, I am forced to use own functions, because
strtoupper|strtolower are definitely locale aware in PHP.

PHP6 unicode.semantics=on reduces my options and forces me to recode all
8bit string operations. After recoding functions are not backwards
compatible with anything lower than 5.2.1. Your slides show that unicode
characters are defined with \u, yet you mess with octals (\300) and
hexadecimals (\xC0).

It is possible that I am not right and I will be able to do everything
more efficiently in PHP6. But for now I have broken password encryption
handling, broken work with binary strings and over noisy stream functions.
And I still haven't checked how it will handle streams with data encoded
in different character sets. I will be forced to recode the code if PHP6
forces me to work in unicode.semantics=on. Don't expect that I will praise
PHP6 for that. You are helping same people, who ask others to turn on
mbstring.func_overload in php.ini in order to get unicode support. You are
not helping people who already have code working with 8bit strings in
different character sets.

--
Tomas

18 years ago by Rasmus Lerdorf — view source — reply

unread

Pierre wrote:

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

Just as a last (hopefully) comment, even if nothing seemed to have an
influence, no matter how many we are to prefer a unicode only mode (so
far only you are in favour of it, maybe Andree too but I don't
remember his opinion on this topic :).

Uh, this was agreed upon by everyone involved in the design of the
Unicode support. So saying I am the only one is extremely misleading.
I may be the only one explaining why the decision was reached, but I am
certainly not the only one in favour of it.

The gain we hope to have by keeping a non unicode mode is about having
more users moving to PHP6. I would like to know why it will work
better than with php5, any thoughts?

By not providing it, we ensure that a large number of people will not
move to PHP 6. At least by providing it we give ourselves a chance. I
think if we drop it we are basically giving up and we will be
maintaining 2 code bases for the next 10 years. Do we really want that?

And let forget that maintaining (and develop/implement) these two
modes will obviously take more time.

More time than maintaining separate Unicode and non-Unicode code bases
in difference branches?

-Rasmus

18 years ago by Pierre — view source — reply

unread

Pierre wrote:

But this is no different from writing code that will work on both PHP 5
and PHP 6. The only difference is that instead of checking for PHP 5
you will be checking for Unicode. Like I said, we don't want the
Unicode decision to be synonymous with PHP 5 vs. PHP 6 because then the
non-Unicode folks will never get the benefits of the non-Unicode
improvements in PHP 6 and we would be forced to support PHP 5 for a lot
longer. We really stretch our already thing resources in order to
support multiple branches, so anything we can do to get as many people
as possible onto the same codebase helps us a lot.

Just as a last (hopefully) comment, even if nothing seemed to have an
influence, no matter how many we are to prefer a unicode only mode (so
far only you are in favour of it, maybe Andree too but I don't
remember his opinion on this topic :).

Uh, this was agreed upon by everyone involved in the design of the
Unicode support. So saying I am the only one is extremely misleading.
I may be the only one explaining why the decision was reached, but I am
certainly not the only one in favour of it.

Sorry, I did not know that there was many "externals" people involved
in the unicode design Who was involved? Almost all "core" developers
I asked are against (anyone having a different view, please step in).

The gain we hope to have by keeping a non unicode mode is about having
more users moving to PHP6. I would like to know why it will work
better than with php5, any thoughts?

By not providing it, we ensure that a large number of people will not
move to PHP 6. At least by providing it we give ourselves a chance. I
think if we drop it we are basically giving up and we will be
maintaining 2 code bases for the next 10 years. Do we really want that?

We maintain three branches since a couple of years, having only two is
a real progress. PHP4's one is dying, it was about time. I can live
with two branches and code base for php-src, I have to maintain three
or more branches in many pecl extensions anyway :)

And let forget that maintaining (and develop/implement) these two
modes will obviously take more time.

More time than maintaining separate Unicode and non-Unicode code bases
in difference branches?

Yes, code base is cleaner when the two are in two separate branches,
it is easier to merge too. That's a feeling only, I never timed each
merge or change :)

--Pierre

18 years ago by Jani Taskinen — view source — reply

unread

Rasmus Lerdorf wrote:

Uh, this was agreed upon by everyone involved in the design of the
Unicode support. So saying I am the only one is extremely misleading.
I may be the only one explaining why the decision was reached, but I am
certainly not the only one in favour of it.

Yesterday's decisions don't necessarily apply today. ;)
(to be "agile")

By not providing it, we ensure that a large number of people will not
move to PHP 6. At least by providing it we give ourselves a chance. I

Yes, you assume that this happens. I've got a hunch too that there will be more
PHP 6 user's than there ever where with PHP 5 by just thinking how many asian,
arabic, etc. people there are in the world..lot more than western anyway.

think if we drop it we are basically giving up and we will be
maintaining 2 code bases for the next 10 years. Do we really want that?

Yes, if it assures we can actually drop the other one at some point.

More time than maintaining separate Unicode and non-Unicode code bases
in difference branches?

Having 2 versions in same branch sounds very bad idea. For a simple change, you
might have to do it 2 different ways in 2 places. And also run 'make test' and
'make utest' (or whatever it was called again..). Also having 2 sets of tests
for same stuff is gonna be huge PITA.

By having 2 branches of code, you change in two branches and merge the change,
with any luck, the patch applies cleanly in both. :)

I think these issues weren't topmost in the people's minds when the decision of
"unicode.semantics" setting was made.

--Jani

18 years ago by Rasmus Lerdorf — view source — reply

unread

Jani Taskinen wrote:

Rasmus Lerdorf wrote:

Uh, this was agreed upon by everyone involved in the design of the
Unicode support. So saying I am the only one is extremely misleading.
I may be the only one explaining why the decision was reached, but I am
certainly not the only one in favour of it.

Yesterday's decisions don't necessarily apply today. ;)
(to be "agile")

Fair enough, but it would be nice if the folks involved in the decision,
including yourself would then clearly state their reasoning for making,
or at least supporting, the decision in the first place and then explain
what has changed to make you change your mind.

By not providing it, we ensure that a large number of people will not
move to PHP 6. At least by providing it we give ourselves a chance. I

Yes, you assume that this happens. I've got a hunch too that there will
be more PHP 6 user's than there ever where with PHP 5 by just thinking
how many asian, arabic, etc. people there are in the world..lot more
than western anyway.

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass for a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time before we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

-Rasmus

18 years ago by Ilia Alshanetsky — view source — reply

unread

Sorry to interject, but just a quick slightly off topic note.

In your earlier e-mail you've said that

I actually don't have a problem with 95% of PHP 6 installations turning
off Unicode support and this being the default setting for ISP's.

Full Unicode support in an application is a big commitment and it will
take quite a bit of work. I just don't think that many people will
invest the time and effort into doing this, but at the same time there
will be large applications and services that have full control over
their server settings that will make use of it. Think Flickr, Yahoo,
Facebook, etc.

Since 95% of installations will not be using PHP6 (php6 without
unicode is pretty much a slower version php5) for whatever reason,
we need a common version for the other 95%. I think it is inevitable
that there will be 2 continually developed versions of PHP out there,
one for people who need unicode support in the way that is envisioned
by PHP6 and one for people who don't need it.

Jani Taskinen wrote:

Rasmus Lerdorf wrote:

Uh, this was agreed upon by everyone involved in the design of the
Unicode support. So saying I am the only one is extremely
misleading.
I may be the only one explaining why the decision was reached,
but I am
certainly not the only one in favour of it.

Yesterday's decisions don't necessarily apply today. ;)
(to be "agile")

Fair enough, but it would be nice if the folks involved in the
decision,
including yourself would then clearly state their reasoning for
making,
or at least supporting, the decision in the first place and then
explain
what has changed to make you change your mind.

By not providing it, we ensure that a large number of people will
not
move to PHP 6. At least by providing it we give ourselves a
chance. I

Yes, you assume that this happens. I've got a hunch too that there
will
be more PHP 6 user's than there ever where with PHP 5 by just
thinking
how many asian, arabic, etc. people there are in the world..lot more
than western anyway.

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass
for a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time
before we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

-Rasmus

--

Ilia Alshanetsky

18 years ago by Rasmus Lerdorf — view source — reply

unread

Ilia Alshanetsky wrote:

Sorry to interject, but just a quick slightly off topic note.

In your earlier e-mail you've said that

I actually don't have a problem with 95% of PHP 6 installations turning
off Unicode support and this being the default setting for ISP's.

Full Unicode support in an application is a big commitment and it will
take quite a bit of work. I just don't think that many people will
invest the time and effort into doing this, but at the same time there
will be large applications and services that have full control over
their server settings that will make use of it. Think Flickr, Yahoo,
Facebook, etc.

Since 95% of installations will not be using PHP6 (php6 without unicode
is pretty much a slower version php5) for whatever reason,
we need a common version for the other 95%. I think it is inevitable
that there will be 2 continually developed versions of PHP out there,
one for people who need unicode support in the way that is envisioned by
PHP6 and one for people who don't need it.

Well, PHP is going to evolve and get more features and performance
enhancements. Those are all going to go into PHP 6 and above. People
stuck on PHP 5 won't see any of these, so I don't see PHP 6 without
unicode as just a slower version of PHP 5. Namespaces and some of the
other PHP 6 planned features are probably quite interesting to a number
of people some of whom may not be interested in Unicode.

-Rasmus

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Hi,

I think that PHP6 adoption will obviously come down to exactly how much
slower unicode support will make things. If its a 10% drop, I think we
will probably find ways to smooth out the kinks with some nice tweaks
here and there. If it gets considerably above 10%, then it will be more
tricky.

But at any rate, I think there are a fair amount of people on PHP4,
holding out for PHP6 to do their next redesign etc. All in all I think
we have enough pain points even without unicode for people to have
excuses to not support PHP6.

So imho, if we make PHP6 as fast as we can for unicode only and still
find that its too slow for people to adopt, then we might be better of
back porting a few nice features to PHP5, then making PHP6 a dual
version internally. Then again, I am likely talking out of my a.. here,
since I do not work on the source.

regards,
Lukas

18 years ago by Stefan Priebsch — view source — reply

unread

Hi list,

IMHO PHP6 might need more new features to attract developers to migrate
their code / write new code for PHP6. Unicode support is great for those
who need it, but is likely to cause work for those that "just have to
live with it". Nobody likes to so extra work, and it's hard to get
managers to pay for refactoring and updating software without
visible/measurable benefit. Especially if the new version it going to be
slower, it's tough to push it through.

Namespaces in PHP 6.0 might be a very interesting feature for many
developers. Currently, it seems that unicode will be main thing, which
might result in lots of developers not care about it and just wait for
namespaces or another feature they need in 6.1 (or whenever).

Regarding the BC break: maybe breaking BC can be cleverly used as a
marketing statement. "We have cleaned up PHP, to get rid of some sins of
the past". If users do not upgrade because there is just one or two
small issues that require them to put work in their existing code, then
why keep BC?

Those who are planning to upgrade to PHP6 will always have to put in
some work into their code, or at least test it on PHP6.

Those who stick with older versions do not really care wether the new
version breaks BC "just a little" or "really a lot" - they are not going
to touch their existing code anyway and stick with older PHP versions.

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by Jani Taskinen — view source — reply

unread

What I think Ilia said (between the lines) is that basically we're forking PHP.

Perhaps we really need to accept the fact that this has already happened..
It started with the CPR for PHP_4_4 branch and same is now continuing with
the PHP_5_2 branch. If the support for PHP 4 was officially dropped by release
of PHP 5, the adoption of PHP 5 would have been quicker than it has been so far.

--Jani

Rasmus Lerdorf kirjoitti:

Ilia Alshanetsky wrote:

Sorry to interject, but just a quick slighty off topic note.

In your earlier e-mail you've said that

I actually don't have a problem with 95% of PHP 6 installations turning
off Unicode support and this being the default setting for ISP's.

Full Unicode support in an application is a big commitment and it will
take quite a bit of work. I just don't think that many people will
invest the time and effort into doing this, but at the same time there
will be large applications and services that have full control over
their server settings that will make use of it. Think Flickr, Yahoo,
Facebook, etc.

Since 95% of installations will not be using PHP6 (php6 without unicode
is pretty much a slower version php5) for whatever reason,
we need a common version for the other 95%. I think it is inevitable
that there will be 2 continually developed versions of PHP out there,
one for people who need unicode support in the way that is envisioned by
PHP6 and one for people who don't need it.

Well, PHP is going to evolve and get more features and performance
enhancements. Those are all going to go into PHP 6 and above. People
stuck on PHP 5 won't see any of these, so I don't see PHP 6 without
unicode as just a slower version of PHP 5. Namespaces and some of the
other PHP 6 planned features are probably quite interesting to a number
of people some of whom may not be interested in Unicode.

-Rasmus

18 years ago by Pierre — view source — reply

unread

What I think Ilia said (between the lines) is that basically we're forking PHP.

Perhaps we really need to accept the fact that this has already happened..
It started with the CPR for PHP_4_4 branch and same is now continuing with
the PHP_5_2 branch. If the support for PHP 4 was officially dropped by release
of PHP 5, the adoption of PHP 5 would have been quicker than it has been so far.

You are right, that's one of the only way to "force" the move, the
only one I can imagine at least.

A realistic way to do it is to say that 5.x will not be supported 2
years (or so) after the first stable release of php6. That's still
~5-10 years with a maintained php5.x (ok, 5.0.x was born dead ;).

(No need to say that php4 support should have been stopped already)

--Pierre

18 years ago by Derick Rethans — view source — reply

unread

What I think Ilia said (between the lines) is that basically we're forking
PHP.

Perhaps we really need to accept the fact that this has already happened..
It started with the CPR for PHP_4_4 branch and same is now continuing with
the PHP_5_2 branch. If the support for PHP 4 was officially dropped by
release
of PHP 5, the adoption of PHP 5 would have been quicker than it has been so
far.

You are right, that's one of the only way to "force" the move, the
only one I can imagine at least.

A realistic way to do it is to say that 5.x will not be supported 2
years (or so) after the first stable release of php6. That's still
~5-10 years with a maintained php5.x (ok, 5.0.x was born dead ;).

(No need to say that php4 support should have been stopped already)

End of the year :)

Derick

18 years ago by Pierre — view source — reply

unread

(No need to say that php4 support should have been stopped already)

End of the year :)

As I like (love even) the idea, I completely missed any discussion
about this time but your april fool post on your blog/site. It is not
exactly the way I had in mind (and not how we should do it) :)

--Pierre

18 years ago by Ilia Alshanetsky — view source — reply

unread

IMHO the big difference between the 4.x to 5.x migration and the one
from 5.x to 6.x is who do the changes benefit. I think Rasmus made a
very true and correct statement, PHP 6, who's main offering (at least
right now) is unicode support is mostly for the 3-4% of the user base
inside large companies like Yahoo that need to deploy multi-language
applications and have full control over their environment. For the
average joe, PHP 6 is not needed because as a rule they develop for 1
locale, which is something PHP can already do quite well, if the #s
of PHP based sites are to be taken into account. This means that
there is absolutely nothing of value that the average use has to gain
by moving from 5.x aside from drop in speed, which I am sure will be
a winner for hosting companies and guaranteed BC breaks. From that
perspective, I think PHP 6 adoption will be very slow, even compared
to the luckster 5.x adoption rates, which only in the last year have
began to pickup steam.

Given that it is the case I think PHP 5 will be supported for a very
long time and eventually may even take a life of its own simply due
to the large user base it will have, that has nothing to gain by
moving to PHP 6. Keeping in mind that aside from unicode any other
features/additions of PHP 6 could be easily ported to PHP 5 by one
who is interested in them.

What I think Ilia said (between the lines) is that basically we're
forking PHP.

Perhaps we really need to accept the fact that this has already
happened..
It started with the CPR for PHP_4_4 branch and same is now
continuing with
the PHP_5_2 branch. If the support for PHP 4 was officially
dropped by release
of PHP 5, the adoption of PHP 5 would have been quicker than it
has been so far.

You are right, that's one of the only way to "force" the move, the
only one I can imagine at least.

A realistic way to do it is to say that 5.x will not be supported 2
years (or so) after the first stable release of php6. That's still
~5-10 years with a maintained php5.x (ok, 5.0.x was born dead ;).

(No need to say that php4 support should have been stopped already)

--Pierre

Ilia Alshanetsky

18 years ago by Jani Taskinen — view source — reply

unread

Rasmus Lerdorf wrote:

Jani Taskinen wrote:

Yesterday's decisions don't necessarily apply today. ;)
(to be "agile")

Fair enough, but it would be nice if the folks involved in the decision,
including yourself would then clearly state their reasoning for making,

I think I've pretty clearly stated why I don't like the idea (anymore).

or at least supporting, the decision in the first place and then explain
what has changed to make you change your mind.

What has changed is the fact that HEAD is becoming a mess (or already is).
Separate tests for different modes, code duplication, etc.
Maintaining all this is gonna be hell..and I'm not gonna volunteer doing it
especially if it's totally unnecessary (IMO).

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass for a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time before we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

To be totally honest, as long as nobody pays us to do something the way they
want, we should do it exactly how WE want. After all, the group doing most of
the work is Y! (or former Y!) people? (not to forget 1 or 2 Zend folks)
The rest of us are not getting paid for this, like for example Ilia.

We have an oportunity to finally get rid of the burden of past "mistakes"
but lets not add any replacements for them. ;)

--Jani

18 years ago by Matt Wilmas — view source — reply

unread

Hi all,

I haven't thought about this too much, just came to mind after following
this thread, so ignore any stupidity. :-)

Wanting to preserve BC where possible, and figuring that code to take
advantage of PHP 6's Unicode support will be either new or rewritten... Is
it possible to always have Unicode support "there" (enabled, no
unicode.semantics setting), as some need/want, but keep behavior of PHP 5
code exactly the same? I mean nothing Unicode is actually used unless
explicitly specified, keeping everything IS_STRING, etc.

For Unicode support in just part of your code, use a [new] u"string prefix"
or (unicode) cast. Functions would still handle either type, of course, but
there shouldn't be unexpected changes in them, because if they're getting
Unicode strings, it's because YOU did it.

For everything Unicode (like semantics=On now), instead of adding u prefixes
and (unicode) casts everywhere, use declare() at the beginning of the
top-most file (setting couldn't be changed after that).

This seems a bit different to me than having unicode.semantics
PHP_INI_PERDIR... Am I wrong, or was this considered (e.g. needing to
specify anything Unicode) and I'm missing major issues...? :-/

Matt

18 years ago by Richard Lynch — view source — reply

unread

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass for
a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time before
we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

If they switch to 6 with unicode off, and never ever get around to
turning unicode on, will it really be any better?

They'll just be running some weird-o setup that causes all kinds of
bugs and issues and you'll have users with php 6 apps that won't work
in php 6 and who submit bogus bug reports about it, because of the
setting.

A clean break is probably better, especially if it makes php 6 much
more maintainable.

Large-scale hosts won't switch to 6 any faster than they switched to
5, unless there are ZERO BC breaks.

And nobody can guarantee zero breaks, because there are always buglets.

The effort to have unicode off in 6 is probably larger than the effort
to document what needs to be done to a PHP 5 app to make it be
6-friendly, or even write tools to auto-convert the buik of a script.

If unicode semantics are "on" what exactly is borked in PHP 5?

Can that be fixed to be BC without resorting to this toggle?

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Tomas Kuliavas — view source — reply

unread

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass for
a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time before
we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

If they switch to 6 with unicode off, and never ever get around to
turning unicode on, will it really be any better?

They'll just be running some weird-o setup that causes all kinds of
bugs and issues and you'll have users with php 6 apps that won't work
in php 6 and who submit bogus bug reports about it, because of the
setting.

A clean break is probably better, especially if it makes php 6 much
more maintainable.

Large-scale hosts won't switch to 6 any faster than they switched to
5, unless there are ZERO BC breaks.

And nobody can guarantee zero breaks, because there are always buglets.

buglet = small break and not something that requires massive code rewrite.
Rewritten code is no longer backwards compatible. So developers have to
maintain two code branches or two different sets of libraries. If code is
maintained in one branch, scripts will need wrapper functions for most of
PHP string and stream function calls. Instead of having performance loss
in interpreter, you will force performance loss in portable scripts.

The effort to have unicode off in 6 is probably larger than the effort
to document what needs to be done to a PHP 5 app to make it be
6-friendly, or even write tools to auto-convert the buik of a script.

If unicode semantics are "on" what exactly is borked in PHP 5?

In Unicode mode [0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode code
points and not to octal or hexadecimal byte values. Fix is not backwards
compatible.

Scripts can't match bytes. How they are supposed to check if string is in
plain ascii or in 8bit? Do conversion to ASCII and check for errors
instead of looking for 8bit byte values? How can scripts replace 8bit
bytes with some other strings? ISO-8859-2 decoding table contains 95
entries written and evaluated as binary strings. Same thing applies to
other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
decoding does not use mapping tables and performs complex calculations
with byte values. multibyte character set decoding might actually benefit
from unicode_encode(), if Table 325 (http://www.php.net/unicode) provides
more information about U_INVALID_SUBSTITUTE and other unicode. settings.

PHP6 does not provide backwards compatible functions to work with bytes.
Provided constructs are not backwards compatible. If scripts want to do
MIME Q encoding, they must work with bytes. Doing Q encoding with provided
PHP extensions adds extra dependencies.

ICU does not support HTML target. Text conversion to iso-8859-x or
windows-125x targets will be lossy.

Can that be fixed to be BC without resorting to this toggle?

Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and older.

PHP6 could introduce new Unicode aware functions, but Unicode
implementation choose to modify existing ones. All low level string
operations ($string[1]) are Unicode aware by default and not when script
actually asks for it. Such implementation is designed for developers, who
don't care about Unicode support and want it out of the box without any
changes in their Unicode unaware scripts. It is not designed for
developers that actually need it and want to have code working in PHP6 and
PHP4/5.

Unicode code points can be defined with \u, but PHP6 breaks existing octal
and hex escape sequences.

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string given")
about data stream and string operations. even when fwrite() or
base64_encode() works only with plain ascii data. PHP script developers
are not used to strict variable type checks in string functions. Which
functions are modified to require binary typecasting? Do I have to make a
list myself every time some function freaks out?

--
Tomas

18 years ago by Richard Quadling — view source — reply

unread

It comes down to predicting the future. Whichever way we go, the
decision is going to be second-guessed. If we have critical mass for
a
clean BC break, then I am ok with it. For me personally it would make
things a bit easier, but I think it would be a long long time before
we
saw any large hosts out there switch to a PHP 6 that can't run common
PHP 5 apps.

If they switch to 6 with unicode off, and never ever get around to
turning unicode on, will it really be any better?

They'll just be running some weird-o setup that causes all kinds of
bugs and issues and you'll have users with php 6 apps that won't work
in php 6 and who submit bogus bug reports about it, because of the
setting.

A clean break is probably better, especially if it makes php 6 much
more maintainable.

Large-scale hosts won't switch to 6 any faster than they switched to
5, unless there are ZERO BC breaks.

And nobody can guarantee zero breaks, because there are always buglets.

buglet = small break and not something that requires massive code rewrite.
Rewritten code is no longer backwards compatible. So developers have to
maintain two code branches or two different sets of libraries. If code is
maintained in one branch, scripts will need wrapper functions for most of
PHP string and stream function calls. Instead of having performance loss
in interpreter, you will force performance loss in portable scripts.

The effort to have unicode off in 6 is probably larger than the effort
to document what needs to be done to a PHP 5 app to make it be
6-friendly, or even write tools to auto-convert the buik of a script.

If unicode semantics are "on" what exactly is borked in PHP 5?

In Unicode mode [0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode code
points and not to octal or hexadecimal byte values. Fix is not backwards
compatible.

Scripts can't match bytes. How they are supposed to check if string is in
plain ascii or in 8bit? Do conversion to ASCII and check for errors
instead of looking for 8bit byte values? How can scripts replace 8bit
bytes with some other strings? ISO-8859-2 decoding table contains 95
entries written and evaluated as binary strings. Same thing applies to
other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
decoding does not use mapping tables and performs complex calculations
with byte values. multibyte character set decoding might actually benefit
from unicode_encode(), if Table 325 (http://www.php.net/unicode) provides
more information about U_INVALID_SUBSTITUTE and other unicode. settings.

PHP6 does not provide backwards compatible functions to work with bytes.
Provided constructs are not backwards compatible. If scripts want to do
MIME Q encoding, they must work with bytes. Doing Q encoding with provided
PHP extensions adds extra dependencies.

ICU does not support HTML target. Text conversion to iso-8859-x or
windows-125x targets will be lossy.

Can that be fixed to be BC without resorting to this toggle?

Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and older.

PHP6 could introduce new Unicode aware functions, but Unicode
implementation choose to modify existing ones. All low level string
operations ($string[1]) are Unicode aware by default and not when script
actually asks for it. Such implementation is designed for developers, who
don't care about Unicode support and want it out of the box without any
changes in their Unicode unaware scripts. It is not designed for
developers that actually need it and want to have code working in PHP6 and
PHP4/5.

Unicode code points can be defined with \u, but PHP6 breaks existing octal
and hex escape sequences.

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string given")
about data stream and string operations. even when fwrite() or
base64_encode() works only with plain ascii data. PHP script developers
are not used to strict variable type checks in string functions. Which
functions are modified to require binary typecasting? Do I have to make a
list myself every time some function freaks out?

--
Tomas

The more I read about what is in place for PHP6 with regard to
Unicode, I feel Unicode should have been an extension included in the
core, rather than rewriting the core. Provide a series of useful
classes and functions. It is there if you want it and as more and more
people get used to it, more use will be made of it. It almost looks
like all the time and energy (thank you to you all) that has been put
into PHP6 to make it Unicode aware will be wasted if it is disabled by
default. I also feel that if it is enabled by default and causes so
much BC that no one will upgrade.

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Stanislav Malyshev — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks existing octal
and hex escape sequences.

What do you mean? Doesn't \x20 create U0020 character? Or you mean you'd
expect it to create just one-byte 0x20? Doesn't binary string do that?

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string given")

Well, exporting and importing to and from non-unicode contexts are
tricky, and fwrite and base64_encode do exactly that. Maybe some
functions need to be less noisy, I don't know - but when people work
with unicode they must be aware that interoperating with non-unicode
contexts brings some complexity, I don't see how that can be avoided.

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Pierre — view source — reply

unread

Hi,

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string given")

Well, exporting and importing to and from non-unicode contexts are
tricky, and fwrite and base64_encode do exactly that. Maybe some
functions need to be less noisy, I don't know - but when people work
with unicode they must be aware that interoperating with non-unicode
contexts brings some complexity, I don't see how that can be avoided.

It can't (and should not be).

And it is not possible to make old releases compatible with the newly
introduced binary cast (noop in 5.2). I thought we made it clear
already and I'm unsure why Tomas brought it back to the list of
griefs.

--Pierre

18 years ago by Alexey Zakhlestin — view source — reply

unread

The thing which I don't understand is: why do people want backward
compatibility that much?
For me, PHP6 is the new environment and I don't want to move my
existing solutions there. That just won't be right.
Just like I didn't reuse code written for PHP4 in PHP5 projects.

This is a major version, so I create new major version of my code.

Situation is just the same everywhere. Perl5 vs Perl6, Cocoa of
MacOS-X 10.3/10.4/10.5…

If ones need to use some older scripts, they can run them using older
version of PHP. It is quite common to have both PHP4 and PHP5
installed on hostings these days.

Call to developers:
Create new versions of your apps/libraries which use new features of
language. Make your users interested in upgrading. If users want it,
hosting-owners will consider upgrades faster. It's all about
marketing ;)

--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

18 years ago by Tomas Kuliavas — view source — reply

unread

Call to developers:
Create new versions of your apps/libraries which use new features of
language. Make your users interested in upgrading. If users want it,
hosting-owners will consider upgrades faster. It's all about
marketing ;)

It also depends on your marketing policy. PHP 5.2.1 requirement is not
acceptable when marketing says that product runs on any php version and
current product version works fine on PHP 4.1-5.x with only quirk in PHP
4.4.1.

--
Tomas

18 years ago by Richard Lynch — view source — reply

unread

The thing which I don't understand is: why do people want backward
compatibility that much?

Because if you run a webhost with a zillion users, half of whom are
screaming for PHP 6, and half of whom are screaming because something
broke, you're a very unhappy company.

Because if you have plug-in web-apps like forums, guestbooks, ... you
don't want to re-write the damn things.

Call to developers:
Create new versions of your apps/libraries which use new features of
language. Make your users interested in upgrading. If users want it,
hosting-owners will consider upgrades faster. It's all about
marketing ;)

And your code will be used by a tiny tiny fraction of the potential
audience...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by johannes@php.net — view source — reply

unread

The thing which I don't understand is: why do people want backward
compatibility that much?

Because if you run a webhost with a zillion users, half of whom are
screaming for PHP 6, and half of whom are screaming because something
broke, you're a very unhappy company.

It is as easy to install two different PHP versions as one with
unicode.semantics On and one with unicode.semantics Off since it's
PHP_INI_SYSTEM. But depending on that setting you get incompatible,
different products with the same name which will just produce way more
problems to hosters and developers of software for "PHP 6".

johannes

18 years ago by Cristian Rodriguez — view source — reply

unread

which will just produce way more
problems to hosters and developers of software for "PHP 6".

yes :-( .. So if unicode.semantics cannot be set at runtime with ini_set() or at least "per-dir" is a complete non-sense to have it,
as the vast mayority of users will not be able to turn it On/off and
will certainly be off in most configurations as otherwise it will
break too much code.

Im sorry but I dont see this ending as a good thing.. looks pretty
much like more of the same old mistakes ( magic_quotes , safe_mode
anyone ? this may be even worse..)

18 years ago by Derick Rethans — view source — reply

unread

which will just produce way more
problems to hosters and developers of software for "PHP 6".

yes :-( .. So if unicode.semantics cannot be set at runtime with
ini_set() or at least "per-dir" is a complete non-sense to have it,
as the vast mayority of users will not be able to turn it On/off and
will certainly be off in most configurations as otherwise it will
break too much code.

Im sorry but I dont see this ending as a good thing.. looks pretty
much like more of the same old mistakes ( magic_quotes , safe_mode
anyone ? this may be even worse..)

This is worse because with magic_quotes you can atleast workaround it
in user land. Not so much with this setting.

Derick

18 years ago by Rasmus Lerdorf — view source — reply

unread

Derick Rethans wrote:

which will just produce way more
problems to hosters and developers of software for "PHP 6".

yes :-( .. So if unicode.semantics cannot be set at runtime with
ini_set() or at least "per-dir" is a complete non-sense to have it,
as the vast mayority of users will not be able to turn it On/off and
will certainly be off in most configurations as otherwise it will
break too much code.

Im sorry but I dont see this ending as a good thing.. looks pretty
much like more of the same old mistakes ( magic_quotes , safe_mode
anyone ? this may be even worse..)

This is worse because with magic_quotes you can atleast workaround it
in user land. Not so much with this setting.

It comes down to whether we want a true Unicode mode for PHP. As far as
I am concerned, anything short of that is rather half-assed and feels
bolted on like in other languages. The huge difficulty, and the reason
it is bolted on after the fact in most languages, is that it is
extremely difficult to transition from non-unicode to full unicode
without breaking everything.

The suggestion has been to just have a bunch of Unicode functions you
can call so you explicitly control when you are doing Unicode stuff and
the rest of the time you are working in binary mode. That's exactly
what we have with the Unicode semantics turned off. The idea is for all
the Unicode functionality to be available in this mode and like has been
stated many times, this is the mode most ISP's are going to run their
shared servers in and as such this is the mode a portable PHP script
needs to be written for.

However, does this mean we shouldn't even attempt to get it right? 5
years from now, are we still going to limp along having to call explicit
functions to compare and iterate over unicode strings? Or heaven
forbid, we end up with a mess of various string classes. A string is
just a string, it isn't a class and it shouldn't be complicated. It
should have carried a charset with it from day one, but it didn't, so we
are where we are.

So yes, the only real customers for this full Unicode mode in PHP 6 are
going to be the folks that have full control over their servers and
their software which will likely limit it to hosted services and exclude
large PHP software packages that will necessarily need to be written to
be portable. That of course creates a split right down the middle and
makes code sharing harder, and maybe it won't work, but the hope is we
can minimize these issues enough that the amount of code that
realistically needs to be written twice will be rather limited. If we
can't get it down to a manageable set of known things that people need
to watch out for, then our full unicode attempt has failed and we need
to stick with the half-assed approach. I'm not convinced we are there
yet and I'd hate to see us give up before we have taken a decent stab at
it. We need to think big and longterm, not small and shortterm here.

-Rasmus

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Rasmus Lerdorf wrote:

So yes, the only real customers for this full Unicode mode in PHP 6 are
going to be the folks that have full control over their servers and
their software which will likely limit it to hosted services and exclude
large PHP software packages that will necessarily need to be written to
be portable. That of course creates a split right down the middle and
makes code sharing harder, and maybe it won't work, but the hope is we
can minimize these issues enough that the amount of code that
realistically needs to be written twice will be rather limited. If we
can't get it down to a manageable set of known things that people need
to watch out for, then our full unicode attempt has failed and we need
to stick with the half-assed approach. I'm not convinced we are there
yet and I'd hate to see us give up before we have taken a decent stab at
it. We need to think big and longterm, not small and shortterm here.

To me it boils down how we want to maintain the "fork":

PHP5 and PHP6
PHP6 unicode off/on (with PHP5 in maintenance mode)

Considering that people will not jump on PHP6 immediately anyways, I
think 1) is more realistic, if we make best efforts to back port new
features to PHP5, but still require that new features go into PHP6
first. Some features might not get back ported and that is a somewhat
unfriendly nudge towards PHP6. So it goes.

This way the PHP6 code base stays lean and people can realistically code
against PHP6. Hosters will hopefully offer both PHP5 and PHP6. I doubt
that many hosters would be interested in offering 3 versions at once
(PHP5, PHP6 unicode on/off).

regards,
Lukas

18 years ago by Antony Dovgal — view source — reply

unread

To me it boils down how we want to maintain the "fork":

PHP5 and PHP6

PHP6 unicode off/on (with PHP5 in maintenance mode)

Considering that people will not jump on PHP6 immediately anyways, I
think 1) is more realistic, if we make best efforts to back port new
features to PHP5, but still require that new features go into PHP6
first. Some features might not get back ported and that is a somewhat
unfriendly nudge towards PHP6. So it goes.

I tend to agree with this POV more and more.

Especially considering this:

Rasmus Lerdorf wrote:

So yes, the only real customers for this full Unicode mode in PHP 6 are
going to be the folks that have full control over their servers and
their software which will likely limit it to hosted services and exclude
large PHP software packages that will necessarily need to be written to
be portable.
--

If we admit that we release a special PHP version for a very limited set
of users then keeping that On/Off switch makes no sense to me.
And it's not about choice, customers DO have a choice - either it's PHP5 (which will
still be there for the next 10 years at the very least) or PHP6 aka Unicode PHP.

You don't by a Porsche if you need a taxi, why would you install PHP6 if you don't need Unicode?
New features? Let's just agree that we can (and definitely will) backport all the fancy looking
new features from PHP6 to PHP5 and both these branches can live together happily.

This way the PHP6 code base stays lean and people can realistically code
against PHP6. Hosters will hopefully offer both PHP5 and PHP6. I doubt
that many hosters would be interested in offering 3 versions at once
(PHP5, PHP6 unicode on/off).

--
Wbr,
Antony Dovgal

18 years ago by Richard Quadling — view source — reply

unread

To me it boils down how we want to maintain the "fork":

PHP5 and PHP6

PHP6 unicode off/on (with PHP5 in maintenance mode)

Considering that people will not jump on PHP6 immediately anyways, I
think 1) is more realistic, if we make best efforts to back port new
features to PHP5, but still require that new features go into PHP6
first. Some features might not get back ported and that is a somewhat
unfriendly nudge towards PHP6. So it goes.

I tend to agree with this POV more and more.

Especially considering this:

Rasmus Lerdorf wrote:

So yes, the only real customers for this full Unicode mode in PHP 6 are
going to be the folks that have full control over their servers and
their software which will likely limit it to hosted services and exclude
large PHP software packages that will necessarily need to be written to
be portable.
--

If we admit that we release a special PHP version for a very limited set
of users then keeping that On/Off switch makes no sense to me.
And it's not about choice, customers DO have a choice - either it's PHP5 (which will
still be there for the next 10 years at the very least) or PHP6 aka Unicode PHP.

You don't by a Porsche if you need a taxi, why would you install PHP6 if you don't need Unicode?
New features? Let's just agree that we can (and definitely will) backport all the fancy looking
new features from PHP6 to PHP5 and both these branches can live together happily.

This way the PHP6 code base stays lean and people can realistically code
against PHP6. Hosters will hopefully offer both PHP5 and PHP6. I doubt
that many hosters would be interested in offering 3 versions at once
(PHP5, PHP6 unicode on/off).

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.
Unicode is a great idea, but I don't use unicode at the moment, but
I'd still like to have PHP6 when it is officially released without
having to do major work to make my code compliant AND without having
to turn Unicode off.

For those that need it, then they can code for it. For those that
don't they still get all the other improvements in PHP6 and without
the reported speed issues as they are not using the extension.

This seems like a winner to me.

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Antony Dovgal — view source — reply

unread

On 06.07.2007 15:32, Richard Quadling wrote:

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

Moreover, we do have such an extension, it's called "mbstring" and you can use it even in PHP4.
But the point is that it's just an extension, hence the Unicode support is far far from full.

Unicode is a great idea, but I don't use unicode at the moment, but
I'd still like to have PHP6 when it is officially released without
having to do major work to make my code compliant AND without having
to turn Unicode off.

If you don't need Unicode, you don't need PHP6.
It's that simple.

For those that need it, then they can code for it. For those that
don't they still get all the other improvements in PHP6 and without
the reported speed issues as they are not using the extension.

This seems like a winner to me.

--
Wbr,
Antony Dovgal

18 years ago by Richard Quadling — view source — reply

unread

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

Moreover, we do have such an extension, it's called "mbstring" and you can use it even in PHP4.
But the point is that it's just an extension, hence the Unicode support is far far from full.

Unicode is a great idea, but I don't use unicode at the moment, but
I'd still like to have PHP6 when it is officially released without
having to do major work to make my code compliant AND without having
to turn Unicode off.

If you don't need Unicode, you don't need PHP6.
It's that simple.

So, all the time and effort going into PHP6 is for 1 maybe-used set of
functionality which also seems to slow down the entire system. I know
I MUST be missing something here.

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Richard Quadling wrote:

So, all the time and effort going into PHP6 is for 1 maybe-used set of
functionality which also seems to slow down the entire system. I know
I MUST be missing something here.

yes you are missing the point both Anthony and I made, that if we remove
the unicode switch we would commit to backporting most non unicode
features to PHP5.

regards,
Lukas

18 years ago by Richard Quadling — view source — reply

unread

Richard Quadling wrote:

So, all the time and effort going into PHP6 is for 1 maybe-used set of
functionality which also seems to slow down the entire system. I know
I MUST be missing something here.

yes you are missing the point both Anthony and I made, that if we remove
the unicode switch we would commit to backporting most non unicode
features to PHP5.

Which would be great for PHP5 and stone cold killer for PHP6 surely?
Unless you needed Unicode. Hmmm.

So whats the expected future of PHP? PHP4 old now, PHP5 much life yet,
PHP6 obscure functionality only for those that know/need it. ISPs
thinking its great to stay with PHP4/5 as the little guys don't need
or ask for Unicode and therefore very little PHP6 take-up.

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Stefan Priebsch — view source — reply

unread

IMHO backporting a lot of features to PHP4 is a major reasons for the
slow PHP5 adoption. Basically, it seems that everybody who is not using
OOP feels that PHP4 is fine for them.

I'd say committing to backporting stuff from PHP6 to PHP5 will yield a
similar situation: very slow or no PHP6 adoption.

BTW, can't the unicode switch be done at compile time? So one can
compile PHP6 Unicode and PHP6 non-Unicode. Then if there is a clever way
of running both engines in parallel, there should be no performance
impact inside the non-unicode engine. Since there is both versions of
the engine (that can maybe even selected by a certain statement in the
main PHP file of the application), unicode and non-unicode users are
happy. And there is only one version of PHP in the market, to conquer it
all.

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Stefan Priebsch wrote:

IMHO backporting a lot of features to PHP4 is a major reasons for the
slow PHP5 adoption. Basically, it seems that everybody who is not using
OOP feels that PHP4 is fine for them.

what was back ported aside from the memory corruption fix, which I am
sure even pushed a few people to update to PHP5?

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

native unicode is a feature. also i would mandate that all new features
must first go into PHP6.

regards,
Lukas

18 years ago by Pierre — view source — reply

unread

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

Namespace is one very important reason. If we need a "marketing"
argument for PHP6 outside unicode, it is the one. I would also like to
do not backport it (but we can backport it as well, my main problem is
only this flag).

--Pierre

18 years ago by Jochem Maas — view source — reply

unread

Pierre wrote:

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

Namespace is one very important reason. If we need a "marketing"
argument for PHP6 outside unicode, it is the one. I would also like to
do not backport it (but we can backport it as well, my main problem is
only this flag).

late static binding is another reason (are we still going to get that?)

rgds,
Jochem

PS - as an average joe this whole unicode semantic debate is confusing
to a tee and scary with a capital S. :-)

--Pierre

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Jochem Maas wrote:

Pierre wrote:

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.
Namespace is one very important reason. If we need a "marketing"
argument for PHP6 outside unicode, it is the one. I would also like to
do not backport it (but we can backport it as well, my main problem is
only this flag).

late static binding is another reason (are we still going to get that?)

well .. last I heard we are still stuck on this one, since it would
require expanding the general zval structure.

regards,
Lukas

18 years ago by Jochem Maas — view source — reply

unread

Lukas Kahwe Smith wrote:

Jochem Maas wrote:

Pierre wrote:

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.
Namespace is one very important reason. If we need a "marketing"
argument for PHP6 outside unicode, it is the one. I would also like to
do not backport it (but we can backport it as well, my main problem is
only this flag).

late static binding is another reason (are we still going to get that?)

well .. last I heard we are still stuck on this one, since it would
require expanding the general zval structure.

oh, I see (well kind of), does this mean it may get taken off the table?
or is it slated as definite (assuming a satisfactory implementation can be
created)?

sorry to be a bore about LSB, it's just that it's the thing I look forward to
most :-), I have missed it since php5 was still in RC and I really believe that
LSB would improve php's OO model.

thank you for your feedback,
regards,
Jochem

regards,
Lukas

18 years ago by Jochem Maas — view source — reply

unread

Jochem Maas wrote:

Lukas Kahwe Smith wrote:

Jochem Maas wrote:

Pierre wrote:

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.
Namespace is one very important reason. If we need a "marketing"
argument for PHP6 outside unicode, it is the one. I would also like to
do not backport it (but we can backport it as well, my main problem is
only this flag).
late static binding is another reason (are we still going to get that?)
well .. last I heard we are still stuck on this one, since it would
require expanding the general zval structure.

oh, I see (well kind of), does this mean it may get taken off the table?
or is it slated as definite (assuming a satisfactory implementation can be
created)?

I'll answer myself, as I've just come across Derick's meeting notes ... it's seems
LSB is in and Marcus has the honor of suggesting an implementation.

I wish him well with that and hope he succeeds!
if he does I'll have to make him my hero for day. :-)
and if he doesn't then at least he tried to do what I wish I could.

sorry to be a bore about LSB, it's just that it's the thing I look forward to
most :-), I have missed it since php5 was still in RC and I really believe that
LSB would improve php's OO model.

thank you for your feedback,
regards,
Jochem

regards,
Lukas

18 years ago by Stefan Priebsch — view source — reply

unread

Pierre schrieb:

Namespace is one very important reason. If we need a "marketing"

I agree. But AFAIK namespaces were not supposed to be in PHP6, at least
not in PHP 6.0. Is there an official position on wether namespaces will
be in PHP 6.0?

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by johannes@php.net — view source — reply

unread

Pierre schrieb:

Namespace is one very important reason. If we need a "marketing"

I agree. But AFAIK namespaces were not supposed to be in PHP6, at least
not in PHP 6.0. Is there an official position on wether namespaces will
be in PHP 6.0?

http://www.php.net/~derick/meeting-notes.html#name-spaces repalce MArcus
with Dmitry in the "Conclusion" and check the other thread.

johannes

18 years ago by Stefan Priebsch — view source — reply

unread

Johannes Schlüter schrieb:

http://www.php.net/~derick/meeting-notes.html#name-spaces repalce MArcus
with Dmitry in the "Conclusion" and check the other thread.

While we edit the document, can we also drop that if printed in italics?

;-)

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by Andrei Zmievski — view source — reply

unread

As we see now, yes they will be in PHP 6.

-Andrei

Pierre schrieb:

Namespace is one very important reason. If we need a "marketing"

I agree. But AFAIK namespaces were not supposed to be in PHP6, at
least
not in PHP 6.0. Is there an official position on wether namespaces
will
be in PHP 6.0?

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by Stefan Priebsch — view source — reply

unread

Andrei Zmievski schrieb:

As we see now, yes they will be in PHP 6.

:-))

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by Andrei Zmievski — view source — reply

unread

Yes, backporting major features from PHP 6 to 5 will slow down PHP 6
adoption, and I'd like to avoid it if possible.

There is a way to run two engines side by side, by the way: in
separate instances of Apache. It's really not that complicated.

-Andrei

IMHO backporting a lot of features to PHP4 is a major reasons for the
slow PHP5 adoption. Basically, it seems that everybody who is not
using
OOP feels that PHP4 is fine for them.

I'd say committing to backporting stuff from PHP6 to PHP5 will yield a
similar situation: very slow or no PHP6 adoption.

BTW, can't the unicode switch be done at compile time? So one can
compile PHP6 Unicode and PHP6 non-Unicode. Then if there is a
clever way
of running both engines in parallel, there should be no performance
impact inside the non-unicode engine. Since there is both versions of
the engine (that can maybe even selected by a certain statement in the
main PHP file of the application), unicode and non-unicode users are
happy. And there is only one version of PHP in the market, to
conquer it
all.

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

18 years ago by David Coallier — view source — reply

unread

IMHO backporting a lot of features to PHP4 is a major reasons for the
slow PHP5 adoption. Basically, it seems that everybody who is not using
OOP feels that PHP4 is fine for them.

I'd say committing to backporting stuff from PHP6 to PHP5 will yield a
similar situation: very slow or no PHP6 adoption.

BTW, can't the unicode switch be done at compile time? So one can
compile PHP6 Unicode and PHP6 non-Unicode. Then if there is a clever way
of running both engines in parallel, there should be no performance
impact inside the non-unicode engine. Since there is both versions of
the engine (that can maybe even selected by a certain statement in the
main PHP file of the application), unicode and non-unicode users are
happy. And there is only one version of PHP in the market, to conquer it
all.

There must be a reason to upgrade to a new PHP version (usually
features, maybe performance increase etc.). But there also must be no
reason not to upgrade. But you all know this, it has been said before.

Kind regards,

Stefan

--

e-novative> - We make IT work for you.

e-novative GmbH - HR: Amtsgericht München HRB 139407
Sitz: Wolfratshausen - GF: Dipl. Inform. Stefan Priebsch

http://www.e-novative.de

--

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

--
D
Do I get a buck? No so ?

18 years ago by Andrei Zmievski — view source — reply

unread

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and
going forward with PHP 6 as Unicode-only. God knows it will rid us of
at least one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

18 years ago by Jani Taskinen — view source — reply

unread

FINALLY we're getting somewhere. Now where to start removing all the crap that
was necessary for the non-unicode mode? (I'd say the tests..)

--Jani

Andrei Zmievski kirjoitti:

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and going
forward with PHP 6 as Unicode-only. God knows it will rid us of at least
one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

18 years ago by Stanislav Malyshev — view source — reply

unread

FINALLY we're getting somewhere. Now where to start removing all the

I don't see how we are getting somewhere - as before, there are people
for removing it and against removing it. Nothing changed, as far as I
see. Why suddenly should we start removing anything?

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Jani Taskinen — view source — reply

unread

Stanislav Malyshev kirjoitti:

FINALLY we're getting somewhere. Now where to start removing all the

I don't see how we are getting somewhere - as before, there are people
for removing it and against removing it. Nothing changed, as far as I
see. Why suddenly should we start removing anything?

For some reason only totally opposing people have Z in their email address domain..

And if Andrei, who is mostly behind (afaict) the whole thing, says he's okay
with getting rid of the totally useless option then in my eyes that's the death
sentence for the option.

--Jani

18 years ago by Tomas Kuliavas — view source — reply

unread

FINALLY we're getting somewhere. Now where to start removing all the

I don't see how we are getting somewhere - as before, there are people
for removing it and against removing it. Nothing changed, as far as I
see. Why suddenly should we start removing anything?

For some reason only totally opposing people have Z in their email address
domain..

add U, S, N and T. :)

And if Andrei, who is mostly behind (afaict) the whole thing, says he's
okay
with getting rid of the totally useless option then in my eyes that's the
death sentence for the option.

Some people want to use PHP6 unicode options. Other people want to be able
to run PHP5 scripts on PHP6.

Restarting same discussion about removal does not lead to anything useful.
Both sides said their arguments and noone wants to lose.

Option is not useless. It allows to run both modes in some setups without
having to install two PHP versions. php_admin_flag works in Apache, but I
suspect that other setups can override it by setting different php.ini
location.

--
Tomas

18 years ago by Stanislav Malyshev — view source — reply

unread

For some reason only totally opposing people have Z in their email
address domain..

Even if it were true (which it isn't) - so what?

And if Andrei, who is mostly behind (afaict) the whole thing, says he's
okay with getting rid of the totally useless option then in my eyes
that's the death sentence for the option.

In your eyes - fine. But besides your personal eyes, there is also such
thing as consensus, and it wasn't achieved.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Pierre — view source — reply

unread

In your eyes - fine. But besides your personal eyes, there is also such
thing as consensus, and it wasn't achieved.

Excuse me but it is achieved, you only don't see it, or refuse to see it.

--Pierre

18 years ago by Andi Gutmans — view source — reply

unread

Before we continue this discussion I think there are a couple of things
which would be useful data points:
a) What is the performance difference between an implicit Unicode app
and non-Unicode. If we have 3-4 apps ported over to Unicode_semantics=on
with only true binary strings cast to binay and real strings as UTF-16,
then I volunteer to put it through our performance lab and come up with
some real numbers. The lab is very well setup and is quite accurate. Do
we have volunteers to port some of those apps? I suggested some apps a
few emails ago.
b) We should try and figure out whether a script could automatically
migrate an application. I'll look into this but if anyone has time to
work on something like that it'd be very helpful input.

I think if we get more clarity on these two then it'll go a long way in
making this discussion more productive...

Andi

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

Before we continue this discussion I think there are a couple of things
which would be useful data points:
a) What is the performance difference between an implicit Unicode app
and non-Unicode. If we have 3-4 apps ported over to Unicode_semantics=on

Honestly I do not see anyone on this list having the time to do this.
Maybe it needs to be turned into a coding contest of sorts, but the
bottom line is that its not a task that anyone but a bored student could
do. Maybe there are a few on this list, but I doubt it.

So we might need some marketing bla bla. With a bit of blogging, a few
googies thrown in and a website: "Help determine the future of PHP6 -
port a popular PHP application to PHP6"

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

Well that's the problem. We have put months (if not more) of work into
PHP 6 but most people who are complaining aren't willing to take a stab
at actually help figuring this out.
No matter what we end up doing, the worst is if we make an arbitrary
decision because no one had time to get the right data. I consider that
part of implementing Unicode. If we can't figure it out to the detail
then we should maybe dump it all together (and that's not what I'm
suggesting).

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, August 20, 2007 11:22 PM
To: Andi Gutmans
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP
6?

Andi Gutmans wrote:

Before we continue this discussion I think there are a couple of
things
which would be useful data points:
a) What is the performance difference between an implicit Unicode
app
and non-Unicode. If we have 3-4 apps ported over to
Unicode_semantics=on

Honestly I do not see anyone on this list having the time to do this.
Maybe it needs to be turned into a coding contest of sorts, but the
bottom line is that its not a task that anyone but a bored student
could
do. Maybe there are a few on this list, but I doubt it.

So we might need some marketing bla bla. With a bit of blogging, a few
googies thrown in and a website: "Help determine the future of PHP6 -
port a popular PHP application to PHP6"

regards,
Lukas

18 years ago by Richard Lynch — view source — reply

unread

Andi Gutmans wrote:

Before we continue this discussion I think there are a couple of
things
which would be useful data points:
a) What is the performance difference between an implicit Unicode
app
and non-Unicode. If we have 3-4 apps ported over to
Unicode_semantics=on

I also seem to recall that the "get rid of camp" was pushing for
reasons of maintainability rather than performance...

Which is not to say that horrible performance either way wouldn't be
quite useful to one camp or the other, but it's kind of an irrelevant
experiment for maintainability.

For those who think it's easier to have PHP 6 and PHP 6--, could you
explain how that's different in real-world setup from PHP 6 and PHP 5
instead?

It's not like you can just flip the switch mid-script or even per-dir.

So you're kind of stuck with two pools of servers, no?

You only save a "./configure --with-dir=/foo; make; make install;"
afaict...

Am I being stoopid again?

I honestly thought this had been resolved to "get rid of it"... [shrug]

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Derick Rethans — view source — reply

unread

And if Andrei, who is mostly behind (afaict) the whole thing, says he's okay
with getting rid of the totally useless option then in my eyes that's the
death sentence for the option.

In your eyes - fine. But besides your personal eyes, there is also such thing
as consensus, and it wasn't achieved.

And there won't any concensus in this case either.

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Andrei Zmievski — view source — reply

unread

Because we can't stay in the stasis forever? What concrete steps do
you propose to change the current situation?

-Andrei
http://10fathoms.org/vu - daily photoblog

FINALLY we're getting somewhere. Now where to start removing all the

I don't see how we are getting somewhere - as before, there are
people for removing it and against removing it. Nothing changed, as
far as I see. Why suddenly should we start removing anything?

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Andi Gutmans — view source — reply

unread

No I'm absolutely not OK with removing this switch and as we currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

Andi

-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Monday, August 20, 2007 12:02 PM
To: David Coallier
Cc: Stefan Priebsch; RQuadling@googlemail.com; Lukas Kahwe Smith;
Antony Dovgal; Rasmus Lerdorf; Derick Rethans; Cristian Rodriguez;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP
6?

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and
going forward with PHP 6 as Unicode-only. God knows it will rid us of
at least one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

18 years ago by Derick Rethans — view source — reply

unread

No I'm absolutely not OK with removing this switch and as we currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

What makes you think that any other group can agree on this?

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Andi Gutmans — view source — reply

unread

I don't think it's a matter of agreeing but rather we can try and figure out how to get out of this stale mate. This includes going down the path I suggested which includes doing some more homework to figure this out. I am keeping an open mind and am willing to be convinced but I feel there's still significant work to be done to figure out both the upgrade path and the performance piece. I am more than willing to put some work into this but we shouldn't just make impulsive decisions without seriously considering the consequences.
Also, if at the end of this we still feel like we'll need to maintain two different code bases then I see only disadvantages over maintaining one with two different ways of running. It'll be much less work for maintainers. Sure, both situations aren't great but it's the lesser of both evils.

Anyway, as I suggested, let's do more homework. We started and it wasn't a pretty sight. But still lots to do. There seem to be enough passionate people on this list to actually port 3-4 apps over and give us some more input on the answers we really need.

Andi

-----Original Message-----
From: Derick Rethans [mailto:derick@php.net]
Sent: Monday, August 20, 2007 11:19 PM
To: Andi Gutmans
Cc: Andrei Zmievski; Lukas Kahwe Smith; Antony Dovgal; Rasmus Lerdorf;
PHP Developers Mailing List
Subject: RE: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

No I'm absolutely not OK with removing this switch and as we
currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they
gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it
offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

What makes you think that any other group can agree on this?

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Derick Rethans — view source — reply

unread

No I'm absolutely not OK with removing this switch and as we currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

What makes you think that any other group can agree on this?

I don't think it's a matter of agreeing but rather we can try and
figure out how to get out of this stale mate. This includes going down
the path I suggested which includes doing some more homework to figure
this out.

[snip]

Anyway, as I suggested, let's do more homework. We started and it
wasn't a pretty sight. But still lots to do. There seem to be enough
passionate people on this list to actually port 3-4 apps over and give
us some more input on the answers we really need.

And the homework being porting applications to see if this works? In
order to find all the issues you'd need a fairly big application and
there would be nobody willing to port 100.000 LoC just to see whether it
works.

regards,
Derick

PS. and ffs, can you please stop the top posting and the mangling of
quoted text? I assume us as technies can deal with e-mail in a sensible
way.

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Andi Gutmans — view source — reply

unread

-----Original Message-----
From: Derick Rethans [mailto:derick@php.net]
Sent: Monday, August 20, 2007 11:40 PM
To: Andi Gutmans
Cc: Andrei Zmievski; Lukas Kahwe Smith; Antony Dovgal; Rasmus Lerdorf;
PHP Developers Mailing List
Subject: RE: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

And the homework being porting applications to see if this works? In
order to find all the issues you'd need a fairly big application and
there would be nobody willing to port 100.000 LoC just to see whether
it
works.

So you're suggesting to just pull the trigger and let's just see if we get lucky?
Yes, I see no reason not to port some applications. We started porting Zend Framework which is more than that which is why we stumbled on a lot of the issues.
Again, I think part of the porting exercise is also figuring out what can be automated. Preferably we'll have some docs and scripts available for our users with PHP 6 and not just a bunch of bits with a "good luck" message.

Maybe you guys can try with ezComponents?
Andi

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

Maybe you guys can try with ezComponents?

So whats your target with this BC flag .. make it possible to have
PHP4-PHP6 (unicode off) apps?

Keep in mind that the camp that is suggesting to remove the unicode flag
is at the same time committing to back porting more things to PHP5 in a
case per case basis. As a result users will not be left in the dust with
PHP5.

Derick also suggested on IRC that we should focus on making PHP 5.3 as
much forward compatible as possible, to make this even more feasible.

Remember that several people have pointed out that maintaining the
unicode flag is more or less like maintaining two branches (in some
respects its even harder .. in some other its less .. which probably
evens out more or less ..). At the same time we will need to maintain
PHP5 for quite some time anyways as PHP6 matures and people get more RAM :)

This porting effort will undoubtly benefit PHP6 in the ways you
describe. It will help us find issues, it will help us improve the
migration documentation. However binding this decision to actually
porting a BIG PHP4 and a BIG PHP5 app is not feasible. We know the
increased effort in maintance, we do not know the performance impact and
the migration time. So how can the default be that we increase the
maintance effort in order to speed up something we do not know?

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Tuesday, August 21, 2007 1:30 AM
To: Andi Gutmans
Cc: Derick Rethans; Andrei Zmievski; Antony Dovgal; Rasmus Lerdorf; PHP
Developers Mailing List
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Andi Gutmans wrote:

Maybe you guys can try with ezComponents?

So whats your target with this BC flag .. make it possible to have
PHP4-PHP6 (unicode off) apps?

It means having to only maintain one code base and also making it easier for people to adopt PHP 6 (Unicode isn't everything we'd have there). It also gives users the choice between Unicode PHP (which will be a headache for some) and the easy PHP. Also performance is something which I think is important but we don't know the full impact yet. If it's just 20% it's not a big deal; if it's much more than it probably is. Also memory usage is a big issue today with scalable Web apps.

Keep in mind that the camp that is suggesting to remove the unicode
flag
is at the same time committing to back porting more things to PHP5 in a
case per case basis. As a result users will not be left in the dust
with
PHP5.

Yes, this is definitely a good idea but it'd require more maintenance mode because we'd have to maintain it for much longer. That said it's probably the best alternative.

Derick also suggested on IRC that we should focus on making PHP 5.3 as
much forward compatible as possible, to make this even more feasible.

Yes, definitely. For starters we should have the (binary) cast operator there.

Remember that several people have pointed out that maintaining the
unicode flag is more or less like maintaining two branches (in some
respects its even harder .. in some other its less .. which probably
evens out more or less ..). At the same time we will need to maintain
PHP5 for quite some time anyways as PHP6 matures and people get more
RAM :)

This isn't quite true. You will have to maintain two code paths in all internal functions anyway because you will support both UTF-16 and binary strings. The biggest problems are in the engine and most of the maintenance has been done by us. So it's not going to affect most people.

This porting effort will undoubtly benefit PHP6 in the ways you
describe. It will help us find issues, it will help us improve the
migration documentation. However binding this decision to actually
porting a BIG PHP4 and a BIG PHP5 app is not feasible. We know the
increased effort in maintance, we do not know the performance impact
and
the migration time. So how can the default be that we increase the
maintance effort in order to speed up something we do not know?

I think that we are having this discussion based on very little data. People here are saying that the upgrade path is quite easy. We've had very different experiences.
I'd like some more time to look into that and see if we can make automated scripts to help people convert (like we did from php/fi 2 to php 3). It'd also shed some more light onto the performance piece and see if it truly is significant.
Btw, not sure why the maintenance piece is being discounted so much. I still think that PHP 6 with two modes is less maintenance efforts than PHP 5 and PHP 6 for many more years. At least we stick to one code base even if the modes aren't compatible. In that sense it would be no different from just PHP 5 and PHP 6.

Anyway, I am listening. I will try and help find some data points and hopefully find an acceptable solution but this takes some time. Which is why I said that if part of the community helped in figuring some of this out it'd be great.

Andi

18 years ago by Lester Caine — view source — reply

unread

Andi Gutmans wrote:

Anyway, as I suggested, let's do more homework. We started and it wasn't a pretty sight. But still lots to do. There seem to be enough passionate people on this list to actually port 3-4 apps over and give us some more input on the answers we really need.
I have kept out of this debate - lack of time - and while I'm probably one of
the '95%' I DO understand that not having unicode available is MORE of a
headache so I need to be more pro-active in using it.
How much work do people think IS involved in porting a large application
over to PHP6? Reading between the lines it looks like we are talking file
conversion to UTF-16 + what? What is currently a show stopper to simply
running a PHP5 application? I used to keep a PHP6 setup here but that had to
go with all the crap involved in the different versions of PHP5.? so I haven't
had a PHP6 copy running for some time :( - but the bitweaver framework does
allow easy code management so I would be prepared to spend time at least
starting to have a look!

No I'm absolutely not OK with removing this switch and as we currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they gain
nothing and only loose.
( Top posting sucks ;) )
I suppose this is the difference between native UTF-8 and UTF-16? If you only
use the 127 character ascii set then there is no difference between UTF-8 and
ASCII? So I assume that the alternative half way house is currently being
discussed and everything will be UTF-16 and 16 bit characters? Given that
there are now two branches to MOST operating systems (32 bit and 64 bit) I see
no reason that there should not be two builds of PHP6 but to be honest I am
probably in the native unicode camp. Keep PHP5.x for 8bit character sets, and
develop PHP6 as 32bit ready for the processors that are already available to
use it. It will be another 10 years before PHP5 reaches a point were it may
want to die by which time who knows how much memory and how many processor
cores we will have in our PDA's :)

The one thing to avoid is the situation that happened with PHP4, where many of
the good reasons for changing were simply back ported? Either we see PHP6 as a
natural progression to PHP5, nothing gets 'back ported', and PHP6 runs PHP5
applications out of the box, or PHP6 requires a 'conversion package' to
transfer PHP5 applications and provides a clean unicode environment. In the
first case we need the switch - in the second I would be looking for 32 bit
character clean code. A half way house of UTF-16 way may as well have the
switch, since all the code to manage multiple 'byte' characters is still
messing up the code base - and we start looking at PHP7 :(

--
Lester Caine - G8HFL

Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact
L.S.Caine Electronic Services - http://home.lsces.co.uk
MEDW - http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Firebird - http://www.firebirdsql.org/index.php

18 years ago by johannes@php.net — view source — reply

unread

How much work do people think IS involved in porting a large application
over to PHP6? Reading between the lines it looks like we are talking file
conversion to UTF-16 + what? What is currently a show stopper to simply
running a PHP5 application? I

I'm bored of the unicode.semantics discussion, but a few words on this:
No, UTF-16 is the internal encoding of (textual) strings in PHP 6 (with
u.s.=On) as a user you should never ever see any text in that encoding.

You're scripts use the encoding specified as "unicode.script_encoding"
which defaults to UTF-8 or the one specified in a declare() statement.
Internally they will be converted to UTF-16 then. When being printed to
the output stream they will be converted from UTF-16 to the encoding
specified using unicode.outputencoding (default again UTF-8).

When porting a PHP 5 application there aren't that many problems from my
experience with quite small applications (while I just did simple tests
mainly for testing simple stuff in PHP...)

You might have some files with different encodings than the configured
one, für example some applications I'm involved with have my lastname
with an ISO-8859-1 encoded umlaut in some DocComment or string, either
the files have to be converted (using recode/iconv should do the
trick) or you need a declare statement. (these declare statements,
btw. are compatible [as in being ignored] with PHP 4 and PHP 5)
You might get a few warnings on stream operations if you're not giving
a specific encoding, problem there is that some stream operations
expect different numbers of parameters, so running the same thing with
PHP 5 and 6 might give a few warnings, but well, most of them can be
ignored
Some function want only binary strings and won't convert uniocode
strings themselves (which would be done by using
unicode.runtime_encoding) or the other way round. Most of these places
will be fixed, some of these will need a specific cast by the user.

An example is rawurlencode() which expects for good reasons a binary
string. In such an case a (binary) cast, which exists as no-op in PHP
5.2, too, might be enough. Sometimes you might need a
unicode_[en|de]code() call.

This might need some work.
A bit more work might be involved when you expect to work on bytes
when doing string operations, if your applications only use English
texts using ASCII characters that's no issue et all, if not the
results of operations like
strlen("äöü");
or
$a = "äöü"; echo $a{2};
might be different depending in the version. But as said in ASCII text
it's no issue since a single character takes a single byte.

A really, really bad workaround for most issues related to this, is
using an encoding like ISO-8859-1 for all unicode*_encoding settings.
Then most byte sequences can be converted to UTF-16 and a single byte
is a UTF-16 character and everything "seems" to work, but well, that's
bad and shouldn't be advertised.

Well, these are most of the things I saw when porting simple
applications from PHP 5 to PHP 6 half a year ago (so maybe I forgot
something important I did...), some of them even are still compatible to
all PHP versions from 4 to 6 (with u.s) while not really making use of
the benefits of the Unicode support.

So for porting: A good first step is simply installing PHP 6, making
sure u.s is On and then fix the errors appearing :-)

And as a final statement: From my experience with rather small apps:
It's possible to make applicatiosn run with PHP 5 and PHP 6 with u.s
On... (while "run" there means "it works but won't benefit from the
unicode stuff")

johannes

18 years ago by Jani Taskinen — view source — reply

unread

Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

What makes you think that any other group can agree on this?

When you can't get people to agree with you, choose the people who
already agree with you..or can't afford not to agree. ;)

--Jani

18 years ago by Richard Quadling — view source — reply

unread

Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

What makes you think that any other group can agree on this?

When you can't get people to agree with you, choose the people who
already agree with you..or can't afford not to agree. ;)

--Jani

I'll agree with anyone who makes it worth my while!

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Jani Taskinen — view source — reply

unread

So what happened to the "Open" in "OpenSource" or is PHP now something
else now?

btw. 95% of Zend users propably don't need unicode but there are a lot
more people out there who can't really use PHP right now since it
doesn't have full unicode support. The percents pulled out of sleeve
would be rather the opposite with all japanese/chinese/etc. asian
countries included.. ;)

--Jani

No I'm absolutely not OK with removing this switch and as we currently
did most of the implementation for it and are maintaining it I see no
reason to remove it. 95% of our users couldn't care less about native
Unicode support except for the performance hit they'd take due to the
slower functions and increased memory usage. For most of them they gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it offline
to discuss with the people who have been involved in this project and
discuss further. The mailing list here isn't exactly working.

Andi

-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Monday, August 20, 2007 12:02 PM
To: David Coallier
Cc: Stefan Priebsch; RQuadling@googlemail.com; Lukas Kahwe Smith;
Antony Dovgal; Rasmus Lerdorf; Derick Rethans; Cristian Rodriguez;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP
6?

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and
going forward with PHP 6 as Unicode-only. God knows it will rid us of
at least one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

18 years ago by Tomas Kuliavas — view source — reply

unread

So what happened to the "Open" in "OpenSource" or is PHP now something
else now?

btw. 95% of Zend users propably don't need unicode but there are a lot
more people out there who can't really use PHP right now since it
doesn't have full unicode support. The percents pulled out of sleeve
would be rather the opposite with all japanese/chinese/etc. asian
countries included.. ;)

PHP supports Japanese since 4.0.6. Chinese is supported since 4.3.0. Text
length evaluation, case insensitive strings, substr should work. What else
do you need in PHP scripts for "full unicode support" in CJK languages?
Reading symbols with $string[$position]? The ones that do such things are
not your normal users and this can be done with mb_substr. Want to make
sure that CJK support is corified? What is wrong with requiring mbstring
support?

If you go "think about users" path, then remember that PHP does not work
for 110 millions of Turks, Kurds and Azerbaijani in Mid East. Bug was
closed with "Won't fix" and locale insensitive tolower()|toupper()
functions take less than 10 lines in C. I am not C programmer. My tests
show that if I change zend_tolower() to work in locale insensitive way,
strtolower() remains locale sensitive and class_exists and case
insensitive method names do not fail.

--
Tomas

18 years ago by Andi Gutmans — view source — reply

unread

Uhm, what makes you think we don't have asian users? I also don't recall
suggesting that we should not have Unicode support.
Don't forget how much we invested in implementing this in the engine. It
was far from trivial...

Andi

-----Original Message-----
From: Jani Taskinen [mailto:jani.taskinen@sci.fi]
Sent: Tuesday, August 21, 2007 2:38 AM
To: Andi Gutmans
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] What is the use of "unicode.semantics" in PHP
6?

So what happened to the "Open" in "OpenSource" or is PHP now something
else now?

btw. 95% of Zend users propably don't need unicode but there are a lot
more people out there who can't really use PHP right now since it
doesn't have full unicode support. The percents pulled out of sleeve
would be rather the opposite with all japanese/chinese/etc. asian
countries included.. ;)

--Jani

No I'm absolutely not OK with removing this switch and as we
currently
did most of the implementation for it and are maintaining it I see
no
reason to remove it. 95% of our users couldn't care less about
native
Unicode support except for the performance hit they'd take due to
the
slower functions and increased memory usage. For most of them they
gain
nothing and only loose.
Anyway, don't want to reignite the thread here. I will take it
offline
to discuss with the people who have been involved in this project
and
discuss further. The mailing list here isn't exactly working.

Andi

-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Monday, August 20, 2007 12:02 PM
To: David Coallier
Cc: Stefan Priebsch; RQuadling@googlemail.com; Lukas Kahwe Smith;
Antony Dovgal; Rasmus Lerdorf; Derick Rethans; Cristian Rodriguez;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in
PHP
6?

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and
going forward with PHP 6 as Unicode-only. God knows it will rid us
of
at least one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here
in
the background that we are not seeing ? :)

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Hi,

Ok, so I think its becoming clear that BC is not the main issue we will
be addressing with the unicode switch. I know Zeev's mantra that BC is
not binary, but from the people that have posted feedback on the topic
from actual experience it seems that making code work on PHP5 (and even
PHP4) as well as PHP6 is possible with a bit of work, but without a rewrite.

So at this point the main argument can only resolve around performance.
So the question is how much performance does a user gain by turning of
unicode in PHP6. We might be able to figure this out without porting an
actual application. Maybe with a few synthetic benchmarks, along with
some code analysis (maybe for 3 different categories of applications:
data processing intensive, web blog, database heavy application)) of how
often particularily slow functions are called in an average application,
we could extrapolate a ball park figure of what kind of slow down to expect.

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

If there's an overwhelming support for removing the switch then I guess
that's where it is. I still think it's a mistake and we are risking a
big split in the user base going forward but time will tell. Long term
PHP may not recover from that split unless we truly manage to help the
most popular PHP applications to make the leap. They have been some of
the biggest drivers behind PHP adoption, probably just as much as the
technology itself.

The burden of maintenance will definitely be higher. Right now we have
to identify what features we think need back porting to PHP 5.3 which in
my opinion looses a lot of good energy which could go into futures but
that's where we're at. The PHP 5 user-base is strong, and growing and
will require a lot of these features.

I still think that the community and internals@ should still invest
significantly in making migration as easy as possible and making clear
what the performance attributes are. As I said, I'll be more than happy
to pitch in when the time comes in running benchmarks and trying to
figure out if a good migration methodology/scripts can be done. There
are also some backporting of features which may make it easier for
people to do a slower migration like (binary) cast (would be a no-op)
and some other things. Still need to think about this further but there
may be some things that can help.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Tuesday, August 21, 2007 7:26 AM
To: Andi Gutmans
Cc: jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP
6?

Hi,

Ok, so I think its becoming clear that BC is not the main issue we
will
be addressing with the unicode switch. I know Zeev's mantra that BC is
not binary, but from the people that have posted feedback on the topic
from actual experience it seems that making code work on PHP5 (and
even
PHP4) as well as PHP6 is possible with a bit of work, but without a
rewrite.

So at this point the main argument can only resolve around
performance.
So the question is how much performance does a user gain by turning of
unicode in PHP6. We might be able to figure this out without porting
an
actual application. Maybe with a few synthetic benchmarks, along with
some code analysis (maybe for 3 different categories of applications:
data processing intensive, web blog, database heavy application)) of
how
often particularily slow functions are called in an average
application,
we could extrapolate a ball park figure of what kind of slow down to
expect.

regards,
Lukas

18 years ago by Larry Garfield — view source — reply

unread

Andi,

Is there a guide somewhere for those who are PHP developers, not C developers,
who would want to try existing code under PHP 6 but don't know all the ins
and outs of the new unicode system? It sounds like there's 3-4 unicode
switches in php.ini, but maybe I'm missing some and I'm sure I don't fully
understand what they're all supposed to do.

I mean a single-sourced guide along the lines of:

Get PHP-free system.
Download this file:
Untar, run make install.
Do X to get the mysql(i) driver in there too (since more apps use that than
PDO right now).
Tour of the new php.ini switches to play with and what they mean/do:
Try running your app. Known places where there may be issues: (Johannes'
earlier post is a great starting point)
Please direct reports on your successes and what broke to:

A "PHP 6 testers' instruction manual" would probably make it a lot easier to
be a PHP 6 tester. :-)

If there's an overwhelming support for removing the switch then I guess
that's where it is. I still think it's a mistake and we are risking a
big split in the user base going forward but time will tell. Long term
PHP may not recover from that split unless we truly manage to help the
most popular PHP applications to make the leap. They have been some of
the biggest drivers behind PHP adoption, probably just as much as the
technology itself.

The burden of maintenance will definitely be higher. Right now we have
to identify what features we think need back porting to PHP 5.3 which in
my opinion looses a lot of good energy which could go into futures but
that's where we're at. The PHP 5 user-base is strong, and growing and
will require a lot of these features.

I still think that the community and internals@ should still invest
significantly in making migration as easy as possible and making clear
what the performance attributes are. As I said, I'll be more than happy
to pitch in when the time comes in running benchmarks and trying to
figure out if a good migration methodology/scripts can be done. There
are also some backporting of features which may make it easier for
people to do a slower migration like (binary) cast (would be a no-op)
and some other things. Still need to think about this further but there
may be some things that can help.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Tuesday, August 21, 2007 7:26 AM
To: Andi Gutmans
Cc: jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP

6?

Hi,

Ok, so I think its becoming clear that BC is not the main issue we

will

be addressing with the unicode switch. I know Zeev's mantra that BC is
not binary, but from the people that have posted feedback on the topic
from actual experience it seems that making code work on PHP5 (and

even

PHP4) as well as PHP6 is possible with a bit of work, but without a
rewrite.

So at this point the main argument can only resolve around

performance.

So the question is how much performance does a user gain by turning of
unicode in PHP6. We might be able to figure this out without porting

an

actual application. Maybe with a few synthetic benchmarks, along with
some code analysis (maybe for 3 different categories of applications:
data processing intensive, web blog, database heavy application)) of
how
often particularily slow functions are called in an average
application,
we could extrapolate a ball park figure of what kind of slow down to
expect.

regards,
Lukas

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by DaveTheAve — view source — reply

unread

Why can't the unicode switch be turn on/off by the application when needed?
Perhaps have an on/off/auto setting where auto meaning it'll remain off
unless the application explicitly asks for it.

Andi,

Is there a guide somewhere for those who are PHP developers, not C
developers,
who would want to try existing code under PHP 6 but don't know all the ins
and outs of the new unicode system? It sounds like there's 3-4 unicode
switches in php.ini, but maybe I'm missing some and I'm sure I don't fully
understand what they're all supposed to do.

I mean a single-sourced guide along the lines of:

Get PHP-free system.

Download this file:

Untar, run make install.

Do X to get the mysql(i) driver in there too (since more apps use that
than
PDO right now).

Tour of the new php.ini switches to play with and what they mean/do:

Try running your app. Known places where there may be issues:
(Johannes'
earlier post is a great starting point)

Please direct reports on your successes and what broke to:

A "PHP 6 testers' instruction manual" would probably make it a lot easier
to
be a PHP 6 tester. :-)

If there's an overwhelming support for removing the switch then I guess
that's where it is. I still think it's a mistake and we are risking a
big split in the user base going forward but time will tell. Long term
PHP may not recover from that split unless we truly manage to help the
most popular PHP applications to make the leap. They have been some of
the biggest drivers behind PHP adoption, probably just as much as the
technology itself.

The burden of maintenance will definitely be higher. Right now we have
to identify what features we think need back porting to PHP 5.3 which in
my opinion looses a lot of good energy which could go into futures but
that's where we're at. The PHP 5 user-base is strong, and growing and
will require a lot of these features.

I still think that the community and internals@ should still invest
significantly in making migration as easy as possible and making clear
what the performance attributes are. As I said, I'll be more than happy
to pitch in when the time comes in running benchmarks and trying to
figure out if a good migration methodology/scripts can be done. There
are also some backporting of features which may make it easier for
people to do a slower migration like (binary) cast (would be a no-op)
and some other things. Still need to think about this further but there
may be some things that can help.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Tuesday, August 21, 2007 7:26 AM
To: Andi Gutmans
Cc: jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP

6?

Hi,

Ok, so I think its becoming clear that BC is not the main issue we

will

be addressing with the unicode switch. I know Zeev's mantra that BC is
not binary, but from the people that have posted feedback on the topic
from actual experience it seems that making code work on PHP5 (and

even

PHP4) as well as PHP6 is possible with a bit of work, but without a
rewrite.

So at this point the main argument can only resolve around

performance.

So the question is how much performance does a user gain by turning of
unicode in PHP6. We might be able to figure this out without porting

an

actual application. Maybe with a few synthetic benchmarks, along with
some code analysis (maybe for 3 different categories of applications:
data processing intensive, web blog, database heavy application)) of
how
often particularily slow functions are called in an average
application,
we could extrapolate a ball park figure of what kind of slow down to
expect.

regards,
Lukas

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the
possession
of every one, and the receiver cannot dispossess himself of it." --
Thomas
Jefferson

--

--
Thank you,
David Branco (Neoelite Web Consultant)
http://www.NeoeliteUSA.com

18 years ago by Stanislav Malyshev — view source — reply

unread

Why can't the unicode switch be turn on/off by the application when needed?
Perhaps have an on/off/auto setting where auto meaning it'll remain off
unless the application explicitly asks for it.

Because it's very hard to implement since we'd have to keep 2 copies of
all symbol tables.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Robert Lemke — view source — reply

unread

Hi Andi,

Am 21.08.2007 um 21:32 schrieb Andi Gutmans:

If there's an overwhelming support for removing the switch then I
guess
that's where it is. I still think it's a mistake and we are risking a
big split in the user base going forward but time will tell. Long term
PHP may not recover from that split unless we truly manage to help the
most popular PHP applications to make the leap. They have been some of
the biggest drivers behind PHP adoption, probably just as much as the
technology itself.

FWIW, I agree. We at TYPO3 will switch to PHP6 / Unicode-only in the
near
future. And that means a few 100.000 of PHP developers are affected
by this
decision.

At the same time we will maintain our current branch of TYPO3 which
will only
run with PHP5. In our opinion if someone wants PHP6, he must upgrade
to the
TYPO3 5.0. If he needs TYPO3 4.x, he's got to be satisfied with PHP 5.

robert

18 years ago by nicobn@php.net — view source — reply

unread

Hi everybody,

I first want to personally thank everybody who have voiced their opinions on
this subject as it shows how much concern you all have for PHP.

To give a bit of background, I am one of the PHP Google Summer of Code
students and part of my project was to port/create a PHP 6 application,
namely, the Jaws CMS. My experience proved to be very instructive in terms
of evaluating the new functionalities of PHP 6.

My experience with unicode.semantics has been very frustrating. Most of the
mainstream projects cannot control their environments and have to be as
portable as possible. To create a portable application for PHP 6, you have
to consider the two different unicode.semantics scenarios AND the
possibility that the switch, for some reason, might be turned on or off at
any time in the future. Even if you don't care about Unicode and/or have
never heard about it, that is very important for you.

I have one specific example of where this can be a HUGE headache. Let's say
you have some serialized configuration files, saved with unicode.semantics =
1, containing a single configuration array. Now, for some reason, the
administrator decides to turn unicode.semantics = 0 and your configuration
file does not work anymore. That's because $arr[(unicode) 'key'] and
$arr[(binary) 'key'] are different. Hence, all the keys in the configuration
array have to be explicitly accessed with either (binary) or (unicode).

Now, make your own examples of how the switch will affect the life of PHP
programmers in the future and you will discover some disastrous scenarios.
From now on, all the strings will have to be explicitly set/casted because
you never know if you'll get a binary string or a Unicode string when you
ask for a string. Literally, when PHP 6 becomes mainstream, all the
programmers will have to be re-educated to face these issues. As far as I'm
concerned, this is unprecedented in the history of the PHP project. PHP -
loosely typed, except for strings.

In the end, the switch benefits only the developers of very specialized
applications that run on big websites that can control their environments.

I've seen the 95%-who-don't-care figure circulating on this thread. If it
were really the case, why was the feature implemented with so much care in
PHP in the first place ? I, for one, do not agree with the 95% figure. Most,
if not all of the fastest growing markets in the world are non-english
speaking and native Unicode support makes PHP a prime choice for them. We're
not talking about 10 folks in Iowa there. We're talking about billions.

Overall, I think the damn switch is simply not a good idea. It removes a
good part of what made PHP a success: simplicity. Get rid of it once and for
all.

Nothing is happening, as far as I can tell. We are at an impasse,
basically. Personally, I am fine with removing the damn switch and
going forward with PHP 6 as Unicode-only. God knows it will rid us of
at least one headache - having to discuss it anymore.

-Andrei
http://10fathoms.org/vu - daily photoblog

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

--

--
Nicolas Bérard-Nault (nicobn@gmail.com)
Étudiant D.E.C . Sciences, Lettres & Arts
Cégep de Sherbrooke

Homepage: http://nicobn.googlepages.com

18 years ago by Stanislav Malyshev — view source — reply

unread

portable as possible. To create a portable application for PHP 6, you have
to consider the two different unicode.semantics scenarios AND the
possibility that the switch, for some reason, might be turned on or off at
any time in the future. Even if you don't care about Unicode and/or have
never heard about it, that is very important for you.

I really don't see, however, how removing the switch is going to make
your life easier. So, we'd have PHP 5 and PHP 6, and once you'd want to
support both (you couldn't afford not supporting PHP 5 for many years
from now - many still support PHP 4) you have exactly the same issues.
So if you don't deal with them, you'd just say "we don't run on PHP 6".
Is it better than saying "we run on PHP 6 only with that specific setting"?

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Richard Lynch — view source — reply

unread

So if you don't deal with them, you'd just say "we don't run on PHP
6".
Is it better than saying "we run on PHP 6 only with that specific
setting"?

Yes, it is better, imho.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Antony Dovgal — view source — reply

unread

Time to put gas on the fire.

Is this flag going to be removed or what? What is happening here in
the background that we are not seeing ? :)

Nothing.
It's stuck.

--
Wbr,
Antony Dovgal

18 years ago by Richard Quadling — view source — reply

unread

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

Moreover, we do have such an extension, it's called "mbstring" and you can use it even in PHP4.
But the point is that it's just an extension, hence the Unicode support is far far from full.

Why couldn't mbstring be upgraded to offer "full" Unicode support?

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Derick Rethans — view source — reply

unread

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

Moreover, we do have such an extension, it's called "mbstring" and
you can use it even in PHP4. But the point is that it's just an
extension, hence the Unicode support is far far from full.

Why couldn't mbstring be upgraded to offer "full" Unicode support?

Because to support Unicode the engine needs to be able to work with
it. That is not something you can do with an extension.

Derick

18 years ago by Richard Quadling — view source — reply

unread

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

Moreover, we do have such an extension, it's called "mbstring" and
you can use it even in PHP4. But the point is that it's just an
extension, hence the Unicode support is far far from full.

Why couldn't mbstring be upgraded to offer "full" Unicode support?

Because to support Unicode the engine needs to be able to work with
it. That is not something you can do with an extension.

Derick

Ah. OK. Thanks for clarifying this. And because the engine needs it,
those that don't want it need to disable it and now we're back to the
unicode.semantics option. Ho hum.

--

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Stanislav Malyshev — view source — reply

unread

Moreover, we do have such an extension, it's called "mbstring" and you
can use it even in PHP4.
But the point is that it's just an extension, hence the Unicode
support is far far from full.

mbstring is very, very far from unicode support. Look at ICU API
description to see how far :)

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Stanislav Malyshev — view source — reply

unread

If Unicode had been an extension (one of those that are part of the
core and cannot be disabled) with its own
classes/exceptions/functions/etc, then everyone would have been happy.

It will be. I.e., most of ICU functionality will be implemented as an
extension - collators, formatters, etc. etc.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Jani Taskinen — view source — reply

unread

Antony Dovgal kirjoitti:

To me it boils down how we want to maintain the "fork":

PHP5 and PHP6

PHP6 unicode off/on (with PHP5 in maintenance mode)

Considering that people will not jump on PHP6 immediately anyways, I
think 1) is more realistic, if we make best efforts to back port new
features to PHP5, but still require that new features go into PHP6
first. Some features might not get back ported and that is a somewhat
unfriendly nudge towards PHP6. So it goes.

I tend to agree with this POV more and more.

Well, that's what I've tried to say for a long time. We're forking whether we
want it or not. So why not make it official? I'd rather maintain 2 branches than
2 "versions" in same branch. And 2 set of tests..and and..

--Jani

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Jani Taskinen wrote:

Antony Dovgal kirjoitti:

To me it boils down how we want to maintain the "fork":

PHP5 and PHP6

PHP6 unicode off/on (with PHP5 in maintenance mode)

Considering that people will not jump on PHP6 immediately anyways, I
think 1) is more realistic, if we make best efforts to back port new
features to PHP5, but still require that new features go into PHP6
first. Some features might not get back ported and that is a somewhat
unfriendly nudge towards PHP6. So it goes.

I tend to agree with this POV more and more.

Well, that's what I've tried to say for a long time. We're forking
whether we want it or not. So why not make it official? I'd rather
maintain 2 branches than 2 "versions" in same branch. And 2 set of
tests..and and..

Yup, I did not get what you mean when you said it first, but now I get it :)

regards,
Lukas

18 years ago by Stanislav Malyshev — view source — reply

unread

You don't by a Porsche if you need a taxi, why would you install PHP6 if
you don't need Unicode?

Namespaces ;)

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Antony Dovgal — view source — reply

unread

You don't by a Porsche if you need a taxi, why would you install PHP6 if
you don't need Unicode?

Namespaces ;)

This reason is only valid if we don't backport such things from PHP6 to PHP5
(5.3, 5.5 or whatever it would be), which I think we should do.

--
Wbr,
Antony Dovgal

18 years ago by Richard Lynch — view source — reply

unread

You don't by a Porsche if you need a taxi, why would you install
PHP6 if
you don't need Unicode?

Namespaces ;)

This reason is only valid if we don't backport such things from PHP6
to PHP5
(5.3, 5.5 or whatever it would be), which I think we should do.

Then PHP 6 is going to be a very weird beast...

It adds only the Unicode feature that a tiny niche market needs,
because everything else will be back-ported to PHP 5.

So the only adopters of PHP 6 will be:
Users who need Unicode
Users who just cannot wait for that new feature to get back-ported
Masochists :-)

For how long will this back-port to PHP 5 policy be in effect?

The whole lifetime of PHP 6?

Until PHP 7 is "stable"?

Until the Unicode advocates get tired of back-porting? :-v

Will users then jump from PHP 5 to 7?

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Stanislav Malyshev — view source — reply

unread

It adds only the Unicode feature that a tiny niche market needs,
because everything else will be back-ported to PHP 5.

I'm not sure assumption that unicode is needed only for "tiny niche
market" is entirely correct.

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Andrei Zmievski — view source — reply

unread

And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

Really, we need to get folks to use Unicode natively as much as
possible. It is the way of the future, and not some "obscure
feature", as some here have suggested. This kind of attitude is
precisely why we've had and continue to have such an
internationalization mess when it comes to building applications.

-Andrei

You don't by a Porsche if you need a taxi, why would you install
PHP6 if you don't need Unicode?
Namespaces ;)
This reason is only valid if we don't backport such things from
PHP6 to PHP5 (5.3, 5.5 or whatever it would be), which I think we
should do.

--
Wbr, Antony Dovgal

18 years ago by Antony Dovgal — view source — reply

unread

And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

I don't really see much sense in forcing people to use PHP6 if we accept the "PHP5 = PHP6 - Unicode" formula.
They are just different things, period.

Really, we need to get folks to use Unicode natively as much as possible.

Andrei, I personally don't need Unicode at all.
I know, that may sound weird, but that's true.

This kind of attitude is
precisely why we've had and continue to have such an
internationalization mess when it comes to building applications.

What attitude are you talking about here?

I'm trying to be honest with myself in the first place.
Do I like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
Do I want to maintain this mess in the future just because of some bad design decision in the past? Noway, we had enough of that already.

I would love to have clean and easy PHP6 without all the "compatibility", which creates gazillion problems to both users and developers.
Please notice that I didn't call Unicode useless crap or whatever others may think about it,
I just want PHP6 to be Unicode-only release because it would make my personal life much easier
without complicating others' lives.

--
Wbr,
Antony Dovgal

18 years ago by nicobn@php.net — view source — reply

unread

Permit me to give my 2 cents on that and share my small bit of experience
with PHP 6.

First of all, I totally agree with you Antony. I'm currently working on
deploying a big codebase in PHP 6 (for those of you who didn't know, I'm the
GSoC student working on refactoring Jaws for PHP 6) and my head started to
ache when I began understanding all the complications of the unicode
implementation as it is right now. Basically, having that
unicode.semanticsPHP_INI switch just totally kills the fun because I
have to have a working
application if it is ON or OFF. Long story short, this forces me to
explicitly define each string as either binary or unicode, which doesn't
make any "PHP sense". It's actually the first time I'm forced to explicitly
specify a variable type in PHP and I'm not sure I'm the only one who's not
happy about this. I like the unicode support and really appreciate all the
work that's been done on it but I absolutely think it should be implemented
without that headache/pain in the ass switch that'll make transition even
tougher for everyone.

In that case, I can say simplicity is certainly not dumb.

And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

I don't really see much sense in forcing people to use PHP6 if we accept
the "PHP5 = PHP6 - Unicode" formula.
They are just different things, period.

Really, we need to get folks to use Unicode natively as much as
possible.

Andrei, I personally don't need Unicode at all.
I know, that may sound weird, but that's true.

This kind of attitude is
precisely why we've had and continue to have such an
internationalization mess when it comes to building applications.

What attitude are you talking about here?

I'm trying to be honest with myself in the first place.
Do I like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
Do I want to maintain this mess in the future just because of some bad
design decision in the past? Noway, we had enough of that already.

I would love to have clean and easy PHP6 without all the "compatibility",
which creates gazillion problems to both users and developers.
Please notice that I didn't call Unicode useless crap or whatever others
may think about it,
I just want PHP6 to be Unicode-only release because it would make my
personal life much easier
without complicating others' lives.

--
Wbr,
Antony Dovgal

--

--
Nicolas Bérard-Nault (nicobn@gmail.com)
Étudiant D.E.C. Sciences, Lettres & Arts
Cégep de Sherbrooke

Homepage: http://nicobn.googlepages.com

18 years ago by Derick Rethans — view source — reply

unread

Permit me to give my 2 cents on that and share my small bit of experience
with PHP 6.

First of all, I totally agree with you Antony. I'm currently working on
deploying a big codebase in PHP 6 (for those of you who didn't know, I'm the
GSoC student working on refactoring Jaws for PHP 6) and my head started to
ache when I began understanding all the complications of the unicode
implementation as it is right now. Basically, having that
unicode.semanticsPHP_INI switch just totally kills the fun because I
have to have a working
application if it is ON or OFF.

Why? Just state that it only works when it is turned ON - I am pretty
sure that that's the way we'll go.

Long story short, this forces me to
explicitly define each string as either binary or unicode, which doesn't
make any "PHP sense". It's actually the first time I'm forced to explicitly
specify a variable type in PHP and I'm not sure I'm the only one who's not
happy about this. I like the unicode support and really appreciate all the
work that's been done on it but I absolutely think it should be implemented
without that headache/pain in the ass switch that'll make transition even
tougher for everyone.

That I agree with :)

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Stanislav Malyshev — view source — reply

unread

Do I like that horrible IS_STRING/IS_UNICODE mess we have atm? No.

I don't think there's any way of having both unstructured character data
and Unicode text represented without having two distinct types. Either
that or you'd have to tell on each step which one it is, and that would
suck much more.

I would love to have clean and easy PHP6 without all the
"compatibility", which creates gazillion problems to both users and
developers.

Fixing unicode=on does not remove the IS_STRING/IS_UNICODE duality. We
still have two kinds of data - unstructured bit stream and structured
text. If we want strlen("превед") to return 6 - since that Russian word
has 6 characters - then we have no way but recognize that it's not just
a collection of bits but Unicode text, and that would require separate
type, as I see it. And as I see it, this is the source of the problems
when people try to operate on text as on bit stream and vice versa.

Unless I totally missed what mess you are referring to...

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by johannes@php.net — view source — reply

unread

Hi,

Fixing unicode=on does not remove the IS_STRING/IS_UNICODE duality. We
still have two kinds of data - unstructured bit stream and structured
text.

But we still have the mess that most internal structures (function
tables, class tables, ...) either hold an IS_STRING or IS_UNICODE
depending on a configuration option - just check the amounts of
UG(unicode)?IS_UNICODE:IS_STIRNG (that one even got a macro
ZEND_STR_TYPE) kind of checks - these make the code way harder to read
and maintain.

And again: It is as easy to run PHP 5 and PHP 6 on the same host as PHP
6 with unicode and PHP 6 w/o so I can't see a BC benefit of that setting
but I can see that this gives us two products with the same name - PHP

And that's bad.

johannes

18 years ago by Larry Garfield — view source — reply

unread

Do I like that horrible IS_STRING/IS_UNICODE mess we have atm? No.

I don't think there's any way of having both unstructured character data
and Unicode text represented without having two distinct types. Either
that or you'd have to tell on each step which one it is, and that would
suck much more.

I would love to have clean and easy PHP6 without all the
"compatibility", which creates gazillion problems to both users and
developers.

Fixing unicode=on does not remove the IS_STRING/IS_UNICODE duality. We
still have two kinds of data - unstructured bit stream and structured
text. If we want strlen("превед") to return 6 - since that Russian word
has 6 characters - then we have no way but recognize that it's not just
a collection of bits but Unicode text, and that would require separate
type, as I see it. And as I see it, this is the source of the problems
when people try to operate on text as on bit stream and vice versa.

Unless I totally missed what mess you are referring to...

I am coming into this discussion decidedly late here, so please thwap me
gently if this is a FAQ. Do we have any idea of what percentage of strings
in the "wild" would break if treated as Unicode vs. not?

If 90% of the strings in use would work fine if treated as unicode, then it
would make sense to just always assume Unicode unless explicitly specified
otherwise.

If 90% of the strings in use would die if treated as Unicode, then Unicode
should probably be the exception and only when explicitly defined.

I'm not liking the ghosts of magic_quotes I'm seeing implied here with
different modes for the server to be in. That sounds like it would make
writing code that works the same everywhere and is not ugly to read (crapload
of markers or lots of conditionals) quite difficult.

As I said, feel free to assuage my fear if appropriate. :-)

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Richard Lynch — view source — reply

unread

If 90% of the strings in use would work fine if treated as unicode,
then it
would make sense to just always assume Unicode unless explicitly
specified
otherwise.

If that 10% includes enough users who have written millions of line of
code in a self-consistent manner that voids ALL their work, you may
want to re-think this 90% number you have chosen...

And of course you need 2 distinct data types for Unicode and strings.

What I don't understand is why you'd lock things down so that:

a) the default "string" is Unicode, breaking XX% of existing applications

b) the end user can't readily change a) in a huge percentage of
existing install base (read: non-dedicated hosting or mixed-user
servers with shared httpd.conf settings)

I realize it's far too late by now to do anything about it, most
likely, but why in the world didn't you just choose a new keyword to
define/declare a string as Unicode?

And did I dream the thread on this way back when where it was stated
that Unicode was backwards-compatible, so this wouldn't be a problem?

Yet now it seems that UTF-16 is not backwards-compatible, and this
seems like a pretty big problem to me.

Oh well. I guess I'll just shut up and hope most of my code doesn't
break when I go copying/pasting it into new sites that are locked into
Unicode mode with no way for me to change that...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Larry Garfield — view source — reply

unread

And did I dream the thread on this way back when where it was stated
that Unicode was backwards-compatible, so this wouldn't be a problem?

Yet now it seems that UTF-16 is not backwards-compatible, and this
seems like a pretty big problem to me.

Oh well. I guess I'll just shut up and hope most of my code doesn't
break when I go copying/pasting it into new sites that are locked into
Unicode mode with no way for me to change that...

AFAIK, UTF-8 is backward compatible with ASCII. UTF-16 is not. That's why
Microsoft defaults to UTF-16 (when they don't default to Windows-1251 or
whatever crap it is) and the rest of the universe (at least the parts of it
that I've seen) defaults to UTF-8.

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Stanislav Malyshev — view source — reply

unread

AFAIK, UTF-8 is backward compatible with ASCII. UTF-16 is not. That's why

Well, with 7-bit ASCII - yes. With 8-bit "extended ASCII", whatever that
means - not exactly. You can have 8-bit strings that aren't valid UTF-8
and can't be translated to UTF-8 without specifying the encoding
(iso-889-1 or something like that).

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Rasmus Lerdorf — view source — reply

unread

Richard Lynch wrote:

If 90% of the strings in use would work fine if treated as unicode,
then it
would make sense to just always assume Unicode unless explicitly
specified
otherwise.

If that 10% includes enough users who have written millions of line of
code in a self-consistent manner that voids ALL their work, you may
want to re-think this 90% number you have chosen...

And of course you need 2 distinct data types for Unicode and strings.

What I don't understand is why you'd lock things down so that:

a) the default "string" is Unicode, breaking XX% of existing applications

b) the end user can't readily change a) in a huge percentage of
existing install base (read: non-dedicated hosting or mixed-user
servers with shared httpd.conf settings)

I realize it's far too late by now to do anything about it, most
likely, but why in the world didn't you just choose a new keyword to
define/declare a string as Unicode?

And did I dream the thread on this way back when where it was stated
that Unicode was backwards-compatible, so this wouldn't be a problem?

Yet now it seems that UTF-16 is not backwards-compatible, and this
seems like a pretty big problem to me.

Richard, you are rather confused on this Unicode stuff. The fact that
PHP and ICU uses UTF-16 internally has absolutely nothing to do with
what is exposed at the scripting level.

The only things that will break in a standard application is stuff that
relies on strings being binary. Normal text passing back and forth
between the browser and the server will work just fine.

The breakages, apart from various bugs at this early stage, are limited
to places where the code is expecting to see a binary string and PHP
hasn't been able to determine this automatically. And hopefully we can
come up with ways to automatically determine when something should
default to a binary string.

But if you write:

$a = "マニュアル";
echo $a[1];

and you expect to have that spew out 0xe3, then yes, it will break
because it will result in ニ which is what it really should do.

And yes, I know a lot of people reading this list don't care much for
other charsets, but people reading an english mailing list are rather
self-selecting.

-Rasmus

18 years ago by Richard Lynch — view source — reply

unread

Richard, you are rather confused on this Unicode stuff.

I'm 100% certain we can all agree on that point. :-)

The fact that
PHP and ICU uses UTF-16 internally has absolutely nothing to do with
what is exposed at the scripting level.

But somebody has just said that it will, didn't they?

That GPC data will be Unicode, and trying to use it as ASCII will break?

The only things that will break in a standard application is stuff
that
relies on strings being binary. Normal text passing back and forth
between the browser and the server will work just fine.

The breakages, apart from various bugs at this early stage, are
limited
to places where the code is expecting to see a binary string and PHP
hasn't been able to determine this automatically. And hopefully we
can
come up with ways to automatically determine when something should
default to a binary string.

But if you write:

$a = "ããã¥ã¢ã«";
echo $a[1];

Whoa.

That was weird...

It was just a bunch of question marks when I read it, and now it's a
bunch of symbols (variants on afz mostly) in my reply...

and you expect to have that spew out 0xe3, then yes, it will break
because it will result in ã which is what it really should do.

You have me beat at the "...if you write" part, because I have no idea
how to make my keyboard make those symbols... :-v

My only concern is that:

http://example.com/foo=bar
echo $_GET['foo'][2];
should still print out 'a' just like it always has.

And:
http://example.com/mask=100110
echo $_GET['mask'] & 110010;
should print out 100010 just like it always has

Folks keep saying that bit-string manipulation makes no sense in
Unicode, and that's fine, I guess...

If a scripter is trying to do that, then see if the string is ASCII
[01]* and typecast it to binary string or whatever and just move on
with life in the old way.

And yes, I know a lot of people reading this list don't care much for
other charsets, but people reading an english mailing list are rather
self-selecting.

I love the idea of users being able to write things in their own
language, and somehow it magically all just "looks right" when I slam
it into the database with mysql_real_escape_string and spew it back
out the the browser with htmlentities!

But it never quite seems to work out, in my limited experience,
because some software somewhere always manages to mangle it...

And I release the whole point of Unicode in PHP 6 is to make PHP 6 not
be that piece of software that mangles it, and I'm sure you guys are
getting that bit right. Well, I hope so anyway. :-)

I especially hope so, because if you don't get it right, I'll never be
able to tell, as I wouldn't notice the difference if it's broken or
not just by looking at the text in anything other than English.

I just get real concerned when it seems to me like a lot of scripts
are going to break, based on what folks who should know post here...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Tomas Kuliavas — view source — reply

unread

But if you write:

$a = "マニュアル";
echo $a[1];

Whoa.

That was weird...

It was just a bunch of question marks when I read it, and now it's a
bunch of symbols (variants on afz mostly) in my reply...

Your browser or operating system does not support Japanese symbols and
translation selected in your Hostbaby Webmail (or you could use real name

SquirrelMail) does not support Japanese characters in reply.

According to google translate $a variable stores word 'Manual' written in
Japanese.

--
Tomas

18 years ago by Rasmus Lerdorf — view source — reply

unread

Richard Lynch wrote:

$a = "ãƒžãƒ‹ãƒ¥ã‚¢ãƒ«";
echo $a[1];

Whoa.

That was weird...

Right, your mail client doesn't handle Unicode correctly. You might
want to do something about that.

-Rasmus

18 years ago by Uwe Schindler — view source — reply

unread

That sounds "good" in my ears.

Software that relys on "old" non-unicode behaviour must be written in a way two handle non-unicode and Unicode behaviour in two different ways.
But for example a rewritten "Squirrelmail" that runs exlusively on PHP6 would be a good thing.

So you could write on your release notes: "We have this new version SquirrelMail++ that’s running only on hosts running PHP6. Using this would be a great speed and performance increase, because the Unicode addons are only available here. If you need an old non-unicode version, you have to stay with our old historic version." The old historic Squirrelmail version without Unicode support would be stays supported until some time. But all users would know: If I want to have new features, I should think about a change to PHP6, all other users could stay on the old version.

In the case of the fantastic software "SquirrelMail++PHP6-only" (which I would use on my servers, too) I would think in this direction!

Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany

-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Saturday, July 14, 2007 4:00 PM
To: ceo@l-i-e.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Richard Lynch wrote:

$a = "ãƒžãƒ‹ãƒ¥ã‚¢ãƒ«";
echo $a[1];

Whoa.

That was weird...

Right, your mail client doesn't handle Unicode correctly. You might
want to do something about that.

-Rasmus

18 years ago by Uwe Schindler — view source — reply

unread

In the case of the fantastic software "SquirrelMail++PHP6-only" (which I
would use on my servers, too) I would think in this direction!

My last post was specific to the complaining guy from SquirrelMail:
Squirrelmail is a fantastic example of software that would, in a rewritten
form, make use of PHP6 at many points (there are many bugs with Unicode in
it...) and make profit of it!

Uwe

Uwe Schindler
thetaphi@php.net - http://www.php.net
NSAPI SAPI developer
Bremen, Germany

18 years ago by Tomas Kuliavas — view source — reply

unread

That sounds "good" in my ears.

Software that relys on "old" non-unicode behaviour must be written in a
way two handle non-unicode and Unicode behaviour in two different ways.
But for example a rewritten "Squirrelmail" that runs exlusively on PHP6
would be a good thing.

So you could write on your release notes: "We have this new version
SquirrelMail++ that’s running only on hosts running PHP6. Using this would
be a great speed and performance increase, because the Unicode addons are
only available here. If you need an old non-unicode version, you have to
stay with our old historic version." The old historic Squirrelmail version
without Unicode support would be stays supported until some time. But all
users would know: If I want to have new features, I should think about a
change to PHP6, all other users could stay on the old version.

In the case of the fantastic software "SquirrelMail++PHP6-only" (which I
would use on my servers, too) I would think in this direction!

There is nothing in current PHP6 version that can be used by SquirrelMail.
Last features are provided by PHP 5.1.0. Limiting code to PHP6 would
reduce user base. SquirrelMail can work on PHP6 with
unicode.semantics=off, if two lines in one script are fixed.

P.S. I am not SquirrelMail guy. I am former SquirrelMail developer and I
use own modified SquirrelMail version. It does not have issues with
Japanese.

--
Tomas

18 years ago by Richard Lynch — view source — reply

unread

Richard Lynch wrote:

$a = "Ã£ÆÅ¾Ã£Æâ¹Ã£ÆÂ¥Ã£âÂ¢Ã£ÆÂ«";
echo $a[1];

Whoa.

That was weird...

Right, your mail client doesn't handle Unicode correctly. You might
want to do something about that.

Or not, since I don't have any chance of reading Japanese even if the
characters "look right"...

I was in Paris once, and using a French keyboard didn't improve my
French either. :-)

I could switch to a "real" mail client instead of the webhost supplied
SquirrelMail, I suppose...

Last time I tried to do that, the Linux mail client "ate" a bunch of
my email and really messed things up badly. I think it was KMail...

Other times I found the sync time of an IMAP mail client to be rather
abysmal compared to the web-based eamil...

Maybe I just store too much old email or something, but I'm not seeing
much reason to switch, since neither of the two renderings were
readable.

It was only interesting that it "switched" in read mode and compose
mode is all.

PS I suspect a newer squirrelMail would handle Unicode just fine, and
I'm sure my webhost will upgrade long before I need them to.

Or I can install a new squirrelMail in my own server and run whatever
version I want, if I go learn Japanese first...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Jani Taskinen — view source — reply

unread

Antony Dovgal kirjoitti:

And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

I don't really see much sense in forcing people to use PHP6 if we accept
the "PHP5 = PHP6 - Unicode" formula.
They are just different things, period.

Really, we need to get folks to use Unicode natively as much as possible.

Andrei, I personally don't need Unicode at all.
I know, that may sound weird, but that's true.

This kind of attitude is precisely why we've had and continue to have
such an internationalization mess when it comes to building
applications.

What attitude are you talking about here?

I'm trying to be honest with myself in the first place.
Do I like that horrible IS_STRING/IS_UNICODE mess we have atm? No.
Do I want to maintain this mess in the future just because of some bad
design decision in the past? Noway, we had enough of that already.

I would love to have clean and easy PHP6 without all the
"compatibility", which creates gazillion problems to both users and
developers.
Please notice that I didn't call Unicode useless crap or whatever others
may think about it, I just want PHP6 to be Unicode-only release because
it would make my personal life much easier
without complicating others' lives.

Thank you Antony. This is exactly how I think too.

--Jani

18 years ago by Andi Gutmans — view source — reply

unread

The large amount of the dual IS_UNICODE/IS_STRING will need to stay in
the code base anyway as we will be supporting binary strings in PHP 6.
So it's not accurate that all these maintance issues will be resolved by
not supporting unicode_semantics=off.

I believe unlike what Andrei said, for a large community of ours
(probably the majority) default unicode_semantics=on will not be of
interest (we don't live in a purists world). Many won't want to run it
because it's going to be significantly slower and will be harder for
them to work with. This community will be best served to be able to run
in native 8bit mode and having some Unicode functionality available
if/when needed. Having dual mode in PHP 6 is not the same as forking two
code bases. There is still like namespaces automatically reach both
audiences.

If we're talking from a pure "what is most useful to the majority of our
users" I'd actually argue that explicit Unicode strings would be the
most convenient, i.e. instead of doing b"8bitstring" you'd do
U"unicodestring". Other languages do the same and there are reasons for
that. As we've decided on a more aggressive (and risky) approach, I
think having this dual mode is extremely important. It will also make
the upgrade path easier.

Btw, I don't know how many of you have actually tried to port PHP 5 apps
to PHP 6 but it's quite a disaster. We made some fixes in the past 2-3
weeks and its getting better but it still requires a lot of work. If we
don't make this easy then this is all not worth too much.

This project has never been a purists project which is why it's been so
successful, let's not start now...

Andi

18 years ago by Andi Gutmans — view source — reply

unread

I was thinking a bit more about this yesterday. Even if I'd agree with
this discussion (which I don't at this point in time) I think it is
being had far too early. We currently have a very big problem with
ability to upgrade to PHP 6 and making decisions without people actually
getting their feet wet and seeing what the issues are is not a good
idea. Purist decisions tend to fail when they meet the real world.

What I really think we need to do for this release, which we haven't
been good at doing in the past, is build a PHP Compatibility Team which
tries to port many applications to PHP 6 and finds the issues in doing
this port (both with unicode_semantics=on/off). We can then learn from
this experience and have good documentation on how to upgrade to both
modes and in some cases, like we have done in the past 2-3 weeks, tweak
PHP 6 to not break backwards compatibility. It is possible in many
cases.

It's something we are willing to spend time on and as I mentioned
already started to do but it would really require a larger amount of
volunteers to pick various apps and do it.

This kind of information would be far more valuable to the project at
this point than a prolonged thread about a piece of software which isn't
finish (and would also give more information for a discussion like the
one we've been having). No one really knows how good/bad of a situation
we are at right now. I know from my end it doesn't look great yet.

Andi

-----Original Message-----
From: Andi Gutmans [mailto:andi@zend.com]
Sent: Monday, July 09, 2007 7:39 PM
To: Antony Dovgal; Andrei Zmievski
Cc: Stas Malyshev; internals@lists.php.net
Subject: RE: [PHP-DEV] What is the use of "unicode.semantics"
in PHP 6?

The large amount of the dual IS_UNICODE/IS_STRING will need
to stay in the code base anyway as we will be supporting
binary strings in PHP 6.
So it's not accurate that all these maintance issues will be
resolved by not supporting unicode_semantics=off.

I believe unlike what Andrei said, for a large community of
ours (probably the majority) default unicode_semantics=on
will not be of interest (we don't live in a purists world).
Many won't want to run it because it's going to be
significantly slower and will be harder for them to work
with. This community will be best served to be able to run in
native 8bit mode and having some Unicode functionality
available if/when needed. Having dual mode in PHP 6 is not
the same as forking two code bases. There is still like
namespaces automatically reach both audiences.

If we're talking from a pure "what is most useful to the
majority of our users" I'd actually argue that explicit
Unicode strings would be the most convenient, i.e. instead of
doing b"8bitstring" you'd do U"unicodestring". Other
languages do the same and there are reasons for that. As
we've decided on a more aggressive (and risky) approach, I
think having this dual mode is extremely important. It will
also make the upgrade path easier.

Btw, I don't know how many of you have actually tried to port
PHP 5 apps to PHP 6 but it's quite a disaster. We made some
fixes in the past 2-3 weeks and its getting better but it
still requires a lot of work. If we don't make this easy then
this is all not worth too much.

This project has never been a purists project which is why
it's been so successful, let's not start now...

Andi

--
To
unsubscribe, visit: http://www.php.net/unsub.php

18 years ago by Evert | Rooftop — view source — reply

unread

Andi Gutmans wrote:

What I really think we need to do for this release, which we haven't
been good at doing in the past, is build a PHP Compatibility Team which
tries to port many applications to PHP 6 and finds the issues in doing
this port (both with unicode_semantics=on/off). We can then learn from
this experience and have good documentation on how to upgrade to both
modes and in some cases, like we have done in the past 2-3 weeks, tweak
PHP 6 to not break backwards compatibility. It is possible in many
cases.

I'd volunteer for this. Does it help you guys to get started with this
today, or should I be waiting till there's more agreement on some of
this stuff..

Evert

18 years ago by Andi Gutmans — view source — reply

unread

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

-----Original Message-----
From: Evert | Rooftop [mailto:evert@rooftopsolutions.nl]
Sent: Tuesday, July 10, 2007 9:40 AM
To: Andi Gutmans
Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
in PHP 6?

Andi Gutmans wrote:

What I really think we need to do for this release, which
we haven't
been good at doing in the past, is build a PHP Compatibility Team
which tries to port many applications to PHP 6 and finds
the issues in
doing this port (both with unicode_semantics=on/off). We can then
learn from this experience and have good documentation on how to
upgrade to both modes and in some cases, like we have done
in the past
2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
possible in many cases.

I'd volunteer for this. Does it help you guys to get started
with this today, or should I be waiting till there's more
agreement on some of this stuff..

Evert

18 years ago by Evert | Rooftop — view source — reply

unread

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common pitfalls)
Should I be documenting the per-project specifics as well?

Evert

18 years ago by Larry Garfield — view source — reply

unread

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common pitfalls)
Should I be documenting the per-project specifics as well?

Evert

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

And I run out of steam here. :-) That's just my guess, though.

Probably a better place to look would be to see what is commonly
pre-installable or pre-installed at shared hosts. phpMyAdmin and
Squirrelmail seem to be everywhere. WordPress, Drupal, Joomla, and PHPBB
seem to turn up in "free scripts!" lists a lot.

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Evert | Rooftop — view source — reply

unread

Larry Garfield wrote:

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

That will keep me busy =)

Evert

18 years ago by Robert Lemke — view source — reply

unread

Am 11.07.2007 um 07:20 schrieb Evert|Rooftop:

Top 10 by what metric? If I had to guess based on market share,
I'd say (unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

hey, and what about TYPO3? ;-)

Honestly, I've tried the current version of TYPO3 (4.x) with PHP6 and
as it seems it is not very difficult adapting it. Most of the errors
were of type E_STRICT and with unicode.semantics off it probably
needs few changes because we don't rely on PHP functions for unicode
support.

I'm currently working on TYPO3 5.0 which comes with a new codebase
specifically written for PHP6 and I agree with Nicolas, that the
unicode.semantics switch spoils the fun a little. We just have to
hope that enough hosting companies offer PHP6 based webspaces with
unicdode.semantics turned on. And if they don't, we'll have to start
an initiative and ask hosters specifically to offer such a product.

Robert

http://typo3.org/gimmefive

18 years ago by Richard Quadling — view source — reply

unread

Larry Garfield wrote:

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

That will keep me busy =)

Evert

Would it also be worth checking some of the frameworks too? Prado, eZ, Zend?

Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"

18 years ago by Richard Lynch — view source — reply

unread

Larry Garfield wrote:

Top 10 by what metric? If I had to guess based on market share,
I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

I saw a reference in this thread to webhosts that don't upgrade
because cPanel didn't work, no?
[Larry said that, I think...]

So, I dunno, maybe the various panels that all those webhosters use
should be a candidate...

I mean, they all seem to have those panel thingies, even if I
personally use them as rarely as humanly possible...

[Talk about making easy things impossible... :-)]

I got no idea which ones are the most common, though.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by M. Sokolewicz — view source — reply

unread

Richard Lynch wrote:

Larry Garfield wrote:

Top 10 by what metric? If I had to guess based on market share,
I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

I saw a reference in this thread to webhosts that don't upgrade
because cPanel didn't work, no?
[Larry said that, I think...]

So, I dunno, maybe the various panels that all those webhosters use
should be a candidate...

I mean, they all seem to have those panel thingies, even if I
personally use them as rarely as humanly possible...

[Talk about making easy things impossible... :-)]

I got no idea which ones are the most common, though.

Cpanel
Plesk
Ensim

18 years ago by Derick Rethans — view source — reply

unread

Larry Garfield wrote:

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

That will keep me busy =)

Would it also be worth checking some of the frameworks too? Prado, eZ,
Zend?

I did test things a couple of months ago for the eZ Components, and it
didn't seem that bad. But now it's more "messy", but I didn't really
check why.

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for
the dev
team.
It'd probably be a good idea to have a Wiki where we can document
issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common
pitfalls)
Should I be documenting the per-project specifics as well?

Evert

Top 10 by what metric? If I had to guess based on market share,
I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin
MediaWiki
Joomla
PHPBB

And I run out of steam here. :-) That's just my guess, though.

Probably a better place to look would be to see what is commonly
pre-installable or pre-installed at shared hosts. phpMyAdmin and
Squirrelmail seem to be everywhere. WordPress, Drupal, Joomla, and
PHPBB
seem to turn up in "free scripts!" lists a lot.

we tried to get most of the top php OSS projects into the primary
testers group:
http://oss.backendmedia.com/PhP4yz
http://oss.backendmedia.com/PhP5yz
http://oss.backendmedia.com/PhP6yz

regards,
Lukas

18 years ago by Jani Taskinen — view source — reply

unread

we tried to get most of the top php OSS projects into the primary
testers group:
http://oss.backendmedia.com/PhP4yz
http://oss.backendmedia.com/PhP5yz
http://oss.backendmedia.com/PhP6yz

Emphasis on word "tried" ? :D
Is there some procedure to follow for releases regarding those testers
anyway?

--Jani

18 years ago by Tomas Kuliavas — view source — reply

unread

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document
issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common
pitfalls)
Should I be documenting the per-project specifics as well?

Evert

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail

SquirrelMail

remove session_unregister call
fix get_magic_quotes_gpc() call
Turn off unicode.semantics in webserver configuration or php.ini

SquirrelMail scripts are designed to work with binary strings. Lots of
SquirrelMail functions are not compatible with unicode.semantics=on. Some
calls are not prepared to handle changes in crc32(), base64_encode(),
fputs() and fwrite(). If scripts keep backwards compatibility, they will
need wrappers for most of affected string and stream functions.

Some unicode.semantics=on side effects can be fixed without splitting
functions between PHP5 and PHP6, but unicode.script_encoding can't be set
with ini_set() and must be declared on top of all affected scripts.

--
Tomas

18 years ago by Sebastian Mendel — view source — reply

unread

Larry Garfield schrieb:

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi
Is anyone aware of a list with a, say top 10 PHP applications?

When such a wiki is setup, how would you suggest to write such
documents.. At least a generic guide would be good (e.g.: common pitfalls)
Should I be documenting the per-project specifics as well?

Evert

Top 10 by what metric? If I had to guess based on market share, I'd say
(unordered):

Drupal
Squirrelmail
WordPress
phpMyAdmin

phpMyAdmin runs fine with PHP 6, except masses of notices/stricts (due to
PHP 4 compatibility till 2.11 release this year)

if you find problems tell me

--
Sebastian Mendel

www.sebastianmendel.de

18 years ago by Evert | Rooftop — view source — reply

unread

One final question..

should I assume while converting code "unicode.semantics" is on or off?

If its on I would be making sure everything is properly casted to binary
strings where this is needed, if it's off the focus would be on making
sure the application runs on both PHP5 and PHP6..

What makes the most sense here? I would personally say I would try it
assuming its off, as this is the most likely for the development teams
to target for ..

Evert

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the dev
team.
It'd probably be a good idea to have a Wiki where we can document issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

-----Original Message-----
From: Evert | Rooftop [mailto:evert@rooftopsolutions.nl]
Sent: Tuesday, July 10, 2007 9:40 AM
To: Andi Gutmans
Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
in PHP 6?

Andi Gutmans wrote:

What I really think we need to do for this release, which

we haven't

been good at doing in the past, is build a PHP Compatibility Team
which tries to port many applications to PHP 6 and finds

the issues in

doing this port (both with unicode_semantics=on/off). We can then
learn from this experience and have good documentation on how to
upgrade to both modes and in some cases, like we have done

in the past

2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
possible in many cases.

I'd volunteer for this. Does it help you guys to get started
with this today, or should I be waiting till there's more
agreement on some of this stuff..

Evert

18 years ago by Richard Lynch — view source — reply

unread

Seems to me...

Both need to be done.

Do both, or pick one if you can't do both, and somebody else will do
the other. That's how FLOSS works. :-)

One final question..

should I assume while converting code "unicode.semantics" is on or
off?

If its on I would be making sure everything is properly casted to
binary
strings where this is needed, if it's off the focus would be on making
sure the application runs on both PHP5 and PHP6..

What makes the most sense here? I would personally say I would try it
assuming its off, as this is the most likely for the development teams
to target for ..

Evert

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the
dev
team.
It'd probably be a good idea to have a Wiki where we can document
issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

-----Original Message-----
From: Evert | Rooftop [mailto:evert@rooftopsolutions.nl]
Sent: Tuesday, July 10, 2007 9:40 AM
To: Andi Gutmans
Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
in PHP 6?

Andi Gutmans wrote:

What I really think we need to do for this release, which

we haven't

been good at doing in the past, is build a PHP Compatibility Team
which tries to port many applications to PHP 6 and finds

the issues in

doing this port (both with unicode_semantics=on/off). We can then
learn from this experience and have good documentation on how to
upgrade to both modes and in some cases, like we have done

in the past

2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
possible in many cases.

I'd volunteer for this. Does it help you guys to get started
with this today, or should I be waiting till there's more
agreement on some of this stuff..

Evert

--

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Olivier Hill — view source — reply

unread

Is there a reason why the last 10 messages on this thread are coming from you?

It might just be me, but answering in the same email would be great.

Olivier

Seems to me...

Both need to be done.

Do both, or pick one if you can't do both, and somebody else will do
the other. That's how FLOSS works. :-)

One final question..

should I assume while converting code "unicode.semantics" is on or
off?

If its on I would be making sure everything is properly casted to
binary
strings where this is needed, if it's off the focus would be on making
sure the application runs on both PHP5 and PHP6..

What makes the most sense here? I would personally say I would try it
assuming its off, as this is the most likely for the development teams
to target for ..

Evert

Andi Gutmans wrote:

I think the sooner the better as it's valuable information for the
dev
team.
It'd probably be a good idea to have a Wiki where we can document
issues
that/common use-cases which are encountered.
Maybe we should have a Wiki on one of the php.net servers for such
purposes?
Andi

-----Original Message-----
From: Evert | Rooftop [mailto:evert@rooftopsolutions.nl]
Sent: Tuesday, July 10, 2007 9:40 AM
To: Andi Gutmans
Cc: Antony Dovgal; Andrei Zmievski; Stas Malyshev;
internals@lists.php.net
Subject: Re: [PHP-DEV] What is the use of "unicode.semantics"
in PHP 6?

Andi Gutmans wrote:

What I really think we need to do for this release, which

we haven't

been good at doing in the past, is build a PHP Compatibility Team
which tries to port many applications to PHP 6 and finds

the issues in

doing this port (both with unicode_semantics=on/off). We can then
learn from this experience and have good documentation on how to
upgrade to both modes and in some cases, like we have done

in the past

2-3 weeks, tweak PHP 6 to not break backwards compatibility. It is
possible in many cases.

I'd volunteer for this. Does it help you guys to get started
with this today, or should I be waiting till there's more
agreement on some of this stuff..

Evert

--

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Larry Garfield — view source — reply

unread

Because he's Richard. He always does that. You should see him on
php-general. :-)

Is there a reason why the last 10 messages on this thread are coming from
you?

It might just be me, but answering in the same email would be great.

Olivier

Seems to me...

Both need to be done.

Do both, or pick one if you can't do both, and somebody else will do
the other. That's how FLOSS works. :-)

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Richard Lynch — view source — reply

unread

What I really think we need to do for this release, which we haven't
been good at doing in the past, is build a PHP Compatibility Team
which
tries to port many applications to PHP 6 and finds the issues in doing
this port (both with unicode_semantics=on/off). We can then learn from
this experience and have good documentation on how to upgrade to both
modes and in some cases, like we have done in the past 2-3 weeks,
tweak
PHP 6 to not break backwards compatibility. It is possible in many
cases.

This all sounds great...

Where are all the developers you need going to come from? :-v

Is it time yet for, say, the squirrelMail developers to try to run
their app in PHP 6 and tell you what all broke?

You wanna announce that somewhere and take a flood of bug reports in
bugs.php.net?

Just tossing out the idea...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by johannes@php.net — view source — reply

unread

Btw, I don't know how many of you have actually tried to port PHP 5 apps
to PHP 6 but it's quite a disaster. We made some fixes in the past 2-3
weeks and its getting better but it still requires a lot of work. If we
don't make this easy then this is all not worth too much.

I'd prefer if we would fix the failing tests with make test first. These
tests already show some problems we might have which might require some
general improvement.

johannes

18 years ago by Christopher Jones — view source — reply

unread

I also think we shouldn't backport features to PHP5. We should

(i) keep PHP5 a stable release with a known feature set for developers
to use.

(ii) have a smaller code base to maintain in PHP5, reducing the
overhead of merging.

(iii) avoid exacerbating the future situation with uptake of PHP6 vs
PHP5 that we now face with PHP5 vs PHP4.

Chris

Andrei Zmievski wrote:

And I think that we shouldn't, since it removes a big incentive for
people to move to PHP 6.

Really, we need to get folks to use Unicode natively as much as
possible. It is the way of the future, and not some "obscure feature",
as some here have suggested. This kind of attitude is precisely why
we've had and continue to have such an internationalization mess when it
comes to building applications.

-Andrei

You don't by a Porsche if you need a taxi, why would you install
PHP6 if you don't need Unicode?
Namespaces ;)
This reason is only valid if we don't backport such things from PHP6
to PHP5 (5.3, 5.5 or whatever it would be), which I think we should do.

--Wbr, Antony Dovgal

--
Christopher Jones, Oracle
Email: christopher.jones@oracle.com Tel: +1 650 506 8630
Blog: http://blogs.oracle.com/opal/ PHP Book: http://tinyurl.com/f8jad

18 years ago by Richard Lynch — view source — reply

unread

I also think we shouldn't backport features to PHP5. We should

I believe the only serious reason FOR this is if you want to drop the
semantics OFF in PHP 6...

If getting new features requires upgrading to 6 and taking the Unicode
stuff that we theorize will break a great deal of code...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Jani Taskinen — view source — reply

unread

This is exactly why I started this thread. I'm afraid the setting causes more
trouble than what it tries to solve..

--Jani

Cristian Rodriguez kirjoitti:

which will just produce way more
problems to hosters and developers of software for "PHP 6".

yes :-( .. So if unicode.semantics cannot be set at runtime with
ini_set() or at least "per-dir" is a complete non-sense to have it,
as the vast mayority of users will not be able to turn it On/off and
will certainly be off in most configurations as otherwise it will
break too much code.

Im sorry but I dont see this ending as a good thing.. looks pretty
much like more of the same old mistakes ( magic_quotes , safe_mode
anyone ? this may be even worse..)

18 years ago by Tomas Kuliavas — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks existing
octal
and hex escape sequences.

What do you mean? Doesn't \x20 create U0020 character? Or you mean you'd
expect it to create just one-byte 0x20? Doesn't binary string do that?

Try higher than 0x7F values.

If I write "\xA0", I expect one byte with A0 hex value and not 0xC2\xA0
(\u00A0). If I use \x80-\xFF range, I expect functions to match bytes and
not only \u0080 - \u00FF

Binary strings can do that, but they are not backwards compatible. In
order to do same thing in PHP4/5 and PHP6, I'll have to move code into
separate libraries.

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning: base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string
given")

Well, exporting and importing to and from non-unicode contexts are
tricky, and fwrite and base64_encode do exactly that. Maybe some
functions need to be less noisy, I don't know - but when people work
with unicode they must be aware that interoperating with non-unicode
contexts brings some complexity, I don't see how that can be avoided.

For me it means that I have to maintain wrappers for fwrite,
base64_encode, ord, crc32 and all other unicode aware functions. Any
direct PHP string or stream function call can cause compatibility issues
or notices. Any function working with binary data will need separate
version for PHP6. Instead of having unicode switches in interpreter
itself, I'll have to implement them in scripts. Talk about performance
issues after that.

--
Tomas

18 years ago by Pierre — view source — reply

unread

Hi,

For me it means that I have to maintain wrappers for fwrite,
base64_encode, ord, crc32 and all other unicode aware functions. Any
direct PHP string or stream function call can cause compatibility issues
or notices. Any function working with binary data will need separate
version for PHP6. Instead of having unicode switches in interpreter
itself, I'll have to implement them in scripts. Talk about performance
issues after that.

Use the right mode (and stream mode) for the right task solves this
problem. If it does not, there is a bug and it has to be fixed.

But it is possible to work with binary data in php6 just like it was
in php5. The "only" difference is that we have now a working unicode
support which includes true binary data and true unicode data instead
of the all-in-one storage used in previous versions (5.x and earlier).

It is certainly not perfect (see the recent discussions about
file_put_contents in our bug tracker for example), but we are on the
right track. If you have issues/bugs with the way binary strings are
handle (in stream or any other functions) please report them and we
will do our best to fix them.

But not being able to use the same code with php6 and 5.1 is not a
bug. It is simply not possible. You need two files and the performance
impact is minimum here (same interface, two implementations).

--Pierre

18 years ago by Andrei Zmievski — view source — reply

unread

It sounds like your libraries are definitely oriented towards working
with binary strings, rather than Unicode strings. So, I am not sure
why you have unicode.semantics turned on then. If you turn it off,
you will get backwards compatibility with PHP 5. And if you do that,
you can still create and work on Unicode strings, programmatically.

-Andrei

If I write "\xA0", I expect one byte with A0 hex value and not 0xC2
\xA0
(\u00A0). If I use \x80-\xFF range, I expect functions to match
bytes and
not only \u0080 - \u00FF

Binary strings can do that, but they are not backwards compatible. In
order to do same thing in PHP4/5 and PHP6, I'll have to move code into
separate libraries.

18 years ago by Tomas Kuliavas — view source — reply

unread

It sounds like your libraries are definitely oriented towards working
with binary strings, rather than Unicode strings. So, I am not sure
why you have unicode.semantics turned on then. If you turn it off,
you will get backwards compatibility with PHP 5. And if you do that,
you can still create and work on Unicode strings, programmatically.

I've never asked to turn on unicode.semantics. I've asked to give controls
of unicode.semantics to scripts (PHP_INI_ALL) or at least give me some
options to turn it off within a script. I don't control PHP version used
by end user and there is a theoretical possibility that end user will use
PHP6 with unicode.semantics=on. So I've tested scripts in
unicode.semantics=on setup. They broke. Lots of notices, broken
authentication functions, etc.

I want to make sure that I have enough controls to reduce side effects of
unicode.semantics=on. Currently I can only ask end user to turn it off
with php_admin_flag or in php.ini.

--
Tomas

18 years ago by Andrei Zmievski — view source — reply

unread

Yes, unfortunately the end user needs to be aware enough of his
environment to either control the unicode.semantics flag or choose
the right server to run it on. Believe me, we've tried making
unicode.semantics controllable on a per-request basis, and after a
long and hard debate after partial implementation we realized it
would make both the code and the applications running on top of it
very fragile.

-Andrei

It sounds like your libraries are definitely oriented towards working
with binary strings, rather than Unicode strings. So, I am not sure
why you have unicode.semantics turned on then. If you turn it off,
you will get backwards compatibility with PHP 5. And if you do that,
you can still create and work on Unicode strings, programmatically.

I've never asked to turn on unicode.semantics. I've asked to give
controls
of unicode.semantics to scripts (PHP_INI_ALL) or at least give me some
options to turn it off within a script. I don't control PHP version
used
by end user and there is a theoretical possibility that end user
will use
PHP6 with unicode.semantics=on. So I've tested scripts in
unicode.semantics=on setup. They broke. Lots of notices, broken
authentication functions, etc.

I want to make sure that I have enough controls to reduce side
effects of
unicode.semantics=on. Currently I can only ask end user to turn it off
with php_admin_flag or in php.ini.

--
Tomas

18 years ago by Richard Lynch — view source — reply

unread

If unicode semantics are "on" what exactly is borked in PHP 5?

In Unicode mode [0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode
code
points and not to octal or hexadecimal byte values. Fix is not
backwards
compatible.

Gak.

You mean this will break:

<?php
$mask = 0xf0;
$value = $_POST['foo'] & $mask;
?>

because of Unicode?

That's nuts.

That can't be right...

Scripts can't match bytes. How they are supposed to check if string is
in
plain ascii or in 8bit? Do conversion to ASCII and check for errors
instead of looking for 8bit byte values? How can scripts replace 8bit
bytes with some other strings? ISO-8859-2 decoding table contains 95
entries written and evaluated as binary strings. Same thing applies to
other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
decoding does not use mapping tables and performs complex calculations
with byte values. multibyte character set decoding might actually
benefit
from unicode_encode(), if Table 325 (http://www.php.net/unicode)
provides
more information about U_INVALID_SUBSTITUTE and other unicode.
settings.

I don't even understand this.

But if I haven't done something new-fangled to make a string be some
new-fangled Unicode thingie, then it's just plain old ASCII, no?

Or PHP can just assume that anyway...

PHP6 does not provide backwards compatible functions to work with
bytes.
Provided constructs are not backwards compatible. If scripts want to
do
MIME Q encoding, they must work with bytes. Doing Q encoding with
provided
PHP extensions adds extra dependencies.

Another one I don't understand...

But since I believe MIME emails are a blight on the universe, I
suspect I just don't care either. :-)

ICU does not support HTML target. Text conversion to iso-8859-x or
windows-125x targets will be lossy.

Well, yeah, if you down-sample UTF-* to a character set that doesn't
have the characters you typed in UTF-*, then those characters won't
make it through the translation.

Output your HTML in UTF-* or accept the loss.

Can that be fixed to be BC without resorting to this toggle?

Unicode and binary typecasting causes E_PARSE error in PHP 5.2.0 and
older.

That's fine.

PHP 6 code that uses new PHP 6 features needs PHP 6.

If that surprises somebody, they have a fundamental misunderstanding
of major release version.

PHP6 could introduce new Unicode aware functions, but Unicode
implementation choose to modify existing ones. All low level string
operations ($string[1]) are Unicode aware by default and not when
script
actually asks for it. Such implementation is designed for developers,
who
don't care about Unicode support and want it out of the box without
any
changes in their Unicode unaware scripts. It is not designed for
developers that actually need it and want to have code working in PHP6
and
PHP4/5.

But an old script ought to just work...

Unicode code points can be defined with \u, but PHP6 breaks existing
octal
and hex escape sequences.

If you're saying what I think you're saying, that's just daft...

Nobody [*] will switch to PHP 6 if I am interpreting these statements
correctly...

Nobody == even a slower adoption rate than the glacial PHP 5.

PHP6 is very noisy ("Notice: fwrite(): 13 character unicode buffer
downcoded for binary stream runtime_encoding", "Warning:
base64_encode()
expects parameter 1 to be strictly a binary string, Unicode string
given")
about data stream and string operations. even when fwrite() or
base64_encode() works only with plain ascii data. PHP script
developers
are not used to strict variable type checks in string functions. Which
functions are modified to require binary typecasting? Do I have to
make a
list myself every time some function freaks out?

Hopefully these are going away as the Unicode stuff is finished?...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Matt Wilmas — view source — reply

unread

Hi Richard,

----- Original Message -----
From: "Richard Lynch"
Sent: Thursday, July 05, 2007 10:43 PM

If unicode semantics are "on" what exactly is borked in PHP 5?

In Unicode mode [0-7]{1,3} and \x[0-9A-Fa-f]{1,2} refer to unicode
code
points and not to octal or hexadecimal byte values. Fix is not
backwards
compatible.

Gak.

You mean this will break:

<?php
$mask = 0xf0;
$value = $_POST['foo'] & $mask;
?>

because of Unicode?

That's nuts.

That can't be right...

No, that shouldn't break. $mask is an int, and the other operand with &
etc. would also be converted to int, so it should be the same whether
$_POST['foo'] is a binary string or Unicode.

And I don't understand the previous message about [0-7]{1,3} and
\x[0-9A-Fa-f]{1,2} (inside of strings, that means) referring to Unicode code
points. I think octal and hex escapes work the same in Unicode mode...

Scripts can't match bytes. How they are supposed to check if string is
in
plain ascii or in 8bit? Do conversion to ASCII and check for errors
instead of looking for 8bit byte values? How can scripts replace 8bit
bytes with some other strings? ISO-8859-2 decoding table contains 95
entries written and evaluated as binary strings. Same thing applies to
other iso-8859 and windows-125x character sets. iso-89859-1 and utf-8
decoding does not use mapping tables and performs complex calculations
with byte values. multibyte character set decoding might actually
benefit
from unicode_encode(), if Table 325 (http://www.php.net/unicode)
provides
more information about U_INVALID_SUBSTITUTE and other unicode.
settings.

I don't even understand this.

But if I haven't done something new-fangled to make a string be some
new-fangled Unicode thingie, then it's just plain old ASCII, no?

Or PHP can just assume that anyway...

No, that's basically the issue that this thread is about -- that when
unicode.semantics=On, even though you haven't done anything new-fangled
with Unicode, it IS Unicode regardless (unless binary strings are explicitly
used). That's how things may behave differently all of a sudden.

Did you see my message a couple weeks ago?:
http://marc.info/?l=php-dev&m=118234541809801&w=2 Seems to me it would be
great if any new Unicode stuff had to be explicitly specified, though
internally Unicode would always be there ready to use, regardless of a
setting, and old code would continue to work as before.

What do you think? I'd hoped for some replies about it, since I also have
some ideas about possible internals concerns...

[...]

PHP6 could introduce new Unicode aware functions, but Unicode
implementation choose to modify existing ones. All low level string
operations ($string[1]) are Unicode aware by default and not when
script
actually asks for it. Such implementation is designed for developers,
who
don't care about Unicode support and want it out of the box without
any
changes in their Unicode unaware scripts. It is not designed for
developers that actually need it and want to have code working in PHP6
and
PHP4/5.

But an old script ought to just work...

Again, not necessarily if the Unicode switch is on.

Matt

18 years ago by Stanislav Malyshev — view source — reply

unread

You mean this will break:

<?php
$mask = 0xf0;
$value = $_POST['foo'] & $mask;
?>

because of Unicode?

I'd say it won't do what it did before. Though I'm not sure bit
operations on unicode make any sense at all... The problem here is the
requirement conflict - how PHP can possibly know if $_POST['foo'] is a
bit field or unicode string?

But if I haven't done something new-fangled to make a string be some
new-fangled Unicode thingie, then it's just plain old ASCII, no?

Or PHP can just assume that anyway...

It can't if we want to keep UTF-16. UTF-16 unlike UTF-8 is not
compatible with ascii. We could have some "smart downgrade" attempt -
Python 2 currently does something like this - but it won't work in all
situations.

But an old script ought to just work...

Sometimes it's not possible - if you use the same variable as string and
bitfield, and bit representation of the string changes, it can't just
work anymore, something needs to be done to bring them together.

Unicode code points can be defined with \u, but PHP6 breaks existing
octal
and hex escape sequences.

I don't understand what this means...

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Tomas Kuliavas — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks existing
octal and hex escape sequences.

I don't understand what this means...

PHP6.0-200707060630

unicode.fallback_encoding => 'utf-8' => 'utf-8'
unicode.filesystem_encoding => no value => no value
unicode.http_input_encoding => 'utf-8' => 'utf-8'
unicode.output_encoding => 'utf-8' => 'utf-8'
unicode.runtime_encoding => 'utf-8' => 'utf-8'
unicode.script_encoding => 'utf-8' => 'utf-8'
unicode.semantics => On => On
unicode.stream_encoding => UTF-8 => UTF-8

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>

ą is in utf-8 (latin small letter a with ogonek, latin extended-a range).
It contains two bytes with 0xC4 0x85 values.

Expected result and actual result for php 5.2.0:

bool(true)
int(1)
int(1)

"/[\240-\377]/" range should match 0xC4 byte.

Actual result (PHP6):

bool(false)
int(0)
int(1)

18 years ago by Stanislav Malyshev — view source — reply

unread

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)

How you expect one-character string to be equal to two-character string?

ą is in utf-8 (latin small letter a with ogonek, latin extended-a range).
It contains two bytes with 0xC4 0x85 values.

It contains two bytes in the filesystem. It however contains one
character in PHP. In unicode mode, bytes and characters are different
things. You could make $string2 as binary and then convert it from utf-8
to unicode, but without explicitly saying otherwise that string contains
two characters - U+00C4 (LATIN CAPITAL LETTER A WITH DIAERESIS) and
U+0085 (control character, no name). It doesn't mean escape sequences
stop working, it means characters and bytes are no more the same. That's
the price one has to pay for doing unicode.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Tomas Kuliavas — view source — reply

unread

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)

How you expect one-character string to be equal to two-character string?

In PHP4/5 \xC4 and \x85 are not characters. They are bytes.

ą is in utf-8 (latin small letter a with ogonek, latin extended-a
range). It contains two bytes with 0xC4 0x85 values.

It contains two bytes in the filesystem. It however contains one
character in PHP. In unicode mode, bytes and characters are different
things. You could make $string2 as binary and then convert it from utf-8
to unicode, but without explicitly saying otherwise that string contains
two characters - U+00C4 (LATIN CAPITAL LETTER A WITH DIAERESIS) and
U+0085 (control character, no name). It doesn't mean escape sequences
stop working, it means characters and bytes are no more the same. That's
the price one has to pay for doing unicode.

I can't pay such price. You are reducing available coding options and want
me to rely on your functions when existing code was doing fine without
unicode support and your functions are not documented
(http://www.php.net/unicode) and don't provide the way to see the
difference between 7bit and 8bit string. Theoretically I might call
unicode_encode() with ascii target, but doing charset conversions just to
detect 8bit is a hack and not a solution.

If I take a look at ext/unicode/unicode.c, I see more PHP_FUNCTION
functions. I don't know PHP6 release schedule. If PHP6 is approaching RC
stage, maybe docs can be updated to inform about these functions. PHP
provides API for PHP scripts developers. Strongest API part is good
documentation. I shouldn't have to dig through C sources in order to learn
about available interpreter features. If you write code now and document
it later, you won't document it or it will take some time and lots of bug
reports to sync sources with manual.

I think I'll be able to port scripts to PHP6 unicode.semantics=on.
Currently I am not sure only about POP3 and IMAP streams with data encoded
in different character sets and MIME Q encoding.

--
Tomas

18 years ago by Stanislav Malyshev — view source — reply

unread

In PHP4/5 \xC4 and \x85 are not characters. They are bytes.

They are both. In PHP 5, character and byte is the same. In Unicode,
it's not.

I can't pay such price. You are reducing available coding options and want

Then you can't use Unicode, at least not directly - you would have to
convert all your unicode data back to bytes and work with them on that
level. Unicode works on character level, you want to work on byte level,
so somewhere on the way translation should happen. We will try to make
it easier, but I don't think it's reasonable to expect that code based
on this assumption would work without any changes whatsoever in php 6.

If I take a look at ext/unicode/unicode.c, I see more PHP_FUNCTION
functions. I don't know PHP6 release schedule. If PHP6 is approaching RC

ext/unicode as it is now is very incomplete. It will be improved quite
soon. I don't want to announce things prematurely, but please just have
a little patience and you'll see the improvement.

stage, maybe docs can be updated to inform about these functions. PHP
provides API for PHP scripts developers. Strongest API part is good
documentation. I shouldn't have to dig through C sources in order to learn
about available interpreter features. If you write code now and document
it later, you won't document it or it will take some time and lots of bug
reports to sync sources with manual.

Nobody expects you to dig through C sources, and of course documentation
is important. However the basic assumption of Unicode that characters
and bytes are not the same is something that I wouln't expect to change.
Of course, having docs that describe common unicode pitfalls and how to
work around them is very important too. I think once we are closer to
releasing it would become higher priority.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Andrei Zmievski — view source — reply

unread

Once again, you're trying to work with bytes inside Unicode strings,
which just does not make sense. What do you propose we do, somehow
automatically detect that you used \x inside a Unicode string and
turn it into a binary one? Or simply allow one to stick any byte
sequence inside what is supposed to be a valid UTF-16 string?

If you're trying to generate a UTF-8 string on a byte by byte basis,
then it needs to be a binary string, I'm sorry. Whether you do this
via being in unicode.semantics=off mode or via using b"" prefix is up
to you.

-Andrei

unicode.fallback_encoding => 'utf-8' => 'utf-8'
unicode.filesystem_encoding => no value => no value
unicode.http_input_encoding => 'utf-8' => 'utf-8'
unicode.output_encoding => 'utf-8' => 'utf-8'
unicode.runtime_encoding => 'utf-8' => 'utf-8'
unicode.script_encoding => 'utf-8' => 'utf-8'
unicode.semantics => On => On
unicode.stream_encoding => UTF-8 => UTF-8

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>

ą is in utf-8 (latin small letter a with ogonek, latin extended-a
range).
It contains two bytes with 0xC4 0x85 values.

Expected result and actual result for php 5.2.0:

bool(true)
int(1)
int(1)

"/[\240-\377]/" range should match 0xC4 byte.

Actual result (PHP6):

bool(false)
int(0)
int(1)

18 years ago by Richard Lynch — view source — reply

unread

Once again, you're trying to work with bytes inside Unicode strings,
which just does not make sense.

From our perspective, you've gone and changed a fundamental data
structure out from under us, in a non-backwards-compatible way, and
broken a whole bunch of working code, for a feature we don't use, and
can't turn off [*]

This is said without rancor nor animosity, but to explain why we (and
many users) are going to have a very high wtf factor with this.

Assuming a shared-host environment budget and external factors make
moving to a different host impossible.

I think the PHP core developers frequently forget that there are a LOT
of PHP developers/users out there with severe budget constraints that
just don't have the kinds of resources you are presuming are available
to "solve" the problems being created here...

I can always find a host who will do what I want with enough effort,
but a LOT of users will just give up on PHP 6 and stick with 5 (or 4
even) rather than do that...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Stanislav Malyshev — view source — reply

unread

From our perspective, you've gone and changed a fundamental data
structure out from under us, in a non-backwards-compatible way, and
broken a whole bunch of working code, for a feature we don't use, and
can't turn off [*]

Supporting unicode requires such change. It is a big deal - Unicode does
change the way one thinks about textual information. Text is not a
collection of 8-bit integers anymore. But this step needs to be made if
we want to be able to write applications that deal with modern
environments requiring multi-language and multi-locale support. So PHP 6
is to make this step.

I can always find a host who will do what I want with enough effort,
but a LOT of users will just give up on PHP 6 and stick with 5 (or 4
even) rather than do that...

Maybe. But we have unicode=off option to give them a chance for smoother
transition.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Richard Lynch — view source — reply

unread

You mean this will break:

<?php
$mask = 0xf0;
$value = $_POST['foo'] & $mask;
?>

because of Unicode?

I'd say it won't do what it did before. Though I'm not sure bit
operations on unicode make any sense at all... The problem here is the
requirement conflict - how PHP can possibly know if $_POST['foo'] is a
bit field or unicode string?

I'm starting to be quite concerned about PHP 6 Unicode, then...

Maybe strings should be UTF-8 until declared otherwise or something,
because this just won't fly...

As for how it knows?

I dunno. Aren't there headers to indicate what kind of data is coming
in?

Should there be?

If there aren't, or can't be, then you have to let ME tell you what it
is.

You can't just go assuming I've got UTF-16 data coming in --
especially not when the entire Internet has been built and subsisted
on ASCII (more or less) for over a decade.

But if I haven't done something new-fangled to make a string be some
new-fangled Unicode thingie, then it's just plain old ASCII, no?

Or PHP can just assume that anyway...

It can't if we want to keep UTF-16. UTF-16 unlike UTF-8 is not
compatible with ascii. We could have some "smart downgrade" attempt -
Python 2 currently does something like this - but it won't work in all
situations.

This is nuts.

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

But an old script ought to just work...

Sometimes it's not possible - if you use the same variable as string
and
bitfield, and bit representation of the string changes, it can't just
work anymore, something needs to be done to bring them together.

It's just an ASCII string, same as it's always been.

Don't go changing that out from under users for the zillion lines of
code already written.

If you need some new-fangled UTF-16 datatype stringie, then go ahead
and give yourself one.

But don't change all MY data to UTF-16 when it isn't UTF-16!!!

You've got 10 YEARS of legacy data built up being managed by billions
of scripts.

In what sane world do you suddenly declare all that data isn't ASCII
any more and claim that it's UTF-16 when UTF-16 isn't backwards
compatible with ASCII?

Unicode code points can be defined with \u, but PHP6 breaks
existing
octal
and hex escape sequences.

I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
$data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

Or maybe \xF0 will "work" but the octal \360 won't?

Yikes.

You think PHP 5 adoption rate was slow?

PHP 6 will be GLACIAL if you're changing that much out from under people.

Changing the definition of a string, arguably the most basic data type
in PHP, is not a Good Idea.

I'm sorry not to have spoken up earlier -- I simply failed to
understand what it was anybody was talking about before. :-(

Cripes, now I have to be the curmudgeon who won't let go of PHP 5. :-(

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Stanislav Malyshev — view source — reply

unread

Maybe strings should be UTF-8 until declared otherwise or something,
because this just won't fly...

UTF8 would not help you with bits (since nobody guarantees you incoming
data is valid UTF-8) and it's impossible to do any unicode stuff on
utf-8 - you'd have to convert it to utf-16 and back on every step.

I dunno. Aren't there headers to indicate what kind of data is coming
in?

I know of no headers that can tell you "parameter 'foo' in a form is a
bitmask so please do not try to see it as text".

If there aren't, or can't be, then you have to let ME tell you what it
is.

You can. Use binary strings and explicit conversions.

You can't just go assuming I've got UTF-16 data coming in --
especially not when the entire Internet has been built and subsisted
on ASCII (more or less) for over a decade.

Actually, there's INI parameter that says which encoding the incoming
data is in. The problem is not that - the problem is that PHP can't know
that you pass bit fields inside textual information (and in HTTP all
parameters are textual) so you have to work with it manually.

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

If they have no need for unicode, why run unicode-enabled PHP? Turn it
off and get all your strings untouched.

It's just an ASCII string, same as it's always been.

IS_STRING

If you need some new-fangled UTF-16 datatype stringie, then go ahead
and give yourself one.

IS_UNICODE

But don't change all MY data to UTF-16 when it isn't UTF-16!!!

Then you can't use unicode mode. Because in Unicode mode the text string
is UTF-16. If it's not a text string, you should tell so, PHP doesn't
have any way to know.

In what sane world do you suddenly declare all that data isn't ASCII
any more and claim that it's UTF-16 when UTF-16 isn't backwards
compatible with ASCII?

Python tried that. They are moving to model PHP 6 uses in Python 3. Must
be not that silly an idea, I guess.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't
matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of
course. Now how preg_match should handle it depends on preg_match.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Richard Lynch — view source — reply

unread

But now \xF0 isn't going to be ASCII 128 anymore, is it?

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't
matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of
course. Now how preg_match should handle it depends on preg_match.

I should have said "Extended ASCII".

And, unfortunately, there are at least 3 commonly-used "Extended
ASCII" out there, and, yes, this is exactly what Unicode is trying to
solve.

Only problem is, the data coming into most web apps is usually NOT
UTF-16, nor even UTF-8, but "Windows Extended ASCII" (more or less)
and most end users of PHP do not have the luxury of being able to have
a dedicated server.

So they are going to be stuck with their data getting totally munged
into UTF-16 on new PHP installations and, if I'm following this thread
correctly, NOT going to be able to get back to the actual data that
came IN to their web application.

So the ISPs aren't going to install PHP 6 because their users are
going to be screaming at them that it broke their applications.

Or they'll all install it with this goofy non-Unicode mode, in which
case, there's not much point to them having installed it, and y'all
will be effectively maintaining 3 branches:
PHP 5
PHP 6 ASCII
PHP 6 Unicode

Unless you drop PHP 6 ASCII, in which case even fewer will bother to
install PHP 6, not even in unicode.semantics off mode.

Seems to me we're painted into a corner where the number of people who
actually install PHP 6 is going to be abysmally small...

But maybe I'm just being pessimistic.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Tomas Kuliavas — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks
existing octal and hex escape sequences.

I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
$data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

\xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
decimal 240. It is 8bit.

Or maybe \xF0 will "work" but the octal \360 won't?

Are you sure that you can't do that by setting unicode.something_encoding
to iso-8859-1 or windows-1252?

--
Tomas

18 years ago by Richard Lynch — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks
existing octal and hex escape sequences.

I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
$data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

\xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
decimal 240. It is 8bit.

Don't tell me.

Tell Microsoft.

Cuz I sure as heck get a LOT of input data >> \x7f and I have to do
something reasonable with it...

And I did say "extended ASCII" in the other paragraph, after all...

Or maybe \xF0 will "work" but the octal \360 won't?

Are you sure that you can't do that by setting
unicode.something_encoding
to iso-8859-1 or windows-1252?

I dunno.

Doesn't really matter if I can't set those in .htaccess, that's for sure.

[joke type="semi"]
All this working going into Unicode, and nobody is pushing to replace
(CR|CRLF|LF) with a new Unicode all-platform newline character?
[/joke]

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Tomas Kuliavas — view source — reply

unread

Unicode code points can be defined with \u, but PHP6 breaks
existing octal and hex escape sequences.

I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
$data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

\xF0 never was ASCII. ASCII (ISO-646) is 7bit character set. \xF0 is
decimal 240. It is 8bit.

Don't tell me.

Tell Microsoft.

Cuz I sure as heck get a LOT of input data >> \x7f and I have to do
something reasonable with it...

And I did say "extended ASCII" in the other paragraph, after all...

Or maybe \xF0 will "work" but the octal \360 won't?

Are you sure that you can't do that by setting
unicode.something_encoding to iso-8859-1 or windows-1252?

I dunno.

Doesn't really matter if I can't set those in .htaccess, that's for sure.

All unicode. settings except unicode.semantics are PHP_INI_ALL.

From README.UNICODE

Script Encoding

...
If you cannot change the encoding system wide, you can use a pragma to
override the INI setting in a local script:

<?php declare(encoding = 'Shift-JIS'); ?>

--
Tomas

18 years ago by Alexey Zakhlestin — view source — reply

unread

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

I wonder whom do you mean here.
I can't remember many non-unicode internet-sites built during the last 5 years.

German, Spanish, Japanese, Russian…
Internet-shops have titles in these languages, communities have users
with nicknames (at least) in these languages, company-sites are
multiligual these days, etc.

ASCII is probably ok only for adult-sites (where people do not care
about texts) and some intranet-sites.

--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

18 years ago by Richard Lynch — view source — reply

unread

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

I wonder whom do you mean here.
I can't remember many non-unicode internet-sites built during the last
5 years.

Errrr.

Maybe you've only looked at really humungous corporate sites?

Cuz there are a few million sites in the past 5 years that wouldn't
know what to do with Unicode if it walked up and bit them...

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

What is the use of "unicode.semantics" in PHP 6?

To get late static binding and namespaces, of course ;)

--

Especially considering this:

Especially considering this:

--

--

I don't see how we are getting somewhere - as before, there are people for removing it and against removing it. Nothing changed, as far as I see. Why suddenly should we start removing anything?

In your eyes - fine. But besides your personal eyes, there is also such thing as consensus, and it wasn't achieved.

I don't see how we are getting somewhere - as before, there are people for removing it and against removing it. Nothing changed, as far as I see. Why suddenly should we start removing anything?

-- Lester Caine - G8HFL

--

Because it's very hard to implement since we'd have to keep 2 copies of all symbol tables.

--

--

mbstring is very, very far from unicode support. Look at ICU API description to see how far :)

It will be. I.e., most of ICU functionality will be implemented as an extension - collators, formatters, etc. etc.

Unless I totally missed what mess you are referring to...

Well, with 7-bit ASCII - yes. With 8-bit "extended ASCII", whatever that means - not exactly. You can have 8-bit strings that aren't valid UTF-8 and can't be translated to UTF-8 without specifying the encoding (iso-889-1 or something like that).

Robert

Would it also be worth checking some of the frameworks too? Prado, eZ, Zend?

I don't understand what this means...

--- test.php --- <?php $string1 = "ą"; $string2 = "\xC4\x85"; var_dump($string1 == $string2) var_dump(preg_match("/[\240-\377]/",$string1)); var_dump(preg_match("/[\240-\377]/",$string2)); ?>

Expected result and actual result for php 5.2.0:

bool(true) int(1) int(1)

Actual result (PHP6):

--- test.php --- <?php $string1 = "ą"; $string2 = "\xC4\x85"; var_dump($string1 == $string2) var_dump(preg_match("/[\240-\377]/",$string1)); var_dump(preg_match("/[\240-\377]/",$string2)); ?>

Expected result and actual result for php 5.2.0:

bool(true) int(1) int(1)

Actual result (PHP6):

Maybe. But we have unicode=off option to give them a chance for smoother transition.

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of course. Now how preg_match should handle it depends on preg_match.

From README.UNICODE

Script Encoding

<?php declare(encoding = 'Shift-JIS'); ?>

I don't see how we are getting somewhere - as before, there are people
for removing it and against removing it. Nothing changed, as far as I
see. Why suddenly should we start removing anything?

In your eyes - fine. But besides your personal eyes, there is also such
thing as consensus, and it wasn't achieved.

I don't see how we are getting somewhere - as before, there are
people for removing it and against removing it. Nothing changed, as
far as I see. Why suddenly should we start removing anything?

--
Lester Caine - G8HFL

Because it's very hard to implement since we'd have to keep 2 copies of
all symbol tables.

mbstring is very, very far from unicode support. Look at ICU API
description to see how far :)

It will be. I.e., most of ICU functionality will be implemented as an
extension - collators, formatters, etc. etc.

Well, with 7-bit ASCII - yes. With 8-bit "extended ASCII", whatever that
means - not exactly. You can have 8-bit strings that aren't valid UTF-8
and can't be translated to UTF-8 without specifying the encoding
(iso-889-1 or something like that).

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>

bool(true)
int(1)
int(1)

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>

bool(true)
int(1)
int(1)

Maybe. But we have unicode=off option to give them a chance for smoother
transition.

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't
matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of
course. Now how preg_match should handle it depends on preg_match.