Hi,
I agree that having such a switch is not going to be a good strategy. The main reason is the headache application authors are going to have with compatibility especially when it comes to hosted pre-configured environments and/or dedicated servers that run more than one application.
I think these are the main reasons and not maintenance costs because at least 80% if not more of the dual-code in PHP today will still be needed after removing the switch because we will still support both the binary and Unicode types. So for those who think this means we only support one data type in PHP extensions and we don't need to port 4000 PHP functions that is not the reality. We will still need to deal with IS_STRING and IS_UNICODE in the extensions.
As we have discussed in the past the migration path may be extremely hard moving from PHP 5 to PHP 6. Therefore the community has to come together and really invest in the migration path more than we have in the past (like we did from version 2 to 3). This means that during the development process we do port applications to PHP 6 and make sure to capture that process for our users. Preferably we can have automated migration scripts, php.ini with defaults that allows for easy migration and well documented migration steps.
Depending on how much pain we discover I think we should still agree to be open to the idea that Unicode strings would be explicit (e.g. u"foo") as opposed to the default (someone else mentioned it on this thread). Obviously this would depend on what we find when we work on the migration methodology as there's not enough input to make that judgment call today. Also we will see a lot of pain when it comes to anything which is coming in from the outside PHP so we may very well see changes in those areas as the breakages become more apparent (e.g. file system, GPC ...). Performance benchmarks may also come in handy as a data point for whether we go down the explicit or implicit route for Unicode strings.
I also think that one thing we are doing right which Perl didn't is the notion of PHP 5.3 where we are able to more incrementally release a subset of the major features. This reduces the amount of moving parts, the migration path and ultimately the risk for delivering a high quality PHP 6. Also users get incremental value while we work on stabilizing PHP 6 and deal with issues like the migration path.
Net, net - I agree that the unicode semantics switch is going to bring along more harm than good in the long term. Let's just not fool ourselves with what we have to get done in order to make PHP 6 a success. Hopefully a lot of the new active people on this list can step up and help the rest of the core team both with porting functions to PHP 6 and also help with the methodology.
Andi
Hi Andi,
As we have discussed in the past the migration path may be extremely hard
moving from PHP 5 to PHP 6. Therefore the community has to come together
and really invest in the migration path more than we have in the past
(like we did from version 2 to 3). This means that during the development
process we do port applications to PHP 6 and make sure to capture that
process for our users. Preferably we can have automated migration scripts,
php.ini with defaults that allows for easy migration and well documented
migration steps.
We're still getting people through the idea of switching to PHP 5 from where
I stand. There is plenty of time to think about stuff and etc. for PHP 6.
Depending on how much pain we discover I think we should still agree to be
open to the idea that Unicode strings would be explicit (e.g. u"foo") as
opposed to the default (someone else mentioned it on this thread).
'Unicode strings would be explicit' is one thing, a Unicode mode that messes
up existing code is quite another. So you're looking at keeping the support
dual but changing the userland approach to it, did I hear you right?
Obviously this would depend on what we find when we work on the migration
methodology as there's not enough input to make that judgment call today.
Also we will see a lot of pain when it comes to anything which is coming
in from the outside PHP so we may very well see changes in those areas as
the breakages become more apparent (e.g. file system, GPC ...).
Add to that 'Apache, ICU'. Speaking just from the doze build perspective,
we're caught between a rock and a hard place there.
<deleted weird stuff />Performance benchmarks may also come in handy as a data point for whether
we go down the explicit or implicit route for Unicode strings.
Net, net - I agree that the unicode semantics switch is going to bring
along more harm than good in the long term. Let's just not fool ourselves
with what we have to get done in order to make PHP 6 a success. Hopefully
a lot of the new active people on this list can step up and help the rest
of the core team both with porting functions to PHP 6 and also help with
the methodology.
So can it go away now please? It's the switch everybody definitely wants to
lose, not the code itself. The code itself, the dev team probably won't want
but the rest just might. (Opinion.) And as you mentioned earlier... until
people try it out a bit, nobody can draw any sane conclusions with regard to
its usefulness anyway. The biggest point is that The Switch actually has
been tested and found wanting, even at this early stage. Read: it's too
sudden and too soon for it to be implicit. (Opinion. Oh that's two. Guess
I'm opinionated then. Or does that require eight?)
- Steph
Andi
'Unicode strings would be explicit' is one thing, a Unicode mode that
messes up existing code is quite another. So you're looking at keeping
the support dual but changing the userland approach to it, did I hear
you right?
I think the idea was "no php.ini switch, but the question what "foo"
should produce - IS_UNICODE or IS_STRING is still open for consideration".
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
I think the idea was "no php.ini switch, but the question what "foo"
should produce - IS_UNICODE or IS_STRING is still open for consideration".
"foo" alone should produce IS_STRING. The real question IMHO is how far back
do you backport tolerance for a unicode cast.
- Steph
I see I may not have been clear in my previous email.
Indeed as Stas mentioned I agree we should not have a php.ini switch, i.e. unicode.semantics goes away.
At the same time I propose:
a) We invest considerable energy in figuring out and documenting the migration path.
b) We build automated migration scripts when possible (like we did in PHP/FI 2 -> PHP 3 ) to make the migration easier.
c) We keep an open mind regarding some areas which could make it easier to adopt especially when the net benefits only affect a smaller audience or audiences which already today are used to working harder with various languages (e.g. fgets()
returning IS_STRING by default, "foo" produces IS_STRING, and u"foo" produces IS_UNICODE, ...). These are just examples and I think reality will depend on (a).
I think the current core development team will definitely need help from all the lurkers on this list. There seems to be plenty of energy for long email threads so let's convert that energy into some productive contributions around function upgrades and migration path :)
I don't think this affects PHP 5.3 (http://wiki.pooteeweet.org/PhP53VoteResult) which I believe we're making good progress on. It allows us to get some of those features out earlier including things like namespaces which the various framework communities badly need. It will also allow many users who need ICU functionality to start using APIs which will be forward compatible with PHP 6.
Andi
-----Original Message-----
From: Steph Fox [mailto:steph@zend.com]
Sent: Monday, January 21, 2008 5:30 PM
To: Stas Malyshev
Cc: Andi Gutmans; Antony Dovgal; internals@lists.php.net
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPI think the idea was "no php.ini switch, but the question what "foo"
should produce - IS_UNICODE or IS_STRING is still open for
consideration"."foo" alone should produce IS_STRING. The real question IMHO is how far
back
do you backport tolerance for a unicode cast.
- Steph
I see I may not have been clear in my previous email.
Indeed as Stas mentioned I agree we should not have a php.ini
switch, i.e. unicode.semantics goes away.At the same time I propose:
a) We invest considerable energy in figuring out and documenting
the migration path.
b) We build automated migration scripts when possible (like we did
in PHP/FI 2 -> PHP 3 ) to make the migration easier.
c) We keep an open mind regarding some areas which could make it
easier to adopt especially when the net benefits only affect a
smaller audience or audiences which already today are used to
working harder with various languages (e.g.fgets()
returning
IS_STRING by default, "foo" produces IS_STRING, and u"foo" produces
IS_UNICODE, ...). These are just examples and I think reality will
depend on (a).
a) - Absolutely
b) - Absolutely
c) - I really, really, really, really x 100 do not believe that
binary strings should be the default and that Unicode strings need
explicit syntax. I've written about this many times before, so I
won't reiterate, but please let's be progressive in this area. Even
Python is moving towards Unicode strings being the default.
-Andrei
See below:
-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Monday, January 21, 2008 8:23 PM
To: Andi Gutmans
Cc: Steph Fox; Stas Malyshev; Antony Dovgal; internals@lists.php.net
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPI see I may not have been clear in my previous email.
Indeed as Stas mentioned I agree we should not have a php.ini
switch, i.e. unicode.semantics goes away.At the same time I propose:
a) We invest considerable energy in figuring out and documenting
the migration path.
b) We build automated migration scripts when possible (like we did
in PHP/FI 2 -> PHP 3 ) to make the migration easier.
c) We keep an open mind regarding some areas which could make it
easier to adopt especially when the net benefits only affect a
smaller audience or audiences which already today are used to
working harder with various languages (e.g.fgets()
returning
IS_STRING by default, "foo" produces IS_STRING, and u"foo" produces
IS_UNICODE, ...). These are just examples and I think reality will
depend on (a).a) - Absolutely
b) - Absolutely
c) - I really, really, really, really x 100 do not believe that
binary strings should be the default and that Unicode strings need
explicit syntax. I've written about this many times before, so I
won't reiterate, but please let's be progressive in this area. Even
Python is moving towards Unicode strings being the default.
Hi Andrei,
I understand there are also pros to this approach which is why I think
this is so closely dependent on (a) and (b). For now we can leave it
like this and strive to make it work, I am just noting that we should
keep an open mind.
By the way, I especially met with Guido van Rossum to discuss this and
also heard Python's story from the source. They are still in the process
of actually going down this route so it's still early to tell what'll
happen.
Andi
I don't think this affects PHP 5.3 (http://wiki.pooteeweet.org/PhP53VoteResult
) which I believe we're making good progress on. It allows us to get
some of those features out earlier including things like namespaces
which the various framework communities badly need. It will also
allow many users who need ICU functionality to start using APIs
which will be forward compatible with PHP 6.
Yeah, I really think our focus should be to put as many non BC
breaking features into PHP 5.3 that were originally targeted for 6.0.
This seems like the best thing for our users, even if that means
people will be slower in migrating to PHP6.
@Johannes: Are we ready to start building some momentum around 5.3
(and thereby indirectly towards 6.0)?
regards,
Lukas
Andi Gutmans skrev:
I think the current core development team will definitely need help from all the lurkers on this list.
Abstract:
From the view of someone whose main job with PHP is teaching it to
absolute beginners, with little or no previous programming experience,
and who are relatively unaware of encoding issues as well...
And who is Swedish, where we daily use å, ä and ö. And who has students
immigrated from all over Europe as well as from Asia (think French and
Spanish cedillas, Cyrillic, Arabic, Kurdish and Hindi...)
PHP 5.3 Invest as much as possible in it!
PHP 6.0 Release with no switch. Default either way, but no switch, IMHO!
Maybe, just maybe, release 6.0 defaulting to ISO as a transitional
release, and then 6.1 defaulting to Unicode...
Argument:
(Having read every single mail in this thread as a lurker...)
There are benefits to both sides. No switch makes for a more homogeneous
environment, but defaulting to Unicode will be troublesome as well.
However, teaching three settings (PHP 5, 6/off, 6/on) will be the worst
nightmare: "This is a string, BTW, it might be Unicode, it might not be... "
Changing default from ISO to Unicode in a point release is a much easier
concept to teach, than having two settings in php.ini. Not to mention my
nightmare of having students working at school on the server I control,
which might have a different setting from the server they experimentally
have set up at home, or from a server on a web hotel they are using,
because someone they know, or someone who answered in a forum
recommended it...
Besides teaching I also have my own libraries. My main problem with
those will not be maintaining different versions. I will chose to work
only with web hotels that have the setting I prefers.
I also work on a Swedish language project that currently sits (and I
have no say in this) on a patched PHP 4.1.7 server. However, upgrading
that project to Unicode is a much smaller issue than upgrading to the
object model in PHP 5.
http://www.youtube.com/watch?v=juOQhTuzDQ0
</rant>
Lars Gunther
Who is also against MS for inserting a stupid metatag switch in IE8.
I'd never thought that I in a single week would disagree with both
Jeffrey Zeldman and Rasmus Leerdorf! What is the world coming to?