6 reasons why we must to get rid of The Switch ASAP
-
it gives users false sense of "compatibility" when no compatibility is even planned;
-
it's supposed to mean compatibility, but can be changed only in php.ini, which
means users would still have to maintain 2 versions of their software:
one for On and second for Off. -
2+ bigger codebase [1] (with lots of duplicates because we have to do
same things in native and unicode modes); -
increases the maintenance costs a lot [2];
-
this is yet another reincarnation of ze1_compatibility switch.
I believe most of the people here agree it was a total failure - untested, unneeded and,
most important, not working thing that complicates user's and developer's lives.
Those who want compatibility may and will stay with PHP5 forever,
those who need Unicode support will use PHP6. -
we need to remove the switch ASAP and make PHP6 Unicode-only before people spend
their time doing useless "compatibility ports" of their applications.
[1] http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?revision=1.664&view=markup
Don't click this link if you want to sleep well today.
[2] Here is a typical problem: http://bugs.php.net/bug.php?id=42861
Try to fix it without looking at the solution and you'll see what I mean.
--
Wbr,
Antony Dovgal
6 reasons why we must to get rid of The Switch ASAP
Couldn't agree more!
Regards
Marco
- it's supposed to mean compatibility, but can be changed only in
php.ini, which
means users would still have to maintain 2 versions of their software:
one for On and second for Off.
I think this is the biggest issue for anyone writing software is the
fact that is can only be changed in php.ini — it may well be fine if
it can be set on a per request basis (though there will still be
issues with that (software libraries that have to cope with both types
of request, for example)).
- 2+ bigger codebase [1] (with lots of duplicates because we have
to do
same things in native and unicode modes);
From the cross-reference I assume you mean PHP's codebase. We still
need binary string support — Unicode is only suitable for textual
content. Images, for example, are binary data and we still need binary
strings for them. Not everything people deal with in PHP is a textual
string.
--
Geoffrey Sneddon
<http://gsnedders.com/
Hello Antony,
+1 + thanks, it is simply a ppain in th eass to develop with
- It alone is responsible for at least 10% slowdown.
marcus
Monday, January 21, 2008, 3:38:00 PM, you wrote:
6 reasons why we must to get rid of The Switch ASAP
- it gives users false sense of "compatibility" when no compatibility is even planned;
- it's supposed to mean compatibility, but can be changed only in php.ini, which
means users would still have to maintain 2 versions of their software:
one for On and second for Off.
- 2+ bigger codebase [1] (with lots of duplicates because we have to do
same things in native and unicode modes);
- increases the maintenance costs a lot [2];
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.
I believe most of the people here agree it was a total failure - untested, unneeded and,
most important, not working thing that complicates user's and developer's lives.
Those who want compatibility may and will stay with PHP5 forever,
those who need Unicode support will use PHP6.
- we need to remove the switch ASAP and make PHP6 Unicode-only before people spend
their time doing useless "compatibility ports" of their applications.
[1]
http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?revision=1.664&view=markup
Don't click this link if you want to sleep well today.
[2] Here is a typical problem: http://bugs.php.net/bug.php?id=42861
Try to fix it without looking at the solution and you'll see what I mean.
--
Wbr,
Antony Dovgal
Best regards,
Marcus
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.
And if you don't control PHP version used by end user? Only bad in-house
apps are written for one specific PHP version and setup.
--
Tomas
Hello Tomas,
you're point being? Without the requested change here you would have one
more version, resulting in PHP 5., PHP 6.-unicode, PHP6.*-native.
marcus
Monday, January 21, 2008, 6:22:32 PM, you wrote:
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.
And if you don't control PHP version used by end user? Only bad in-house
apps are written for one specific PHP version and setup.
--
Tomas
Best regards,
Marcus
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would
simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.And if you don't control PHP version used by end user? Only bad in-house
apps are written for one specific PHP version and setup.you're point being? Without the requested change here you would have one
more version, resulting in PHP 5., PHP 6.-unicode, PHP6.*-native.
there is only a little difference between php5 and php6 with
unicode.semantics off. php6 with unicode.semantics on design is broken. It
replaces standard functions that worked same way through all PHP4-5
versions and forces use of code that is totally backwards incompatible.
binary and unicode typecasting triggers E_PARSE
errors in older PHP
versions. I can't mix PHP6 and older PHP code in one script or library or
function.
PHP introduced changes similar to unicode.semantics=on with mbstring
function overloading. When I learned about it, I've stopped trusting ereg
and string functions. With mbstring overloading I still have options to
disable broken design. With unicode semantics I am forced to use features
provided by interpreter instead doing things the way I want and having
better controls over script.
PHP with unicode.semantics on is more suitable for novice developers who
are not familiar with character sets and lazy developers, who want their
PHP4-5 code to become Unicode aware without any changes on their side. If
PHP4-5 code works with multiple charsets and 8bit data, it will break in
PHP6.
I don't care if you remove this setting. I'll find the way to make my code
work. but don't expect me to remain silent if you say that it is a good
thing. It is good for PHP codebase. It is not good for portable PHP script
developers. Removal of setting forces developers to drop 5.2.0 and older
versions or to maintain two library versions. If setting remains,
developers can ask to turn it off. PHP_INI_SYSTEM is php.ini and
httpd.conf.
--
Tomas
Tomas Kuliavas wrote:
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would
simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.And if you don't control PHP version used by end user? Only bad in-house
apps are written for one specific PHP version and setup.you're point being? Without the requested change here you would have one
more version, resulting in PHP 5., PHP 6.-unicode, PHP6.*-native.there is only a little difference between php5 and php6 with
unicode.semantics off. php6 with unicode.semantics on design is broken. It
replaces standard functions that worked same way through all PHP4-5
versions and forces use of code that is totally backwards incompatible.
binary and unicode typecasting triggersE_PARSE
errors in older PHP
versions. I can't mix PHP6 and older PHP code in one script or library or
function.PHP introduced changes similar to unicode.semantics=on with mbstring
function overloading. When I learned about it, I've stopped trusting ereg
and string functions. With mbstring overloading I still have options to
disable broken design. With unicode semantics I am forced to use features
provided by interpreter instead doing things the way I want and having
better controls over script.PHP with unicode.semantics on is more suitable for novice developers who
are not familiar with character sets and lazy developers, who want their
PHP4-5 code to become Unicode aware without any changes on their side. If
PHP4-5 code works with multiple charsets and 8bit data, it will break in
PHP6.I don't care if you remove this setting. I'll find the way to make my code
work. but don't expect me to remain silent if you say that it is a good
thing. It is good for PHP codebase. It is not good for portable PHP script
developers. Removal of setting forces developers to drop 5.2.0 and older
versions or to maintain two library versions. If setting remains,
developers can ask to turn it off. PHP_INI_SYSTEM is php.ini and
httpd.conf.
And most people on shared servers don't have access to httpd.conf. As
long as it's not PHP_INI_PERDIR, unicode.semantics will never be an
acceptable solution. In my opinion, even if it were PERDIR, it still
wouldn't be an acceptable solution as you'll still have portability
problems, either way.
--
Jeremy Privett
C.E.O. & C.S.A.
Omega Vortex Corporation
Web: http://www.omegavortex.net
E-Mail: jeremy@omegavortex.net
Please note: This message has been sent with information that could be confidential and meant only for the intended recipient. If you are not the intended recipient, please delete all copies and inform us of the error as soon as possible. Thank you for your cooperation.
- this is yet another reincarnation of ze1_compatibility switch.
Which is the worst mistake ever imo - If you wanted PHP 4 you would
simply
use PHP 4. Now if you want PHP 5 just damn use PHP 5.And if you don't control PHP version used by end user? Only bad in-
house
apps are written for one specific PHP version and setup.you're point being? Without the requested change here you would
have one
more version, resulting in PHP 5., PHP 6.-unicode, PHP6.*-native.there is only a little difference between php5 and php6 with
unicode.semantics off. php6 with unicode.semantics on design is
broken. It
replaces standard functions that worked same way through all PHP4-5
versions and forces use of code that is totally backwards
incompatible.
I wholly agree that this is idiotic — I'd much rather see an approach
to Unicode strings like there is in Python, namely that any string is
binary data unless explicitly prefixed by u (e.g., gettype('a') ==
'binary' && gettype(u'a') == 'unicode'). We already have support for
similar syntax in the form of b'…'. This alternative, unlike the
current situation, allows code written for PHP < 6 to run on PHP 6
without issue.
binary and unicode typecasting triggers
E_PARSE
errors in older PHP
versions. I can't mix PHP6 and older PHP code in one script or
library or
function.
(binary) is fine on PHP 5.2.1 and above, though it's still a far from
ideal situation (as it relies on unicode.runtime_encoding unlike
b'…'). It is most certainly possible to get code working on both PHP 5
and PHP 6, but it requires a lot of verbose code (that can, on the
upside, all be abstracted into unicode/binary classes that each deal
with a different string type (which also gives an excuse for a PHP 5
userland implementation of unicode, so you can rely on unicode support
on both PHP 5 and PHP 6)).
PHP introduced changes similar to unicode.semantics=on with mbstring
function overloading. When I learned about it, I've stopped trusting
ereg
and string functions. With mbstring overloading I still have options
to
disable broken design. With unicode semantics I am forced to use
features
provided by interpreter instead doing things the way I want and having
better controls over script.
I already refuse to support mbstring.func_overload as there is
absolutely no way to get back to dealing with binary strings. If PHP 6
continues on how it is I may well end up not supporting it either.
If PHP 6 breaks backwards compatibility I'd rather it did so far more
sweepingly, without options to give the allusion of it being
compatible with previous versions.
--
Geoffrey Sneddon
<http://gsnedders.com/
6 reasons why we must to get rid of The Switch ASAP
Amen!
Derick
Zitat von Antony Dovgal tony@daylessday.org:
6 reasons why we must to get rid of The Switch ASAP
Having maintained a huge Unicode compatible codebase in PHP4 for the
last few years, I know which PITA it already is today, having to
consider the availability of mbstring and iconv, or dealing with
different input, processing, backend, and output charsets. I don't
event want to start thinking about adding unicode semantics to that
equation. Drop it.
Jan.
--
Do you need professional PHP or Horde consulting?
http://horde.org/consulting/
Antony Dovgal wrote:
6 reasons why we must to get rid of The Switch ASAP
it gives users false sense of "compatibility" when no compatibility is even planned;
it's supposed to mean compatibility, but can be changed only in php.ini, which
means users would still have to maintain 2 versions of their software:
one for On and second for Off.
+1, I'm not looking forward to implementing support for both values in
MediaWiki.
As for PHP 6 generally: there needs to be a solid migration path, such
as forwards-compatible syntax introduced to PHP 5. MediaWiki has
extensive support for unicode in PHP 5, including a pure PHP
implementation of NFC, cross-script and confusable character checks,
extensive parsing of UTF-8 text using regexes both with and without /u,
and megabytes of localisations in the form of PHP source files with
UTF-8 string literals.
Porting all this to a UTF-16-based environment would be a hassle, and we
don't gain anything from it in terms of features for our users. I'd hate
to end up in an adversarial situation, where developers working in PHP
are forced to boycott or fork PHP 6. That's why a simple migration path
is important.
-- Tim Starling
As for PHP 6 generally: there needs to be a solid migration path,
such as forwards-compatible syntax introduced to PHP 5. MediaWiki
has extensive support for unicode in PHP 5, including a pure PHP
implementation of NFC, cross-script and confusable character
checks, extensive parsing of UTF-8 text using regexes both with and
without /u, and megabytes of localisations in the form of PHP
source files with UTF-8 string literals.Porting all this to a UTF-16-based environment would be a hassle,
and we don't gain anything from it in terms of features for our
users. I'd hate to end up in an adversarial situation, where
developers working in PHP are forced to boycott or fork PHP 6.
That's why a simple migration path is important.
I don't think that "porting to a UTF-16 environment" in your case is
that hard at all. UTF-8 source files will work transparently with
proper script encoding setting, PCRE regexes work the same way, and
you could replace your own implementations of NFC, etc with PHP
provided ones.
-Andrei
Without repeating too much of what has already been said, phpBB3 runs
with its own normalizer (NF[CD]K?) and a full implementation of case
folding along with all sorts of other goodies. For us, it would be best
if semantics were off. Then we could trivially determine whether or not
we should use functions built into PHP or if we should use functions we
have written. unicode.semantics saves us nothing...
2008/1/21, Antony Dovgal tony@daylessday.org:
- we need to remove the switch ASAP
Yes :) I urge you to do this, the introduction of this setting is
probably the worst design mistake in PHP history after safe_mode and
register_globals .
Please withdrawn this insanity before it is too late, if you dont, it
will probably not work correctly anyway..
Deeply concerned,
Cristian.
6 reasons why we must to get rid of The Switch ASAP
I was +1 months ago, I'm still +1 now :)
--
Pierre
http://blog.thepimp.net | http://www.libgd.org
6 reasons why we must to get rid of The Switch ASAP
I was +1 months ago, I'm still +1 now :)
I'll throw in my +1 too. That's right, I'm still alive! :)
-- Gwynne, Daughter of the Code
"This whole world is an asylum for the incurable."
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...
That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...
I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.
Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.
Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.
I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
I don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad design
decision.
My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to always
jump through hoops.
If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case for
years. All major applications out there have implemented various hacks
to deal with these issues, some with more success than others.
This is what PHP does. We take common Web development pains and try to
reduce them. Think back to the pains of XML parsing in PHP 3 and even
in PHP 4 compared to today.
Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite ready
for that yet. But it sounds like the world isn't ready for the switch
either. Without it, I am afraid we will never get there, and that may
just be something we have to live with.
-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
I don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.
Would the world really end for people who write NEW apps in a NEW
version of PHP, #6, if they had to put u"foo" to get their nifty
new-fangled Unicode strings?...
Surely that is better than making a BC break of gigantic proportions
for the unwashed masses that don't know a charset from a croquette and
having NOBODY move to PHP 6 except a handful of large corporations...
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
Richard Lynch wrote:
I don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.Would the world really end for people who write NEW apps in a NEW
version of PHP, #6, if they had to put u"foo" to get their nifty
new-fangled Unicode strings?...Surely that is better than making a BC break of gigantic proportions
for the unwashed masses that don't know a charset from a croquette and
having NOBODY move to PHP 6 except a handful of large corporations...
Like I said, without the unicode semantics switch, we can't make unicode
strings default for BC reasons. The switch was there to allow not just
large corporations, but also smaller companies and projects not
restricted by portability or BC concerns to build stuff from the ground
up entirely in Unicode. u"foo" is a hack that will eventually disappear
from the various languages that have it or something similar. 10 years
from now I doubt anybody could even imagine that you could have a string
that didn't carry its character set with it. Unfortunately 10 years
ago, I wasn't very concerned about that.
-Rasmus
Richard Lynch wrote:
I don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the
Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.Would the world really end for people who write NEW apps in a NEW
version of PHP, #6, if they had to put u"foo" to get their nifty
new-fangled Unicode strings?...Surely that is better than making a BC break of gigantic proportions
for the unwashed masses that don't know a charset from a croquette
and
having NOBODY move to PHP 6 except a handful of large
corporations...Like I said, without the unicode semantics switch, we can't make
unicode
strings default for BC reasons. The switch was there to allow not
just
large corporations, but also smaller companies and projects not
restricted by portability or BC concerns to build stuff from the
ground
up entirely in Unicode. u"foo" is a hack that will eventually
disappear
from the various languages that have it or something similar. 10
years
from now I doubt anybody could even imagine that you could have a
string
that didn't carry its character set with it. Unfortunately 10 years
ago, I wasn't very concerned about that.
Does the switch just change u"foo" to "foo" and "foo" to b"foo" or
vice-versa?
Or could it be MADE that simple instead of whatever it currently does?...
Or maybe it is that simple on the surface, but causes all manner of
grief under the hood?
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
u"foo" is a hack that will eventually disappear from the various
languages that have it or something similar.
I think we need to have binary strings as default with u"…" for a
while (whenever that gets merged into the default string type it is
probably the real time to break everything without too much regard for
backwards compatibility at all — I don't think this time has come yet,
though), without any switch to change that.
To note: Python introduced support for Unicode in 2.0 (released 2000),
and Unicode strings are the default in 3.0 (as of 2007-12-07 alpha 2)
— the current plan is to release 3.0 final in mid-2008; this is eight
years between adding Unicode support and it becoming the default —
following this precedent would result in Unicode becoming default in
PHP at the very earliest of 2016, though I think quite such a long
delay in the case of PHP wouldn't be the best for the language, but I
doubt we can do it much quicker than around four years (without the
benefits outweighing the issues it will cause).
Unfortunately 10 years ago, I wasn't very concerned about that.
10 years ago, not all too many people were with scripting languages,
which is part of the reason why (scripting) languages with native
Unicode support are only just starting to real take off, as they will,
even though we will have Unicode support in PHP 6, have more mature
implementations than we will.
--
Geoffrey Sneddon
<http://gsnedders.com/
Hi Rasmus, Chris,
I agree with you which is why I suggested to not have a switch but to
make the default string binary and require u"foo" for Unicode strings.
It supports the existing community incl. hosters and as Chris and you
pointed out, the broad community of non-"high class" developers to who
we owe PHP's success.
As you rightfully pointed out the broader world isn't ready for it yet
and we have to evolve at the same pace. I think that going down that
route incrementally is exactly what's going to support that need.
Let's face it, the people who are struggling today will have an immense
relief when they get native Unicode capabilities in PHP 6 including a
large amount of ICUs functionality. The u"foo" isn't what's going to
take that away and PHP 6 will likely lead the pack in many regards when
it comes to Unicode support. For the people who don't care today and
will have to care tomorrow, they get to move to PHP 6 without much pain,
they continue to benefit from their existing code and performance
characteristics, and as they slowly evolve and find out that they do
need to deal with it, it's there and readily available to them.
Andi
-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Wednesday, January 23, 2008 11:29 AM
To: Chris Stockton
Cc: php-dev
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared
for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to
always
jump through hoops.If we get rid of the switch, then I agree that we can't make the
default
string IS_UNICODE. We would be crippling the implementation and
taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case
for
years. All major applications out there have implemented various
hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and try
to
reduce them. Think back to the pains of XML parsing in PHP 3 and even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite
ready
for that yet. But it sounds like the world isn't ready for the switch
either. Without it, I am afraid we will never get there, and that may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's
funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in
a
shared hosting environment, and the ease of use for beginners, and
the
oodles of fairly poor quality code that is easy to copy and paste
onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to
dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement
my
"high class" development methodologies. I don't mind, I do it
already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to
just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in
a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
How about allowing b"foo" in 5.3 (so people can start migrating their
code early) and making unicode strings default in PHP7? :D
David
Am 23.01.2008 um 22:30 schrieb Andi Gutmans:
Hi Rasmus, Chris,
I agree with you which is why I suggested to not have a switch but to
make the default string binary and require u"foo" for Unicode strings.
It supports the existing community incl. hosters and as Chris and you
pointed out, the broad community of non-"high class" developers to who
we owe PHP's success.
As you rightfully pointed out the broader world isn't ready for it yet
and we have to evolve at the same pace. I think that going down that
route incrementally is exactly what's going to support that need.Let's face it, the people who are struggling today will have an
immense
relief when they get native Unicode capabilities in PHP 6 including a
large amount of ICUs functionality. The u"foo" isn't what's going to
take that away and PHP 6 will likely lead the pack in many regards
when
it comes to Unicode support. For the people who don't care today and
will have to care tomorrow, they get to move to PHP 6 without much
pain,
they continue to benefit from their existing code and performance
characteristics, and as they slowly evolve and find out that they do
need to deal with it, it's there and readily available to them.Andi
-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Wednesday, January 23, 2008 11:29 AM
To: Chris Stockton
Cc: php-dev
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics
switch
ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the
Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared
for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to
always
jump through hoops.If we get rid of the switch, then I agree that we can't make the
default
string IS_UNICODE. We would be crippling the implementation and
taking
a step backwards in terms of leading the way in Unicode adoption.
The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case
for
years. All major applications out there have implemented various
hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and try
to
reduce them. Think back to the pains of XML parsing in PHP 3 and
even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite
ready
for that yet. But it sounds like the world isn't ready for the
switch
either. Without it, I am afraid we will never get there, and that
may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's
funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in
a
shared hosting environment, and the ease of use for beginners, and
the
oodles of fairly poor quality code that is easy to copy and paste
onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to
dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement
my
"high class" development methodologies. I don't mind, I do it
already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to
just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in
a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
It seems we're only talking about literals here. What about the rest of
the places where unicode.semantics switch matters right now, like
streams (works in binary or unicode mode), incoming request decoding,
etc? It would be a shame to go back to binary by default mode and have
to jump through more hoops to get proper Unicode support.
Ugh.
-Andrei
Andi Gutmans wrote:
Hi Rasmus, Chris,
I agree with you which is why I suggested to not have a switch but to
make the default string binary and require u"foo" for Unicode strings.
It supports the existing community incl. hosters and as Chris and you
pointed out, the broad community of non-"high class" developers to who
we owe PHP's success.
As you rightfully pointed out the broader world isn't ready for it yet
and we have to evolve at the same pace. I think that going down that
route incrementally is exactly what's going to support that need.Let's face it, the people who are struggling today will have an immense
relief when they get native Unicode capabilities in PHP 6 including a
large amount of ICUs functionality. The u"foo" isn't what's going to
take that away and PHP 6 will likely lead the pack in many regards when
it comes to Unicode support. For the people who don't care today and
will have to care tomorrow, they get to move to PHP 6 without much pain,
they continue to benefit from their existing code and performance
characteristics, and as they slowly evolve and find out that they do
need to deal with it, it's there and readily available to them.Andi
-----Original Message-----
From: Rasmus Lerdorf [mailto:rasmus@lerdorf.com]
Sent: Wednesday, January 23, 2008 11:29 AM
To: Chris Stockton
Cc: php-dev
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared
for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to
always
jump through hoops.If we get rid of the switch, then I agree that we can't make the
default
string IS_UNICODE. We would be crippling the implementation and
taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case
for
years. All major applications out there have implemented various
hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and try
to
reduce them. Think back to the pains of XML parsing in PHP 3 and even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite
ready
for that yet. But it sounds like the world isn't ready for the switch
either. Without it, I am afraid we will never get there, and that may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's
funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in
a
shared hosting environment, and the ease of use for beginners, and
the
oodles of fairly poor quality code that is easy to copy and paste
onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to
dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement
my
"high class" development methodologies. I don't mind, I do it
already.
I write this as a "high class" developer.-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to
just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in
a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
It seems we're only talking about literals here. What about the rest
of
the places where unicode.semantics switch matters right now, like
streams (works in binary or unicode mode), incoming request decoding,
etc? It would be a shame to go back to binary by default mode and have
to jump through more hoops to get proper Unicode support.
You tell us.
What's going to break in age-old PHP scripts?
Ugh.
BC break == Ugh.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
Blimey. I agree with Rasmus. That's twice now!
I think PHP 6 should be an interim period with support for both scenarios,
but with the default being bog-standard as-we-know-it IS_STRING and anything
IS_UNICODE needing to be marked.
Perhaps PHP 7 can drop the IS_STRING stuff and have it all IS_UNICODE, by
removing the need to mark unicode text and taking it all that way. I think
doing this in PHP 6 will make for a white elephant situation (and we like
purple-blue, no?)
- Steph
----- Original Message -----
From: "Rasmus Lerdorf" rasmus@lerdorf.com
To: "Chris Stockton" chrisstocktonaz@gmail.com
Cc: "php-dev" internals@lists.php.net
Sent: Wednesday, January 23, 2008 7:28 PM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAP
I don't disagree with this, and that is actually why I insisted on having
the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared for
large standalone applications and sites that wanted the luxury of working
natively in their chosen character set without needing to always jump
through hoops.If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking a
step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode everywhere"
approach. It used to be that the Web was primarily a Western single-byte
charset phenomena, but that hasn't been the case for years. All major
applications out there have implemented various hacks to deal with these
issues, some with more success than others.This is what PHP does. We take common Web development pains and try to
reduce them. Think back to the pains of XML parsing in PHP 3 and even in
PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode semantics
switch was an acknowledgement that the world isn't quite ready for that
yet. But it sounds like the world isn't ready for the switch either.
Without it, I am afraid we will never get there, and that may just be
something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
What's going to make PHP 7 different than PHP 6? We'll be back to the
same discussion then. PHP 5 people have had a long time to work with
mbstring, etc and still Unicode a big scary beast.
-Andrei
Steph Fox wrote:
Blimey. I agree with Rasmus. That's twice now!
I think PHP 6 should be an interim period with support for both
scenarios, but with the default being bog-standard as-we-know-it
IS_STRING and anything IS_UNICODE needing to be marked.Perhaps PHP 7 can drop the IS_STRING stuff and have it all IS_UNICODE,
by removing the need to mark unicode text and taking it all that way. I
think doing this in PHP 6 will make for a white elephant situation (and
we like purple-blue, no?)
- Steph
----- Original Message ----- From: "Rasmus Lerdorf" rasmus@lerdorf.com
To: "Chris Stockton" chrisstocktonaz@gmail.com
Cc: "php-dev" internals@lists.php.net
Sent: Wednesday, January 23, 2008 7:28 PM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad
design decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared
for large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to
always jump through hoops.If we get rid of the switch, then I agree that we can't make the
default string IS_UNICODE. We would be crippling the implementation
and taking a step backwards in terms of leading the way in Unicode
adoption. The longterm goal for just about everyone has got to be a
"Unicode everywhere" approach. It used to be that the Web was
primarily a Western single-byte charset phenomena, but that hasn't
been the case for years. All major applications out there have
implemented various hacks to deal with these issues, some with more
success than others.This is what PHP does. We take common Web development pains and try
to reduce them. Think back to the pains of XML parsing in PHP 3 and
even in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite
ready for that yet. But it sounds like the world isn't ready for the
switch either. Without it, I am afraid we will never get there, and
that may just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
Unicode is a 'big scary beast' because people don't know what impact it will
or won't have on their applications. If they're ISO-8859-1 apps there
shouldn't be an issue - but where has anyone ever said that?
There are two options open at this point for PHP 6: unicode-only and a
MASSIVE push for user education way before it even becomes available, or
you hold it all back and force 'non-standard' (sorry rest of the world)
languages to use markers.
- Steph
----- Original Message -----
From: "Andrei Zmievski" andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:00 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAP
What's going to make PHP 7 different than PHP 6? We'll be back to the same
discussion then. PHP 5 people have had a long time to work with mbstring,
etc and still Unicode a big scary beast.-Andrei
Steph Fox wrote:
Blimey. I agree with Rasmus. That's twice now!
I think PHP 6 should be an interim period with support for both
scenarios, but with the default being bog-standard as-we-know-it
IS_STRING and anything IS_UNICODE needing to be marked.Perhaps PHP 7 can drop the IS_STRING stuff and have it all IS_UNICODE, by
removing the need to mark unicode text and taking it all that way. I
think doing this in PHP 6 will make for a white elephant situation (and
we like purple-blue, no?)
- Steph
----- Original Message ----- From: "Rasmus Lerdorf" rasmus@lerdorf.com
To: "Chris Stockton" chrisstocktonaz@gmail.com
Cc: "php-dev" internals@lists.php.net
Sent: Wednesday, January 23, 2008 7:28 PM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to always
jump through hoops.If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case for
years. All major applications out there have implemented various hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and try to
reduce them. Think back to the pains of XML parsing in PHP 3 and even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite ready
for that yet. But it sounds like the world isn't ready for the switch
either. Without it, I am afraid we will never get there, and that may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
top-posting because it's already messed up...
You will need a massive education effort in PHP 6 to start using
b"foo" for all non-UTF-8 strings because PHP 7 default will be UTF-8.
Or, yes, you will be in the same boat for PHP 7.
Or you can just start the education effort now and not release PHP 6
for a over a year...
I don't think you really want the latter option, but it's there...
Unicode is a 'big scary beast' because people don't know what impact
it will
or won't have on their applications. If they're ISO-8859-1 apps there
shouldn't be an issue - but where has anyone ever said that?There are two options open at this point for PHP 6: unicode-only and a
MASSIVE push for user education way before it even becomes available,
or
you hold it all back and force 'non-standard' (sorry rest of the
world)
languages to use markers.
- Steph
----- Original Message -----
From: "Andrei Zmievski" andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:00 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPWhat's going to make PHP 7 different than PHP 6? We'll be back to
the same
discussion then. PHP 5 people have had a long time to work with
mbstring,
etc and still Unicode a big scary beast.-Andrei
Steph Fox wrote:
Blimey. I agree with Rasmus. That's twice now!
I think PHP 6 should be an interim period with support for both
scenarios, but with the default being bog-standard as-we-know-it
IS_STRING and anything IS_UNICODE needing to be marked.Perhaps PHP 7 can drop the IS_STRING stuff and have it all
IS_UNICODE, by
removing the need to mark unicode text and taking it all that way.
I
think doing this in PHP 6 will make for a white elephant situation
(and
we like purple-blue, no?)
- Steph
----- Original Message ----- From: "Rasmus Lerdorf"
rasmus@lerdorf.com
To: "Chris Stockton" chrisstocktonaz@gmail.com
Cc: "php-dev" internals@lists.php.net
Sent: Wednesday, January 23, 2008 7:28 PM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics
switch
ASAPI don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the
Unicode
discussions, so you can blame me, again, if you consider it a bad
design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more
geared for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to
always
jump through hoops.If we get rid of the switch, then I agree that we can't make the
default
string IS_UNICODE. We would be crippling the implementation and
taking
a step backwards in terms of leading the way in Unicode adoption.
The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the
case for
years. All major applications out there have implemented various
hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and
try to
reduce them. Think back to the pains of XML parsing in PHP 3 and
even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite
ready
for that yet. But it sounds like the world isn't ready for the
switch
either. Without it, I am afraid we will never get there, and that
may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's
funny
how we have such a class of high end developers saying to break
old
PHP code. But, the majority of the success of PHP is not due to
this
small class of high end developers, it's due to it's availability
in a
shared hosting environment, and the ease of use for beginners,
and the
oodles of fairly poor quality code that is easy to copy and paste
onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated
to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small
and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support
calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to
dethrone
PHP.Problem is, PHP has been built to great success based on it's
early
foundation, but now a group of high class developers want it to
be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to
implement my
"high class" development methodologies. I don't mind, I do it
already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is
in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to
just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users
in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?--
--
--
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
Did you mean to say "can't make the default string IS_STRING"? Because
that's the only reading that makes sense given the rest of the message.
-Andrei
Rasmus Lerdorf wrote:
If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case for
years. All major applications out there have implemented various hacks
to deal with these issues, some with more success than others.
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek, Japanese
and Icelandic' demos. We haven't seen what happens to English, French or
German - ever.
So what happens if I pass in "Hello World", in English, and it's regarded as
an an IS_UNICODE string? Would I know about it? Is there anything special I
should do? Or does it just happen as always, and
what-was-all-the-fuss-about?
- Steph
----- Original Message -----
From: "Andrei Zmievski" andrei@gravitonic.com
To: "Rasmus Lerdorf" rasmus@lerdorf.com
Cc: "Chris Stockton" chrisstocktonaz@gmail.com; "php-dev"
internals@lists.php.net
Sent: Thursday, January 24, 2008 12:53 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAP
Did you mean to say "can't make the default string IS_STRING"? Because
that's the only reading that makes sense given the rest of the message.-Andrei
Rasmus Lerdorf wrote:
If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking a
step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a Western
single-byte charset phenomena, but that hasn't been the case for years.
All major applications out there have implemented various hacks to deal
with these issues, some with more success than others.
Pass in "Hello World" where? And yes, you shouldn't have to do anything
special (especially for English). The functions will work transparently.
-Andrei
Steph Fox wrote:
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek,
Japanese and Icelandic' demos. We haven't seen what happens to English,
French or German - ever.So what happens if I pass in "Hello World", in English, and it's
regarded as an an IS_UNICODE string? Would I know about it? Is there
anything special I should do? Or does it just happen as always, and
what-was-all-the-fuss-about?
- Steph
Right, and that's something that does NOT appear in any notes anywhere.
----- Original Message -----
From: "Andrei Zmievski" andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:03 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAP
Pass in "Hello World" where? And yes, you shouldn't have to do anything
special (especially for English). The functions will work transparently.-Andrei
Steph Fox wrote:
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek, Japanese
and Icelandic' demos. We haven't seen what happens to English, French or
German - ever.So what happens if I pass in "Hello World", in English, and it's regarded
as an an IS_UNICODE string? Would I know about it? Is there anything
special I should do? Or does it just happen as always, and
what-was-all-the-fuss-about?
- Steph
Hey, I can't do everything.
-Andrei
Steph Fox wrote:
Right, and that's something that does NOT appear in any notes anywhere.
----- Original Message ----- From: "Andrei Zmievski"
andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:03 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAPPass in "Hello World" where? And yes, you shouldn't have to do
anything special (especially for English). The functions will work
transparently.-Andrei
Steph Fox wrote:
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek,
Japanese and Icelandic' demos. We haven't seen what happens to
English, French or German - ever.So what happens if I pass in "Hello World", in English, and it's
regarded as an an IS_UNICODE string? Would I know about it? Is there
anything special I should do? Or does it just happen as always, and
what-was-all-the-fuss-about?
- Steph
Well maybe half the problem with this is that people aren't really aware of
what is or isn't the issue. As I (now) understand it, the only people
affected by Unicode support will be those currently using mbstring, is that
correct?
- Steph
----- Original Message -----
From: "Andrei Zmievski" andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:33 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAP
Hey, I can't do everything.
-Andrei
Steph Fox wrote:
Right, and that's something that does NOT appear in any notes anywhere.
----- Original Message ----- From: "Andrei Zmievski"
andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:03 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch
ASAPPass in "Hello World" where? And yes, you shouldn't have to do anything
special (especially for English). The functions will work transparently.-Andrei
Steph Fox wrote:
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek,
Japanese and Icelandic' demos. We haven't seen what happens to English,
French or German - ever.So what happens if I pass in "Hello World", in English, and it's
regarded as an an IS_UNICODE string? Would I know about it? Is there
anything special I should do? Or does it just happen as always, and
what-was-all-the-fuss-about?
- Steph
Or people that worry too much about characters being bytes.
-Andrei
Steph Fox wrote:
Well maybe half the problem with this is that people aren't really aware
of what is or isn't the issue. As I (now) understand it, the only people
affected by Unicode support will be those currently using mbstring, is
that correct?
- Steph
----- Original Message ----- From: "Andrei Zmievski"
andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:33 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics switch ASAPHey, I can't do everything.
-Andrei
Steph Fox wrote:
Right, and that's something that does NOT appear in any notes anywhere.
----- Original Message ----- From: "Andrei Zmievski"
andrei@gravitonic.com
To: "Steph Fox" steph@zend.com
Cc: "Rasmus Lerdorf" rasmus@lerdorf.com; "Chris Stockton"
chrisstocktonaz@gmail.com; "php-dev" internals@lists.php.net
Sent: Thursday, January 24, 2008 1:03 AM
Subject: Re: [PHP-DEV] why we must get rid of unicode.semantics
switch ASAPPass in "Hello World" where? And yes, you shouldn't have to do
anything special (especially for English). The functions will work
transparently.-Andrei
Steph Fox wrote:
Hey Andrei,
You can't just say that without giving full details.
We've seen all your 'this will cope with Russian, Hebrew, Greek,
Japanese and Icelandic' demos. We haven't seen what happens to
English, French or German - ever.So what happens if I pass in "Hello World", in English, and it's
regarded as an an IS_UNICODE string? Would I know about it? Is
there anything special I should do? Or does it just happen as
always, and what-was-all-the-fuss-about?
- Steph
Andrei Zmievski wrote:
Or people that worry too much about characters being bytes.
-Andrei
Steph Fox wrote:
Well maybe half the problem with this is that people aren't really
aware of what is or isn't the issue. As I (now) understand it, the
only people affected by Unicode support will be those currently using
mbstring, is that correct?
ACTUALLY - how many people use more than the 127 character set in English
anyway. I'd be more than happy to force UTF8 mode as standard and get away
from 'code page hell'.
This IS more about education and we are having similar 'discussions' in
Firebird over adding unicode for field and table names. As long as there is a
clean 'conversion' path that takes care of characters that DO need mapping
then simplifying things by only having one method of work makes perfect sense?
Most of those who complain about 'taking more space' will never see a two byte
character ;)
--
Lester Caine - G8HFL
Contact - http://home.lsces.co.uk/lsces/wiki/?page=contact
L.S.Caine Electronic Services - http://home.lsces.co.uk
MEDW - http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Firebird - http://www.firebirdsql.org/index.php
No, sorry, I agree that was a badly written statement where the
"crippling" part didn't refer to the previous sentence. I meant that if
we remove the unicode semantics switch, then we are crippling the
implementation because we wouldn't be able to make the default string
literal IS_UNICODE which, I think we all agree, is where we will have to
eventually get to.
The question here isn't so much where we are going, but exactly how we
will get there and how long that might take. In our early discussions
on this stuff we did mull over this and figured we'd go ahead with the
two-mode approach with the idea that PHP 6 would be the transition
version where people could start off with Unicode-light (semantics off)
and eventually turn them on to get ready for PHP 7 which would be all
Unicode all the time.
Apart from the string literals, I think setting the encoding for streams
at runtime isn't that big a deal. The input encoding is a bit trickier
since it happens before the script is run, but we did explore delaying
the decoding until access time which again means it should be something
the script can trigger at runtime which gives people the portability and
BC they are aching for.
-Rasmus
Andrei Zmievski wrote:
Did you mean to say "can't make the default string IS_STRING"? Because
that's the only reading that makes sense given the rest of the message.-Andrei
Rasmus Lerdorf wrote:
If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case for
years. All major applications out there have implemented various hacks
to deal with these issues, some with more success than others.
The question here isn't so much where we are going, but exactly how we
will get there and how long that might take.
Absolutely.
- Steph (who has taken several queries over this today thank you)
Is it possible to take a page out of the database engine's handbook and
tie a charset to a namespace like charsets are tied to tables?
namespace myNamespace charset=utf8
{
...
}
Then when no charset is defined it defaults to current PHP semantics.
Win-win?
Cheers,
Rob.
I don't disagree with this, and that is actually why I insisted on
having the unicode-semantics switch from the early days of the Unicode
discussions, so you can blame me, again, if you consider it a bad design
decision.My take on it was that just about all ISPs would run with Unicode
semantics off and that the Unicode semantics on mode was more geared for
large standalone applications and sites that wanted the luxury of
working natively in their chosen character set without needing to always
jump through hoops.If we get rid of the switch, then I agree that we can't make the default
string IS_UNICODE. We would be crippling the implementation and taking
a step backwards in terms of leading the way in Unicode adoption. The
longterm goal for just about everyone has got to be a "Unicode
everywhere" approach. It used to be that the Web was primarily a
Western single-byte charset phenomena, but that hasn't been the case for
years. All major applications out there have implemented various hacks
to deal with these issues, some with more success than others.This is what PHP does. We take common Web development pains and try to
reduce them. Think back to the pains of XML parsing in PHP 3 and even
in PHP 4 compared to today.Ultimately we need to get to Unicode everywhere, and the Unicode
semantics switch was an acknowledgement that the world isn't quite ready
for that yet. But it sounds like the world isn't ready for the switch
either. Without it, I am afraid we will never get there, and that may
just be something we have to live with.-Rasmus
Chris Stockton wrote:
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?--
--
...........................................................
SwarmBuy.com - http://www.swarmbuy.com
Leveraging the buying power of the masses!
...........................................................
..
I partially agree, I have been watching this discussion and it's funny
how we have such a class of high end developers saying to break old
PHP code. But, the majority of the success of PHP is not due to this
small class of high end developers, it's due to it's availability in a
shared hosting environment, and the ease of use for beginners, and the
oodles of fairly poor quality code that is easy to copy and paste onto
peoples websites.Look at the adoption of php4, many webhosts haven't even updated to
PHP5 completely due to things like register_globals and small
backwards compatibility breakage. The list of problems is small and
correctable, if you give system engineers at all of these hosting
companies the choice of A. Upgrade to php6 and drive support calls
through the roof, or B. Stay at PHP4/5 for eternity until a more
(insert your complaints / rants here) language comes along to dethrone
PHP.Problem is, PHP has been built to great success based on it's early
foundation, but now a group of high class developers want it to be
more then PHP was built onto. You will sacrifice it's success if
backwards compatibility is not just, broke, but obliterated. Why
change PHP's philosophy? Keep it easy for the new user, keep it
successful, and make me work a little more when I want to implement my
"high class" development methodologies. I don't mind, I do it already.I write this as a "high class" developer.
-1
-Chris
6 reasons why we must to get rid of The Switch ASAP
I was +1...
Until folks started posting that old PHP scripts won't run as-is in
PHP 6?...That's just daft...
When my webhost upgrades to PHP 6, I need all my old scripts to just
keep on chugging away, as much as possible...I really think we're stuck with the default "string" being an
old-school binary string, unless you want to lose a LOT of users in a
hurry, or have PHP 5 stick around forever and ever.--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?
+1
remove switch
make unicode strings default
6 reasons why we must to get rid of The Switch ASAP
it gives users false sense of "compatibility" when no compatibility is even planned;
it's supposed to mean compatibility, but can be changed only in php.ini, which
means users would still have to maintain 2 versions of their software:
one for On and second for Off.2+ bigger codebase [1] (with lots of duplicates because we have to do
same things in native and unicode modes);increases the maintenance costs a lot [2];
this is yet another reincarnation of ze1_compatibility switch.
I believe most of the people here agree it was a total failure - untested, unneeded and,
most important, not working thing that complicates user's and developer's lives.
Those who want compatibility may and will stay with PHP5 forever,
those who need Unicode support will use PHP6.we need to remove the switch ASAP and make PHP6 Unicode-only before people spend
their time doing useless "compatibility ports" of their applications.
[1] http://cvs.php.net/viewvc.cgi/php-src/ext/standard/string.c?revision=1.664&view=markup
Don't click this link if you want to sleep well today.[2] Here is a typical problem: http://bugs.php.net/bug.php?id=42861
Try to fix it without looking at the solution and you'll see what I mean.--
Wbr,
Antony Dovgal--
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
+1
Remove switch.
Make unicode strings default.