POSIX regex

18 years ago by Jani Taskinen — view source — reply

unread

Please read about the decision done regarding this and why it was done
at: http://derickrethans.nl/files/meeting-notes.html#move-ereg-to-pecl

This is getting quite boring. You have had over 2 years to read about
this and complain..and this wasn't the first time with your usual
comment "will break a huge amount of applications" about anything we're
trying to improve. <removed usual rant about BC>

--Jani

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

Andi

-----Original Message-----
From: Ilia Alshanetsky [mailto:ilia@prohost.org]
Sent: Monday, July 16, 2007 6:48 AM
To: Andi Gutmans
Cc: jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Why move it to PECL? I agree that PCRE is the preferred way but not
having ereg() will break a huge amount of applications for
very little
gain.

I tend to agree, unless we provide wrappers via PCRE that
emulate ereg functionality I don't think we can remove posix
regex until PHP 6.

Ilia Alshanetsky

18 years ago by Pierre — view source — reply

unread

Even in PHP 6 I am not sure it's a good idea.

As far as I know, Jani is referring to PHP6 only. And it was "decided"
in the "php6 notes".

I'm in favour to remove ereg in php6, and the sooner we decide the
better.Users will know about this change and will finally understand
the PCRE superiority and why they should use it instead, and today.

As of 5.x (5.2.x or 5.3.x), I rather prefer to deprecate it in 5.3
(if any) but I don't think we should remove it in 5.x.

Cheers,
--Pierre

18 years ago by Derick Rethans — view source — reply

unread

Even in PHP 6 I am not sure it's a good idea.

As far as I know, Jani is referring to PHP6 only. And it was "decided"
in the "php6 notes".

Unfortunately that is not true. It's only the title of the agenda point,
it's not part of the "conclusions".

I'm in favour to remove ereg in php6, and the sooner we decide the
better.

Yes, I agree.

Users will know about this change and will finally understand
the PCRE superiority and why they should use it instead, and today.

However, users should learn how to use the new regexp engine
as that will support Unicode :)

regards,
Derick

18 years ago by Nuno Lopes — view source — reply

unread

PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement
for the engine behind ereg(). What I don't know is how compatible it is with
the current engine. But I think it worth investigating.

Nuno

P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it
wasn't needed so far.

18 years ago by Jani Taskinen — view source — reply

unread

PCRE has a POSIX API, so it is possible to use PCRE as a drop-in replacement
for the engine behind ereg(). What I don't know is how compatible it is with
the current engine. But I think it worth investigating.

Worked fine when I tested it. But it's quite pointless, it's still not
unicode friendly. It's just better to use system POSIX regex funcs if
ext/ereg/ is to stay..which is stupid since all functions it provides
can be easily replaced with unicode friendly preg_* funcs. Nobody should
use ereg_*() for anything if they want to use unicode. If they don't
need unicode, they don't need PHP 6 either.

P.S.: this POSIX PCRE layer isn't currently bundled with PHP, because it
wasn't needed so far.

It is bundled, just isn't compiled, see above.. :)

--Jani

18 years ago by Derick Rethans — view source — reply

unread

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.

Their apps are breaking anyway and three regex engines doesn't make
sense.

Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

I disagree with this approach. The thing is that we need to get a clear
message out ASAP. This all ties into topics like if we will have a
unicode off/on switch or not. Delaying these decisions will hurt our
userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode on/off
switch, accept that the bulk of all code will need to be rewritten from
scratch. The benefit will be that it will truely be cleaned up, people
will still be able to leverage the bulk of their PHP programming
background and they can enjoy the fastest possible unicode engine we can
provide them.

PHP5 will be for the people that cannot make the jump. We will back port
whatever we can reasonably get into PHP5. People will linger on PHP5,
just as they are doing now with PHP4. So it goes. At least we will not
punish the early adopters for those that are unwilling to move to the
new version in the near future anyways.

At any rate .. the time is now to make a decision on what its gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

18 years ago by Antony Dovgal — view source — reply

unread

Thank you Lukas for expressing exactly my thoughts on this.

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

I disagree with this approach. The thing is that we need to get a clear
message out ASAP. This all ties into topics like if we will have a
unicode off/on switch or not. Delaying these decisions will hurt our
userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode on/off
switch, accept that the bulk of all code will need to be rewritten from
scratch. The benefit will be that it will truely be cleaned up, people
will still be able to leverage the bulk of their PHP programming
background and they can enjoy the fastest possible unicode engine we can
provide them.

PHP5 will be for the people that cannot make the jump. We will back port
whatever we can reasonably get into PHP5. People will linger on PHP5,
just as they are doing now with PHP4. So it goes. At least we will not
punish the early adopters for those that are unwilling to move to the
new version in the near future anyways.

At any rate .. the time is now to make a decision on what its gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

--
Wbr,
Antony Dovgal

18 years ago by Jani Taskinen — view source — reply

unread

Thank you Lucas and Antony. Could not agree more..

Thank you Lukas for expressing exactly my thoughts on this.

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

I disagree with this approach. The thing is that we need to get a clear
message out ASAP. This all ties into topics like if we will have a
unicode off/on switch or not. Delaying these decisions will hurt our
userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode on/off
switch, accept that the bulk of all code will need to be rewritten from
scratch. The benefit will be that it will truely be cleaned up, people
will still be able to leverage the bulk of their PHP programming
background and they can enjoy the fastest possible unicode engine we can
provide them.

PHP5 will be for the people that cannot make the jump. We will back port
whatever we can reasonably get into PHP5. People will linger on PHP5,
just as they are doing now with PHP4. So it goes. At least we will not
punish the early adopters for those that are unwilling to move to the
new version in the near future anyways.

At any rate .. the time is now to make a decision on what its gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

18 years ago by Pierre — view source — reply

unread

Thank you Lucas and Antony. Could not agree more..

But .... we all agree, don't we? :)

18 years ago by David Coallier — view source — reply

unread

Thank you Lucas and Antony. Could not agree more..

Thank you Lukas for expressing exactly my thoughts on this.

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a huge amount of
apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get closer to PHP 6
and see what the migration path looks like. We'll have to check with a
few popular apps + google code search :)
No need to decide on that right now without having more info.

I disagree with this approach. The thing is that we need to get a clear
message out ASAP. This all ties into topics like if we will have a
unicode off/on switch or not. Delaying these decisions will hurt our
userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode on/off
switch, accept that the bulk of all code will need to be rewritten from
scratch. The benefit will be that it will truely be cleaned up, people
will still be able to leverage the bulk of their PHP programming
background and they can enjoy the fastest possible unicode engine we can
provide them.

PHP5 will be for the people that cannot make the jump. We will back port
whatever we can reasonably get into PHP5. People will linger on PHP5,
just as they are doing now with PHP4. So it goes. At least we will not
punish the early adopters for those that are unwilling to move to the
new version in the near future anyways.

At any rate .. the time is now to make a decision on what its gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

--

Another thing to mention is that without GLOBALS (PHP6), most
application and <cough>php4-developers</cough> will have far more
problems than without posix regex'es.

D

18 years ago by Andi Gutmans — view source — reply

unread

I disagree with this view of the world.
It doesn't have to be a complete either/or decision and labeling
everything as a "bc hacks" decision is an inacurrate and populistic way
of building FUD.

There are clear things we want to change (like register_globals) because
we believe that ultimately they have a significant benefit to our users
with controllable downside (there is an easy one line workaround which
we can document for people to get their old apps to work). There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable. I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB ->
VB.NET).

If you want to break everything and anything and don't want to be
limited whatsoever by our huge user-base then maybe you should write a
new language which fits exactly what your preference would be. The fact
is though, that even after these discussions and the Paris discussions,
the bulk of the idiosyncracies which make PHP what it is today will
remain (as per agreement). So there must have been some kind of view
even by the folks here that they don't want to create a new language but
improve on what we have. And it's a trade-off between bang for the buck;
sometimes it really brings high returns to break BC especially when it
comes to security; but sometimes except for making 10 PHP devs happy who
are not the bulk of our users it doesn't.

So let's not oversimplify this situation. We have to continue to make
trade-offs.

Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice). In any case, I think on this
question it does make sense that we start making "informed" decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, July 16, 2007 7:25 AM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a
huge amount
of apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get
closer to PHP
6 and see what the migration path looks like. We'll have to
check with
a few popular apps + google code search :) No need to
decide on that
right now without having more info.

I disagree with this approach. The thing is that we need to
get a clear message out ASAP. This all ties into topics like
if we will have a unicode off/on switch or not. Delaying
these decisions will hurt our userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode
on/off switch, accept that the bulk of all code will need to
be rewritten from scratch. The benefit will be that it will
truely be cleaned up, people will still be able to leverage
the bulk of their PHP programming background and they can
enjoy the fastest possible unicode engine we can provide them.

PHP5 will be for the people that cannot make the jump. We
will back port whatever we can reasonably get into PHP5.
People will linger on PHP5, just as they are doing now with
PHP4. So it goes. At least we will not punish the early
adopters for those that are unwilling to move to the new
version in the near future anyways.

At any rate .. the time is now to make a decision on what its
gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

18 years ago by Larry Garfield — view source — reply

unread

Non-core PHP developer speaking, so read with that in mind:

One of the things that held back PHP 5 adoption for so long, IMO, is the large
amount of FUD that surrounded it. Even now, 3 years after it was released, I
keep seeing the argument that "I can't drop PHP 4 and use PHP 5, then I have
to rewrite everything to use objects. I hate objects." That is, of
course, completely untrue, and if you're paying even moderate attention it's
not at all difficult to write code that runs just fine in both PHP 4 and PHP
5, with and without register_globals and magic_quotes. All it takes is a
little forethought and not letting yourself be sloppy.

Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in
addition to having better marketing to head off the FUD. Taking a stance
of "you'll have to start from scratch if you want to be PHP 6 compatible, oh
well" is an absolutely sure-fire way to guarantee that no one uses PHP 6 for
anything except niche markets.

If people are still relying on register_globals at this point, sure, they're
screwed no matter what they do. But code written to PHP 5 E_STRICT standards
with a recommended configuration (register_globals off, etc.) should be
possible to make run successfully in PHP 6 without gutting and starting from
scratch (even if you can't use the new-and-cool features). If not, "GoPHP6"
will be a failure before it even gets started. :-)

(And yes, I'm already pondering how to do GoPHP6 in order to make the 5/6
transition smoother.)

I disagree with this view of the world.
It doesn't have to be a complete either/or decision and labeling
everything as a "bc hacks" decision is an inacurrate and populistic way
of building FUD.

There are clear things we want to change (like register_globals) because
we believe that ultimately they have a significant benefit to our users
with controllable downside (there is an easy one line workaround which
we can document for people to get their old apps to work). There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable. I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB ->
VB.NET).

If you want to break everything and anything and don't want to be
limited whatsoever by our huge user-base then maybe you should write a
new language which fits exactly what your preference would be. The fact
is though, that even after these discussions and the Paris discussions,
the bulk of the idiosyncracies which make PHP what it is today will
remain (as per agreement). So there must have been some kind of view
even by the folks here that they don't want to create a new language but
improve on what we have. And it's a trade-off between bang for the buck;
sometimes it really brings high returns to break BC especially when it
comes to security; but sometimes except for making 10 PHP devs happy who
are not the bulk of our users it doesn't.

So let's not oversimplify this situation. We have to continue to make
trade-offs.

Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice). In any case, I think on this
question it does make sense that we start making "informed" decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, July 16, 2007 7:25 AM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

Even in PHP 6 I am not sure it's a good idea. There are a

huge amount

of apps that use them and it'll be very hard for people to upgrade.
Anyway, let's do some more research on that once we get

closer to PHP

6 and see what the migration path looks like. We'll have to

check with

a few popular apps + google code search :) No need to

decide on that

right now without having more info.

I disagree with this approach. The thing is that we need to
get a clear message out ASAP. This all ties into topics like
if we will have a unicode off/on switch or not. Delaying
these decisions will hurt our userbase. We need to prepare them early.

IMHO we should use PHP6 as the clean up release. Drop unicode
on/off switch, accept that the bulk of all code will need to
be rewritten from scratch. The benefit will be that it will
truely be cleaned up, people will still be able to leverage
the bulk of their PHP programming background and they can
enjoy the fastest possible unicode engine we can provide them.

PHP5 will be for the people that cannot make the jump. We
will back port whatever we can reasonably get into PHP5.
People will linger on PHP5, just as they are doing now with
PHP4. So it goes. At least we will not punish the early
adopters for those that are unwilling to move to the new
version in the near future anyways.

At any rate .. the time is now to make a decision on what its
gonna be.
PHP6 with BC hacks or not.

regards,
Lukas

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Larry Garfield wrote:

Non-core PHP developer speaking, so read with that in mind:

One of the things that held back PHP 5 adoption for so long, IMO, is the large
amount of FUD that surrounded it. Even now, 3 years after it was released, I
keep seeing the argument that "I can't drop PHP 4 and use PHP 5, then I have
to rewrite everything to use objects. I hate objects." That is, of
course, completely untrue, and if you're paying even moderate attention it's
not at all difficult to write code that runs just fine in both PHP 4 and PHP
5, with and without register_globals and magic_quotes. All it takes is a
little forethought and not letting yourself be sloppy.

I have seen little of that. But I have seen issues due to array_merge()
changes. But more importantly our handling of E_STRICT has made it
difficult for many.

Writing PHP 5/6 compatible code needs to be just as easy, if not easier, in
addition to having better marketing to head off the FUD. Taking a stance
of "you'll have to start from scratch if you want to be PHP 6 compatible, oh
well" is an absolutely sure-fire way to guarantee that no one uses PHP 6 for
anything except niche markets.

I see it more as a question of being open about whats going on. If we
would have had the upgrading guides from the beginning of 5.0.z, I think
things would have been easier. The fact that our x.0.z releases are not
particularly popular is another issue.

I think the biggest challenge PHP5 faced however was that it was mainly
about making developers life easier, since PHP4 already enables you to
do pretty much any kind of web site if you are willing to put in the
required time. Native unicode to me feels a bit more like adding
something that was not really doable before (sure you can but that would
mean writing every lib yourself, so the time required is beyond the vast
majority of dev teams). Then again its not like all developers will jump
on unicode the second its released (mainly because not all end users are
asking for this). But the point is, getting very high adoption rates for
new PHP releases will always be hard.

regards,
Lukas

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

There are clear things we want to change (like register_globals) because
we believe that ultimately they have a significant benefit to our users
with controllable downside (there is an easy one line workaround which
we can document for people to get their old apps to work). There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable. I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB ->
VB.NET).

I never said we should break BC just for the hell of it. The goal must
be that PHP6 feels and behaves like PHP. Its not about high-jacking PHP
to come up with the language we all wanted instead.

So let's not oversimplify this situation. We have to continue to make
trade-offs.

Sure, but you are suggesting to delay decisions indefinitely. Either you
are saying this because you already decided that you don't want this
change, or you are accepting that our users will be unable to prepare
themselves for what happens in the future. This of course will make it
that much harder for them to take the plunge into PHP6.

Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice). In any case, I think on this
question it does make sense that we start making "informed" decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).

I have not seen any proposed way of finding out this migration path
besides lets wait. Lets wait is not the answer. What I asked for was
exactly a decision on how far we are willing to go with the breakage and
more importantly the fundamental decision about how we approach unicode
in PHP6. The on off switch is not something that makes sense to delay
until forever. Its a big decision and once its decided other things will
become much easier (like PHP6 development or deciding the impact of
other potential BC breaks).

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

A few months ago we agreed that we will give our users the choice of
both modes. The burdon of maintenance has mainly been on us btw as the
majority of the differences here are in the Zend Engine and the
extensions don't have as much work associated with them.

Here's my proposed way of figuring how to make migration easier. Port
the following applications to PHP 6 and let's see what we can learn from
it:

mediaWiki
SugarCRM
Drupal
Wordpress

I don't think we can have more of a reality check than actually going
through this exercise and understanding the issues. As I mentioned from
the small work we have done up to now it seems like there really is no
migration patch except for applications to be almost completely
rewritten when unicode_semantics=on. I don't think this is a feasible
way to go. But if volunteers can work on this porting and it allows us
to fix some things (if they are fixable) then that would change the
situation.

I believe that people who actually do this exercise and want to have a
migration path will understand that there's no other way except to
support unicode_semantics=off. Btw, most languages deliver Unicode in
this way and it works pretty well.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, July 16, 2007 11:40 PM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

There are clear things we want to change (like register_globals)
because we believe that ultimately they have a significant
benefit to
our users with controllable downside (there is an easy one line
workaround which we can document for people to get their
old apps to
work). There are other areas where breaking BC makes sense.
But saying
we should just break it across the board and not even
consider having
a good upgrade path for our users is unreasonable. I believe we can
have a very good PHP 6, which is pretty much in sync with
many of your
feelings, but that provides a well documented and
reasonable upgrade
path (unlike VB -> VB.NET).

I never said we should break BC just for the hell of it. The
goal must be that PHP6 feels and behaves like PHP. Its not
about high-jacking PHP to come up with the language we all
wanted instead.

So let's not oversimplify this situation. We have to
continue to make
trade-offs.

Sure, but you are suggesting to delay decisions indefinitely.
Either you are saying this because you already decided that
you don't want this change, or you are accepting that our
users will be unable to prepare themselves for what happens
in the future. This of course will make it that much harder
for them to take the plunge into PHP6.

Btw, one of PHP's strengths has been in high performance sites and
with a Unicode=on only mode this would take quite a hit
(but it's not
the only reason why I need we need choice). In any case, I think on
this question it does make sense that we start making "informed"
decisions by understanding the migration path better, as opposed to
just basing decisions on gut feelings. Maybe that kind of learning
experience will proove me wrong (which may be so).

I have not seen any proposed way of finding out this
migration path besides lets wait. Lets wait is not the
answer. What I asked for was exactly a decision on how far we
are willing to go with the breakage and more importantly the
fundamental decision about how we approach unicode in PHP6.
The on off switch is not something that makes sense to delay
until forever. Its a big decision and once its decided other
things will become much easier (like PHP6 development or
deciding the impact of other potential BC breaks).

regards,
Lukas

18 years ago by Jani Taskinen — view source — reply

unread

Just FYI: I did not agree with that choice. And IIRC, neither did
several other people here.

--Jani

A few months ago we agreed that we will give our users the choice of
both modes. The burdon of maintenance has mainly been on us btw as the
majority of the differences here are in the Zend Engine and the
extensions don't have as much work associated with them.

Here's my proposed way of figuring how to make migration easier. Port
the following applications to PHP 6 and let's see what we can learn from
it:

mediaWiki

SugarCRM

Drupal

Wordpress

I don't think we can have more of a reality check than actually going
through this exercise and understanding the issues. As I mentioned from
the small work we have done up to now it seems like there really is no
migration patch except for applications to be almost completely
rewritten when unicode_semantics=on. I don't think this is a feasible
way to go. But if volunteers can work on this porting and it allows us
to fix some things (if they are fixable) then that would change the
situation.

I believe that people who actually do this exercise and want to have a
migration path will understand that there's no other way except to
support unicode_semantics=off. Btw, most languages deliver Unicode in
this way and it works pretty well.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, July 16, 2007 11:40 PM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

There are clear things we want to change (like register_globals)
because we believe that ultimately they have a significant
benefit to
our users with controllable downside (there is an easy one line
workaround which we can document for people to get their
old apps to
work). There are other areas where breaking BC makes sense.
But saying
we should just break it across the board and not even
consider having
a good upgrade path for our users is unreasonable. I believe we can
have a very good PHP 6, which is pretty much in sync with
many of your
feelings, but that provides a well documented and
reasonable upgrade
path (unlike VB -> VB.NET).

I never said we should break BC just for the hell of it. The
goal must be that PHP6 feels and behaves like PHP. Its not
about high-jacking PHP to come up with the language we all
wanted instead.

So let's not oversimplify this situation. We have to
continue to make
trade-offs.

Sure, but you are suggesting to delay decisions indefinitely.
Either you are saying this because you already decided that
you don't want this change, or you are accepting that our
users will be unable to prepare themselves for what happens
in the future. This of course will make it that much harder
for them to take the plunge into PHP6.

Btw, one of PHP's strengths has been in high performance sites and
with a Unicode=on only mode this would take quite a hit
(but it's not
the only reason why I need we need choice). In any case, I think on
this question it does make sense that we start making "informed"
decisions by understanding the migration path better, as opposed to
just basing decisions on gut feelings. Maybe that kind of learning
experience will proove me wrong (which may be so).

I have not seen any proposed way of finding out this
migration path besides lets wait. Lets wait is not the
answer. What I asked for was exactly a decision on how far we
are willing to go with the breakage and more importantly the
fundamental decision about how we approach unicode in PHP6.
The on off switch is not something that makes sense to delay
until forever. Its a big decision and once its decided other
things will become much easier (like PHP6 development or
deciding the impact of other potential BC breaks).

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

I thought you were retired at the time...

-----Original Message-----
From: Jani Taskinen [mailto:jani.taskinen@sci.fi]
Sent: Tuesday, July 17, 2007 7:37 AM
To: Andi Gutmans
Cc: internals@lists.php.net
Subject: RE: [PHP-DEV] POSIX regex

Just FYI: I did not agree with that choice. And IIRC, neither
did several other people here.

--Jani

A few months ago we agreed that we will give our users the
choice of
both modes. The burdon of maintenance has mainly been on us
btw as the
majority of the differences here are in the Zend Engine and the
extensions don't have as much work associated with them.

Here's my proposed way of figuring how to make migration
easier. Port
the following applications to PHP 6 and let's see what we can learn
from
it:

mediaWiki

SugarCRM

Drupal

Wordpress

I don't think we can have more of a reality check than
actually going
through this exercise and understanding the issues. As I mentioned
from the small work we have done up to now it seems like
there really
is no migration patch except for applications to be almost
completely
rewritten when unicode_semantics=on. I don't think this is
a feasible
way to go. But if volunteers can work on this porting and
it allows us
to fix some things (if they are fixable) then that would change the
situation.

I believe that people who actually do this exercise and
want to have a
migration path will understand that there's no other way except to
support unicode_semantics=off. Btw, most languages deliver
Unicode in
this way and it works pretty well.

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Monday, July 16, 2007 11:40 PM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi;
internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

There are clear things we want to change (like
register_globals)
because we believe that ultimately they have a significant
benefit to
our users with controllable downside (there is an easy one line
workaround which we can document for people to get their
old apps to
work). There are other areas where breaking BC makes sense.
But saying
we should just break it across the board and not even
consider having
a good upgrade path for our users is unreasonable. I believe we
can have a very good PHP 6, which is pretty much in sync with
many of your
feelings, but that provides a well documented and
reasonable upgrade
path (unlike VB -> VB.NET).

I never said we should break BC just for the hell of it. The goal
must be that PHP6 feels and behaves like PHP. Its not about
high-jacking PHP to come up with the language we all
wanted instead.

So let's not oversimplify this situation. We have to
continue to make
trade-offs.

Sure, but you are suggesting to delay decisions indefinitely.
Either you are saying this because you already decided that you
don't want this change, or you are accepting that our
users will be
unable to prepare themselves for what happens in the
future. This of
course will make it that much harder for them to take the plunge
into PHP6.

Btw, one of PHP's strengths has been in high
performance sites and
with a Unicode=on only mode this would take quite a hit
(but it's not
the only reason why I need we need choice). In any
case, I think
on this question it does make sense that we start
making "informed"
decisions by understanding the migration path better,
as opposed
to just basing decisions on gut feelings. Maybe that kind of
learning experience will proove me wrong (which may be so).

I have not seen any proposed way of finding out this
migration path
besides lets wait. Lets wait is not the answer. What I
asked for was
exactly a decision on how far we are willing to go with
the breakage
and more importantly the fundamental decision about how
we approach
unicode in PHP6.
The on off switch is not something that makes sense to
delay until
forever. Its a big decision and once its decided other
things will
become much easier (like PHP6 development or deciding the
impact of
other potential BC breaks).

regards,
Lukas

--
To
unsubscribe,
visit: http://www.php.net/unsub.php

18 years ago by Pierre — view source — reply

unread

I thought you were retired at the time...

Other were not. Some other were not even present. And those who were
present seem to have different interpretations of the decisions. I
also have to say that this meeting was done when we were not actually
informed.

--Pierre

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

Here's my proposed way of figuring how to make migration easier. Port
the following applications to PHP 6 and let's see what we can learn from
it:

mediaWiki

SugarCRM

Drupal

Wordpress

IIRC Wordpress is a good example of bad source code to fix. Drupal would
be a good example of a PHP4 style fairly procedural app to port.
mediaWiki also seems like a worthy cause since its one of those apps
that would actually benefit quite a bit from unicode support, but I
guess you are talking about porting with unicode==off, right?

SugarCRM would be a good example of a gigantic horrible horrible source
code to fix and I am not sure if I would put it on the list considering
the limited open source release they do. I think it would be cool of
they would do it themselves or sponsor whoever is doing it.

We also have an SoC project where someone is implementing a PHP6 version
of Jaws.

regards,
Lukas

18 years ago by Andi Gutmans — view source — reply

unread

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call "good"
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.

I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6
(unciode_semantics=on). I can also do some performance testing in our
performance lab once we have both working. I haven't yet mentioned how
companies building high-performance sites would probably take a huge hit
by moving to Unicode to the point where I think they will not adopt for
a long time and then will be faced with the choice to migrate off of PHP
or bite the bullet. With some of the companies I know that have huge
server farms adding 50% capacity (or whatever the number is) could be a
good enough reason to migate off as they are paying huge fees for the
servers...

Andi

-----Original Message-----
From: Lukas Kahwe Smith [mailto:mls@pooteeweet.org]
Sent: Tuesday, July 17, 2007 7:50 AM
To: Andi Gutmans
Cc: Ilia Alshanetsky; jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Andi Gutmans wrote:

Here's my proposed way of figuring how to make migration
easier. Port
the following applications to PHP 6 and let's see what we can learn
from
it:

mediaWiki

SugarCRM

Drupal

Wordpress

IIRC Wordpress is a good example of bad source code to fix.
Drupal would be a good example of a PHP4 style fairly
procedural app to port.
mediaWiki also seems like a worthy cause since its one of
those apps that would actually benefit quite a bit from
unicode support, but I guess you are talking about porting
with unicode==off, right?

SugarCRM would be a good example of a gigantic horrible
horrible source code to fix and I am not sure if I would put
it on the list considering the limited open source release
they do. I think it would be cool of they would do it
themselves or sponsor whoever is doing it.

We also have an SoC project where someone is implementing a
PHP6 version of Jaws.

regards,
Lukas

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Andi Gutmans wrote:

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call "good"
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.

Bad good in the sense its messy. But what I was going at is that I find
your proposed list good with the exception of SugarCRM. It might be good
to also include a php5 only app, so that we have a good idea of how
messy code, fairly procedural, E_STRICT complaint etc code ports to PHP6
unicode==off.

I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6
(unciode_semantics=on). I can also do some performance testing in our

ok .. this makes this quite a large undertaking indeed.

regards,
Lukas

18 years ago by Pierre — view source — reply

unread

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call "good"
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.

Using PHP4 as a base to test the compatibility of PHP6 is a bad idea.
The entry point should be PHP5+ (even if the troubles begin between
5.1 and 5.2).

Having apps running on 5.2 with E_STRICT without notices would be a
good indicator about how it will work with php6 without unicode (or
php 5.3 for php6/Off .... and php6 with unicode only).

I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6 (unciode_semantics=on).

I did. And please (for god' sake...), can you stop to make bad
assumptions about what other knows or not?

With all my apps and I'm well aware of the work I will need to port
them. But this work is required as long as I'm interested in Unicode.
Unicode off? No interest sorry, I do not care about Namespace for my
existing apps.

Don't get me wrong: I love them but I don't consider this feature as
critical for my exisiting applications. They work without since
years, they will continue to work without a couple of more years.
Using Namespace will require more work anyway.

I can also do some performance testing in our
performance lab once we have both working. I haven't yet mentioned how
companies building high-performance sites would probably take a huge hit
by moving to Unicode to the point where I think they will not adopt for
a long time and then will be faced with the choice to migrate off of PHP
or bite the bullet. With some of the companies I know that have huge
server farms adding 50% capacity (or whatever the number is) could be a
good enough reason to migate off as they are paying huge fees for the
servers...

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

Gruß,
--Pierre

18 years ago by Jani Taskinen — view source — reply

unread

Pierre kirjoitti:

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are insignificant. :)

And by the time people actually start using PHP 6, it's propably already antique
tech anyway..(around 2015 or so) :D

--Jani

18 years ago by Tomas Kuliavas — view source — reply

unread

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are
insignificant. :)

If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.

--
Tomas

18 years ago by Jani Taskinen — view source — reply

unread

Nitpicking, are we? :)

Tomas Kuliavas kirjoitti:

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.
If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are
insignificant. :)

If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.

18 years ago by Pierre — view source — reply

unread

50% increase sounds off base. But I did not bench php6 yet. When all
the new features are implemented, it will make more sense to work on
the performance problem. For now, it is simply premature.

If Moore's law stands for the coming years, this argument is moot anyway.
By the time PHP 6 is out the door, any performance issues are
insignificant. :)

If you have setup with 10 machines and new interpreter works 10% faster,
you can serve same amount of users with 9 machines. Plus Moore talks about
number of transistors and not about performance or power consumption.

Three core in one processor consume less than three different
processors. More CPUs in one host will also faster than many hosts
(processing power). Sorry, but Jani's reference to Moore is correct.
But that's definitively not the topic :)

18 years ago by Derick Rethans — view source — reply

unread

Hmm I don't quite understand what bad code vs. good code plays here.
Wordpress is one of the most popular applications out there so it's got
huge value to our community. I bet there's a huge amount of PHP
applications who's source code is of the same quality or worse. Anyway,
the issues I have seen would also be relevant to what you call "good"
code but again, when it comes to compatibility, I don't quite know why
that will play a big role.

I am talking about porting to both unicode_semantics=off and on. This
will give us a good understanding of the difference of the modes and
where we're at. I bet most people who are voicing their opinions have
actually not tried to write a sizeable application with PHP 6 and also
tried to run an existing one on PHP 6
(unciode_semantics=on).

I hope you are not suggesting to port them to both modes? Why on earth
should an application support both unicode=off and unicode=on? That's
exactly the thing that some of us are so afraid of and want to prevent
as this just annoys more and more PHP users that have to deal with this
stuff. And as mentioned before, having both modes is way worse than
having to real with register_globals on/off or magic_quotes, as those
two cases could atleast be handled in user space.

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Richard Lynch — view source — reply

unread

I hope you are not suggesting to port them to both modes? Why on earth
should an application support both unicode=off and unicode=on? That's
exactly the thing that some of us are so afraid of and want to prevent
as this just annoys more and more PHP users that have to deal with
this
stuff. And as mentioned before, having both modes is way worse than
having to real with register_globals on/off or magic_quotes, as those
two cases could atleast be handled in user space.

I suspect some apps can only be reasonably ported one way or the other.

But one would hope that an app could make the choice to go either way,
and not have a nightmare experience.

The purpose of the PHP Devs doing a port is not to release both
versions, or either version, but to find out if it can actually be
done without major grief for either version.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Stanislav Malyshev — view source — reply

unread

that would actually benefit quite a bit from unicode support, but I
guess you are talking about porting with unicode==off, right?

unicode=off doesn't mean no unicode support, btw.

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Derick Rethans — view source — reply

unread

that would actually benefit quite a bit from unicode support, but I guess
you are talking about porting with unicode==off, right?

unicode=off doesn't mean no unicode support, btw.

Of course that's what it means, as none of the string functions work
properly with unicode if you turn it off. And that's just the whole
selling point of Unicode support.

Derick

18 years ago by Andi Gutmans — view source — reply

unread

Functions would work properly with Unicode, but you would explicitly
create Unicode strings e.g. u"foobar". This is not uncommon practice and
many other languages actually go down this route incl. Python and
various versions of C++ frameworks.

Andi

-----Original Message-----
From: Derick Rethans [mailto:derick@php.net]
Sent: Wednesday, July 18, 2007 1:07 AM
To: Stas Malyshev
Cc: Lukas Kahwe Smith; Andi Gutmans; Ilia Alshanetsky;
jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

that would actually benefit quite a bit from unicode
support, but I
guess you are talking about porting with unicode==off, right?

unicode=off doesn't mean no unicode support, btw.

Of course that's what it means, as none of the string
functions work properly with unicode if you turn it off. And
that's just the whole selling point of Unicode support.

Derick

18 years ago by Andrei Zmievski — view source — reply

unread

Python did go down that road, but take a look at Python 3000 effort
and you will see that what they are trying to do is exactly what we
have: native Unicode strings, without prefixes.

-Andrei

Functions would work properly with Unicode, but you would explicitly
create Unicode strings e.g. u"foobar". This is not uncommon
practice and
many other languages actually go down this route incl. Python and
various versions of C++ frameworks.

18 years ago by Stanislav Malyshev — view source — reply

unread

Python did go down that road, but take a look at Python 3000 effort and
you will see that what they are trying to do is exactly what we have:
native Unicode strings, without prefixes.

Maybe still having u"" - that always produce unicode, regardless of
semantics - could be helpful...

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by scott.mcnaught@synergy8.com — view source — reply

unread

I don't like the idea of having a "u" prefix for Unicode strings. It may
improve performance, and give you some level of fine grain control, but...

It breaks your "keep php simple" policy by introducing a lot of new
functions (ugly).
I (plus a lot of others) have an existing php5 application which I wish to
eventually use with Unicode, and like others, I don't want to spend time
refactoring.
It will also introduce bugs when programmers accidentally forget to add
the "u" prefix when working with unicode.

If you always want to produce Unicode, I think its best to always use a cast
or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

My 2c :)

-----Original Message-----
From: Stanislav Malyshev [mailto:stas@zend.com]
Sent: Friday, 20 July 2007 8:47 AM
To: Andrei Zmievski
Cc: Andi Gutmans; Derick Rethans; Lukas Kahwe Smith; Ilia Alshanetsky;
jani.taskinen@iki.fi; internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

Python did go down that road, but take a look at Python 3000 effort and
you will see that what they are trying to do is exactly what we have:
native Unicode strings, without prefixes.

Maybe still having u"" - that always produce unicode, regardless of
semantics - could be helpful...

--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Andrei Zmievski — view source — reply

unread

On Jul 19, 2007, at 4:14 PM, scott.mcnaught@synergy8.com
scott.mcnaught@synergy8.com wrote:

I don't like the idea of having a "u" prefix for Unicode strings.
It may
improve performance, and give you some level of fine grain control,
but...

It breaks your "keep php simple" policy by introducing a lot of new
functions (ugly).

I (plus a lot of others) have an existing php5 application which
I wish to
eventually use with Unicode, and like others, I don't want to spend
time
refactoring.

It will also introduce bugs when programmers accidentally forget
to add
the "u" prefix when working with unicode.

If you always want to produce Unicode, I think its best to always
use a cast
or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

Good idea and it will totally work, except that it won't. strtoupper
() operates in different ways according to the type of the string
that it gets.

-Andrei

18 years ago by scott.mcnaught@synergy8.com — view source — reply

unread

I don't really know much about unicode, and to be honest, I don't really
know much about the internal workings of php.
But I assume that there are going to be different implementations of string
functions depending on whether the string is unicode or not.

I'm going to suggest an implementation suggestion... Keep in mind I havent
hacked around with php source, so my variable naming etc will be wrong...
and its all psuedocode, so its not

// The object type used when php creates a string
class ZendString
{
char *strPtr; // however strings are stored in php
ZendStringFunctions *pFunctions;
};

abstract class ZendStringFunctions
{
abstract function strtolower(ZendString *pStr);
abstract function strtoupper(ZendString *pStr);
abstract function substr(ZendString *pStr);

// All functions that differ depending on unicode / non-unicode

implementation
// ...
};

// A set of string functions for unicode strings
class ZendStringFunctionsUnicode
{
function strtolower(ZendString *pStr)
{
// unicode implementation
}

function strtoupper(ZendString *pStr)
{
	// unicode implementation
}

function substr(ZendString *pStr)
{
	// unicode implementation
}

};

// A set of string functions for non-unicode strings
class ZendStringFunctionsNonUnicode
{
function strtolower(ZendString *pStr)
{
// non-unicode implementation
}

function strtoupper(ZendString *pStr)
{
	// non-unicode implementation
}

function substr(ZendString *pStr)
{
	// non-unicode implementation
}

};

// the strtolower implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr->pFunctions->strtolower(pStr);
}

// the strtoupper implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr->pFunctions->strtolower(pStr);
}

ZEND_FUNC unicode_val(ZendString *pStr)
{
// do something with pStr->strPtr
delete pStr->pFunctions;
pStr->pFunctions = new ZendStringFunctionsUnicode();
}

Anyway - the point I'm trying to make is to use function pointers to switch
between implementations.

You could even make the ZendStringFunctions singletons and just set
pStr->pFunctions to an instance of the singleton.

I think this would provide a very fast implementation of what is trying to
be done.

Im just making a suggestion, and feel free to ignore/criticise me if im
wrong. I don't know anything about phps internals... Just an idea

Scott

-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Friday, 20 July 2007 9:36 AM
To: scott.mcnaught@synergy8.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

On Jul 19, 2007, at 4:14 PM, scott.mcnaught@synergy8.com
scott.mcnaught@synergy8.com wrote:

I don't like the idea of having a "u" prefix for Unicode strings.
It may
improve performance, and give you some level of fine grain control,
but...

It breaks your "keep php simple" policy by introducing a lot of new
functions (ugly).

I (plus a lot of others) have an existing php5 application which
I wish to
eventually use with Unicode, and like others, I don't want to spend
time
refactoring.

It will also introduce bugs when programmers accidentally forget
to add
the "u" prefix when working with unicode.

If you always want to produce Unicode, I think its best to always
use a cast
or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

Good idea and it will totally work, except that it won't. strtoupper
() operates in different ways according to the type of the string
that it gets.

-Andrei

18 years ago by scott.mcnaught@synergy8.com — view source — reply

unread

Sorry if you are using outlook, turn off the thing that says "Extra line
breaks in this message were removed" at the top of my previous message.

Scott

-----Original Message-----
From: scott.mcnaught@synergy8.com [mailto:scott.mcnaught@synergy8.com]
Sent: Friday, 20 July 2007 10:11 AM
To: internals@lists.php.net
Subject: RE: [PHP-DEV] POSIX regex

I don't really know much about unicode, and to be honest, I don't really
know much about the internal workings of php.
But I assume that there are going to be different implementations of string
functions depending on whether the string is unicode or not.

I'm going to suggest an implementation suggestion... Keep in mind I havent
hacked around with php source, so my variable naming etc will be wrong...
and its all psuedocode, so its not

// The object type used when php creates a string
class ZendString
{
char *strPtr; // however strings are stored in php
ZendStringFunctions *pFunctions;
};

abstract class ZendStringFunctions
{
abstract function strtolower(ZendString *pStr);
abstract function strtoupper(ZendString *pStr);
abstract function substr(ZendString *pStr);

// All functions that differ depending on unicode / non-unicode

implementation
// ...
};

// A set of string functions for unicode strings
class ZendStringFunctionsUnicode
{
function strtolower(ZendString *pStr)
{
// unicode implementation
}

function strtoupper(ZendString *pStr)
{
	// unicode implementation
}

function substr(ZendString *pStr)
{
	// unicode implementation
}

};

// A set of string functions for non-unicode strings
class ZendStringFunctionsNonUnicode
{
function strtolower(ZendString *pStr)
{
// non-unicode implementation
}

function strtoupper(ZendString *pStr)
{
	// non-unicode implementation
}

function substr(ZendString *pStr)
{
	// non-unicode implementation
}

};

// the strtolower implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr->pFunctions->strtolower(pStr);
}

// the strtoupper implmentation
ZEND_FUNC strtolower(ZendString *pStr)
{
return pStr->pFunctions->strtolower(pStr);
}

ZEND_FUNC unicode_val(ZendString *pStr)
{
// do something with pStr->strPtr
delete pStr->pFunctions;
pStr->pFunctions = new ZendStringFunctionsUnicode();
}

Anyway - the point I'm trying to make is to use function pointers to switch
between implementations.

You could even make the ZendStringFunctions singletons and just set
pStr->pFunctions to an instance of the singleton.

I think this would provide a very fast implementation of what is trying to
be done.

Im just making a suggestion, and feel free to ignore/criticise me if im
wrong. I don't know anything about phps internals... Just an idea

Scott

-----Original Message-----
From: Andrei Zmievski [mailto:andrei@gravitonic.com]
Sent: Friday, 20 July 2007 9:36 AM
To: scott.mcnaught@synergy8.com
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] POSIX regex

On Jul 19, 2007, at 4:14 PM, scott.mcnaught@synergy8.com
scott.mcnaught@synergy8.com wrote:

I don't like the idea of having a "u" prefix for Unicode strings.
It may
improve performance, and give you some level of fine grain control,
but...

It breaks your "keep php simple" policy by introducing a lot of new
functions (ugly).

I (plus a lot of others) have an existing php5 application which
I wish to
eventually use with Unicode, and like others, I don't want to spend
time
refactoring.

It will also introduce bugs when programmers accidentally forget
to add
the "u" prefix when working with unicode.

If you always want to produce Unicode, I think its best to always
use a cast
or a conversion function.

Eg

$str = (unicode)(strtoupper($str));
Or
$str = unicode_val(strtoupper($str));

Good idea and it will totally work, except that it won't. strtoupper
() operates in different ways according to the type of the string
that it gets.

-Andrei

18 years ago by Jani Taskinen — view source — reply

unread

Python did go down that road, but take a look at Python 3000 effort
and you will see that what they are trying to do is exactly what we
have: native Unicode strings, without prefixes.

So maybe we should learn from mistakes other have already made and not
do the same.. and remove that stupid option before it's too late.

--Jani

18 years ago by Derick Rethans — view source — reply

unread

Functions would work properly with Unicode, but you would explicitly
create Unicode strings e.g. u"foobar". This is not uncommon practice and
many other languages actually go down this route incl. Python and
various versions of C++ frameworks.

That's what I meant, Unicode is not implied so it doesn't work "by
default".

Derick

18 years ago by Derick Rethans — view source — reply

unread

Andi Gutmans wrote:

There are clear things we want to change (like register_globals) because
we believe that ultimately they have a significant benefit to our users
with controllable downside (there is an easy one line workaround which
we can document for people to get their old apps to work). There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable. I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB ->
VB.NET).

I never said we should break BC just for the hell of it. The goal must be that
PHP6 feels and behaves like PHP. Its not about high-jacking PHP to come up
with the language we all wanted instead.

So let's not oversimplify this situation. We have to continue to make
trade-offs.

Sure, but you are suggesting to delay decisions indefinitely. Either you are
saying this because you already decided that you don't want this change,

Doh, isn't that obvious?

regards,
Derick

--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Pierre — view source — reply

unread

Hi Andi,

I disagree with this view of the world.

Well, we seem to all agree on this view, but let forget this
unsignificant fact :)

It doesn't have to be a complete either/or decision and labeling
everything as a "bc hacks" decision is an inacurrate and populistic way
of building FUD.

Your persistent way to tell me (I use "me" as I'm not in the position
to talk for the other developers) that my way is populist, source of a
FUD, or whatever else came through your mind at a given moment . Fine,
if it helps you to make your point. However, can I suggest you to
seriously consider the (legitimate) voices outside your (no matter how
huge it is) world, it would be much appreciated.

There are
other areas where breaking BC makes sense. But saying we should just
break it across the board and not even consider having a good upgrade
path for our users is unreasonable.

For what I see in the various code I can fgrep, pcre is already used
much more than pcre. To migrate from ereg to pcre is a very small task
and it only brings advantages (cache, unicode support if
required,...). Ironically, a little pcre based script or grep should
do the job, if any regexp fan likes to play with that :)

Other changes in the engine will bring much more troubles (because
they are not obvious). Just like they did in the past between two
minor PHP versions.

I believe we can have a very good
PHP 6, which is pretty much in sync with many of your feelings, but that
provides a well documented and reasonable upgrade path (unlike VB ->
VB.NET).

It is comparing apple and orange. As far as I remember, VB.net was not
really planed, they only realized how much their users liked VB and
why they will not move to c* or whatever else :)

If you want to break everything and anything

It is not about breaking everything just for the fun of it but about
creating a sane base to create portable and maintainable application
and libraries.

and don't want to be
limited whatsoever by our huge user-base then maybe you should write a
new language which fits exactly what your preference would be. The fact
is though, that even after these discussions and the Paris discussions,
the bulk of the idiosyncracies which make PHP what it is today will
remain (as per agreement).

You seem to have a straight view on what should be PHP6, why don't you
publish it (we have a wiki for this exact purpose) and let see that we
(as PHP internals developers) think about it, the sooner the better
(and once for all).

Waiting indefinitely is not a solution, and taking quick decisions a
week before the final release neither. Taking early decision will let
us adapt them or change them if necessary. Our users will have the
time to think about the consequences and tell us their needs or fears.

So let's not oversimplify this situation. We have to continue to make
trade-offs.

Let's not complicate it either.

Btw, one of PHP's strengths has been in high performance sites and with
a Unicode=on only mode this would take quite a hit (but it's not the
only reason why I need we need choice).

In any case, I think on this
question it does make sense that we start making "informed" decisions by
understanding the migration path better, as opposed to just basing
decisions on gut feelings. Maybe that kind of learning experience will
proove me wrong (which may be so).

With the risk to repeat myself, we already learned from PHP5. There is
nothing that can prevent users to migrate quicker than they want
(read: quicker that they need) except if the benefits are enormous,
but that's not the case (it is but not for a large amount of users).

We can keep dreaming about a short migration path for PHP6 or we can
simply take the right decisions. Saying that we are not informed is a
poor excuse to delay any critical decisions. We are informed, we use
php daily and we have to deal "every day" with the issues we try to
solve now.

Cheers,
--Pierre

18 years ago by Zeev Suraski — view source — reply

unread

At 00:21 17/07/2007, Pierre wrote:

Hi Andi,

I disagree with this view of the world.

Well, we seem to all agree on this view, but let forget this
unsignificant fact :)

Pierre,

I wanted to send my 2c even though I'm not really involved in
internals@ any longer - because in reality it doesn't really have
much to do with such decisions. internals@ makes decisions that
effect the entire PHP userbase.

We all need to remember that the people on this mailing list are not
close to something that represents the userbase. We do have some
very opinionated people on this list, some of them with a lot of
commit-karma - which are not very open to feedback from regular
users. I'm not saying I represent the PHP userbase, and I don't
think Andi is saying this either - but both of us try to take the end
user's view when we think about stuff like this, as opposed as the
internals@ PHP developer view. I would go as far as saying that I
think we do it (as well as some others, like Rasmus) more so than
some others on this list.

For that reason I suspect that if you moved the discussion to, say,
php-general - you'd see a much more balanced view of the
world. Unfortunately it will probably not be very
manageable. Something more practical would be trying to think about
things from the end users perspective as opposed to our perspective
as the developers and maintainers of the language.

Finally, at the risk of sounding like a broken record, we always need
to remember that BC breakage accumulates, and it's not binary. Every
cleanup we do in PHP 6 will further slow migration, and as Andi
pointed out a few days ago, things don't look too well as it is.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5. I agree with Andi that
I'm not sure it's a good idea for PHP 6 either, but I'm not sure it
isn't either. As long as it's easy enough to turn it back on (i.e.
have it bundled but disabled) I think it's not unreasonable.

Zeev

18 years ago by Pierre — view source — reply

unread

Pierre,

I wanted to send my 2c even though I'm not really involved in
internals@ any longer - because in reality it doesn't really have
much to do with such decisions. internals@ makes decisions that
effect the entire PHP userbase.

We all need to remember that the people on this mailing list are not
close to something that represents the userbase. We do have some
very opinionated people on this list, some of them with a lot of
commit-karma - which are not very open to feedback from regular
users. I'm not saying I represent the PHP userbase, and I don't
think Andi is saying this either - but both of us try to take the end
user's view when we think about stuff like this, as opposed as the
internals@ PHP developer view. I would go as far as saying that I
think we do it (as well as some others, like Rasmus) more so than
some others on this list.

For that reason I suspect that if you moved the discussion to, say,
php-general - you'd see a much more balanced view of the
world. Unfortunately it will probably not be very
manageable. Something more practical would be trying to think about
things from the end users perspective as opposed to our perspective
as the developers and maintainers of the language.

Finally, at the risk of sounding like a broken record, we always need
to remember that BC breakage accumulates, and it's not binary. Every
cleanup we do in PHP 6 will further slow migration, and as Andi
pointed out a few days ago, things don't look too well as it is.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5. I agree with Andi that
I'm not sure it's a good idea for PHP 6 either, but I'm not sure it
isn't either. As long as it's easy enough to turn it back on (i.e.
have it bundled but disabled) I think it's not unreasonable.

My answer to Andi was not only about ereg but php6 in general (the
unicode flag being a much more important problem that ereg, for
example).

I fully agree with you. Each individual here does not represent the
user base but only a relative small part.

However, my problem here is not about that but about the respect of
our voices. It is understandable that you think to have a brighter
customers base, it is not necessary the case. not historically and not
practically. Conferences attendees are also a very small part of our
users.

All in all, internals developers, with their customers, coworkers or
users (Ez, PEAR, linux package maintainers, etc.) do represent what I
consider as a good representation of what our users are or like to
have.

About the migration path, we should not forget our PHP5 lessons. All
Andi is trying to do was what was done with PHP5. Many cleanups have
not been done for the sake of BC breaks and migration troubles. We
know now that it does not matter. Users migrate when they have to or
need to not just for the fun of it.

Finally, you are right to say that an opinion has little to do with
the commit karma.

Cheers,
--Pierre

18 years ago by Zeev Suraski — view source — reply

unread

At 00:55 18/07/2007, Pierre wrote:

My answer to Andi was not only about ereg but php6 in general (the
unicode flag being a much more important problem that ereg, for
example).

I fully agree with you. Each individual here does not represent the
user base but only a relative small part.

However, my problem here is not about that but about the respect of
our voices. It is understandable that you think to have a brighter
customers base, it is not necessary the case. not historically and not
practically. Conferences attendees are also a very small part of our
users.

All in all, internals developers, with their customers, coworkers or
users (Ez, PEAR, linux package maintainers, etc.) do represent what I
consider as a good representation of what our users are or like to
have.

I think that they're still quite far away from a real coverage of the
entire userbase. Each of them sees a certain part of the userbase
through a different prism. I think that some of us get to see people
through some more prisms than others, and you may very well be one of
them - but they are still prisms, and I think that most of us don't
get to meet some of the lower 'average' developers. The ones that
don't respond to blogs, go to conferences, let alone participate in
internals@. The ones who constitute the vast majority of PHP
developers around the world - those using it to get their job done.
If you noticed, I didn't just speak about the users that I meet, but
trying to put myself in the average user's place using a simple
thought experiment. I think using this approach (the famous 'WTF
factor' is a part of that) helped PHP tremendously and was one of the
key reasons for its success.
That's why I'm pretty confident you'd get a very different (much more
balanced) view of the world if you ask the question in a more neutral
environment - such as php-general (and even that list arguably
includes people with above-average interest in PHP - given that we're
talking about millions of developers and only thousands of
subscribers). Can I realize, from an end-user's point of view, why
the removal of a certain feature that I'm using would help me? Or
will it be much easier for me to imagine the pain involved with
working around it?

Other than the theological views some people on this list have
(either very pro-BC or anti-BC), what did keeping BC cost us?

About the migration path, we should not forget our PHP5 lessons. All
Andi is trying to do was what was done with PHP5. Many cleanups have
not been done for the sake of BC breaks and migration troubles. We
know now that it does not matter. Users migrate when they have to or
need to not just for the fun of it.

I think we're learning very different lessons from the same facts.

PHP 5 migration stalled because of several reasons, the key of which
are (IMHO):

Misperception about the level of compatibility breakage.
Correct perception that moving to PHP 5 requires a full QA cycle
of your entire codebase with full code coverage (assuming you're
running a critical app that you can't afford to break, which needless
to say thousands and thousands of users do); And contrary to popular
belief, that's actually a very very big deal.

In the shared hosting arena there's supposedly also lack of support
for PHP 5 deployment, although the big hosters I've been in touch
with have provided PHP 5 support (as an option) a couple of months
after its release, so I'm not sure how much this had to do with it.

Is the lesson we should learn that we need to turn #1 into a correct
perception, requiring substantial changes and potentially a full code
audit, and make the migration much more difficult? Would we ever be
able to discontinue PHP 5 if migrating to PHP 6 is a truly tough
task, like we just did with PHP 4?

The less undue compatibility breakage we introduce the better. I
hope we can agree on that - turning the discussion into what's
exactly 'due' and what is 'undue'.

IMHO - if we remove the unicode=off mode, we'll have to support PHP 5
(unlike we supported PHP 4 with bugfixes only for the most part - but
with true backporting of all key features, apps & frameworks running
properly on both versions, etc.) or seriously risk losing our
userbase. Given that we managed to nail it fairly well already, I
can't understand why we would want to do that and increase the
chances of PHP 6 being a flop quite significantly.

Zeev

18 years ago by Keryx Web — view source — reply

unread

Zeev Suraski wrote:

Other than the theological views some people on this list have
(either very pro-BC or anti-BC), what did keeping BC cost us?

Hey that must be me he is talking about - as I am a real theologian!

So for a theologians 2c on Unicode:

Teaching unicode and PHP

As stated elsewhere I am working as a teacher. I follow this list for
one main purpose and that is I am trying to remedy the extremely sad
situation when it comes to books and other teaching material about PHP
in Sweden. All books we have got by Swedish authors are so bad that I
actively discourage people from reading it!

I am trying to write an "advanced newbie" book that will focus on PHP 6
(+ some HTML 5, CSS 3 and JS 2), with an emphasis on best practice.

In Sweden we can do nicely with iso-8859-1 (we do not even need the
stinkin' euro-symbol!) But I have students that have developed websites
in Arabic, Kurdish and Hindi!

I am appalled to see some comments even seemingly questioning if Unicode
is worthwhile at all. That's a no brainer! i18n is the next big move on
the web. But what technique would be easier to grasp when it comes to
"switching" it on or off? Considering that PHP:s main strength always
has been its low entry barrier, I think this is a reasonable
consideration. And maybe I am the only one on this list that deals daily
with newbies...?

From this POV I would definitely say that it would be easier to teach
that in PHP 6 unicode is always on and in PHP 5 it's N/A. I do however
find the arguments compelling that such an ideal would be impractical.

My second best option would be something that can be turned on or off
within the scripts, i.e. with ini_set or per directory with .htaccess

From the low end user perspective I think this would be great from
another POV. Let's imagine for a second that Wordpress will only work
with unicode semantics off and that phpBB will only work with the switch
"on". What if someone would want to run both on a shared server?

But as my "commit karma" is zero I do not know if this is feasible at all.

User base.

There is not one voice on this list as far as I can tell that is from
the CJK-language hemisphere. Is it part of the PHP way to Europe/America
ethnocentric?

I think it would be a noble thing to actively try to engage PHP
developers from Asia in this discussion. (Well, besides the Israeli
ones... who are doing a great job!)

Adoption rate.

When PHP 5 was new we got two books in Sweden claiming to teach this
version. When I read them there was so little PHP 5 in there that it was
scary. Even today most resources that newbies read tend to teach PHP 4.
Most discussion fora - at least in Sweden - discuss PHP 4 solutions to
peoples problems.

This spring I actually taught my students PDO - but then my wife got ill
and had a heart transplant. When I got back to school and started
grading my students work, all but two had switched to the mysql
extension. I asked why, and all said that they had found tutorials and
help in a discussion forum, all teaching the old way.

I undertook a study: All four totally dominant sites in Sweden where a
young developer would turn, all teach PHP 4. (Two of them also teach
table-based-layout, unsemantic, inaccessible, proprietary HTML and
obtrusive browser-sniffing old school DHTML.)

Conclusion: Every advance in PHP internally has to be communicated to us
who teach PHP and the easier something is, the more likely it is that it
will be picked up.

Lars Gunther

18 years ago by Tomas Kuliavas — view source — reply

unread

From the low end user perspective I think this would be great from
another POV. Let's imagine for a second that Wordpress will only work
with unicode semantics off and that phpBB will only work with the switch
"on". What if someone would want to run both on a shared server?

from httpd.conf

<Directory /var/www/example.org/www/phpbb>;
php_admin_flag unicode.semantics on
</Directory>
<Directory /var/www/example.org/www/wp>;
php_admin_flag unicode.semantics off
</Directory>

Code written to work in unicode.semantics = off, can work in
unicode.semantics=on. It just has to deal with functions that expect
binary strings instead of PHP5 strings. Other side effects of
unicode.semantics=on can be switched off without breaking backwards
compatibility.

--
Tomas

18 years ago by Jani Taskinen — view source — reply

unread

From the low end user perspective I think this would be great from
another POV. Let's imagine for a second that Wordpress will only work
with unicode semantics off and that phpBB will only work with the switch
"on". What if someone would want to run both on a shared server?

from httpd.conf

<Directory /var/www/example.org/www/phpbb>;
php_admin_flag unicode.semantics on
</Directory>
<Directory /var/www/example.org/www/wp>;
php_admin_flag unicode.semantics off
</Directory>

Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options.
Live and learn I guess. :)

Too bad it only works for Apache module.. ;)

--Jani

18 years ago by Stanislav Malyshev — view source — reply

unread

Too bad it only works for Apache module.. ;)

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com

18 years ago by Pierre — view source — reply

unread

Too bad it only works for Apache module.. ;)

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

Yes, merge htscanner (pecl) into the core (sapi hooks or something
like that). Doing so will also kill the couple of limitations due to
the init order in php. It is on my todos, but I would appreciate any
help :)

Cheers,
--Pierre

18 years ago by Alexey Zakhlestin — view source — reply

unread

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

FastCGI users already can have their own php.ini for every application

--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/

18 years ago by Pierre — view source — reply

unread

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

FastCGI users already can have their own php.ini for every application

Having 100 FCGI only because you have 100 different config is not an option.

--Pierre

18 years ago by Richard Lynch — view source — reply

unread

I think on Windows you can do something with the registry per-dir
too.
On unix there's no registry though. Maybe we need some generic
solution
to this (like for FastCGI users)? Anybody has good ideas?

FastCGI users already can have their own php.ini for every application

Perhaps the OP just needs a link to a good HowTo FastCGI reference...

http://www.fastcgi.com/docs/faq.html#PHP

It would be nice if it were a bit more specific about the CLI install
hack...

Or if PHP out of the box compiled --with-fastcgi as a different binary
name so there was no hack... :-v

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Richard Lynch — view source — reply

unread

From the low end user perspective I think this would be great
from
another POV. Let's imagine for a second that Wordpress will only
work
with unicode semantics off and that phpBB will only work with the
switch
"on". What if someone would want to run both on a shared server?

from httpd.conf

<Directory /var/www/example.org/www/phpbb>;
php_admin_flag unicode.semantics on
</Directory>
<Directory /var/www/example.org/www/wp>;
php_admin_flag unicode.semantics off
</Directory>

Hmm..I forgot that this works for ZEND_INI_SYSTEM type of options.
Live and learn I guess. :)

Too bad it only works for Apache module.. ;)

Maybe I'm being stupid, but why would this "work" when .htaccess isn't
supposed to work for Unicode on/off because it would require too much
gnarly ifdef-type code in PHP source?

Maybe this doesn't really really work at all and it's going to be a
problem?

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Jani Taskinen — view source — reply

unread

one main purpose and that is I am trying to remedy the extremely sad
situation when it comes to books and other teaching material about PHP
in Sweden. All books we have got by Swedish authors are so bad that I
actively discourage people from reading it!

Perhaps you should teach the students english? And encourage them to
read english books which are widely available.. :D

I really thought most swedes do learn english in school? Like we finns
do.. :)

another POV. Let's imagine for a second that Wordpress will only work
with unicode semantics off and that phpBB will only work with the switch
"on". What if someone would want to run both on a shared server?

Very good point.

--Jani

18 years ago by Derick Rethans — view source — reply

unread

At 00:21 17/07/2007, Pierre wrote:

I disagree with this view of the world.

Well, we seem to all agree on this view, but let forget this
unsignificant fact :)

I wanted to send my 2c even though I'm not really involved in internals@ any
longer - because in reality it doesn't really have much to do with such
decisions. internals@ makes decisions that effect the entire PHP userbase.

We all need to remember that the people on this mailing list are not close to
something that represents the userbase. We do have some very opinionated
people on this list, some of them with a lot of commit-karma - which are not
very open to feedback from regular users.

This sounds like a broken record, this sounds like a broken record, this
sounds like a broken record. I've heard this so many times now, it
get's boring. You seem to think that none of the people on the internals
list are part of the user base - that is incorrect. Most of my opinions
come forth out of my involvement with an extremely large code base.

I'm not saying I represent the PHP userbase, and I don't think Andi is
saying this either - but both of us try to take the end user's view
when we think about stuff like this, as opposed as the internals@ PHP
developer view. I would go as far as saying that I think we do it (as
well as some others, like Rasmus) more so than some others on this
list.

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.

For that reason I suspect that if you moved the discussion to, say,
php-general - you'd see a much more balanced view of the world.

I really doubt that, as that list does not include many people that use
PHP for internal projects. It's mostly the "geeks" that have time to
discuss on the list. I know that many PHP users don't either know
about this list, or simply can't be bothered with it.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5.

I don't think anybody wanted to remove it in PHP 5 - just make it
possible to disable as an extension.

regards,
Derick

Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org

18 years ago by Jani Taskinen — view source — reply

unread

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5.

I don't think anybody wanted to remove it in PHP 5 - just make it
possible to disable as an extension.

I guess it was misunderstood: All the talk about it concerns HEAD only,
not PHP 5. But I will MFH the move to ext in PHP_5_3 though. Helps
future merges around when the changes are in both branches.

--Jani

18 years ago by Zeev Suraski — view source — reply

unread

At 01:20 18/07/2007, Derick Rethans wrote:

This sounds like a broken record, this sounds like a broken record, this
sounds like a broken record. I've heard this so many times now, it
get's boring.

I'm not surprised, but it doesn't change the fact that it's true, though.
No matter how many times this will be discussed or disputed, the more
we break - the harder it is for our users to move. It's an axiom,
and we have to live with it, even if it gets easy to repress it and
take all sorts of opportunities for an end-of-the-season
compatibility breakage sale.

You seem to think that none of the people on the internals
list are part of the user base - that is incorrect. Most of my opinions
come forth out of my involvement with an extremely large code base.

I didn't say that, I did say that they (myself included) don't
represent the PHP userbase at large and I fully stand behind that statement.
Read my other post from a couple of minutes ago for an explanation as
to what I mean.

I'm not saying I represent the PHP userbase, and I don't think Andi is
saying this either - but both of us try to take the end user's view
when we think about stuff like this, as opposed as the internals@ PHP
developer view. I would go as far as saying that I think we do it (as
well as some others, like Rasmus) more so than some others on this
list.

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.

Well, I tend to agree with you that they shouldn't have to handle
BOTH modes (write code that works with both settings). But they will
definitely be better off if they can choose one of these modes and
develop/deploy for it.

For someone for whom PHP 6 is a non-item (no interest in Unicode),
moving to PHP 6 and being forced to audit his code will be a
completely unreasonable cost of migration. A clear 'not worth it' situation.

For that reason I suspect that if you moved the discussion to, say,
php-general - you'd see a much more balanced view of the world.

I really doubt that, as that list does not include many people that use
PHP for internal projects. It's mostly the "geeks" that have time to
discuss on the list. I know that many PHP users don't either know
about this list, or simply can't be bothered with it.

You know what, I agree. I wrote something to that effect in my post
from a few minutes ago. The vast userbase is mostly comprised of
people we hardly even get to see.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5.

I don't think anybody wanted to remove it in PHP 5 - just make it
possible to disable as an extension.

Great, I misunderstood.

Zeev

18 years ago by johannes@php.net — view source — reply

unread

Hi Zeev,

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.

Well, I tend to agree with you that they shouldn't have to handle
BOTH modes (write code that works with both settings). But they will
definitely be better off if they can choose one of these modes and
develop/deploy for it.

For someone for whom PHP 6 is a non-item (no interest in Unicode),
moving to PHP 6 and being forced to audit his code will be a
completely unreasonable cost of migration. A clear 'not worth it' situation.

The question here in my opinion is: How much harm should we do to users
who develop new things in order to make lives simpler for these who need
BC.

The first thing I see is: Having these two modes is a pita for everybody
who wants to write portable code. The modes act different depending on
that switch, some parts of PHP work quite different, some of these
changes can be worked around in a quite simple way others not that easy
but still possible. (since the engine still knows unicode and you still
can make it all think there's some more unicode in there) But for a new
application it's imo bad to need such compatibility hacks.

If you want clean code you might concentrate on one of these two modes -
but which? The faster oder the better? Well, that depends on what
hoster's will configure, but how should they know?

For hosters it's hard to decide which road to go. Offer both? - Offering
both is, from the complexity, the same as hosting PHP 5 and PHP 6 since
unicode.semantics is PHP_INI_SYSTEM, meaning you need independent PHP
instances (FastCGI, individual hosts, whatever) Another possibility is
offering just PHP 6 with unicode.semantics Off. In my opinion a hoster
doing that might not advertise offering PHP 6 with that mode off since
it's only offering half of PHP 6 (namespaces, gote, maybe LSB, ...) or
offer PHP 6 + unicode and PHP 5 for BC. For me this feels like the most
sane way by the means of BC - on the one hand you have the full BC by
using PHP 5 on the other hand you're offering full PHP 6 for the ones
who need this feature.

Talking about BC: Except for the unicode stuff PHP 6 will most likely
have around the same amount of BC breaks as PHP 5 had compared to PHP 4.
(there are already a few tiny ones, like you can't call your functions
"goto" anymore and such stuff). PHP 5 offers an compatibility mode for
PHP 4, the benefit of that mode, compared to PHP 6's BC mode was that
one might change it even at runtime. What might help doing the migration
(while making the code ugly but hopefully such hacks are temporarily)

Another argument for that setting I read was performance. I didn't do
proper benchmarks of the code comparing both modes so I don't know how
relevant the impact is but if performance of the unicode mode really is
a big problem for most users we are really going to have a big problem
since then we have to keep the mode forever and I, who can really live
with using ISO-8859-1, am wondering whether it really makes sense to
change half the engine for a mode which is too slow for most cases and
only needed by a minority of users (some mentioned in these discussions
numers like 10 % unicode mode on, 90% off ...) and whether it won't be
better do concentrate on the intl and mbstring extensions to improve the
tools for the ones needing better support in the area without harming
most users. But well, as said: Here I'm just wondering after reading the
previous discussions.

This all gives me the conclusion that we really should consider removing
the mode, but well, that's my opinion.

As for ereg - especially in light of the discontinuation of PHP 4 we
shouldn't even consider removing it in PHP 5.

I don't think anybody wanted to remove it in PHP 5 - just make it
possible to disable as an extension.

Great, I misunderstood.

This gives me the possibility to come back to the original topic of this
thread, which wasn't about the unicode.semantics mode: Since I think we
should remove that setting I think we should disable ereg with PHP 6
since for what I know ereg won't work with unicode data. Regular
expressions which won't work on the main data type are pointless in my
opinion.

Besides that there are two other reasons I see:

ereg functions are marked as deprecated for ages so user's should be
prepared
ereg functions aren't binary safe - most cases where I've seen them
where most likely insecure since people didn't know you can bypass
ereg-based input checking by inserting nullbytes so removing these
helps writing more secure code

In most cases a workaround, by PHP_Compat or something, can be offered
by escaping slashes in the pattern, adding slashes as delimiters and
give that to preg - this won't work in all cases but I'm sure it works
in most cases.

Ah, another thing kind of related to this thread: We really need a
proper way of having decisions declared as being made. Recently it
happened quite often that many developer's thought some decision was
made (for example from reading the Paris meeting notes) and then some
developers come and say there wasn't anything finally decided, yet. But
imo it's important to decide some things (like removal of possibly often
used functionality) soon so user's can be informed and prepare their
code and developers here can spent time on theses tasks knowing that
they are following decisions. Maybe this should discussed independently
from this thread - but it's a good example for the need... (while there
might be reasons to change the decision - but that shouldn't happen too
often)

johannes

18 years ago by Rasmus Lerdorf — view source — reply

unread

Derick Rethans wrote:

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.

Have you guys really thought this through?

Let's look at this from two angles.

First, from the our perspective maintaining and developing PHP. Without
the Unicode switch, and as has already been suggested, PHP 5 will never
die. Anything new in PHP 6 that isn't related to Unicode will be
backported to PHP 5. Or, a slight variation of that, any developer with
no interest in Unicode will only work on the PHP 5 branch and not bother
worrying about whether it works in PHP 6 forcing others to do that work.
I don't think we have the resources to do this, and I think it is
likely to either create 2 classes of developers and potentially
diverging trees, or it may simply kill off the Unicode effort altogether
if not enough developers bother looking at PHP 6 since PHP 5 will live
forever and is free of all this annoying Unicode stuff that is just too
complicated to deal with.

Second, from the user space PHP developers' perspective. There are two
groups of those out there. There is the group that builds apps for
controlled environments. Yahoo, Facebook, and the hundreds, if not
thousands of smaller companies out there that will define a certain PHP
configuration and code against that. To them such a switch isn't a big
deal except when it comes to re-using external code. Which bring us to
the second group which is the group that strives to build portable apps
designed to run on as many unknown PHP configs as possible. This is the
group that will get hit by this, and here is where we need to figure out
how to cause them the least amount of pain. They are going to feel some
pain in order to get their heads around Unicode no matter how we handle
this. For the portion of these folks who don't want to worry about
Unicode at all and they actually have code that does stuff on binary
strings that will break, their stuff just won't work no matter what we
do. The difference comes down to whether it gets marked as PHP5-only or
it gets marked as non-Unicode-only. And the other camp who do want to
make sure their stuff supports Unicode will need to write the Unicode
and non-Unicode versions and check to see if the system they are running
on supports Unicode or not. Whether they check the PHP version number,
or the Unicode switch, or probe directly for the features they need, it
ends up being about the same amount of pain.

What may be somewhat lost in all this, that I hope nobody here is
forgetting, is that smooth Unicode support is really important. Being
able to work directly in your native charset with your native strings
without having to deal with iconv and other crap is the goal here. And
let's also not forget that a lot of code will actually work unchanged in
PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't
before. I would love to see all this energy put toward making sure as
much code as possible falls into this category instead of arguing about
where to put the Unicode switch. It's still a switch whether you put it
in the version number or in the .ini file. In the version number it is
simply easier for people to ignore from all sides or the discussion
here, but where does that leave us 4 years from now?

Perhaps the real argument here is whether we should be doing Unicode at all?

-Rasmus

18 years ago by William A. Rowe — view source — reply

unread

Rasmus Lerdorf wrote:

Perhaps the real argument here is whether we should be doing Unicode at all?

I've watched these debate with tremendous interest. i18n is one of my
pure 'hobbies' (my 'clients' are all quite happy with ISO-8859-1, and
one of my backgrounds is WinNT where everything became unicode within
the OS.)

I'm pondering if utf-8 as the 'default' encoding wouldn't have been a
more effective approach than pure unicode wide-chars, but no matter how
you slice this, there will be several points of pain in the transition.

Rethinking in terms of utf-8 might be an interesting exercise, just to
draw up a comparison of 'what is broken' when sliding between a PHP5 ISO
charset and a PHP6 Unicode or utf-8 charset.

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Rasmus Lerdorf wrote:

Derick Rethans wrote:

Regarding the unicode on/off modes, I don't think you put yourself in
the developer's view at all. Users are not going to be better of having
to deal with both modes.

Have you guys really thought this through?

Let's look at this from two angles.

First, from the our perspective maintaining and developing PHP. Without
the Unicode switch, and as has already been suggested, PHP 5 will never
die. Anything new in PHP 6 that isn't related to Unicode will be
backported to PHP 5. Or, a slight variation of that, any developer with
no interest in Unicode will only work on the PHP 5 branch and not bother
worrying about whether it works in PHP 6 forcing others to do that work.
I don't think we have the resources to do this, and I think it is
likely to either create 2 classes of developers and potentially
diverging trees, or it may simply kill off the Unicode effort altogether
if not enough developers bother looking at PHP 6 since PHP 5 will live
forever and is free of all this annoying Unicode stuff that is just too
complicated to deal with.

Second, from the user space PHP developers' perspective. There are two
groups of those out there. There is the group that builds apps for
controlled environments. Yahoo, Facebook, and the hundreds, if not
thousands of smaller companies out there that will define a certain PHP
configuration and code against that. To them such a switch isn't a big
deal except when it comes to re-using external code. Which bring us to
the second group which is the group that strives to build portable apps
designed to run on as many unknown PHP configs as possible. This is the
group that will get hit by this, and here is where we need to figure out
how to cause them the least amount of pain. They are going to feel some
pain in order to get their heads around Unicode no matter how we handle
this. For the portion of these folks who don't want to worry about
Unicode at all and they actually have code that does stuff on binary
strings that will break, their stuff just won't work no matter what we
do. The difference comes down to whether it gets marked as PHP5-only or
it gets marked as non-Unicode-only. And the other camp who do want to
make sure their stuff supports Unicode will need to write the Unicode
and non-Unicode versions and check to see if the system they are running
on supports Unicode or not. Whether they check the PHP version number,
or the Unicode switch, or probe directly for the features they need, it
ends up being about the same amount of pain.

What may be somewhat lost in all this, that I hope nobody here is
forgetting, is that smooth Unicode support is really important. Being
able to work directly in your native charset with your native strings
without having to deal with iconv and other crap is the goal here. And
let's also not forget that a lot of code will actually work unchanged in
PHP 6 Unicode-mode and suddenly be Unicode-capable where they weren't
before. I would love to see all this energy put toward making sure as
much code as possible falls into this category instead of arguing about
where to put the Unicode switch. It's still a switch whether you put it
in the version number or in the .ini file. In the version number it is
simply easier for people to ignore from all sides or the discussion
here, but where does that leave us 4 years from now?

I guess the question (which I am unable to answer) is if its easier to
maintain PHP6 with the switch or be forced to backport to PHP5 without
the switch in PHP6. If it does end up that a lot of devs prefer to work
on PHP5 and as a result PHP6 is left dangling, I wonder if with the
switch things will be any easier as devs will work/test only the non
unicode side of things? I think this was the key point that was brought
up that it will not be easier and instead was deemed more error prone to
handle all the if's in a single tree, versus have a "clean" separation.

Also I wonder how a unicode on/off switch will be handled on the
documentation side. It would add more permutations in the documentation
to have the switch. From my understanding the situation is fairly non
trivial already in how to handle all the version dependent differences.
Philipp, whats your take on this?

regards,
Lukas

18 years ago by Philip Olson — view source — reply

unread

<snip> > Also I wonder how a unicode on/off switch will be handled on the > documentation side. It would add more permutations in the > documentation to have the switch. From my understanding the > situation is fairly non trivial already in how to handle all the > version dependent differences. Philipp, whats your take on this?

I don't think it matters for documentation because both routes have
hurdles and planning requirements. But, it's exciting that we're
worrying about this because it's time we educate the world to
understand why unicode is useful, and why it's needed today. Andrei
asked the documentation team to start the unicode documentation
process long ago but given that nobody knows what PHP 6 will be, it
makes that tough so we've (for time reasons too) done little.
However, each function has a unicode section dedicated to it and
general unicode feature sections planned. I don't know if a PHP 6
version of the manual would be a good route to take but it's possible
although I prefer shoving information into a users face, both past
and present, so said user knows what to look for and worry about in
all directions. Each function now has a changelog for that.

In reply to removing the directive, I fear that PHP 6 would be
discussed as === PHP 5 + Unicode when this won't be true... yet this
idea could persist and cause confusion so let's be sure everyone
realizes this from day #.01. It's the main new (and big) feature
only, so that's all we can promise. And in this scenario please
decide what PHP 7 could be. Would we have 5/7, 7/8, or just 7 with
unicode. In other words, coupled PHP versions forever? Or just once.
And regardless, we need an effective marketing strategy via PHP.net
that does not solely rely on third parties, word of mouth, or PHP's
greatness like we've done in the past. This includes the website and
documentation, and this includes strong efforts by everyone. Like,
explaining ways to be forward compatible. And perhaps PHP 6 will
bring with it a new web design, with pictures of little children from
all around the world happily holding hands... :-)

So unless something truly innovative seeps up (maybe it has) then
stealing ideas from other languages experience and growing pains
(like Python and Java) sounds good. If a document existed that
compared the situation in many programming languages, the pros and
cons, that would be great and might shed light in many of the right
places. At least, for me. And/or an update deciphering where we're at
after all these lengthy unicode threads. If it's time to go old
school with two sides presenting official statements/arguments, then
a vote, then so be it. But I don't feel we're quite there yet.

Regards,
Philip

18 years ago by Jani Taskinen — view source — reply

unread

What may be somewhat lost in all this, that I hope nobody here is
forgetting, is that smooth Unicode support is really important. Being

Smooth it will be only if it's the only option. Otherwise it's just PITA
for both the camps. I'm all for unicode support as long as it's always
there.

where to put the Unicode switch. It's still a switch whether you put it
in the version number or in the .ini file. In the version number it is
simply easier for people to ignore from all sides or the discussion
here, but where does that leave us 4 years from now?

With a bone in hand? ;) Or most likely with actually working PHP with
full Unicode support rather than half-assed one..

Why not just rename the beast to uPHP. :D

--Jani

18 years ago by Larry Garfield — view source — reply

unread

Second, from the user space PHP developers' perspective. There are two
groups of those out there. There is the group that builds apps for
controlled environments. Yahoo, Facebook, and the hundreds, if not
thousands of smaller companies out there that will define a certain PHP
configuration and code against that. To them such a switch isn't a big
deal except when it comes to re-using external code. Which bring us to
the second group which is the group that strives to build portable apps
designed to run on as many unknown PHP configs as possible. This is the
group that will get hit by this, and here is where we need to figure out
how to cause them the least amount of pain. They are going to feel some
pain in order to get their heads around Unicode no matter how we handle
this. For the portion of these folks who don't want to worry about
Unicode at all and they actually have code that does stuff on binary
strings that will break, their stuff just won't work no matter what we
do. The difference comes down to whether it gets marked as PHP5-only or
it gets marked as non-Unicode-only. And the other camp who do want to
make sure their stuff supports Unicode will need to write the Unicode
and non-Unicode versions and check to see if the system they are running
on supports Unicode or not. Whether they check the PHP version number,
or the Unicode switch, or probe directly for the features they need, it
ends up being about the same amount of pain.

Disclaimer again: PHP commit karma of 0, PHP development karma of some
positive integer, PHP support karma of "depends if you like gophp5.org or
not". :-)

Permit me to offer a concrete example. I am a Drupal developer; that is, I
work on the Drupal CMS core and also get paid to build sites with Drupal
professionally. Drupal has made a huge push for internationalization in the
past year and a half or so. It's currently UTF-8 through and through,
complete with user-space UTF-8-safe implementations of various string
manipulation functions. Native Unicode support would be awesome.

Drupal is used by a huge number of people on dedicated boxes where they
control the environment. It's also used by an even huger number of people on
shared hosts where they get almost no control over the environment. Right
now it handles both quite well, under PHP 4.3.6-5.2.3. (PHP 4 to be dropped
in version 7.)

Now, when PHP 6 is released we are going to want to be able to run in PHP 6,
and likely at some point in the future switch to PHP 6 only just as we're now
(finally) moving to PHP 5 only. That means that, for a time, we'll have to
be able to run with the same code base on PHP 5 and PHP 6.

A great many people will want to run it on a PHP 6 unicode=on server, so they
can leverage native Unicode support. A great many people will want to run it
on shared hosts, which means either PHP 5 or PHP 6 unicode=off (because I
don't expect shared hosts to default to unicode=on any more readily than they
accepted the default of register_globals=off). And unlike register_globals,
it won't be something we can change in the

So there will be a prolonged period where we will have to be able to run on
PHP 5.2, PHP 6 unicode=off, and PHP 6 unicode=on, even if we don't explicitly
use PHP 6-only features yet. Simply excluding one of those three completely
will not be a viable option. Maintaining two or three separate trees is also
not an option. We simply don't have anywhere close to the resources to do
that. (Plus Drupal is a plugin-based system, and asking plugin authors to do
that is completely unreasonable.)

So, just how much hair should we plan to pull out in order to make that
happen? That's the million dollar question for me, and for, I suspect, most
of the open source application developers out there. How can we minimize
that hair loss?

Right now I really don't know what the answer is. That's why I'm asking the
question, because as C is really not a comfortable language for me anymore I
have little ability to affect it directly.

--
Larry Garfield AIM: LOLG42
larry@garfieldtech.com ICQ: 6817012

"If nature has made any one thing less susceptible than all others of
exclusive property, it is the action of the thinking power called an idea,
which an individual may exclusively possess as long as he keeps it to
himself; but the moment it is divulged, it forces itself into the possession
of every one, and the receiver cannot dispossess himself of it." -- Thomas
Jefferson

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always need to
remember that BC breakage accumulates, and it's not binary. Every
cleanup we do in PHP 6 will further slow migration, and as Andi pointed
out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC breaks.
Doing this better this time (the migration guides are a good start,
porting some major apps and documenting the issues is another) could
help us easy the transition as well. But as you point out, there is the
fixed overhead of having to do the QA'ing at any rate.

regards,
Lukas

18 years ago by Pierre — view source — reply

unread

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always need to
remember that BC breakage accumulates, and it's not binary. Every
cleanup we do in PHP 6 will further slow migration, and as Andi pointed
out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC breaks.
Doing this better this time (the migration guides are a good start,
porting some major apps and documenting the issues is another) could
help us easy the transition as well. But as you point out, there is the
fixed overhead of having to do the QA'ing at any rate.

What we really screwed up are the breakages after 5.0, between 5.0
and now. Every one expects changes and breakages between two major
major versions, no matter the language.

--Pierre

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Pierre wrote:

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always
need to
remember that BC breakage accumulates, and it's not binary. Every
cleanup we do in PHP 6 will further slow migration, and as Andi pointed
out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC breaks.
Doing this better this time (the migration guides are a good start,
porting some major apps and documenting the issues is another) could
help us easy the transition as well. But as you point out, there is the
fixed overhead of having to do the QA'ing at any rate.

What we really screwed up are the breakages after 5.0, between 5.0
and now. Every one expects changes and breakages between two major
major versions, no matter the language.

True that ... the way E_STRICT was handled did not help either. Still
looking forward to E_DEPRECATED.

regards,
Lukas

18 years ago by Zeev Suraski — view source — reply

unread

At 04:47 18/07/2007, Lukas Kahwe Smith wrote:

Zeev Suraski wrote:

Finally, at the risk of sounding like a broken record, we always
need to remember that BC breakage accumulates, and it's not
binary. Every cleanup we do in PHP 6 will further slow migration,
and as Andi pointed out a few days ago, things don't look too well as it is.

Agreed, its not binary, but its also not the simple addition of all
issues either. The effort does diminish as you can cover multiple BC
breaks in one going over your code. The key thing that we screwed up
with PHP 5.x was not providing enough documentation on the BC
breaks. Doing this better this time (the migration guides are a good
start, porting some major apps and documenting the issues is
another) could help us easy the transition as well. But as you point
out, there is the fixed overhead of having to do the QA'ing at any rate.

Well I don't think it really diminishes, but I agree that 1+1 is
maybe 1.9 and not 2. On the other hand, if you remember that
perception is everything (or at least very important), 1+1 can easily
be perceived as 3, and in a negative sense.

Zeev

18 years ago by Pierre — view source — reply

unread

Well I don't think it really diminishes, but I agree that 1+1 is
maybe 1.9 and not 2. On the other hand, if you remember that
perception is everything (or at least very important), 1+1 can easily
be perceived as 3, and in a negative sense.

Exactly. And many people lost much more time to hunt down "smaller"
things like the "Indirect modification of overloaded property.." or
the numerous other annoying (but sometimes required) changes. And
those means 1+1+1+1=2^32/F* php for most of them.

A dropped extension, function or feature, when known (and done) soon
enough, is by far easier (planning is possible, migration, etc.).

--Pierre

18 years ago by Richard Lynch — view source — reply

unread

I also was thinking the other day, like Ze'ev, that PHP Devs aren't
really in touch with the unwashed masses of the userbase...

There are a zillion websites "out there" that run on shared hosts with
copy/pasted code and all these scripters will get burned big-time if
ereg is suddenly unavailable.

They don't really care about PCRE versus POSIX, so long as they can
get the job done.

I suspect all the shared webhosts will just install ereg once they
figure out that their users who never re-factor need it, but they'll
be pretty cranky with you for nuking it and making them jump through
an extra hoop to bring it back.

And all the distro package-maintainers will probably just bundle it
right into their packages.

And there will be tutorials on how to compile PHP with ereg in it, or
how to add it back into windows, or how to install PECL ereg.

So just yanking ereg will cause a fair amount of grief, followed by
the dubious benefit of thousands of users figuring out how to install
a PECL module.

Any gurus really offended by ereg can --disable-ereg or whatever it
is, no?

At least just spit out an E_DEPRECATED in PHP 6, and move it to PECL
in PHP 7.

Give people enough warning that it's going away before nuking it, so
that you can at least say "You've been warned for a whole major
release that it was going away."

I suspect you'll still end up with people just installing it rather
than re-writing their code, though, so it's not serving any real
purpose to any real users to move it.

The people who need to use PCRE exclusively can do that already.

The people who need their legacy code to work will just have to jump
through an extra hoop.

What purpose is served, then, in moving ereg out? None, really.

PS
I'm working on the PostgreSQL POSIX->PCRE patch, as I don't think PHP
itself should need ereg.

--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

18 years ago by Lukas Kahwe Smith — view source — reply

unread

Richard Lynch wrote:

Any gurus really offended by ereg can --disable-ereg or whatever it
is, no?

So in a dream world, Rasmus would have shipped all the features of PHP
42 as his first release.

In a slightly less dreamy, but still unrealistic world, we would have
infinite development resources to maintain all the BC hacks in the world.

In reality, we have limited resources, so its not about being
"offended", its about yet another extension that is redundant that needs
to be supported. This is the point with a lot of this. How do we set the
priorities in managing the scarce resources. For the most part, this is
pretty automatic: whatever people do is what we priorities, the other
stuff is left for someone else to pick up if they care. Obviously it's
not quite that extreme, since there are several people that are willing
to do stuff they do not need (or they have a company sponsoring them),
just to move PHP forward.

regards,
Lukas

unicode=off doesn't mean no unicode support, btw.

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

regards,
Derick

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".

POSIX regex

unicode=off doesn't mean no unicode support, btw.

I think on Windows you can do something with the registry per-dir too. On unix there's no registry though. Maybe we need some generic solution to this (like for FastCGI users)? Anybody has good ideas?

regards, Derick

It's different things. Casting means "create string as binary, then in runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in runtime cast it to unicode", u"" means "this string is unicode".

I think on Windows you can do something with the registry per-dir too.
On unix there's no registry though. Maybe we need some generic solution
to this (like for FastCGI users)? Anybody has good ideas?

regards,
Derick

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".

It's different things. Casting means "create string as binary, then in
runtime cast it to unicode", u"" means "this string is unicode".