All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.
I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.
The RFC is available here:
Comments welcome!
Zeev
hi,
All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.
This does not provide what I consider as the best compromise. An
optional per file/package strict mode and a fully compatible with
existing mode.
However here are some comments:
Integer STH (int):
“42.0” should not be accepted. This is a float not an integer. It
introduces edge cases I would rather avoid (precision setting f.e.,
imagine "42.000001" or "42.0001"?)
Boolean STH (bool):
this is by far too weak. How strings could be consider as valid, how?
"true" > Boolean true? I suppose then "false" will be boolean false?
What's is the boolean value of float 0.5?
At the very least only integer should be accepted, 0 > false, anything >=1 true
Changes to Internal Functions:
I am generally speaking against changing them by default, this is a
too big BC break.
This RFC is also not complete. A test should be provided to valid the
changes against existing applications. I suspect the impact may not be
as small as we think. I can be wrong here but tests will tell me how
wrong I could be :)
And finally, this RFC only proposes one solution, so competitive RFCs
are still required to actually represent alternatives.
Cheers,
Pierre
Pierre,
And finally, this RFC only proposes one solution, so competitive RFCs
are still required to actually represent alternatives.
That is a good thing. it should only propose one solution. Making a
single RFC proposing two solutions would be a MASSIVE mistake IMHO as
these proposals are complex and charged enough without trying to make
a voter read a single piece of text that goes back and forth between
two options.
We need to be simplifying, not making more difficult. Only about 25%
of my RFC is dedicated to the proposal. The other ~75% is dedicated to
summarizing the conversation and acting as a FAQ. That could well live
off-RFC if we used discussion pages or something of the like (which we
likely should).
Anthony
Pierre,
And finally, this RFC only proposes one solution, so competitive RFCs
are still required to actually represent alternatives.That is a good thing. it should only propose one solution. Making a
single RFC proposing two solutions would be a MASSIVE mistake IMHO as
these proposals are complex and charged enough without trying to make
a voter read a single piece of text that goes back and forth between
two options.We need to be simplifying, not making more difficult. Only about 25%
of my RFC is dedicated to the proposal. The other ~75% is dedicated to
summarizing the conversation and acting as a FAQ. That could well live
off-RFC if we used discussion pages or something of the like (which we
likely should).
I am not saying this is good or bad.
I only fear the php7-Zend like RFCs being done again with this one.
Let prevent that to happen. We need a consensus and well thought
compromises. I do not see any of that in that one.
--
Pierre
@pierrejoye | http://www.libgd.org
Zeev,
First off, thanks for putting forward a proposal. I look forward to a
patch that can be experimented with.
There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around code safety/security. In their view, the conversion rules proposed by Dynamic STH can easily allow ‘garbage’ input to be silently converted into arguments that the callee will accept – but that may, in many cases, hide difficult-to-find bugs or otherwise result in unexpected behavior.
I think that's partially mis-stating the concern. It's less about
"garbage input" and more about unpredictable behavior. You can't look
at code and know that it will not produce an error with dynamic
typing. That's one of the big advantages of strict typing that many
people want. In reality the reasons are complex, varied and important
to each person.
Proponents of Dynamic STH bring up consistency with the rest of the language, including some fundamental type-juggling aspects that have been key tenets of PHP since its inception. Strict STH, in their view, is inconsistent with these tenets.
Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages treatment of parameters.
However there's an important point to make here: a lot of best
practice has been pushing against the way PHP treats scalar types in
certain cases. Specifically around == vs === and using strict
comparison mode in in_array, etc.
So while it appears consistent with the rest of PHP, it only does so
if you ignore a large part of both the language and the way it's
commonly used.
In reality, the only thing PHP's type system is consistent at is being
inconsistent.
In the "Changes To Internal Functions" section, I think all three
types are significantly flawed:
-
"Just Do It" - This is problematic because a very large chunk of
code that worked in 5.x will all of a sudden not work in 7.0. This
will likely create a python 2/3 issue, as it would require a LOT of
code to be changed to make it compatible. -
"Emit E_DEPRECATED" - This is problematic because raising errors
(even if suppressed) is not cheap. And the potential for raising one
for a non-trivial percentage of every native function call has the
potential to have a MASSIVE performance impact for code designed for
5.x. Without a patch to test, it can't really be codified, but it
would be a shame to lose the performance gains made with 7 because
we're triggering 100's, 1000's or 10000's of errors in a single
application run... -
"Just Do It but give users an option to not" - This has the
problems thatE_DEPRECATED
has, but it also gets us back to having
fundamental code behavior controlled by an INI setting, which for a
very long time this community has generally seen as a bad thing
(especially for portability and code re-use).
Moving along,
Further, the two sets can cause the same functions to behave differently depending on where they're being called
I think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference is in your code, not the end function.
For example, a “32” (string) value coming back from an integer column in a database table, would not be accepted as valid input for a function expecting an integer.
There's an important point to consider here. You're relying on
information outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning when a non-integer value comes in. Because the
value-information comes from outside.
With strict mode, you'd have to embed a cast (smart or explicit) to
convert to an integer at the point the data comes in. So semantic
information about the value is places right at the point of entry
(forcing the code to be more explicit and clear).
Additionally, with the dual-mode proposal DB interactions can be in
weak mode and have the exact behavior you're describing here. Giving
the user the choice, rather than making assumptions.
Strict zval.type based STH effectively eliminates this behavior, moving the burden of worrying about type conversion to the user.
Correct. And you say that as if it's a bad thing. Being explicit about
type conversions isn't what you'd do in a 10 line-of-code script where
you can realize what the types are by just thinking about it. But on
large scale systems exposing the type conversions to the user gives
the power to actually understand the codebase when you can't fit the
whole thing in your head at the same time.
So what you cite here as a disadvantage many consider to be an advantage.
Performance
I find it funny how the non-strict crowd keeps bringing up performance...
It is our position that there is no difference at all between strict and coercive typing in terms of potential future AOT/JIT development - none at all
So really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC
or voting in any way. And is in no part in my RFC at all. Yet brought
up again.
Static Analysis. It is the position of several Strict STH proponents that Strict STH can help static analysis in certain cases. For the same reasons mentioned above about JIT, we don't believe that is the case
This is patently false. Keep not believing it all you want, but
static analysis requires statically looking at code. Which means you
have no value information. So static analysis can't possibly happen in
cases where you need to know about value information (because it's not
there). Yes, at function entry you know the types. But static analysis
isn't about analyzing a single function (in fact, that's the least
interesting case). It's more about analyzing a series of functions, a
function call graph. And in that case strict typing (based only on
type) does make a big difference.
In short, I think the concerns around the handling of internal
functions is significant enough to cause major concern about this
proposal.
Thanks
Anthony
All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.The RFC is available here:
Comments welcome!
Zeev
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 8:12 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
First off, thanks for putting forward a proposal. I look forward to a
patch
that can be experimented with.There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around code
safety/security. In their view, the conversion rules proposed by Dynamic
STH
can easily allow ‘garbage’ input to be silently converted into arguments
that
the callee will accept – but that may, in many cases, hide
difficult-to-find
bugs or otherwise result in unexpected behavior.I think that's partially mis-stating the concern.
I don't think it is, based
It's less about "garbage input"
and more about unpredictable behavior. You can't look at code and know
that it will not produce an error with dynamic typing. That's one of the
big
advantages of strict typing that many people want. In reality the reasons
are
complex, varied and important to each person.
Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent
with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages
treatment of parameters.However there's an important point to make here: a lot of best practice
has
been pushing against the way PHP treats scalar types in certain cases.
Specifically around == vs === and using strict comparison mode in
in_array,
etc.So while it appears consistent with the rest of PHP, it only does so if
you
ignore a large part of both the language and the way it's commonly used.In reality, the only thing PHP's type system is consistent at is being
inconsistent.In the "Changes To Internal Functions" section, I think all three types
are
significantly flawed:
"Just Do It" - This is problematic because a very large chunk of code
that
worked in 5.x will all of a sudden not work in 7.0. This will likely
create a
python 2/3 issue, as it would require a LOT of code to be changed to make
it
compatible."Emit E_DEPRECATED" - This is problematic because raising errors (even
if
suppressed) is not cheap. And the potential for raising one for a
non-trivial
percentage of every native function call has the potential to have a
MASSIVE
performance impact for code designed for 5.x. Without a patch to test, it
can't really be codified, but it would be a shame to lose the performance
gains made with 7 because we're triggering 100's, 1000's or 10000's of
errors
in a single application run..."Just Do It but give users an option to not" - This has the problems
that
E_DEPRECATED
has, but it also gets us back to having fundamental code
behavior controlled by an INI setting, which for a very long time this
community has generally seen as a bad thing (especially for portability
and
code re-use).Moving along,
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference
is in your code, not the end function.For example, a “32” (string) value coming back from an integer column in
a
database table, would not be accepted as valid input for a function
expecting
an integer.There's an important point to consider here. You're relying on information
outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning
when a non-integer value comes in. Because the value-information comes
from outside.With strict mode, you'd have to embed a cast (smart or explicit) to
convert to
an integer at the point the data comes in. So semantic information about
the
value is places right at the point of entry (forcing the code to be more
explicit
and clear).Additionally, with the dual-mode proposal DB interactions can be in weak
mode and have the exact behavior you're describing here. Giving the user
the
choice, rather than making assumptions.Strict zval.type based STH effectively eliminates this behavior, moving
the
burden of worrying about type conversion to the user.Correct. And you say that as if it's a bad thing. Being explicit about
type
conversions isn't what you'd do in a 10 line-of-code script where you can
realize what the types are by just thinking about it. But on large scale
systems
exposing the type conversions to the user gives the power to actually
understand the codebase when you can't fit the whole thing in your head at
the same time.So what you cite here as a disadvantage many consider to be an advantage.
Performance
I find it funny how the non-strict crowd keeps bringing up performance...
It is our position that there is no difference at all between strict
and coercive typing in terms of potential future AOT/JIT development -
none at allSo really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC or
voting
in any way. And is in no part in my RFC at all. Yet brought up again.Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the
same reasons mentioned above about JIT, we don't believe that is the
caseThis is patently false. Keep not believing it all you want, but static
analysis
requires statically looking at code. Which means you have no value
information. So static analysis can't possibly happen in cases where you
need
to know about value information (because it's not there). Yes, at function
entry you know the types. But static analysis isn't about analyzing a
single
function (in fact, that's the least interesting case). It's more about
analyzing a
series of functions, a function call graph. And in that case strict typing
(based
only on
type) does make a big difference.In short, I think the concerns around the handling of internal functions
is
significant enough to cause major concern about this proposal.Thanks
Anthony
All,
I’ve been working with François and several other people from
internals@ and the PHP community to create a single-mode Scalar Type
Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit
more time, but given the time pressure, as well as the fact that a not
fully compatible subset of that RFC was published and has people
already discussing it, it made the most sense to publish it sooner
rather than later.The RFC is available here:
Comments welcome!
Zeev
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 8:12 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
First off, thanks for putting forward a proposal. I look forward to a
patch
that can be experimented with.There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around code
safety/security. In their view, the conversion rules proposed by Dynamic
STH
can easily allow ‘garbage’ input to be silently converted into arguments
that
the callee will accept – but that may, in many cases, hide
difficult-to-find
bugs or otherwise result in unexpected behavior.I think that's partially mis-stating the concern.
I don't think it is, based
The sentence stresses garbage in too much to read as accurate. To
clarify, there is a) garbage in due to weak coercion and b) a function
being called with a string when the typehint says int. Both are
separate concerns around error detection. Stricter coercion can enable
only one of these two, for example. That's better than neither, of
course! The coercion rules were stricter than I expected based on
previous emails. Stressing one too much might suggest to a reader that
the second concern does not exist.
Other pedantic comment: "numerous" is probably too strong a word
there. The advantages may vary by person, but usually fit within basic
five-finger math. It would be more important to enumerate them rather
than selecting one as primary.
On the RFC rules themselves, a few comments:
- Happy to see leading/trailing spaces excluded.
- Rules don't make mention of leading zeroes, e.g. 0003
- "1E07" might be construed as overly generous assuming we are
excluding stringy integers like hex, oct and binary - I'm assuming the stringy ints are rejected?
- Is ".32" coerced to float or only "0.32"? Merely for clarification.
- Boolean coercion from other types... Not entirely sure myself.
Completely off the cuff: <=0: false, >0:true, floats and strings need
not apply. - In string to float, only capital E or also small e?
- I'll never stop call them "stringy" ints.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
-----Original Message-----
From: Pádraic Brady [mailto:padraic.brady@gmail.com]
Sent: Saturday, February 21, 2015 9:56 PM
To: Zeev Suraski
Cc: Anthony Ferrara; PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCThe sentence stresses garbage in too much to read as accurate. To clarify,
there is a) garbage in due to weak coercion and b) a function being called
with a string when the typehint says int. Both are separate concerns
around
error detection. Stricter coercion can enable only one of these two, for
example. That's better than neither, of course! The coercion rules were
stricter than I expected based on previous emails. Stressing one too much
might suggest to a reader that the second concern does not exist.
As I told Anthony, based on what I saw on the list, the former appeared to
be a much more widely-held concern than the latter.
Other pedantic comment: "numerous" is probably too strong a word there.
The advantages may vary by person, but usually fit within basic
five-finger
math. It would be more important to enumerate them rather than selecting
one as primary.
Thanks, changed to 'several'. Double thanks, as I always thought 'Numerous'
was more or less equivalent to 'Several', and you made me look it up :)
On the RFC rules themselves, a few comments:
- Happy to see leading/trailing spaces excluded.
Happy to see you happy!
- Rules don't make mention of leading zeroes, e.g. 0003 3. "1E07" might
be
construed as overly generous assuming we are excluding stringy integers
like
hex, oct and binary 4. I'm assuming the stringy ints are rejected?
It's up for discussion. I personally don't have very strong feelings on how
we deal with these cases, as the main thing I care about is for the common
cases to work. That said, if it was entirely up to me, I'd accept "0003"
but not "1E07" - as the latter is considered floating point:
$x = "1e7";
print gettype($x+0); // would print double
But if rejecting leading zeros is what's needed to get a lot more people to
support it, I can live with that.
- Is ".32" coerced to float or only "0.32"? Merely for clarification.
With the same disclaimer as the one for the previous answer, I'd go with
accepting ".32" in the same way is_numeric()
accepts it.
- In string to float, only capital E or also small e?
I think both, same is_numeric()
rationale - same disclaimer.
- I'll never stop call them "stringy" ints.
I can live with that :)
Thanks for the feedback!
Zeev
- Happy to see leading/trailing spaces excluded.
Fixed length fields may well be a data source so having to strip them
before using them just seems a backward step. The basic C library simply
strips the white space so are we looking at using an alternative?
- Rules don't make mention of leading zeroes, e.g. 0003
Again data may well be zero padded. This is more likely to be historic
material, but it's yet another extra processing step. If we have to
process data before then asking if it's a valid number what is the
advantage of this? However of cause the C library switches to octal mode
and needs pre processing of leading zero's anyway.
- "1E07" might be construed as overly generous assuming we are
excluding stringy integers like hex, oct and binary
Yet again ... If we have to add a lot of extra checks then why can't we
simply ask if the data is a usable integer. At the end of the day it
does depend on where the data was sourced? Binary data is only usable if
we know that it is binary, or we will be converting some other format
anyway?
- I'm assuming the stringy ints are rejected?
Source material may be 'stringy ints', so all that does is say "we can't
use the original variable it has to be converted" rather than we can and
use it's 'non-stringy' view.
- Is ".32" coerced to float or only "0.32"? Merely for clarification.
Omitting the leading zero is normal when doing hand keyed data entry.
.32 is a valid data entry ... Omitting perhaps 20% of characters speeds
data entry.
- Boolean coercion from other types... Not entirely sure myself.
Completely off the cuff: <=0: false, >0:true, floats and strings need
not apply.
That would be a major BC break!
- In string to float, only capital E or also small e?
lower case e is common ...
- I'll never stop call them "stringy" ints.
For some funny reason we tend to prefer to view numbers on the screen as
strings ... having the original string with a value which can be used
for calculation is simply how I thought PHP worked. A variable is more
than just a 'value', we need it's name which is a string, and a viewable
string could be useful, along with other flags. I had assumed up until
now that PHP was using that model but it seems not :(
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
De : Lester Caine [mailto:lester@lsces.co.uk]
Fixed length fields may well be a data source so having to strip them
before using them just seems a backward step. The basic C library simply
strips the white space so are we looking at using an alternative?
I guess you got it wrong : we'll ignore leading and trailing blanks, as well as leading 0s. We say we restrict because, where we accepted every trailing chars, we now only accept blanks. So fixed length fields will work as before.
- Rules don't make mention of leading zeroes, e.g. 0003
Again data may well be zero padded. This is more likely to be historic
material, but it's yet another extra processing step. If we have to
process data before then asking if it's a valid number what is the
advantage of this? However of cause the C library switches to octal mode
and needs pre processing of leading zero's anyway.
Leading 0s are skipped. Same as PHP 5.
- "1E07" might be construed as overly generous assuming we are
excluding stringy integers like hex, oct and binary
Yet again ... If we have to add a lot of extra checks then why can't we
simply ask if the data is a usable integer. At the end of the day it
does depend on where the data was sourced? Binary data is only usable if
we know that it is binary, or we will be converting some other format
anyway?
I asked you for rules to recognize octal. I am all in favor of recognizing other numeric strings, hexa with leading 0x (detecting trailing h is more ambiguous and slower), for instance. It was removed some weeks ago for consistency reasons but it can be reintroduced in a consistent way in the future if needed.
- I'm assuming the stringy ints are rejected?
Source material may be 'stringy ints', so all that does is say "we can't
use the original variable it has to be converted" rather than we can and
use it's 'non-stringy' view.
Can you explain? I don't understand.
- Is ".32" coerced to float or only "0.32"? Merely for clarification.
Omitting the leading zero is normal when doing hand keyed data entry.
.32 is a valid data entry ... Omitting perhaps 20% of characters speeds
data entry.
No difference with PHP 5.
- Boolean coercion from other types... Not entirely sure myself.
Completely off the cuff: <=0: false, >0:true, floats and strings need
not apply.
That would be a major BC break!
I am personally more in favor of reintroducing it. Test patch will show BC breaks and confirm.
- In string to float, only capital E or also small e?
lower case e is common ...
Don't know. Check on php5. No change.
- I'll never stop call them "stringy" ints.
For some funny reason we tend to prefer to view numbers on the screen as
strings ... having the original string with a value which can be used
for calculation is simply how I thought PHP worked. A variable is more
than just a 'value', we need it's name which is a string, and a viewable
string could be useful, along with other flags. I had assumed up until
now that PHP was using that model but it seems not :(
It is mostly using that model. Unfortunately, history demands that $a[32] and $a["32"] behave in different ways. There are more cases but this one is the most common one.
Regards
François
- "1E07" might be construed as overly generous assuming we are
excluding stringy integers like hex, oct and binary
Yet again ... If we have to add a lot of extra checks then why can't we
simply ask if the data is a usable integer. At the end of the day it
does depend on where the data was sourced? Binary data is only usable if
we know that it is binary, or we will be converting some other format
anyway?
I asked you for rules to recognize octal. I am all in favor of recognizing other numeric strings, hexa with leading 0x (detecting trailing h is more ambiguous and slower), for instance. It was removed some weeks ago for consistency reasons but it can be reintroduced in a consistent way in the future if needed.
I've given the 'c' ones, but others have already blocked the use of
them. Octal escape is allowed but a simple '0' or '0o' has the same
problem as 0h or 0x ? Those of us still using 'c/c++' get used to the
old standards.
- I'm assuming the stringy ints are rejected?
Source material may be 'stringy ints', so all that does is say "we can't
use the original variable it has to be converted" rather than we can and
use it's 'non-stringy' view.
Can you explain? I don't understand.
I'm working with a 'variable' which may be an integer but that is only
enforced by reference to how I am looking at it. I understand that some
people want to ONLY allow it to be an integer and that may be the
constraint added to it, but you may also need the string view, or the
origin may be a string already, so one maintains both a binary view and
a 'string' view of the same variable. So I don't subscribe to 'stringy
ints' because all variables have a stringy element naturally.
If I had any need for 'strict' constraints on a variable it has to cover
both the size and type of the data, simply saying int is of little use
if the data is too big anyway.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
De : Lester Caine [mailto:lester@lsces.co.uk]
Fixed length fields may well be a data source so having to strip them
before using them just seems a backward step. The basic C library simply
strips the white space so are we looking at using an alternative?I guess you got it wrong : we'll ignore leading and trailing blanks, as well as leading 0s. We say we restrict because, where we accepted every trailing chars, we now only accept blanks. So fixed length fields will work as before.
Actually the Coercive STH RFC currently takes the position that leading or trailing spaces will not be accepted.
In general, the idea is to support the conversions which are likely to be right and safe in the vast majority of cases, and if there's any doubt - reject it. It's of course up for discussion, but it's that concept that I believe can bridge the gap between proponents of strict and proponents of dynamic typing.
Thanks,
Zeev
- Happy to see leading/trailing spaces excluded.
Fixed length fields may well be a data source so having to strip them
before using them just seems a backward step. The basic C library simply
strips the white space so are we looking at using an alternative?
They also may not be a data source. Big Universe. Many possibilities.
Bear in mind that one of the assumptions of the strict camp is
programmers will make errors. Taking a known field from a database,
and throwing it a known parameter, and doing so by design? Not an
error. C may do it a certain way, but I can't really take that as
anything more than C having its own preference which doesn't tell
whether in absolute terms whether it's correct or not. YMMV.
- Rules don't make mention of leading zeroes, e.g. 0003
Again data may well be zero padded. This is more likely to be historic
material, but it's yet another extra processing step. If we have to
process data before then asking if it's a valid number what is the
advantage of this? However of cause the C library switches to octal mode
and needs pre processing of leading zero's anyway.
Mentioned to clarify. The RFC has a table of various coercion rules. I
like my tables complete.
- "1E07" might be construed as overly generous assuming we are
excluding stringy integers like hex, oct and binary
Yet again ... If we have to add a lot of extra checks then why can't we
simply ask if the data is a usable integer. At the end of the day it
does depend on where the data was sourced? Binary data is only usable if
we know that it is binary, or we will be converting some other format
anyway?
Or perhaps have a typehint for a string which definitely must be
interpreted as an integer/float. The stringyint!
Fine: numeric? I'm not sure if that was ever a well supported thing,
but it would be nice to clarify if it's a runner and, if so, not to
jump the gun by making int/float overly permissive.
You may yet have your cake. Zeev, any chance? No chance?
- I'm assuming the stringy ints are rejected?
Source material may be 'stringy ints', so all that does is say "we can't
use the original variable it has to be converted" rather than we can and
use it's 'non-stringy' view.
See above
- Is ".32" coerced to float or only "0.32"? Merely for clarification.
Omitting the leading zero is normal when doing hand keyed data entry.
.32 is a valid data entry ... Omitting perhaps 20% of characters speeds
data entry.
Again, merely a clarification for my accurate tables fetish.
- Boolean coercion from other types... Not entirely sure myself.
Completely off the cuff: <=0: false, >0:true, floats and strings need
not apply.
That would be a major BC break!
We should definitely put that in a major PHP version then. Seriously,
that was off the cuff and intentionally conservative. It would need
way more discussion.
- In string to float, only capital E or also small e?
lower case e is common ...
Yes, and not mentioned in the RFC table... So it bugged me.
- I'll never stop call them "stringy" ints.
For some funny reason we tend to prefer to view numbers on the screen as
strings
My term is glorious in its perfection. Disagreement will not be permitted!
having the original string with a value which can be used
for calculation is simply how I thought PHP worked. A variable is more
than just a 'value', we need it's name which is a string, and a viewable
string could be useful, along with other flags. I had assumed up until
now that PHP was using that model but it seems not :(
A string is always a string. Until someone decides it represents an
integer. At the heart of our current RFC faceoff, is that some people
expect that throwing a string at an int expecting parameter should
transform the string to an int automatically. Other people reason that
since we cannot inspect the string personally, the automatic
assumption might be erroneous. Perhaps the string really was just a
string. It was never intended to be an integer.
Yes, if you throw a string at an int deliberately, then your vision is
correct. The assumption others make is that programmers are not
perfect, and may do the same thing entirely in error. In those cases,
a silent permissive coercion equates to an error going undetected.
Thereafter we can have some fun debating whether the receiving
parameter should be typed to the expected input (string) or the
expected final input (integer). On the balance of not annoying
upstream users, many would find it attractive to do one thing on the
public API and the other internally where control of parameter flows
is more absolute. Back on the topic of your cake, it may be possible
to resolve that dilemna with a different type specific to that
purpose.
Paddy
--
Pádraic Brady
http://blog.astrumfutura.com
http://www.survivethedeepend.com
Sorry for the previous prematurely sent email, looks like I found a new
keyboard shortcut :)
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 8:12 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
First off, thanks for putting forward a proposal. I look forward to a
patch
that can be experimented with.There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around code
safety/security. In their view, the conversion rules proposed by Dynamic
STH
can easily allow ‘garbage’ input to be silently converted into arguments
that
the callee will accept – but that may, in many cases, hide
difficult-to-find
bugs or otherwise result in unexpected behavior.I think that's partially mis-stating the concern.
I don't think it's mis-stating the key concern. At least not based on what
I've heard from most people here over the last few months.
It's less about "garbage input"
and more about unpredictable behavior. You can't look at code and know
that it will not produce an error with dynamic typing. That's one of the
big
advantages of strict typing that many people want. In reality the reasons
are
complex, varied and important to each person.
Your ability to look at code and know whether or not it will produce errors
is very similar in both strict and coercive typing. But that goes back to
what we already decided to agree to disagree on - whether or not strict type
give you any tangible extra data when you look at code - aka Static
Analysis.
Note that Strict Typing would produce all the errors of coercive typing and
then some. So knowing whether code will produce errors is arguably more
difficult in strict typing, although I think that at the end of the day,
it's pretty much equivalent.
Again, I did see Static Analysis being brought up by just a handful of
people, perhaps not even that. For most people, it was the silent
acceptance of input that's likely to be invalid.
Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent
with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages
treatment of parameters.
Not in the way Andrea proposed it, IIRC. She opted to go for consistency
with internal functions. Either way, at the risk of being shot for talking
about spiritual things, Dynamic STH is consistent with the dynamic spirit of
PHP, even if there are some discrepancies between its rule-set and the
implicit typing rules that govern expressions. Note that in this RFC I'm
actually suggesting a possible way forward that will align all aspects of
PHP, including implicit casting - and have them all governed by a single set
of rules.
However there's an important point to make here: a lot of best practice
has
been pushing against the way PHP treats scalar types in certain cases.
Specifically around == vs === and using strict comparison mode in
in_array,
etc.
I think you're correct on comparisons, but not so much on the rest. Dynamic
use of scalars in expressions is still exceptionally common in PHP code.
Even with comparisons, == is still very common - and you'd use == vs. ===
depending on what you need.
So while it appears consistent with the rest of PHP, it only does so if
you
ignore a large part of both the language and the way it's commonly used.
Let's agree to disagree. That's one thing we can always agree on! :)
In reality, the only thing PHP's type system is consistent at is being
inconsistent.
I'd have to partially agree with you here; But if you read the RFC through
including its future recommendations, you'd see it's perhaps the first
attempt in 20 years to fix that. Instead of doing that through the
introduction of a 3rd (albeit simplistic rule-set that only pays attention
to zval.type) - a creation of a single set of rules that will be consistent
across the whole language, beginning with userland and internal functions.
In the "Changes To Internal Functions" section, I think all three types
are
significantly flawed:
"Just Do It" - This is problematic because a very large chunk of code
that
worked in 5.x will all of a sudden not work in 7.0. This will likely
create a
python 2/3 issue, as it would require a LOT of code to be changed to make
it
compatible."Emit E_DEPRECATED" - This is problematic because raising errors (even
if
suppressed) is not cheap. And the potential for raising one for a
non-trivial
percentage of every native function call has the potential to have a
MASSIVE
performance impact for code designed for 5.x. Without a patch to test, it
can't really be codified, but it would be a shame to lose the performance
gains made with 7 because we're triggering 100's, 1000's or 10000's of
errors
in a single application run..."Just Do It but give users an option to not" - This has the problems
that
E_DEPRECATED
has, but it also gets us back to having fundamental code
behavior controlled by an INI setting, which for a very long time this
community has generally seen as a bad thing (especially for portability
and
code re-use).
I do too, and I was upfront about their cons, not just pros. And yet, they
all bring us to a much better outcome within a relatively short period of
time (in the lifetime of a language) than the Dual Mode will.
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference
is in your code, not the end function.
I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling foo()
from one place and have it accept her arguments, and foo() from another
place and have it reject the very same arguments.
For example, a “32” (string) value coming back from an integer column in
a
database table, would not be accepted as valid input for a function
expecting
an integer.There's an important point to consider here. You're relying on information
outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning
when a non-integer value comes in. Because the value-information comes
from outside.
Of course we're relying on information coming from outside, as we all know,
this is one of the most common use cases for PHP.
While theoretically you're right, in practice, in the vast majority of cases
it wouldn't play out like that. The string column won't be tested
exclusively with "123" inputs. As soon as there's a non-numeric-string
input, it'll fail. That's likely to happen very early in the process, and
that's before considering that if there's such a huge mismatch between the
semantic meaning of the column and what the function expects - the problem
is likely to be found even sooner, since the function will simply not
perform its intended job.
On the flip-side, imagine that same developer using strict types. Feeding
the function that integer in string form gets rejected. What are her
options? The developer is likely to just explicitly cast the value into an
int, giving up on any and all sanitization that coercive types would offer
her, happily accepting "Apples" and "100 Dalmatians" as valid inputs. That,
on the other hand, is a very likely scenario.
With strict mode, you'd have to embed a cast (smart or explicit) to
convert to
an integer at the point the data comes in.
First, I'm not aware of smart/safe casts being available or proposed at this
point.
Secondly, why at the point the data comes in? That would be ideal for
static analyzers, but it's probably a lot more common that it will be done
at the first point in time where it gets rejected.
Additionally, with the dual-mode proposal DB interactions can be in weak
mode and have the exact behavior you're describing here. Giving the user
the
choice, rather than making assumptions.
This is bound to be misquoted and used against me, but I don't think it's a
good idea to give the user the choice in such a way. I could have sworn
that you tweeted the quote about perfection being not when there's nothing
left to add, but nothing left to remove, but perhaps it was someone else.
Either way, two modes are worse than one, if we can come up with a good
single unified mode that addresses most cases.
Remember you can always implement custom type checking to your heart's
content. You can easily implement if (!is_int($foo)) { exit; } in the
not-so-common-cases where accepting "42" as 42 might be disastrous.
However, on the caller side, forcing people to clutter their code with
casts - many casts - either explicit casts or custom ones - is going to
affect a lot more developers in a lot more places. The bang for the buck of
adding strict mode is just not there, in my humble opinion of course.
Strict zval.type based STH effectively eliminates this behavior, moving
the
burden of worrying about type conversion to the user.Correct. And you say that as if it's a bad thing. Being explicit about
type
conversions isn't what you'd do in a 10 line-of-code script where you can
realize what the types are by just thinking about it. But on large scale
systems
exposing the type conversions to the user gives the power to actually
understand the codebase when you can't fit the whole thing in your head at
the same time.
I have a hard time connecting to the 'power' approach. I think developers
want their code to work, with minimal effort, and be secure. Coercive
scalar type hints will do an excellent job at that. Strict type hints will
be more work, are bound to a lot of trigger "Oh come on" responses, and as a
special bonus - proliferate the use of explicit casts. Let me top that -
you'd have developers who think they're security conscious, because they're
using strict mode - with code that's full of explicit casts.
So what you cite here as a disadvantage many consider to be an advantage.
Perhaps, but I used the proper verb at the top ("We believe").
It is our position that there is no difference at all between strict
and coercive typing in terms of potential future AOT/JIT development -
none at allSo really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC or
voting
in any way. And is in no part in my RFC at all. Yet brought up again.
We listed all what we believe to be misconceptions that were brought up on
internals. As recently as yesterday, you had a PHP power user (Larry) that
was under the strong impression Strict STH would yield substantial
performance benefits. Given that it was claimed in the past, and since we
can't assume every voter reads every last word that's written on internals@
threads, it was important to list that here even if it's not mentioned in
the Strict/Dual mode RFC.
It's also worth mentioning that there are people who assume that strict
type hints can somehow help performance, without being domain experts at
neither the engine nor JIT, even if they weren't exposed to the explicit
statements that suggested that on blogs and on internals@ - adding to the
importance of making it clear that there are no performance benefits to that
approach.
Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the
same reasons mentioned above about JIT, we don't believe that is the
caseThis is patently false.
It's actually patently true. We don't believe that is the case. QED.
While at it, can we stop using that 'patently false', and stick for
constructive wording such as 'I disagree'?
Also, I think that if you quoted the rest of the sentence you chose to trim,
it would appear a lot less confrontational:
"Static Analysis. It is the position of several Strict STH proponents that
Strict STH can help static analysis in certain cases. For the same reasons
mentioned above about JIT, we don't believe that is the case - although
it's possible that Strict Typing may be able to help static analysis in
certain edge cases."
That's still under 'we (don't) believe', so again, it's "patently true".
You can disagree, but that's our opinion.
I'll also add the most important part of that paragraph for the sake of
completeness:
"It is our belief that even if that is true, Static Analyzers need to be
designed for Languages, rather than Languages being designed for Static
Analyzers."
Keep not believing it all you want, but static analysis
requires statically looking at code. Which means you have no value
information. So static analysis can't possibly happen in cases where you
need
to know about value information (because it's not there). Yes, at function
entry you know the types. But static analysis isn't about analyzing a
single
function (in fact, that's the least interesting case). It's more about
analyzing a
series of functions, a function call graph. And in that case strict typing
(based
only on
type) does make a big difference.
I think it's fair to say that while we were unable to convince you there's
no tangible extra value in Strict STH compared to any other kind of STH that
guarantees the type of value a function will get, you were also unable to
convince Dmitry, Stas or myself - all of which independently discussed it
with you. Again, despite that, I'm not saying that you're "patently wrong",
just that I don't believe you're right.
Thanks for the feedback!
Zeev
Zeev,
I won't nit-pick every point, but there are a few I think need to be clarified.
Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent
with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages
treatment of parameters.Not in the way Andrea proposed it, IIRC. She opted to go for consistency
with internal functions. Either way, at the risk of being shot for talking
about spiritual things, Dynamic STH is consistent with the dynamic spirit of
PHP, even if there are some discrepancies between its rule-set and the
implicit typing rules that govern expressions. Note that in this RFC I'm
actually suggesting a possible way forward that will align all aspects of
PHP, including implicit casting - and have them all governed by a single set
of rules.
The point I was making up to there is that we currently have 2 type
systems: user-land object and ZPP-scalar. So in any given function you
have 2 type systems interacting. The current ZPP scalar type is
dynamic, and user-land object static.
With the proposal here, you'd unify user-land scalar to behave as
zpp-scalar. So you'd have two type systems in any given function:
scalar and object (which behave differently).
My proposal gives you the same two by default (scalar and object) and
a strict switch to collapse them into a single, unified type system.
This is even more apparent with the int-float acceptance, because we
can mentally model Float as an object that extends Int. Then it makes
perfect sense why you'd accept ints where you see floats, but not the
opposite.
However there's an important point to make here: a lot of best practice
has
been pushing against the way PHP treats scalar types in certain cases.
Specifically around == vs === and using strict comparison mode in
in_array,
etc.I think you're correct on comparisons, but not so much on the rest. Dynamic
use of scalars in expressions is still exceptionally common in PHP code.
Even with comparisons, == is still very common - and you'd use == vs. ===
depending on what you need.So while it appears consistent with the rest of PHP, it only does so if
you
ignore a large part of both the language and the way it's commonly used.Let's agree to disagree. That's one thing we can always agree on! :)
I'm talking about the object system. I don't think you're disagreeing
that it's static. Hence coercive scalars are consistent only if you
look at 1/2 the type system. That was the point I was making there.
- "Just Do It but give users an option to not" - This has the problems
that
E_DEPRECATED
has, but it also gets us back to having fundamental code
behavior controlled by an INI setting, which for a very long time this
community has generally seen as a bad thing (especially for portability
and
code re-use).I do too, and I was upfront about their cons, not just pros. And yet, they
all bring us to a much better outcome within a relatively short period of
time (in the lifetime of a language) than the Dual Mode will.
Let's agree to disagree that an ini setting will be better than a
per-file setting.
In fact, I personally think this is major enough of an issue that I
will vote no simply on this reason alone (type behavior depending on
an ini setting in any way shape or form).
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference
is in your code, not the end function.I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling foo()
from one place and have it accept her arguments, and foo() from another
place and have it reject the very same arguments.
Let me think on it and I will come up with something.
With strict mode, you'd have to embed a cast (smart or explicit) to
convert to
an integer at the point the data comes in.First, I'm not aware of smart/safe casts being available or proposed at this
point.
Secondly, why at the point the data comes in? That would be ideal for
static analyzers, but it's probably a lot more common that it will be done
at the first point in time where it gets rejected.
By "smart cast" I was referring to a function which checked
is_numeric()
. Not a new language construct.
I have a hard time connecting to the 'power' approach. I think developers
want their code to work, with minimal effort, and be secure. Coercive
scalar type hints will do an excellent job at that. Strict type hints will
be more work, are bound to a lot of trigger "Oh come on" responses, and as a
special bonus - proliferate the use of explicit casts. Let me top that -
you'd have developers who think they're security conscious, because they're
using strict mode - with code that's full of explicit casts.
I agree we should have users avoid explicit casts. That's why the
dual-mode proposal exists. If users don't want to control their types,
they should use the default mode. And everything works fine.
If they know what they want, then the explicit cast becomes a
documenting piece of information that "this is supposed to happen".
Ex:
function takesInt(int $a) {}
function foo(float $b) {
return takesInt($b);
}
In weak mode, that "just works". But is it supposed to just work? You
have no idea. The next developer who comes will look at it and ask "is
that supposed to truncate, or was that an oversight?" and have no
idea. But in strict mode, placing an explicit cast before $b shows the
next developer who comes there "the truncation was intentional".
Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the
same reasons mentioned above about JIT, we don't believe that is the
caseThis is patently false.
It's actually patently true. We don't believe that is the case. QED.
To understand why "we don't believe" can be false, let's make an
analogy: I can say that I don't believe in gravity. That doesn't mean
that the opinion isn't patently false just because it was stated as an
opinion (or rather the "believe" is true, but the implication of the
belief is false)...
Keep not believing it all you want, but static analysis
requires statically looking at code. Which means you have no value
information. So static analysis can't possibly happen in cases where you
need
to know about value information (because it's not there). Yes, at function
entry you know the types. But static analysis isn't about analyzing a
single
function (in fact, that's the least interesting case). It's more about
analyzing a
series of functions, a function call graph. And in that case strict typing
(based
only on
type) does make a big difference.I think it's fair to say that while we were unable to convince you there's
no tangible extra value in Strict STH compared to any other kind of STH that
guarantees the type of value a function will get, you were also unable to
convince Dmitry, Stas or myself - all of which independently discussed it
with you. Again, despite that, I'm not saying that you're "patently wrong",
just that I don't believe you're right.
I've built a static analyzer that's public. I've talked to people who
build them for a living. I don't claim to be an expert in them (far
from it), but what I've seen and learned is that what you're talking
about here either isn't possible (yet) or is difficult enough to be
impractical (in terms of computing resources necessary).
You can disagree with me all you want. You don't even need to convince
me. All you need to do is disprove me. Show me a static analyzer for a
sufficiently dynamic language (Scalar PHP or full JS - not ASM.js -
would work) and I'll happy apologize and retract the comment. But so
far all I've seen are people saying it's possible even in presence of
arguments to the contrary (why it's not possible).
Thanks,
Anthony
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 10:08 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
I won't nit-pick every point, but there are a few I think need to be
clarified.Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have
been key tenets of PHP since its inception. Strict STH, in their
view, is inconsistent with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages treatment of parameters.Not in the way Andrea proposed it, IIRC. She opted to go for
consistency with internal functions. Either way, at the risk of being
shot for talking about spiritual things, Dynamic STH is consistent
with the dynamic spirit of PHP, even if there are some discrepancies
between its rule-set and the implicit typing rules that govern
expressions. Note that in this RFC I'm actually suggesting a possible
way forward that will align all aspects of PHP, including implicit
casting - and have them all governed by a single set of rules.The point I was making up to there is that we currently have 2 type
systems: user-land object and ZPP-scalar. So in any given function you
have 2
type systems interacting. The current ZPP scalar type is dynamic, and
user-
land object static.
Objects and scalars are fundamentally different, definitely in PHP..
There's no standard way to convert an instance of a certain class to an
instance of another class, at least not in PHP. There is, however, a
standard way of converting different types of scalars to one another, and
this behavior is extremely ubiquitous in PHP. That is the reason that when
class type hints were introduced, it was at the condition that we'll not
have strict scalar type hints, because scalars were (and still are) so
fundamentally different than objects and in many cases can losslessly change
type in a well-defined manner. Perhaps if we had the coercive STH idea back
then we could have saved us all a lost decade without scalar type hints.
I'll say it again - scalars and objects (and for that matter any non-scalar
type including arrays and resources) - are inherently different, and
striving for consistency between them should not be a goal IMHO.
Let's agree to disagree that an ini setting will be better than a per-file
setting.
We can agree to disagree on that, but not on the fact that the Coercive STH
RFC provides a roadmap, at the end of which we'd have a single mode,
consistently behaving(*) and more secure-for-everyone language - and one
without that INI entry (it's a temporary measure that will be removed). The
Dual Mode RFC, on the other hand, we're introducing two modes that we'll
have to support for all eternity, with two potential 'camps' of developers
forming up, and with no security benefits for anybody who doesn't choose to
flip on the strict switch.
(*) With the exception of Scalar/non-Scalar consistency, which again, I find
hard to accept as a valid goal.
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference is in your code, not the end function.I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling
foo() from one place and have it accept her arguments, and foo() from
another place and have it reject the very same arguments.Let me think on it and I will come up with something.
Thanks.
I agree we should have users avoid explicit casts. That's why the
dual-mode
proposal exists. If users don't want to control their types, they should
use the
default mode. And everything works fine.
This ignores the reasonable guesstimate - raised by several people here -
that many will flip Strict mode on because they'd think it makes them
inherently safer, faster, or just a good thing to do. Everything I've
learned about PHP in the last couple of decades tells me there are going to
be plenty of those. Those are the same people who would quickly then add
explicit casts to make the code work again; Those are also people who would
benefit a lot more from a single mode system that would point out real
problems in their code, with likely very few false positives or false
negatives.
Static Analysis. It is the position of several Strict STH
proponents that Strict STH can help static analysis in certain
cases. For the same reasons mentioned above about JIT, we don't
believe that is the caseThis is patently false.
It's actually patently true. We don't believe that is the case. QED.
To understand why "we don't believe" can be false, let's make an
analogy: I can say that I don't believe in gravity. That doesn't mean that
the
opinion isn't patently false just because it was stated as an opinion (or
rather
the "believe" is true, but the implication of the belief is false)...
Of course an opinion can be patently false, however, you were talking about
what I wrote, not the opinion.
Let me illustrate it. Watch youtu.be/kjvQ9eT_t-U - it's worth the one
minute of your time if you haven't already seen it. Now, apparently, that
guy absolutely believes in what he says. What he says is absolute nonsense,
but not the fact he believes in it.
Now, I took an extreme case to illustrate my point and I obviously don't aim
to be compared with this guy, but hopefully that takes the nitpicking point
across. Sorry, I've become a bit sensitive to the explosion of 'patently
false' statements here, I find them somewhat offensive and aiming to shut
discussion down.
To the point itself, with all due respect, the Static Analysis gains you
believe exist in Strict STH aren't even remotely at the same bucket as
gravity, which has been proven with scientific method. More on that below.
I think it's fair to say that while we were unable to convince you
there's no tangible extra value in Strict STH compared to any other
kind of STH that guarantees the type of value a function will get, you
were also unable to convince Dmitry, Stas or myself - all of which
independently discussed it with you. Again, despite that, I'm not
saying that you're "patently wrong", just that I don't believe you're
right.I've built a static analyzer that's public. I've talked to people who
build them
for a living. I don't claim to be an expert in them (far from it), but
what I've
seen and learned is that what you're talking about here either isn't
possible
(yet) or is difficult enough to be impractical (in terms of computing
resources
necessary).You can disagree with me all you want. You don't even need to convince me.
All you need to do is disprove me.
Actually, that's not the scientific method I know. In the scientific method
I know, a theory needs to be proven for it to be accepted as true - and not
as you seem to suggest, that a theory is deemed true unless it's proven
untrue.
Show me a static analyzer for a sufficiently
dynamic language (Scalar PHP or full JS - not ASM.js - would work) and
I'll
happy apologize and retract the comment.
First, you don't need to retract the comment as evidently it's clear you
believe in it. It's fine that we hold different views. But we should both
be allowed to say what we think and have this difference of opinion be
widely known.
Secondly, there you are: http://www.checkmarx.com/ - they've been
developing pretty amazing static analyzers for PHP for years, and without
any type of scalar hints - strict or weak.
Last - I'm not sure how whether or not static analyzers exist for PHP proves
or disproves whether the delta between Strict/Dynamic type hints helps
Static Analysis in any way. You could still be right even though static
analyzers already exist for PHP (they could be made better, perhaps), and
you could still be wrong if there were no static analyzers for PHP in
existence.
But so far all I've seen are people
saying it's possible even in presence of arguments to the contrary (why
it's
not possible).
My recommendation is that we again agree to disagree, as you suggested.
Agreeing to disagree doesn't mean I accept your position as I accept
gravity, or that I start hiding the fact I disagree (and vice versa). It
means we both disagree with each other respectfully. In the RFC I
intentionally didn't say that Static Analysis being helped by Strict STH is
nonsense or 'patently false' - but instead, pointed out that we don't
believe there are substantial gains to be had there.
Thanks,
Zeev
Zeev,
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 10:08 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
I won't nit-pick every point, but there are a few I think need to be
clarified.Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have
been key tenets of PHP since its inception. Strict STH, in their
view, is inconsistent with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages treatment of parameters.Not in the way Andrea proposed it, IIRC. She opted to go for
consistency with internal functions. Either way, at the risk of being
shot for talking about spiritual things, Dynamic STH is consistent
with the dynamic spirit of PHP, even if there are some discrepancies
between its rule-set and the implicit typing rules that govern
expressions. Note that in this RFC I'm actually suggesting a possible
way forward that will align all aspects of PHP, including implicit
casting - and have them all governed by a single set of rules.The point I was making up to there is that we currently have 2 type
systems: user-land object and ZPP-scalar. So in any given function you
have 2
type systems interacting. The current ZPP scalar type is dynamic, and
user-
land object static.Objects and scalars are fundamentally different, definitely in PHP..
There's no standard way to convert an instance of a certain class to an
instance of another class, at least not in PHP. There is, however, a
standard way of converting different types of scalars to one another, and
this behavior is extremely ubiquitous in PHP. That is the reason that when
class type hints were introduced, it was at the condition that we'll not
have strict scalar type hints, because scalars were (and still are) so
fundamentally different than objects and in many cases can losslessly change
type in a well-defined manner. Perhaps if we had the coercive STH idea back
then we could have saved us all a lost decade without scalar type hints.I'll say it again - scalars and objects (and for that matter any non-scalar
type including arrays and resources) - are inherently different, and
striving for consistency between them should not be a goal IMHO.
They are inherently different because we (PHP) say they are. Plenty of
other languages out there say they are not inherently different.
You want to keep them separate? That's fine. But it can be useful to
model a language without making them separate. So I don't agree that
they are "fundamentally" different.
Is it worth going into that now? No. But I think it's a valid viewpoint.
Let's agree to disagree that an ini setting will be better than a per-file
setting.We can agree to disagree on that, but not on the fact that the Coercive STH
RFC provides a roadmap, at the end of which we'd have a single mode,
consistently behaving(*) and more secure-for-everyone language - and one
without that INI entry (it's a temporary measure that will be removed). The
Dual Mode RFC, on the other hand, we're introducing two modes that we'll
have to support for all eternity, with two potential 'camps' of developers
forming up, and with no security benefits for anybody who doesn't choose to
flip on the strict switch.
The other RFC comes from the standpoint that there's nothing wrong
with coercive. So there's no need for a roadmap. There's no need for
people's code to break when they upgrade, ever. There's no need for an
upgrade path because everything will just work. It introduces a new
mode that users can opt into. And in practice, they should use both
modes because different needs have different solutions.
"with two potential 'camps' of developers forming up"
Have you looked at the community lately? That's been happening for a
decade. One camp likes to engineering everything out using classes and
libraries. The other keeps using PHP procedurally and ignoring
changing "best practices" (and I do mean the quotes). Does that make
one better than the other? NO.
And that's PHP's strength. It gives both sides the power to keep doing
what they want to be doing without having to give up or be burdened by
the other side.
And that's what my RFC provides. It doesn't force behavior or belief
on anyone, it gives the choice.
I agree we should have users avoid explicit casts. That's why the
dual-mode
proposal exists. If users don't want to control their types, they should
use the
default mode. And everything works fine.This ignores the reasonable guesstimate - raised by several people here -
that many will flip Strict mode on because they'd think it makes them
inherently safer, faster, or just a good thing to do. Everything I've
learned about PHP in the last couple of decades tells me there are going to
be plenty of those. Those are the same people who would quickly then add
explicit casts to make the code work again; Those are also people who would
benefit a lot more from a single mode system that would point out real
problems in their code, with likely very few false positives or false
negatives.
Sure, some people will do that. Just like people still use single
quotes because they are faster.
But the way you fix that is not to handicap the language (we do have
eval() and goto, no?) but to educate the users.
First, you don't need to retract the comment as evidently it's clear you
believe in it. It's fine that we hold different views. But we should both
be allowed to say what we think and have this difference of opinion be
widely known.
Secondly, there you are: http://www.checkmarx.com/ - they've been
developing pretty amazing static analyzers for PHP for years, and without
any type of scalar hints - strict or weak.
Vulnerability scanning is a subset of static analysis. One that does
not rely on typing, but tainting. All it needs to do is watch variable
flow with explicit "markers" to make the input safe. There are
definitely other techniques involved as well.
What I'm talking about is correctness provers. Static analyzers that
look at an entire codebase and ensures that information flow through
it is correct and will not error at runtime. Precisely what HHVM's
type checker does.
Two more things regarding the competing RFC – it’s still alive, and being
promoted for PHP 7.0; And while it doesn’t create a huge BC break, it
allows developers to selectively create localized BC breaks, on a per file
basis.
No, it does not. A BC break is something where existing code works,
and you do nothing more than upgrade and have the new code not work
anymore.
With the other dual-mode RFC, if a user opts-in (enables strict mode),
if code doesn't work that's not a BC break. That's a case of "you told
us explicit you don't want this code to work if it's invalid, and
guess what, it's invalid".
Anthony
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Sunday, February 22, 2015 12:25 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
"with two potential 'camps' of developers forming up"
Have you looked at the community lately? That's been happening for a
decade. One camp likes to engineering everything out using classes and
libraries. The other keeps using PHP procedurally and ignoring changing
"best
practices" (and I do mean the quotes). Does that make one better than the
other? NO.
I don't think these two camps map to the strict and dynamic camps,
definitely not for the userbase at large.
Which means that whatever camps we have today, we'd have more - one extra
point of division.
And that's PHP's strength. It gives both sides the power to keep doing
what
they want to be doing without having to give up or be burdened by the
other
side.
I explained both in the RFC and here why I don't think providing two modes
for doing something quite similar are good. We can agree to disagree :)
Sure, some people will do that. Just like people still use single quotes
because
they are faster.
Not quite IMHO - that example actually requires some relatively advanced
knowledge. Strict, with the amount of noise that Scalar Type Hints are
bound to get (and are already getting) - is bound to have a lot more
exposure than single quotes being faster ever had. And it's therefore very
likely it'll be used by people who shouldn't really be using it.
Let's leave the Static Analysis part aside, and agree to disagree as we
already did numerous times but failed to implement.
Two more things regarding the competing RFC – it’s still alive, and
being promoted for PHP 7.0; And while it doesn’t create a huge BC
break, it allows developers to selectively create localized BC breaks,
on a per file basis.No, it does not. A BC break is something where existing code works, and
you
do nothing more than upgrade and have the new code not work anymore.With the other dual-mode RFC, if a user opts-in (enables strict mode), if
code
doesn't work that's not a BC break. That's a case of "you told us explicit
you
don't want this code to work if it's invalid, and guess what, it's
invalid".
That's splitting hairs IMHO. The bottom line is that many people will
undergo the same process Rasmus did as he experimented, flipping the switch
on because it's a best practice, and start having to fix their code to work.
But we can also agree on what we always agree here too :)
Thanks,
Zeev
Zeev,
Two more things regarding the competing RFC – it’s still alive, and
being promoted for PHP 7.0; And while it doesn’t create a huge BC
break, it allows developers to selectively create localized BC breaks,
on a per file basis.No, it does not. A BC break is something where existing code works, and
you
do nothing more than upgrade and have the new code not work anymore.With the other dual-mode RFC, if a user opts-in (enables strict mode),
if
code
doesn't work that's not a BC break. That's a case of "you told us
explicit
you
don't want this code to work if it's invalid, and guess what, it's
invalid".That's splitting hairs IMHO. The bottom line is that many people will
undergo the same process Rasmus did as he experimented, flipping the
switch
on because it's a best practice, and start having to fix their code to
work.
But we can also agree on what we always agree here too :)
Saying that turning on an optional and previously unavailable option inside
code causing code breaks is any way a " BC" break is pure FUD.
It is not BC by any definition that we have ever used on on this list, nor
is it BC based on semver nor any other community accepted definition.
Let's please avoid FUD and continue to discuss the proposals at hand...
Anthony
De : Anthony Ferrara [mailto:ircmaxell@gmail.com]
Saying that turning on an optional and previously unavailable option inside
code causing code breaks is any way a " BC" break is pure FUD.
Who talked of BC break for this ?
I probably missed something because there's no BC break here, just an extremely probable disaster scenario.
If people massively turn strict-type on because they see on twitter/google/fantasm that it is detecting more errors (and history shows it will happen), we have a big problem, because can only be massive casting. Nothing exaggerated here and history again shows that 'people know what they're doing' is not serious.
"Strict types are sexy. I want to use them on my old codebase. It will help me finding bugs". Human. Basic. Don't think that's FUD.
Regards
François
De : Anthony Ferrara [mailto:ircmaxell@gmail.com]
Saying that turning on an optional and previously unavailable option
inside
code causing code breaks is any way a " BC" break is pure FUD.Who talked of BC break for this ?
I probably missed something because there's no BC break here, just an
extremely probable disaster scenario.If people massively turn strict-type on because they see on
twitter/google/fantasm that it is detecting more errors (and history shows
it will happen), we have a big problem, because can only be massive
casting. Nothing exaggerated here and history again shows that 'people know
what they're doing' is not serious."Strict types are sexy. I want to use them on my old codebase. It will
help me finding bugs". Human. Basic. Don't think that's FUD.
Saying that everyone will turn it on in every possible situations or legacy
codes is FUD and pure speculation.
My gut feeling, having quite a large users base to back my rough
estimation, is that most won't even know or notice it.
Regards
François
My gut feeling, having quite a large users base to back my rough
estimation, is that most won't even know or notice it.
The majority of users have no need for it at all ... so is there any
need for it to be even compiled in for a general user ?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
"with two potential 'camps' of developers forming up"
Have you looked at the community lately? That's been happening for a
decade. One camp likes to engineering everything out using classes and
libraries. The other keeps using PHP procedurally and ignoring changing
"best
practices" (and I do mean the quotes). Does that make one better than the
other? NO.I don't think these two camps map to the strict and dynamic camps,
definitely not for the userbase at large.
Actually yes, it is slightly like that.
Which means that whatever camps we have today, we'd have more - one extra
point of division.
We do, while one RFC allows a smooth, BC breaks free (100% safe)
migration path to 7 for existing apps or code.
And that's PHP's strength. It gives both sides the power to keep doing
what
they want to be doing without having to give up or be burdened by the
other
side.I explained both in the RFC and here why I don't think providing two modes
for doing something quite similar are good. We can agree to disagree :)
And I explain here many times why changing the casting rules is a bad
risky idea to begin with.
Sure, some people will do that. Just like people still use single quotes
because
they are faster.Not quite IMHO - that example actually requires some relatively advanced
knowledge. Strict, with the amount of noise that Scalar Type Hints are
bound to get (and are already getting) - is bound to have a lot more
exposure than single quotes being faster ever had. And it's therefore very
likely it'll be used by people who shouldn't really be using it.Let's leave the Static Analysis part aside, and agree to disagree as we
already did numerous times but failed to implement.
Pure suppositions and again extrapolating from nothing. It is as
always a documentation matter and as usual many users won't even know
it exists and simply update PHP. It should work smoothly, without BC
and that's what will happen by default if we don't change the default
way to deal with casts.
Two more things regarding the competing RFC – it’s still alive, and
being promoted for PHP 7.0; And while it doesn’t create a huge BC
break, it allows developers to selectively create localized BC breaks,
on a per file basis.No, it does not. A BC break is something where existing code works, and
you
do nothing more than upgrade and have the new code not work anymore.With the other dual-mode RFC, if a user opts-in (enables strict mode), if
code
doesn't work that's not a BC break. That's a case of "you told us explicit
you
don't want this code to work if it's invalid, and guess what, it's
invalid".That's splitting hairs IMHO. The bottom line is that many people will
undergo the same process Rasmus did as he experimented, flipping the switch
on because it's a best practice, and start having to fix their code to work.
But we can also agree on what we always agree here too :)
It is not splitting hair, it is the key point of the dual mode. A
given user can simply ignore the type hinting and keep using what its
apps do, and it will work. He can as well focus one area, or addon to
its app, to use strict mode (and will only apply to the files used by
this specific addon and nowhere else!! ).
With the other RFC, which changes the casting modes, I wish everyone
good luck. I may be wrong, can happen ;), but we simply do not know
and will not know before 7.0.0 is out. Good luck to change them again
to "adapt and tweak", and good luck to the apps developers to adapt
their apps with plenty of patch versions checks. This is the reason #2
why I am against your RFC, the #1 being the total lack of actual non
magic casting (read: strict), optionally enabled.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
With the other RFC, which changes the casting modes, I wish everyone
good luck. I may be wrong, can happen ;), but we simply do not know
and will not know before 7.0.0 is out. Good luck to change them again
to "adapt and tweak", and good luck to the apps developers to adapt
their apps with plenty of patch versions checks. This is the reason #2
why I am against your RFC, the #1 being the total lack of actual non
magic casting (read: strict), optionally enabled.
Like you I don't see the point of the casting changes but similarly I
don't see why we have to have strict mode bolted in to everybodys
systems as well. Both RFC's seem to be trying to push the bitter pill
through with the main payload ... Scalar Type Hinting. There doses seem
to be a general assumption that everybody wants STH therefore it has to
be included and I am sure that that simple question would get a
substantial majority. As with other 'hints' there is a difference in
opinion on just what 'everybody' wants. My objection is perhaps to all
targeting an different rule sets, none of which actually do the whole
job, and now that I am starting to get into just how the code works, I
think I can see openings for hook points where just how a particular
style of working can be accommodated. Just as the problem with
case-sensitive core, if the right 'filter' can be made selectable we can
all have what we want. Yes it will result in confusion over what runs
where, but equally we can tailor hinting to match a particular set of
rules rather than having to live with some other persons preference such
as the 'new' casting rules ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Pierre,
De : Pierre Joye [mailto:pierre.php@gmail.com]
With the other RFC, which changes the casting modes, I wish everyone
good luck. I may be wrong, can happen ;), but we simply do not know
and will not know before 7.0.0 is out. Good luck to change them again
to "adapt and tweak", and good luck to the apps developers to adapt
their apps with plenty of patch versions checks. This is the reason #2
why I am against your RFC, the #1 being the total lack of actual non
magic casting (read: strict), optionally enabled.
About #1: if, one day, a majority decides that we absolutely need strict types, it will be very easy. Just define four additional type hints, something like 'int!', 'float!', 'string!' 'bool!' or any other syntax. These types would be defined as accepting only their native zval type and, of course, performing no conversion. And that's it. Nothing more. You have your strict types, function by function, argument by argument, return type too if you want. Isn't it nice ? We decided not to propose such types because we think it would bring more bad than good. That's just an opinion. Do you see the difference ? we don't propose strict types for 7.0 but, if you gather a majority favoring it, they may be present in 7.1.
About #2: The risk is not so terrible. As the default is to turn off E_DEPRECATED
messages in production, it is even very low. The highest risk we take is to see a small performance hit. Probably negligible compared to the phpng positive impact on performance. And temporary because developers will quietly fix their code and the hidden messages will disappear. So, we probably won't have to rely on 'good luck'.
FUD apart, every test Dmitry ran using his upcoming patch (which will implement E_DEPRECATED) on existing PHP software raised very few new errors. More : after analysis, all these messages except 1, I believe, correspond to undetected bugs in the PHP code. Like the bug I detected in the PHP code to build phar.phar. So, I can go further : we are not breaking anything and we are helping users to find undetected bugs in their codebase. Nice side effect, isn't it ?
Regards
François
About #1: if, one day, a majority decides that we absolutely need strict types, it will be very easy. Just define four additional type hints, something like 'int!', 'float!', 'string!' 'bool!' or any other syntax. These types would be defined as accepting only their native zval type and, of course, performing no conversion. And that's it. Nothing more. You have your strict types, function by function, argument by argument, return type too if you want. Isn't it nice ? We decided not to propose such types because we think it would bring more bad than good. That's just an opinion. Do you see the difference ? we don't propose strict types for 7.0 but, if you gather a majority favoring it, they may be present in 7.1.
I do not think this will even happen, not even in a distant future.
About #2: The risk is not so terrible. As the default is to turn off
E_DEPRECATED
messages in production, it is even very low. The highest risk we take is to see a small performance hit. Probably negligible compared to the phpng positive impact on performance.
As I said earlier, E_DEPRECATED
will only caught what it sees, we miss
what it does not and actually casts happily. There is no performance
hit, let get over that, we have published numbers to show that
already.
And temporary because developers will quietly fix their code and the hidden messages will disappear. So, we probably won't have to rely on 'good luck'.
Again. It seems you only see issues with this insignificant
E_DEPRECATED
messages, it will be just like other or E_NOTICE, they
will disable it and move on. However my point is in a totally
different area, the one we have no idea what is going on because it is
now casted while it was not before. We do not have data to cover these
cases. It is not a lack of will but it is simply impossible to have
for all apps out there in production (as you said before). And it is
exactly why I am against changing the casting rules, even for a single
yota beyond what we have done already (or close to this area).
FUD apart, every test Dmitry ran using his upcoming patch (which will implement E_DEPRECATED) on existing PHP software raised very few new errors. More : after analysis, all these messages except 1, I believe, correspond to undetected bugs in the PHP code. Like the bug I detected in the PHP code to build phar.phar. So, I can go further : we are not breaking anything and we are helping users to find undetected bugs in their codebase. Nice side effect, isn't it ?
Again, see my previous comment in this mail.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
De : Pierre Joye [mailto:pierre.php@gmail.com]
As I said earlier,
E_DEPRECATED
will only caught what it sees, we miss
what it does not and actually casts happily.
Sorry, Pierre, I don't see what you mean. You're probably right but I don't understand what you mean with 'casting'. AFAIK, we are not touching casting rules, implicit or explicit. We just take the ZPP code and raise some E_DEPRECATED
while keeping the whole logic.
Can you try to explain again or give a scenario I can understand.
And temporary because developers will quietly fix their code and the
hidden messages will disappear. So, we probably won't have to rely on 'good
luck'.Again. It seems you only see issues with this insignificant
E_DEPRECATED
messages, it will be just like other or E_NOTICE, they
will disable it and move on. However my point is in a totally
different area, the one we have no idea what is going on because it is
now casted while it was not before.
Do you mean that adding type hints will break existing code ? If someone can explain...
We do not have data to cover these
cases. It is not a lack of will but it is simply impossible to have
for all apps out there in production (as you said before). And it is
exactly why I am against changing the casting rules, even for a single
yota beyond what we have done already (or close to this area).FUD apart, every test Dmitry ran using his upcoming patch (which will
implement E_DEPRECATED) on existing PHP software raised very few new
errors. More : after analysis, all these messages except 1, I believe,
correspond to undetected bugs in the PHP code. Like the bug I detected in
the PHP code to build phar.phar. So, I can go further : we are not breaking
anything and we are helping users to find undetected bugs in their
codebase. Nice side effect, isn't it ?Again, see my previous comment in this mail.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
You're probably right but I don't understand what you mean with 'casting'. AFAIK, we are not touching casting rules, implicit or explicit.
BUT ... While Coercive Type Rules don't actually cast, they fail in
different ways to what the cast would have been, so someone who HAS cast
the value before calling then fails simply because the rules are
different? I have never used things like 100kg, and I don't see an easy
way to convert that to perhaps 100000 gms, but it is a basic part of
current PHP so ANY dilution of that should be covered properly.
Introducing new rules should mirror across all relevant areas, and I
find this a big negative to Coercive STH. But Strict STH is equally bad
since again it is not equally applied across all code. One 'cherry
picks' when to switch on Strict, one changes the rules that don't fit
for Coercive. Neither deserve acceptance because both are only creating
more divergence.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi,
I intended on replying to this thread at a later stage (and probably
will), but I just can't ignore this one:
Secondly, there you are: http://www.checkmarx.com/ - they've been
developing pretty amazing static analyzers for PHP for years, and without
any type of scalar hints - strict or weak.
Having used that product, I can tell you that it's not at all that
amazing for PHP (maybe it works well for other languages, don't know),
and that's an understatement. In fact, a lot of what it does is to
force you into explicit casts.
Cheers,
Andrey.
I agree we should have users avoid explicit casts. That's why the
dual-mode
proposal exists. If users don't want to control their types, they should
use the
default mode. And everything works fine.This ignores the reasonable guesstimate - raised by several people here -
that many will flip Strict mode on because they'd think it makes them
inherently safer, faster, or just a good thing to do. Everything I've
learned about PHP in the last couple of decades tells me there are going
to
be plenty of those. Those are the same people who would quickly then add
explicit casts to make the code work again; Those are also people who
would
benefit a lot more from a single mode system that would point out real
problems in their code, with likely very few false positives or false
negatives.
This is not even a remotely comparable to a guess.
The users I talk to likes it because they see it as a way to improve their
code, not the performance or security but see it as as cleaner, more
readable, less error prone, more predictable.
Like the drupal case, I would not suggest them to move to strict
everywhere, but modules would be free to do so, or part of the core. And
that without any impact on the 3rd party users. I repeat, without any
impact.
On Sat Feb 21 2015 at 21:08:39 Anthony Ferrara ircmaxell@gmail.com wrote:
Zeev,
I won't nit-pick every point, but there are a few I think need to be
clarified.Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have
been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent
with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages
treatment of parameters.Not in the way Andrea proposed it, IIRC. She opted to go for consistency
with internal functions. Either way, at the risk of being shot for
talking
about spiritual things, Dynamic STH is consistent with the dynamic
spirit of
PHP, even if there are some discrepancies between its rule-set and the
implicit typing rules that govern expressions. Note that in this RFC I'm
actually suggesting a possible way forward that will align all aspects
of
PHP, including implicit casting - and have them all governed by a single
set
of rules.The point I was making up to there is that we currently have 2 type
systems: user-land object and ZPP-scalar. So in any given function you
have 2 type systems interacting. The current ZPP scalar type is
dynamic, and user-land object static.With the proposal here, you'd unify user-land scalar to behave as
zpp-scalar. So you'd have two type systems in any given function:
scalar and object (which behave differently).My proposal gives you the same two by default (scalar and object) and
a strict switch to collapse them into a single, unified type system.This is even more apparent with the int-float acceptance, because we
can mentally model Float as an object that extends Int. Then it makes
perfect sense why you'd accept ints where you see floats, but not the
opposite.However there's an important point to make here: a lot of best practice
has
been pushing against the way PHP treats scalar types in certain cases.
Specifically around == vs === and using strict comparison mode in
in_array,
etc.I think you're correct on comparisons, but not so much on the rest.
Dynamic
use of scalars in expressions is still exceptionally common in PHP code.
Even with comparisons, == is still very common - and you'd use == vs. ===
depending on what you need.So while it appears consistent with the rest of PHP, it only does so if
you
ignore a large part of both the language and the way it's commonly used.Let's agree to disagree. That's one thing we can always agree on! :)
I'm talking about the object system. I don't think you're disagreeing
that it's static. Hence coercive scalars are consistent only if you
look at 1/2 the type system. That was the point I was making there.
- "Just Do It but give users an option to not" - This has the problems
that
E_DEPRECATED
has, but it also gets us back to having fundamental code
behavior controlled by an INI setting, which for a very long time this
community has generally seen as a bad thing (especially for portability
and
code re-use).I do too, and I was upfront about their cons, not just pros. And yet,
they
all bring us to a much better outcome within a relatively short period of
time (in the lifetime of a language) than the Dual Mode will.Let's agree to disagree that an ini setting will be better than a
per-file setting.In fact, I personally think this is major enough of an issue that I
will vote no simply on this reason alone (type behavior depending on
an ini setting in any way shape or form).Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference
is in your code, not the end function.I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling foo()
from one place and have it accept her arguments, and foo() from another
place and have it reject the very same arguments.Let me think on it and I will come up with something.
With strict mode, you'd have to embed a cast (smart or explicit) to
convert to
an integer at the point the data comes in.First, I'm not aware of smart/safe casts being available or proposed at
this
point.
Secondly, why at the point the data comes in? That would be ideal for
static analyzers, but it's probably a lot more common that it will be
done
at the first point in time where it gets rejected.By "smart cast" I was referring to a function which checked
is_numeric()
. Not a new language construct.I have a hard time connecting to the 'power' approach. I think
developers
want their code to work, with minimal effort, and be secure. Coercive
scalar type hints will do an excellent job at that. Strict type hints
will
be more work, are bound to a lot of trigger "Oh come on" responses, and
as a
special bonus - proliferate the use of explicit casts. Let me top that -
you'd have developers who think they're security conscious, because
they're
using strict mode - with code that's full of explicit casts.I agree we should have users avoid explicit casts. That's why the
dual-mode proposal exists. If users don't want to control their types,
they should use the default mode. And everything works fine.If they know what they want, then the explicit cast becomes a
documenting piece of information that "this is supposed to happen".
Ex:function takesInt(int $a) {}
function foo(float $b) {
return takesInt($b);
}In weak mode, that "just works". But is it supposed to just work? You
have no idea. The next developer who comes will look at it and ask "is
that supposed to truncate, or was that an oversight?" and have no
idea. But in strict mode, placing an explicit cast before $b shows the
next developer who comes there "the truncation was intentional".Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the
same reasons mentioned above about JIT, we don't believe that is the
caseThis is patently false.
It's actually patently true. We don't believe that is the case. QED.
To understand why "we don't believe" can be false, let's make an
analogy: I can say that I don't believe in gravity. That doesn't mean
that the opinion isn't patently false just because it was stated as an
opinion (or rather the "believe" is true, but the implication of the
belief is false)...Keep not believing it all you want, but static analysis
requires statically looking at code. Which means you have no value
information. So static analysis can't possibly happen in cases where you
need
to know about value information (because it's not there). Yes, at
function
entry you know the types. But static analysis isn't about analyzing a
single
function (in fact, that's the least interesting case). It's more about
analyzing a
series of functions, a function call graph. And in that case strict
typing
(based
only on
type) does make a big difference.I think it's fair to say that while we were unable to convince you
there's
no tangible extra value in Strict STH compared to any other kind of STH
that
guarantees the type of value a function will get, you were also unable to
convince Dmitry, Stas or myself - all of which independently discussed it
with you. Again, despite that, I'm not saying that you're "patently
wrong",
just that I don't believe you're right.I've built a static analyzer that's public. I've talked to people who
build them for a living. I don't claim to be an expert in them (far
from it), but what I've seen and learned is that what you're talking
about here either isn't possible (yet) or is difficult enough to be
impractical (in terms of computing resources necessary).You can disagree with me all you want. You don't even need to convince
me. All you need to do is disprove me. Show me a static analyzer for a
sufficiently dynamic language (Scalar PHP or full JS - not ASM.js -
would work) and I'll happy apologize and retract the comment. But so
far all I've seen are people saying it's possible even in presence of
arguments to the contrary (why it's not possible).
There have been several attempts:
for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf
or similar techniques applied to PHP, quite outdated though:
https://github.com/colder/phantm
You are right that the lack of static information about types is (one of
the) a main issue. Recovering the types has typically a huge performance
cost, or is unreliable
But seriously, time is getting wasted on this argument; it's actually a
no-brainer: more static information helps tools that rely on static
information. Yes. Absolutely. 100%.
The question is rather: at what weight should we take (potential/future)
external tools into account when developping language features?
From: Etienne Kneuss [mailto:colder@php.net]
Sent: Sunday, February 22, 2015 3:00 PM
To: Anthony Ferrara; Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC
The question is rather: at what weight should we take (potential/future)
external tools into account when developping language features?
Agreed! My answer - "Static Analyzers need to be designed for Languages,
rather than Languages being designed for Static Analyzers".
Will send additional thoughts on Static Analysis on a separate, off-list
email to put this argument to rest as both Anthony and agreed to.
Zeev
There have been several attempts:
for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf
or similar techniques applied to PHP, quite outdated though:
https://github.com/colder/phantmYou are right that the lack of static information about types is (one of
the) a main issue. Recovering the types has typically a huge performance
cost, or is unreliableBut seriously, time is getting wasted on this argument; it's actually a
no-brainer: more static information helps tools that rely on static
information. Yes. Absolutely. 100%.The question is rather: at what weight should we take (potential/future)
external tools into account when developping language features?
Previous on the list nodejs JIT engine was mentioned as a working
example of a JIT without the language having any sort of type
information. While this is true I think it should also be considered the
amount of checks and resources required for the generated machine code
to achieve this. On this tests you can see that in most situations the
javascript v8 engine used on nodejs uses much more memory than that of
current PHP (compare it to C also)
(benchmarksgame.alioth.debian.org/u64/compare.php?lang=v8&lang2=php)
Yes, it is faster, but it consumes much more CPU and RAM in most
situations, and I'm sure that it is related to the dynamic nature of the
language.
A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody explain if thats not the case.
As I see it, some example, if the JIT generated C++ code to then
generate the machine code:
function calc(int $val1, int $val2) : int {return $val1 + $val2;}
On weak mode I see the generated code would be something like this:
Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() && val2.isInt())
return new Variant(val1.toInt() + val2.toInt());
else if(val1.isFloat() && val2.isFloat())
return new Variant(val1.toInt() + val2.toInt());
else
throw new RuntimeError();
}
while on strict mode the generated code could be:
int calc(int val1, int val2) {
return val1 + val2;
}
So in this scenario is clear that strict mode performance and memory
usage would be better. Code conversion code would be required only in
some cases, example:
calc(1, 5) // No need for casting
calc((int) "12", (int) "15") // Needs casting depending on how the
parser deals with it
If my example is right it means strict would be better to achieve good
performance rather than weak which is almost the same situation we have
now with zval's. Also I think is wrong to say that casting will always
take place on strict mode.
So I have some questions floating on my mind for the coercive rfc.
-
Does weak mode could provide the required rules to implement a JIT
with a sane level of memory and CPU usage? -
I see that the proponents of dual weak/strict modes are offering to
write a AOT implementation if strict makes it, And the impresive work of
Joe (JITFU) and Anthony on recki-ct with the strict mode could be taken
to another level of integration with PHP and performance. IMHO is harder
and more resource hungry to implement a JIT/AOT using weak mode. With
that said, if a JIT implementation is developed will the story of the
ZendOptimizer being a commercial solution will be repeated or would this
JIT implementation would be part of the core?
Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web
should have some caring on this department.
- Does weak mode could provide the required rules to implement a JIT
with a sane level of memory and CPU usage?
There is no objective answer to the question while it has the clause "with
a sane level of ...".
The assertion in the RFC that says there is no difference between strict
and weak types, in the context of a JIT/AOT compiler, is wrong.
function add(int $l, int $r) {
return $l + $r;
}
The first instruction in the interpreted code is not ZEND_ADD, first,
parameters must be received from the stack.
If that's a strict function, then un-stacking parameters is relatively
easy, if it's a dynamic function then you have to generate code that is
considerably more complicated.
This is an inescapable difference, of the the kind that definitely does
have a negative impact on implementation complexity, runtime, and
maintainability.
To me, it only makes sense to compile strict code AOT or JIT; If you want
dynamic behaviour, we have an extremely mature platform for that.
- ... With that said, if a JIT implementation is developed will the
story of the ZendOptimizer being a commercial solution will be repeated or
would this JIT implementation would be part of the core?
There should hopefully be no need to complicate the core with the
implementation, the number of people that are capable of maintaining Zend
is low enough already, the number of people able to maintain something as
new (for us) and complex as a JIT/AOT engine is even less, I fear.
I think it's likely that Anthony and I, and Dmitry want different things
for a JIT/AOT engine. I think Anthony and I are preferring an engine that
requires minimal inference because type information is present (or
implicit), while Dmitry probably favours the kind that can infer at
runtime, the dynamic kind, like Zend is today. They are a world apart, I
think, I'll be happy to be proven wrong about that.
I like to think that even if Dmitry wrote it all by himself, it would be
opensource from the start, in fact I don't think that will happen. I'm
hoping we'll all work on the same solution together.
Cheers
Joe
On Sun, Feb 22, 2015 at 2:24 PM, Jefferson Gonzalez jgmdev@gmail.com
wrote:
There have been several attempts:
for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf
or similar techniques applied to PHP, quite outdated though:
https://github.com/colder/phantmYou are right that the lack of static information about types is (one of
the) a main issue. Recovering the types has typically a huge performance
cost, or is unreliableBut seriously, time is getting wasted on this argument; it's actually a
no-brainer: more static information helps tools that rely on static
information. Yes. Absolutely. 100%.The question is rather: at what weight should we take (potential/future)
external tools into account when developping language features?Previous on the list nodejs JIT engine was mentioned as a working example
of a JIT without the language having any sort of type information. While
this is true I think it should also be considered the amount of checks and
resources required for the generated machine code to achieve this. On this
tests you can see that in most situations the javascript v8 engine used on
nodejs uses much more memory than that of current PHP (compare it to C
also) (benchmarksgame.alioth.debian.org/u64/compare.php?lang=v8&lang2=php)
Yes, it is faster, but it consumes much more CPU and RAM in most
situations, and I'm sure that it is related to the dynamic nature of the
language.A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules, somebody
explain if thats not the case.As I see it, some example, if the JIT generated C++ code to then generate
the machine code:function calc(int $val1, int $val2) : int {return $val1 + $val2;}
On weak mode I see the generated code would be something like this:
Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() && val2.isInt())
return new Variant(val1.toInt() + val2.toInt());else if(val1.isFloat() && val2.isFloat()) return new Variant(val1.toInt() + val2.toInt()); else throw new RuntimeError();
}
while on strict mode the generated code could be:
int calc(int val1, int val2) {
return val1 + val2;
}So in this scenario is clear that strict mode performance and memory usage
would be better. Code conversion code would be required only in some cases,
example:calc(1, 5) // No need for casting
calc((int) "12", (int) "15") // Needs casting depending on how the parser
deals with itIf my example is right it means strict would be better to achieve good
performance rather than weak which is almost the same situation we have now
with zval's. Also I think is wrong to say that casting will always take
place on strict mode.So I have some questions floating on my mind for the coercive rfc.
Does weak mode could provide the required rules to implement a JIT with
a sane level of memory and CPU usage?I see that the proponents of dual weak/strict modes are offering to
write a AOT implementation if strict makes it, And the impresive work of
Joe (JITFU) and Anthony on recki-ct with the strict mode could be taken to
another level of integration with PHP and performance. IMHO is harder and
more resource hungry to implement a JIT/AOT using weak mode. With that
said, if a JIT implementation is developed will the story of the
ZendOptimizer being a commercial solution will be repeated or would this
JIT implementation would be part of the core?Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web should
have some caring on this department.
This is an inescapable difference, of the the kind that definitely does
have a negative impact on implementation complexity, runtime, and
maintainability.To me, it only makes sense to compile strict code AOT or JIT; If you want
dynamic behaviour, we have an extremely mature platform for that.
So basically weak mode coupled with a JIT will be almost the same thing
we have today, the only difference is the opcache would be replaced with
machine code (for a bit more of performance), but the same logic and
code used on the zend engine will also be used on the generated code of
the JIT (more bloat).
On the other hand a strict type mode would allow the generation of
machine code that is much cleaner and similar in respect of C to machine
code translation, meaning it would be more efficient and less resource
hungry to the point that functions code generated by the AOT or JIT
would be more efficient than those functions provided by the zend
engine, which does lots of type checking/parsing (less bloat).
There should hopefully be no need to complicate the core with the
implementation, the number of people that are capable of maintaining Zend
is low enough already, the number of people able to maintain something as
new (for us) and complex as a JIT/AOT engine is even less, I fear.
And thats why I asked about the commercial stuff, because, like things
are looking, from a technical perspective the strict mode opens the
doors for an easier implementation of AOT or JIT while a weak mode would
only make it harder for others in the community to work in such things,
which again rises the question, does this whole idea of favoring a weak
model by the minority serve as an impediment/complication for others so
they (those who favor weak) can force a commercial solution?
I think it's likely that Anthony and I, and Dmitry want different things
for a JIT/AOT engine. I think Anthony and I are preferring an engine that
requires minimal inference because type information is present (or
implicit), while Dmitry probably favours the kind that can infer at
runtime, the dynamic kind, like Zend is today. They are a world apart, I
think, I'll be happy to be proven wrong about that.I like to think that even if Dmitry wrote it all by himself, it would be
opensource from the start, in fact I don't think that will happen. I'm
hoping we'll all work on the same solution together.
And it would be ideal to have the most capable people to develop this
solution to work in a single team from a community point of view. IMHO a
dual weak/strict mode is the best way of getting people to work together
in a way that benefits the community. Otherwise, a single handed man
working on a solution can serve as a justification to commercialize
something that is being currently offered by others (HHVM).
- Does weak mode could provide the required rules to implement a JIT
with a sane level of memory and CPU usage?There is no objective answer to the question while it has the clause "with
a sane level of ...".The assertion in the RFC that says there is no difference between strict
and weak types, in the context of a JIT/AOT compiler, is wrong.function add(int $l, int $r) {
return $l + $r;
}The first instruction in the interpreted code is not ZEND_ADD, first,
parameters must be received from the stack.
PHP7 interpreter skips RECV instructions without type hints, because they
do nothing.
with type hints, they may perform checks and conversion.
If that's a strict function, then un-stacking parameters is relatively
easy, if it's a dynamic function then you have to generate code that is
considerably more complicated.
in both cases code is similar - an additional call on slow path
if (UNEXPECTED(!valid_type(arg)) {
// slow path
zend_error(...) or convert_to_type(arg);
}
The problem with run-time switch is the fact that we will have to generate
additional check and both calls in slow patch. This is why I'm against the
run-time declare() switch.
This is an inescapable difference, of the the kind that definitely does
have a negative impact on implementation complexity, runtime, and
maintainability.To me, it only makes sense to compile strict code AOT or JIT; If you want
dynamic behaviour, we have an extremely mature platform for that.
Strictness is defined on call-site. So when you compile a function, you
don't know with what semantic it's going to be called. Now think about
different declare(strict) setting in different files and call chain. Do you
see the mess? :)
- ... With that said, if a JIT implementation is developed will the
story of the ZendOptimizer being a commercial solution will be repeated or
would this JIT implementation would be part of the core?There should hopefully be no need to complicate the core with the
implementation, the number of people that are capable of maintaining Zend
is low enough already, the number of people able to maintain something as
new (for us) and complex as a JIT/AOT engine is even less, I fear.I think it's likely that Anthony and I, and Dmitry want different things
for a JIT/AOT engine. I think Anthony and I are preferring an engine that
requires minimal inference because type information is present (or
implicit), while Dmitry probably favours the kind that can infer at
runtime, the dynamic kind, like Zend is today. They are a world apart, I
think, I'll be happy to be proven wrong about that.
I think our approches were very similar and leaded us to simular decisions,
however we alredy tried jit-ing bench.php with type hints (strict). While
JIT itselv makes bench.php about 5 times faster, type hints adds about 2%.
Strict or weak make no difference at all.
I like to think that even if Dmitry wrote it all by himself, it would be
opensource from the start, in fact I don't think that will happen. I'm
hoping we'll all work on the same solution together.
We are going to open our work soon, but we are not going to work on it
actevely, because all the forces at PHP7.
Thanks. Dmitry.
Cheers
JoeOn Sun, Feb 22, 2015 at 2:24 PM, Jefferson Gonzalez jgmdev@gmail.com
wrote:There have been several attempts:
for JS: http://users-cs.au.dk/simonhj/tajs2009.pdf
or similar techniques applied to PHP, quite outdated though:
https://github.com/colder/phantmYou are right that the lack of static information about types is (one
of
the) a main issue. Recovering the types has typically a huge
performance
cost, or is unreliableBut seriously, time is getting wasted on this argument; it's actually a
no-brainer: more static information helps tools that rely on static
information. Yes. Absolutely. 100%.The question is rather: at what weight should we take
(potential/future)
external tools into account when developping language features?Previous on the list nodejs JIT engine was mentioned as a working
example
of a JIT without the language having any sort of type information. While
this is true I think it should also be considered the amount of checks
and
resources required for the generated machine code to achieve this. On
this
tests you can see that in most situations the javascript v8 engine used
on
nodejs uses much more memory than that of current PHP (compare it to C
also) (
benchmarksgame.alioth.debian.org/u64/compare.php?lang=v8&lang2=php)
Yes, it is faster, but it consumes much more CPU and RAM in most
situations, and I'm sure that it is related to the dynamic nature of the
language.A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody
explain if thats not the case.As I see it, some example, if the JIT generated C++ code to then
generate
the machine code:function calc(int $val1, int $val2) : int {return $val1 + $val2;}
On weak mode I see the generated code would be something like this:
Variant* calc(Variant& val1, Variant& val2) {
if(val1.isInt() && val2.isInt())
return new Variant(val1.toInt() + val2.toInt());else if(val1.isFloat() && val2.isFloat()) return new Variant(val1.toInt() + val2.toInt()); else throw new RuntimeError();
}
while on strict mode the generated code could be:
int calc(int val1, int val2) {
return val1 + val2;
}So in this scenario is clear that strict mode performance and memory
usage
would be better. Code conversion code would be required only in some
cases,
example:calc(1, 5) // No need for casting
calc((int) "12", (int) "15") // Needs casting depending on how the
parser
deals with itIf my example is right it means strict would be better to achieve good
performance rather than weak which is almost the same situation we have
now
with zval's. Also I think is wrong to say that casting will always take
place on strict mode.So I have some questions floating on my mind for the coercive rfc.
Does weak mode could provide the required rules to implement a JIT
with
a sane level of memory and CPU usage?I see that the proponents of dual weak/strict modes are offering to
write a AOT implementation if strict makes it, And the impresive work of
Joe (JITFU) and Anthony on recki-ct with the strict mode could be taken
to
another level of integration with PHP and performance. IMHO is harder
and
more resource hungry to implement a JIT/AOT using weak mode. With that
said, if a JIT implementation is developed will the story of the
ZendOptimizer being a commercial solution will be repeated or would this
JIT implementation would be part of the core?Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web
should
have some caring on this department.
-----Original Message-----
From: Dmitry Stogov [mailto:dmitry@zend.com]
Sent: Monday, February 23, 2015 1:54 PM
To: Joe Watkins
Cc: Etienne Kneuss; Jefferson Gonzalez; PHP internals; Anthony Ferrara;
Zeev
Suraski
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCI think our approches were very similar and leaded us to simular
decisions,
however we alredy tried jit-ing bench.php with type hints (strict). While
JIT
itselv makes bench.php about 5 times faster, type hints adds about 2%.
Strict
or weak make no difference at all.
Woops, it was Mandelbrot that was 25x faster with JIT. bench.php as a whole
was 'only' 5 times faster as Dmitry stated.
Zeev
Hi!
A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody explain if thats not the case.
Yes, that's not the case, at least nobody ever showed that to be the
case. In general, as JS example (among many others) shows, it is
completely possible to have JIT without strict typing. In particular,
coercive typing provides as much information as strict typing about
variable type after passing the function boundary - the only difference
is what happens at the boundary and how the engine behaves when the
types do not match, but I do not see where big performance difference
would come from - the only possibility for different behavior would be
if your app requires constant type juggling (checks are needed in strict
mode anyway, since variables are not typed) - but in this case in strict
mode you'd have to do manual type conversions, which aren't in any way
faster than engine type conversions.
So the case for JIT being somehow better with strict typing so far
remains a myth without any substantiation.
while on strict mode the generated code could be:
int calc(int val1, int val2) {
return val1 + val2;
}
No, it can't be (at least it can't be the entire code of this
function), since the user still can pass non-int into this function -
nothing introducing strict typing in functions, as it is proposed now,
prevents it. What strict typing does is to ensure the error in this
case, but to generate the error you still need the checks!
BTW, your weak mode code is wrong too - there's no need to generate
Variants if you typed the variables as int. You know once coercion is
done they are ints. At least in the model that was now proposed.
If my example is right it means strict would be better to achieve good
Unfortunately, your example is not right.
to another level of integration with PHP and performance. IMHO is harder
and more resource hungry to implement a JIT/AOT using weak mode. With
Please provide a substantiation for this opinion. So far what was
provided was not correct.
Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web
should have some caring on this department.
Please do not strawman. A lot of people here care about performance, and
you have not yet made case that strict typing has any benefit on
performance, so implying that opponents of strict typing somehow don't
care about performance while you champion it does not match the real
situation.
Stas Malyshev
smalyshev@gmail.com
2015-02-22 16:38 GMT-04:00 Stanislav Malyshev smalyshev@gmail.com:
Yes, that's not the case, at least nobody ever showed that to be the
case. In general, as JS example (among many others) shows, it is
completely possible to have JIT without strict typing. In particular,
coercive typing provides as much information as strict typing about
variable type after passing the function boundary - the only difference
is what happens at the boundary and how the engine behaves when the
types do not match, but I do not see where big performance difference
would come from - the only possibility for different behavior would be
if your app requires constant type juggling (checks are needed in strict
mode anyway, since variables are not typed) - but in this case in strict
mode you'd have to do manual type conversions, which aren't in any way
faster than engine type conversions.
So the case for JIT being somehow better with strict typing so far
remains a myth without any substantiation.
Well, strict on a JIT environment may haven't been proved, but it surely
has been proved on statically compiled languages like C. Currently, a JIT
in the most cases can't compete to the bare performance of a static
compiled language, both in resources and CPU, so how is non strict better
in that sense? You can argue a lot about nodejs, but as I said on previous
message, at runtime it consumes more memory and cpu and this is mostly due
to all the type checking it requires. In that sense if the strict proposal
could improve that situation it would be a benefit.
No, it can't be (at least it can't be the entire code of this
function), since the user still can pass non-int into this function -
nothing introducing strict typing in functions, as it is proposed now,
prevents it. What strict typing does is to ensure the error in this
case, but to generate the error you still need the checks!
BTW, your weak mode code is wrong too - there's no need to generate
Variants if you typed the variables as int. You know once coercion is
done they are ints. At least in the model that was now proposed.
I thought those checks could be optional if generated at call time, thats
why I gave these 2 examples:
calc(1, 5) -> no need for type checking or conversion, do a direct call
calc("12", "15") -> calc(strToInt(value1), strToInt(value2))
calc($var1, $var2) -> needs type checking and conversion if required
I was thinking on the sense that before calling a function, type checking
could take place and conversion if required, but may be thats even more
complicated...
So what you are saying is that there is no way of determining the type of a
variable (only at runtime), as Zeev explained on the previous messages,
since variables aren't typed, checks are mandatory either way.
Please provide a substantiation for this opinion. So far what was
provided was not correct.
Static typed languages -> Direct conversion to machine code
Dynamic typed languages with JIT -> Intermediate representation -> Checks
-> Conversion to machine code with checks.
Please do not strawman. A lot of people here care about performance, and
you have not yet made case that strict typing has any benefit on
performance, so implying that opponents of strict typing somehow don't
care about performance while you champion it does not match the real
situation.
My intention is just that, clear the doubts, I thought and may still think
that strict has some advantages, but I'm been proven wrong and many people
with all these insightful information might as well.
-----Original Message-----
From: Jefferson González [mailto:jgmdev@gmail.com]
Sent: Sunday, February 22, 2015 11:59 PM
To: Stanislav Malyshev
Cc: Etienne Kneuss; Anthony Ferrara; Zeev Suraski; PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFC2015-02-22 16:38 GMT-04:00 Stanislav Malyshev smalyshev@gmail.com:
Yes, that's not the case, at least nobody ever showed that to be the
case. In general, as JS example (among many others) shows, it is
completely possible to have JIT without strict typing. In particular,
coercive typing provides as much information as strict typing about
variable type after passing the function boundary - the only
difference
is what happens at the boundary and how the engine behaves
when the
types do not match, but I do not see where big performance
difference
would come from - the only possibility for different behavior would
be
if your app requires constant type juggling (checks are needed in
strict
mode anyway, since variables are not typed) - but in this case in strict
mode you'd have to do manual type conversions, which aren't in any
way
faster than engine type conversions.
So the case for JIT being somehow better with strict typing so far
remains a myth without any substantiation.Well, strict on a JIT environment may haven't been proved, but it surely
has
been proved on statically compiled languages like C.
Jefferson,
Strict type hints will not make PHP even remotely similar or otherwise
comparable to a statically typed language like C. It will take totally
different facilities being added to PHP, which are not being considered.
Currently, a JIT in the
most cases can't compete to the bare performance of a static compiled
language, both in resources and CPU, so how is non strict better in that
sense?
Nobody is saying it's better in that sense. Nobody is also suggesting that
a statically compiled language like C isn't a lot easier to optimize than a
dynamic language with JIT. What we are saying is that Strict type hints are
of no help making JIT any better or easier, compared to Dynamic/Coercive
hints. I provided what I believe to be detailed proof for that in my email
titled 'JIT'.
I thought those checks could be optional if generated at call time, thats
why I
gave these 2 examples:calc(1, 5) -> no need for type checking or conversion, do a direct call
calc("12", "15") -> calc(strToInt(value1), strToInt(value2)) calc($var1,
$var2) -needs type checking and conversion if required
That's the wrong comparison to make. We should be comparing the same calls
with the two systems, rather than one call in one system and a different one
in a different system.
Taking your example:
calc(1, 5) -> no need for type checking or conversion in neither Strict
type hints nor Coercive/Dynamic type hints, do a direct call. Identical
performance.
calc("1", "5") -> fails in strict type hints, succeeds in Dynamic/Coercive
type hints (cannot be optimized)
Again, this illustrates that the difference between the two is that of
functionality, not performance.
If you're saying that calc("1", "5") is slower than calc(1, 5) when using
dynamic type hints - then that would be correct, but also pretty meaningless
from a performance standpoint, if what you have are string values. And if
you have integer values, well then, we've already established there's no
difference between the two type hinting systems.
Typically, you obtain the data you need in a type that's not under your
control. You're getting data from the browser, database, filesystem, web
service or some algorithm - the type of the values you get is determined by
the API functions you're using to get the data from.
So what are your options if what you have in your hand is "1" and "5",
because that's how the APIs provided the data to you, as opposed to 1 and 5?
Before they can be added, they need to be converted to integer format,
whether this is done by explicitly casting them (likely outcome in case of a
strict type hint), casting them through a safe coercive STH, or letting
PHP's + operator implementation do it for you. The data needs to be
converted somewhere.
So what you are saying is that there is no way of determining the type of
a
variable (only at runtime), as Zeev explained on the previous messages,
since
variables aren't typed, checks are mandatory either way.
There are ways to infer typing information both during compile time and also
create 'educated guess' as to what the data type is going to be based on
runtime information, but:
- No, it's absolutely not possible to always determine the type of
variables during compile-time, you'd often (perhaps more often than not)
only know the data type with absolute certainty only at runtime - Whatever you CAN infer, you can infer equally regardless of whether a
piece of code uses strict type hints or dynamic ones.
Please provide a substantiation for this opinion. So far what was
provided was not correct.Static typed languages -> Direct conversion to machine code
Dynamic typed languages with JIT -> Intermediate representation ->
Checks -Conversion to machine code with checks.
True statements (more or less) but irrelevant to the discussion. Strict
type hints do not make PHP in any way even remotely similar to a statically
typed language. I don't believe Anthony, Joe or anybody are claiming
otherwise.
Zeev
Hi!
Well, strict on a JIT environment may haven't been proved, but it surely
has been proved on statically compiled languages like C. Currently, a
I understand that using the same concept of typing in both cases can be
confusing, but that's pretty much where the similarity ends. Strict
typing in C has very little to do with what is proposed as strict typing
in PHP, and so far nobody is considering making PHP strictly typed in
the way C is (let alone more strict languages than C are). So bringing C
into the discussion is misleading.
JIT in the most cases can't compete to the bare performance of a static
compiled language, both in resources and CPU, so how is non strict
better in that sense?
Dynamic typing is not better in that sense. That's my whole point - from
the JIT perspective, they are the same, so the claim that strict typing,
as proposed, provides performance benefits, is incorrect.
previous message, at runtime it consumes more memory and cpu and this is
mostly due to all the type checking it requires. In that sense if the
As I already mentioned, current strict proposal requires type checking
too. The only one that doesn't is complete strict typing at compile-time
- which nobody is proposing.
strict proposal could improve that situation it would be a benefit.
You keep repeating that, but that claim does not become more true
because it is repeated more times. It still is as unsubstantiated and
lacking base as it was the first time it was introduced. Please provide
some proof (logical or experimental) as to why it must happen (yes, this
includes the "if" too since it is pointless to bring it as a possibility
if we do not have any way for this possibility to be realized).
I thought those checks could be optional if generated at call time,
thats why I gave these 2 examples:
I don't see how they can be "optional" with strict typing.
calc(1, 5) -> no need for type checking or conversion, do a direct call
calc("12", "15") -> calc(strToInt(value1), strToInt(value2))
calc($var1, $var2) -> needs type checking and conversion if required
The same can be said about dynamic typing, with exactly the same words.
The only difference is what happens after checking - but this is only
relevant if the code relies on conversions, in which case in strict case
it just won't work - hardly a performance improvement worth considering.
I was thinking on the sense that before calling a function, type
checking could take place and conversion if required, but may be thats
even more complicated...
This can be done in dynamic case too, provided the type information is
present (i.e. constants). No current proposal does this, though, AFAIK.
Static typed languages -> Direct conversion to machine code
Dynamic typed languages with JIT -> Intermediate representation ->
Checks -> Conversion to machine code with checks.
We're not talking about making PHP statically typed language, do we? So
this advantage - while without any doubt real - does not apply to PHP.
Stas Malyshev
smalyshev@gmail.com
Hi Stas,
It seems the actual problem is that we have too many compiler / code analysis experts in the community ;)
(don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source).
Regards
François
-----Message d'origine-----
De : Stanislav Malyshev [mailto:smalyshev@gmail.com]
Envoyé : dimanche 22 février 2015 21:39
À : Jefferson Gonzalez; Etienne Kneuss; Anthony Ferrara; Zeev Suraski
Cc : PHP internals
Objet : Re: [PHP-DEV] Coercive Scalar Type Hints RFCHi!
A JIT or AOT machine code generator IMHO will never have a decent use of
system resources without some sort of strong/strict typed rules,
somebody explain if thats not the case.Yes, that's not the case, at least nobody ever showed that to be the
case. In general, as JS example (among many others) shows, it is
completely possible to have JIT without strict typing. In particular,
coercive typing provides as much information as strict typing about
variable type after passing the function boundary - the only difference
is what happens at the boundary and how the engine behaves when the
types do not match, but I do not see where big performance difference
would come from - the only possibility for different behavior would be
if your app requires constant type juggling (checks are needed in strict
mode anyway, since variables are not typed) - but in this case in strict
mode you'd have to do manual type conversions, which aren't in any way
faster than engine type conversions.
So the case for JIT being somehow better with strict typing so far
remains a myth without any substantiation.while on strict mode the generated code could be:
int calc(int val1, int val2) {
return val1 + val2;
}No, it can't be (at least it can't be the entire code of this
function), since the user still can pass non-int into this function -
nothing introducing strict typing in functions, as it is proposed now,
prevents it. What strict typing does is to ensure the error in this
case, but to generate the error you still need the checks!
BTW, your weak mode code is wrong too - there's no need to generate
Variants if you typed the variables as int. You know once coercion is
done they are ints. At least in the model that was now proposed.If my example is right it means strict would be better to achieve good
Unfortunately, your example is not right.
to another level of integration with PHP and performance. IMHO is harder
and more resource hungry to implement a JIT/AOT using weak mode. WithPlease provide a substantiation for this opinion. So far what was
provided was not correct.Thats all that comes to mind now, and while many people doesn't care for
performance, IMHO a programming language mainly targeted for the web
should have some caring on this department.Please do not strawman. A lot of people here care about performance, and
you have not yet made case that strict typing has any benefit on
performance, so implying that opponents of strict typing somehow don't
care about performance while you champion it does not match the real
situation.Stas Malyshev
smalyshev@gmail.com
Hi Stas,
It seems the actual problem is that we have too many compiler / code analysis experts in the community ;)
(don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source).
Well I never have worked on a JIT/AOT and I have to admit I haven't done
any contributions to the PHP engine (and it seems I do not have any
rights to write some couple of messages expressing concerns/views
because of that).
On the other side I took the wxwidgets extension in an effort to revive
it (because I believe PHP can have other use cases). Improved its code
generator (and other stuff that involved a relation with the PHP source
code) which now generates more than 905941 lines of code that constitute
the extension (github.com/wxphp/wxphp/tree/master/src).
So I have indeed read source from PHP core. In any case, sorry if I have
annoyed some, that never was my intention, we as humans can't posses all
the knowledge of the world, so thats why we always learn from somebody
else, whats the purpose of a community without participation :)
Cheers!
Hi Stas,
It seems the actual problem is that we have too many compiler / code analysis experts in the community ;)
(don't get me wrong, I am not saying that for you, I just admire your patience explaining the same again and again to people who never read one line from PHP core source).
Well I never have worked on a JIT/AOT and I have to admit I haven't done
any contributions to the PHP engine (and it seems I do not have any
rights to write some couple of messages expressing concerns/views
because of that).
On the other side I took the wxwidgets extension in an effort to revive
it (because I believe PHP can have other use cases). Improved its code
generator (and other stuff that involved a relation with the PHP source
code) which now generates more than 905941 lines of code that constitute
the extension (github.com/wxphp/wxphp/tree/master/src).
So I have indeed read source from PHP core. In any case, sorry if I have
annoyed some, that never was my intention, we as humans can't posses all
the knowledge of the world, so thats why we always learn from somebody
else, whats the purpose of a community without participation :)
Cheers!
Sorry for the previous prematurely sent email, looks like I found a new
keyboard shortcut :)-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Saturday, February 21, 2015 8:12 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
First off, thanks for putting forward a proposal. I look forward to a
patch
that can be experimented with.There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around code
safety/security. In their view, the conversion rules proposed by Dynamic
STH
can easily allow ‘garbage’ input to be silently converted into arguments
that
the callee will accept – but that may, in many cases, hide
difficult-to-find
bugs or otherwise result in unexpected behavior.I think that's partially mis-stating the concern.
I don't think it's mis-stating the key concern. At least not based on what
I've heard from most people here over the last few months.
I think this argument should be avoided from now on.
We surely can go wild and provide names and numbers, all being better
than other. But at the end of the day, we vote on a proposal. A
proposal written by one or many persons (name them or leave them)
with a clear specification, patch, impact (backed by tests), etc. Any
other random estimation or popularity contests are pointless and
counter productive.
Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent
with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages
treatment of parameters.Not in the way Andrea proposed it, IIRC. She opted to go for consistency
with internal functions. Either way, at the risk of being shot for talking
about spiritual things, Dynamic STH is consistent with the dynamic spirit of
PHP, even if there are some discrepancies between its rule-set and the
implicit typing rules that govern expressions. Note that in this RFC I'm
actually suggesting a possible way forward that will align all aspects of
PHP, including implicit casting - and have them all governed by a single set
of rules.
You did not answer my questions about BC. Changing the way we do it is
much more likely to break things than providing a choice to move to
another mode. I am in favor of do not break things by default and give
the option to actually use strict typing when desired (yes, I repeat
myself here).
However there's an important point to make here: a lot of best practice
has
been pushing against the way PHP treats scalar types in certain cases.
Specifically around == vs === and using strict comparison mode in
in_array,
etc.I think you're correct on comparisons, but not so much on the rest. Dynamic
use of scalars in expressions is still exceptionally common in PHP code.
Even with comparisons, == is still very common - and you'd use == vs. ===
depending on what you need.
I do not think using legacy codes to determine which (optional)
features should be implemented in php is the right way. Really not.
So while it appears consistent with the rest of PHP, it only does so if
you
ignore a large part of both the language and the way it's commonly used.Let's agree to disagree. That's one thing we can always agree on! :)
I am not sure there is something to agree on but something to actually
validate against existing codes. We can't do it until this RFC has a
patch.
In reality, the only thing PHP's type system is consistent at is being
inconsistent.I'd have to partially agree with you here; But if you read the RFC through
including its future recommendations, you'd see it's perhaps the first
attempt in 20 years to fix that. Instead of doing that through the
introduction of a 3rd (albeit simplistic rule-set that only pays attention
to zval.type) - a creation of a single set of rules that will be consistent
across the whole language, beginning with userland and internal functions.
I agree we should fix that. I however disagree that the fix may break
BC. Many proposed that back to 5.0 and we did not agree on changing
that. The situation now is no different.
In the "Changes To Internal Functions" section, I think all three types
are
significantly flawed:
"Just Do It" - This is problematic because a very large chunk of code
that
worked in 5.x will all of a sudden not work in 7.0. This will likely
create a
python 2/3 issue, as it would require a LOT of code to be changed to make
it
compatible."Emit E_DEPRECATED" - This is problematic because raising errors (even
if
suppressed) is not cheap. And the potential for raising one for a
non-trivial
percentage of every native function call has the potential to have a
MASSIVE
performance impact for code designed for 5.x. Without a patch to test, it
can't really be codified, but it would be a shame to lose the performance
gains made with 7 because we're triggering 100's, 1000's or 10000's of
errors
in a single application run..."Just Do It but give users an option to not" - This has the problems
that
E_DEPRECATED
has, but it also gets us back to having fundamental code
behavior controlled by an INI setting, which for a very long time this
community has generally seen as a bad thing (especially for portability
and
code re-use).I do too, and I was upfront about their cons, not just pros. And yet, they
all bring us to a much better outcome within a relatively short period of
time (in the lifetime of a language) than the Dual Mode will.
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference
is in your code, not the end function.I'll be happy to get a suggestion from you on how to reword that.
Ultimately, from the layman user's point of view, she'd be calling foo()
from one place and have it accept her arguments, and foo() from another
place and have it reject the very same arguments.For example, a “32” (string) value coming back from an integer column in
a
database table, would not be accepted as valid input for a function
expecting
an integer.There's an important point to consider here. You're relying on information
outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning
when a non-integer value comes in. Because the value-information comes
from outside.Of course we're relying on information coming from outside, as we all know,
this is one of the most common use cases for PHP.
While theoretically you're right, in practice, in the vast majority of cases
it wouldn't play out like that. The string column won't be tested
exclusively with "123" inputs. As soon as there's a non-numeric-string
input, it'll fail. That's likely to happen very early in the process, and
that's before considering that if there's such a huge mismatch between the
semantic meaning of the column and what the function expects - the problem
is likely to be found even sooner, since the function will simply not
perform its intended job.On the flip-side, imagine that same developer using strict types. Feeding
the function that integer in string form gets rejected. What are her
options? The developer is likely to just explicitly cast the value into an
int, giving up on any and all sanitization that coercive types would offer
her, happily accepting "Apples" and "100 Dalmatians" as valid inputs. That,
on the other hand, is a very likely scenario.
You are underestimating the knowledge and experiences of our users. I
do not think developers looking for strict types will do what you are
suggesting.
With strict mode, you'd have to embed a cast (smart or explicit) to
convert to
an integer at the point the data comes in.First, I'm not aware of smart/safe casts being available or proposed at this
point.
Secondly, why at the point the data comes in? That would be ideal for
static analyzers, but it's probably a lot more common that it will be done
at the first point in time where it gets rejected.Additionally, with the dual-mode proposal DB interactions can be in weak
mode and have the exact behavior you're describing here. Giving the user
the
choice, rather than making assumptions.This is bound to be misquoted and used against me, but I don't think it's a
good idea to give the user the choice in such a way. I could have sworn
that you tweeted the quote about perfection being not when there's nothing
left to add, but nothing left to remove, but perhaps it was someone else.
Either way, two modes are worse than one, if we can come up with a good
single unified mode that addresses most cases.
I disagree. One mode remains untouch, fully BC ensures a smooth and
fast (or lest slow) migration to 7. The 2nd mode will attract the non
negligible amount of users looking forward to have such mode.
On the other hand, changing the default mode is very likely to be a
real pain during migration. This is not the kind of things that are
easy to catch, cannot be automated (like code conversions and the
likes).
Remember you can always implement custom type checking to your heart's
content. You can easily implement if (!is_int($foo)) { exit; } in the
not-so-common-cases where accepting "42" as 42 might be disastrous.
However, on the caller side, forcing people to clutter their code with
casts - many casts - either explicit casts or custom ones - is going to
affect a lot more developers in a lot more places. The bang for the buck of
adding strict mode is just not there, in my humble opinion of course.
Now you are misleading readers about what it is proposed. A user of a
library will never ever be forced to cast. This is not what the dual
proposal does and it is not what it will do. The strict mode is
confined to the given library files and code, where the library
authors decided to enable it. I would very much appreciate to stop
using this as an argument as it is simply not correct. We do not more
confusions about the respective proposals.
Strict zval.type based STH effectively eliminates this behavior, moving
the
burden of worrying about type conversion to the user.Correct. And you say that as if it's a bad thing. Being explicit about
type
conversions isn't what you'd do in a 10 line-of-code script where you can
realize what the types are by just thinking about it. But on large scale
systems
exposing the type conversions to the user gives the power to actually
understand the codebase when you can't fit the whole thing in your head at
the same time.I have a hard time connecting to the 'power' approach. I think developers
want their code to work, with minimal effort, and be secure. Coercive
scalar type hints will do an excellent job at that. Strict type hints will
be more work, are bound to a lot of trigger "Oh come on" responses, and as a
special bonus - proliferate the use of explicit casts. Let me top that -
you'd have developers who think they're security conscious, because they're
using strict mode - with code that's full of explicit casts.
Again, speculations on random numbers or code reviews.
It is our position that there is no difference at all between strict
and coercive typing in terms of potential future AOT/JIT development -
none at allSo really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC or
voting
in any way. And is in no part in my RFC at all. Yet brought up again.We listed all what we believe to be misconceptions that were brought up on
internals. As recently as yesterday, you had a PHP power user (Larry) that
was under the strong impression Strict STH would yield substantial
performance benefits.
We agreed that performance is totally irrelevant to this discussions.
And this time my team has provided numbers to back this statement,
after Dmitry's reply worrying (a bit) about the performance impact,
for the 1st time in this discussion). So let move on this aspect,
waste of time :)
Given that it was claimed in the past, and since we
can't assume every voter reads every last word that's written on internals@
threads, it was important to list that here even if it's not mentioned in
the Strict/Dual mode RFC.
We could add this statement in all related RFCs: "performance is not
impacted by this RFC, in any direction". And move on.
It's also worth mentioning that there are people who assume that strict
type hints can somehow help performance, without being domain experts at
neither the engine nor JIT, even if they weren't exposed to the explicit
statements that suggested that on blogs and on internals@ - adding to the
importance of making it clear that there are no performance benefits to that
approach.
Everyone, even "experts", are pretty much assuming a lot of things
about type hinting. We should focus on the design and concept behind
that, not a potential advantages or other "upcoming" new features but
the actual benefits of each proposal from an implementation, clarity,
taste or applications requirements point of views.
Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the
same reasons mentioned above about JIT, we don't believe that is the
caseThis is patently false.
It's actually patently true. We don't believe that is the case. QED.
Both are true and false, let call it the Schroedinger Question of the
day. Refer to my previous line for this question as well. This is not
in the scope of these RFCs.
While at it, can we stop using that 'patently false', and stick for
constructive wording such as 'I disagree'?
I see nothing wrong with "patently". I see much more wrong to totally
ignore feedback, ideas, replies etc. while playing the politically
correct writer. But I am being OT again.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hi Pierre,
I'm currently writing a patch to test the proposed ruleset, as BC is the key point.
If BC breaks are so massive that we have to revert to the original 'weak' mode, I will be the first one to vote for dual-mode.
There will be two slightly different branches : one totally disabling (IS_LONG|IS_FLOAT|IS_STRING to bool) and one implementing the current way of converting these (with the small modification about numeric strings recognized as false). There may be a first level without support for trailing blanks but it is not important as it is an extension of the current behavior, not a restriction.
More when something is ready.
Regards
François
Zeev,
First off, thanks for putting forward a proposal. I look forward to a
patch that can be experimented with.There are a few concerns that I have about the proposal however:
Proponents of Strict STH cite numerous advantages, primarily around
code safety/security. In their view, the conversion rules proposed by
Dynamic STH can easily allow ‘garbage’ input to be silently converted into
arguments that the callee will accept – but that may, in many cases, hide
difficult-to-find bugs or otherwise result in unexpected behavior.I think that's partially mis-stating the concern. It's less about
"garbage input" and more about unpredictable behavior. You can't look
at code and know that it will not produce an error with dynamic
typing. That's one of the big advantages of strict typing that many
people want. In reality the reasons are complex, varied and important
to each person.Proponents of Dynamic STH bring up consistency with the rest of the
language, including some fundamental type-juggling aspects that have been
key tenets of PHP since its inception. Strict STH, in their view, is
inconsistent with these tenets.Dynamic STH is apparently consistency with the rest of the language's
treatment of scalar types. It's inconsistent with the rest of the
languages treatment of parameters.However there's an important point to make here: a lot of best
practice has been pushing against the way PHP treats scalar types in
certain cases. Specifically around == vs === and using strict
comparison mode in in_array, etc.So while it appears consistent with the rest of PHP, it only does so
if you ignore a large part of both the language and the way it's
commonly used.In reality, the only thing PHP's type system is consistent at is being
inconsistent.In the "Changes To Internal Functions" section, I think all three
types are significantly flawed:
"Just Do It" - This is problematic because a very large chunk of
code that worked in 5.x will all of a sudden not work in 7.0. This
will likely create a python 2/3 issue, as it would require a LOT of
code to be changed to make it compatible."Emit E_DEPRECATED" - This is problematic because raising errors
(even if suppressed) is not cheap. And the potential for raising one
for a non-trivial percentage of every native function call has the
potential to have a MASSIVE performance impact for code designed for
5.x. Without a patch to test, it can't really be codified, but it
would be a shame to lose the performance gains made with 7 because
we're triggering 100's, 1000's or 10000's of errors in a single
application run..."Just Do It but give users an option to not" - This has the
problems thatE_DEPRECATED
has, but it also gets us back to having
fundamental code behavior controlled by an INI setting, which for a
very long time this community has generally seen as a bad thing
(especially for portability and code re-use).Moving along,
Further, the two sets can cause the same functions to behave
differently depending on where they're being calledI think that's misleading. The functions will always behave the same.
The difference is how you get data into the function. The behavior
difference is in your code, not the end function.For example, a “32” (string) value coming back from an integer column
in a database table, would not be accepted as valid input for a function
expecting an integer.There's an important point to consider here. You're relying on
information outside of the program to determine program correctness.
So to say "coming back from an integer column" requires concrete
knowledge and information that you can't possibly have in the program.
What happens when some DBA changes the column type to a string type.
The data will still work for a while, but then suddenly break without
warning when a non-integer value comes in. Because the
value-information comes from outside.With strict mode, you'd have to embed a cast (smart or explicit) to
convert to an integer at the point the data comes in. So semantic
information about the value is places right at the point of entry
(forcing the code to be more explicit and clear).Additionally, with the dual-mode proposal DB interactions can be in
weak mode and have the exact behavior you're describing here. Giving
the user the choice, rather than making assumptions.Strict zval.type based STH effectively eliminates this behavior, moving
the burden of worrying about type conversion to the user.Correct. And you say that as if it's a bad thing. Being explicit about
type conversions isn't what you'd do in a 10 line-of-code script where
you can realize what the types are by just thinking about it. But on
large scale systems exposing the type conversions to the user gives
the power to actually understand the codebase when you can't fit the
whole thing in your head at the same time.So what you cite here as a disadvantage many consider to be an advantage.
Performance
I find it funny how the non-strict crowd keeps bringing up performance...
It is our position that there is no difference at all between strict
and coercive typing in terms of potential future AOT/JIT development - none
at allSo really what you're saying is that you disagree with me publicly. A
statement which I said on the side, and I said should not impact RFC
or voting in any way. And is in no part in my RFC at all. Yet brought
up again.Static Analysis. It is the position of several Strict STH proponents
that Strict STH can help static analysis in certain cases. For the same
reasons mentioned above about JIT, we don't believe that is the caseThis is patently false. Keep not believing it all you want, but
static analysis requires statically looking at code. Which means you
have no value information. So static analysis can't possibly happen in
cases where you need to know about value information (because it's not
there). Yes, at function entry you know the types. But static analysis
isn't about analyzing a single function (in fact, that's the least
interesting case). It's more about analyzing a series of functions, a
function call graph. And in that case strict typing (based only on
type) does make a big difference.
Strict and weak type hints provide exactly the same information for static
analizers - they guarantee the types of arguments at function entry. Having
this information it's possible to infer types of other variables inside the
function and even across the functions (analising call graph).
I don't see how the strict semantic of hints may change this guarantee in
some way.
Thanks. Dmitry.
In short, I think the concerns around the handling of internal
functions is significant enough to cause major concern about this
proposal.Thanks
Anthony
All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints
proposal.I think it’s the RFC is a bit premature and could benefit from a bit
more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.The RFC is available here:
Comments welcome!
Zeev
Hi,
2015-02-21 14:22 GMT-03:00 Zeev Suraski zeev@zend.com:
All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.The RFC is available here:
Comments welcome!
Zeev
Thanks for your effort and for proposing the RFC so quick. But indeed, as
you said, this is really premature.
According to my interpretation the RFC proposes a potentially huge bc break
on internal functions usage while the competing (recently withdraw) RFC was
BC compatible regarding this. Having this proposal without a working patch,
so we can try the real impact it could have before form opinions, frankly,
doesn't sound like a good idea.
I know it's still a draft but, considering the RFC is aimed to v7.0, I hope
to see a working implementation before the feature freeze just like what
was exemplary made with the other previous withdraw proposals.
Thanks,
Márcio
Márcio,
I hope to be able to work on an actual implementation and have something by
the end of the upcoming week, allowing us all to experiment.
Other than tweaking the conversion table, which based on the feedbacks I *
believe* can be done in a way everyone can live with – I agree that the
biggest open question is how we deal with internal functions.
Also note that the, the 2nd proposed option in the RFC (“Emit E_DEPRECATED
in 7.0, move to E_RECOVERABLE_ERROR
in v7.1 or v8.0”) – of marking
newly-rejected-conversions as E_DEPRECATED
is not considered as a BC break
IMHO. The code would still work, and the likelihood that something bad was
legitimately found is pretty high.
Given the compressed timeline, I wanted to gauge the general response as
soon as possible, and also start the RFC discussion process as soon as
possible, which meant we couldn’t include the implementation to go along
with it.
Two more things regarding the competing RFC – it’s still alive, and being
promoted for PHP 7.0; And while it doesn’t create a huge BC break, it
allows developers to selectively create localized BC breaks, on a per file
basis.
Thanks for the feedback!
Zeev
Thanks for your effort and for proposing the RFC so quick. But indeed, as
you said, this is really premature.
According to my interpretation the RFC proposes a potentially huge bc break
on internal functions usage while the competing (recently withdraw) RFC was
BC compatible regarding this. Having this proposal without a working patch,
so we can try the real impact it could have before form opinions, frankly,
doesn't sound like a good idea.
I know it's still a draft but, considering the RFC is aimed to v7.0, I hope
to see a working implementation before the feature freeze just like what
was exemplary made with the other previous withdraw proposals.
Thanks,
Márcio
Hi Zeev,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.
Thanks for the refreshed approach. Although there are already oh-so-many
RFCs and discussion, I found the sum up to be very clear. Thanks to you
at al.
- About impact / BC
There's one thing I never understood, even not for 0.3 which almost went
through: why have this impact on internal
function/zend_parse_parameters/ZPP at all?
Why not just keep this strictly user-land /for now/ ?
The user-land change is much more controllable. The internal function
change is ... not sure how to say. Has such a big impact and I'm not
sure what should be the gain at all here.
- I'm with Pierre regarding accepting e.g. 42.0 for an int. Just
imagine you pass the result of a calculation which happens to be a
result with a clean .0 fractional part and everything works. The next
time you calculate with a different input and boom your value is 42.1
and thus reject. That's quite a POLA [1] right there.
I know I will derail this whole thing when I write the next point but I
really think one of the best ways to move forward is to
- only support strict types, e.g. function foo(string $bar)
- in user-land code
That way the there's almost no BC impact sans the possible clashes of
classnames (which I probably just missed how this will be resolved?).
Future RFCs could still reason about other typo of hints (initial weak
STH or now this coercive STH) and expand on the syntax.
thanks,
- Markus
[1] http://en.wikipedia.org/wiki/Principle_of_least_astonishment
Hi Zeev,
All,
I’ve been working with François and several other people from internals@
and the PHP community to create a single-mode Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more
time, but given the time pressure, as well as the fact that a not fully
compatible subset of that RFC was published and has people already
discussing it, it made the most sense to publish it sooner rather than
later.The RFC is available here:
Comments welcome!
I was really looking forward to the RFC. However the dependence on an INI
setting and the question of massive internal BC break are a bit too much I
think.
They are necessary as you explain to allow internal functions/ZPP to have
the same conversion rules as userland functions. This INI setting is quite
similar to introducing two modes as well, but on the server configuration
level instead in a much more fine granular way in scalar type hints v0.5.
I see much more compatibility problems for third party libraries here.
Now if I had to decide between having two modes on a granular file level or
an INI setting (option 2/3) or massive BC breaks (option 1) to get scalar
type hints in PHP, then two modes with declare() doens't look, because I
can pick the mode I want, and no-one can force it on me.
Zeev
Benjamin,
There’s a fundamental difference between the two RFCs that goes beyond
whether using a global INI setting and the other per-file setting. The
fundamental difference is that the endgame of the Dual Mode RFC is having
two modes – and whatever syntax we’ll add, will be with us forever; and in
the Coercive STH RFC – the endgame is a single mode with no INI entries,
and opening the door to changing the rest of PHP to be consistent with the
same rule-set as well (implicit casts). The challenge with the Coercive
STH RFC is figuring out the best transition strategy, but the endgame is
superior IMHO.
Regarding the proposed migration options, yes, if we pick option #2 or #3 –
we’d be introducing an INI entry governing runtime behavior is something
nobody here wants, myself included – but given that it’ll be time limited
(perhaps for as short as 1 or 2 years) – perhaps that’s something we can
live with.
Additional options we could entertain are:
-
Not applying the rules for internal functions at all. Personally I’m
not very fond of this option. -
Go for just
E_DEPRECATED
in 7.0. ChangeE_DEPRECATED
to
E_RECOVERABLE_ERROR
in 7.1/7.2/8.0 (TBD). -
Same as #5, but also provide a mechanism similar to declare() as the
temporary measure for strict campers to explicitly ask for
E_RECOVERABLE_ERROR’s to be triggered. They will no longer be necessary
when we changeE_DEPRECATED
toE_RECOVERABLE_ERROR
in 7.1/7.2/8.0.
Thoughts?
Zeev
I was really looking forward to the RFC. However the dependence on an INI
setting and the question of massive internal BC break are a bit too much I
think.
They are necessary as you explain to allow internal functions/ZPP to have
the same conversion rules as userland functions. This INI setting is quite
similar to introducing two modes as well, but on the server configuration
level instead in a much more fine granular way in scalar type hints v0.5.
I see much more compatibility problems for third party libraries here.
Now if I had to decide between having two modes on a granular file level or
an INI setting (option 2/3) or massive BC breaks (option 1) to get scalar
type hints in PHP, then two modes with declare() doens't look, because I
can pick the mode I want, and no-one can force it on me.
Zeev
There’s a fundamental difference between the two RFCs that goes beyond
whether using a global INI setting and the other per-file setting. The
fundamental difference is that the endgame of the Dual Mode RFC is having
two modes – and whatever syntax we’ll add, will be with us forever; and in
the Coercive STH RFC – the endgame is a single mode with no INI entries,
and opening the door to changing the rest of PHP to be consistent with the
same rule-set as well (implicit casts). The challenge with the Coercive
STH RFC is figuring out the best transition strategy, but the endgame is
superior IMHO.
Hello,
the two modes was something that I didn't like, at all, as a userland
developer. It seems really scary that decision to add 2 modes would
mean that every PHP code could have been written in any of these 2
ways and it would stick forever with PHP, because removing it again if
it proved to be a bad feature would be IMHO really painful.
So a single mode is infinitely better than 2 modes.
Also, personally, I would prefer #1 or #2 version for internal
functions, but definitely without an INI switch. Not being able to
change it on some hostings could make development for the transition
period kinda painful.
Regards
Pavel Kouril
There’s a fundamental difference between the two RFCs that goes beyond
whether using a global INI setting and the other per-file setting. The
fundamental difference is that the endgame of the Dual Mode RFC is having
two modes – and whatever syntax we’ll add, will be with us forever; and
in
the Coercive STH RFC – the endgame is a single mode with no INI entries,
and opening the door to changing the rest of PHP to be consistent with
the
same rule-set as well (implicit casts). The challenge with the Coercive
STH RFC is figuring out the best transition strategy, but the endgame is
superior IMHO.Hello,
the two modes was something that I didn't like, at all, as a userland
developer. It seems really scary that decision to add 2 modes would
mean that every PHP code could have been written in any of these 2
ways and it would stick forever with PHP, because removing it again if
it proved to be a bad feature would be IMHO really painful.So a single mode is infinitely better than 2 modes.
Also, personally, I would prefer #1 or #2 version for internal
functions, but definitely without an INI switch. Not being able to
change it on some hostings could make development for the transition
period kinda painful.Regards
Pavel Kouril--
As a userland developer I will chime in and say that I love this rfc
better than the dual mode. In my opinion, having dual mode will
put unnecessary cognitive burden on me especially when reading other
people's code and libraries. While current rfc conversion rules are also
different than what exists in PHP but I can get used to them and they are
more
in line with what should ideally be there (except contentious ones like
bool).
I will certainly love to see the type coercion rules being unified and
slightly tightened up
over time and this is a good first step for that.
I am not a lang expert but of the languages I have learnt
and developed in, I can't think of any that allow dual mode type coercion
rules in such an explicit manner as the one proposed by Andrea/Anthony.
Thanks
Shashank
Zeev,
Benjamin,
There’s a fundamental difference between the two RFCs that goes beyond
whether using a global INI setting and the other per-file setting. The
fundamental difference is that the endgame of the Dual Mode RFC is having
two modes – and whatever syntax we’ll add, will be with us forever; and in
the Coercive STH RFC – the endgame is a single mode with no INI entries,
and opening the door to changing the rest of PHP to be consistent with the
same rule-set as well (implicit casts). The challenge with the Coercive
STH RFC is figuring out the best transition strategy, but the endgame is
superior IMHO.
Yes i confirmed this difference. So lets say i am totally against INI and
hopefully many others are as well, then we introduce massive BC breaks in
PHP 7. We got ourselves a Python2/3, Ruby1.8/2 situation here that nobody
wants. I don't like the casting rules that PHP has now, but subtly breaking
them everywhere (instead of with declare, by choice) is something I can't
support.
I can forsee Wordpress will not work with kind of BC anymore, so your
(Zends) foremorst goal for PHP7 to get everyone to upgrade because the code
is as fast as HHVM suddenly vanishes in thin air.
Regarding the proposed migration options, yes, if we pick option #2 or #3
– we’d be introducing an INI entry governing runtime behavior is something
nobody here wants, myself included – but given that it’ll be time limited
(perhaps for as short as 1 or 2 years) – perhaps that’s something we can
live with.
1-2 years in php-src has absolutely no relation to how long this will be in
userland, probably 2-5 times as long (2-10 years). You are against dual
mode because people need to learn two different styles, however with some
change like this lurking in old hosters and whatnot, how don't we have to
learn about this as well? Its exactly the same thing.
Additional options we could entertain are:
Not applying the rules for internal functions at all. Personally I’m
not very fond of this option.Go for just
E_DEPRECATED
in 7.0. ChangeE_DEPRECATED
to
E_RECOVERABLE_ERROR
in 7.1/7.2/8.0 (TBD).Same as #5, but also provide a mechanism similar to declare() as the
temporary measure for strict campers to explicitly ask for
E_RECOVERABLE_ERROR’s to be triggered. They will no longer be necessary
when we changeE_DEPRECATED
toE_RECOVERABLE_ERROR
in 7.1/7.2/8.0.
-
Agree with you here, not an option.
-
BC Break too big leading to a Python2/3 situation, low adoption of PHP7,
something that especially you couldn't want given that the perf
improvements are what keeps PHP on level with HHVM. This automatically
applies to all internal functions regardless if userland uses typehints or
not, wordpress will not work with this anymore, no performance comparisons
anymore. -
Ok, so this is now getting very similar like v5 of the STH, but still
with much more BC breaks.
Thoughts?
Zeev
I was really looking forward to the RFC. However the dependence on an INI
setting and the question of massive internal BC break are a bit too much I
think.They are necessary as you explain to allow internal functions/ZPP to have
the same conversion rules as userland functions. This INI setting is quite
similar to introducing two modes as well, but on the server configuration
level instead in a much more fine granular way in scalar type hints v0.5.I see much more compatibility problems for third party libraries here.
Now if I had to decide between having two modes on a granular file level
or an INI setting (option 2/3) or massive BC breaks (option 1) to get
scalar type hints in PHP, then two modes with declare() doens't look,
because I can pick the mode I want, and no-one can force it on me.Zeev
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Monday, February 23, 2015 6:54 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
Additional options we could entertain are:
5. Go for just
E_DEPRECATED
in 7.0. ChangeE_DEPRECATED
to
E_RECOVERABLE_ERROR
in 7.1/7.2/8.0 (TBD).6. Same as #5, but also provide a mechanism similar to declare() as
the temporary measure for strict campers to explicitly ask for
E_RECOVERABLE_ERROR’s to be triggered. They will no longer be necessary
when we changeE_DEPRECATED
toE_RECOVERABLE_ERROR
in 7.1/7.2/8.0.
- BC Break too big leading to a Python2/3 situation, low adoption of
PHP7,
something that especially you couldn't want given that the perf
improvements are what keeps PHP on level with HHVM. This automatically
applies to all internal functions regardless if userland uses typehints or
not,
wordpress will not work with this anymore, no performance comparisons
anymore.
I don't see how that would affect PHP 7 adoption at all actually. You can
just disable E_DEPRECATED
and your upgrade would be clean. Technically it
would have to wait to PHP 8 before we change it to E_RECOVERABLE_ERROR, give
users several years to migrate. Given we're talking about coercive rules
and not strict rules, I actually don't expect that many failures (initial
tests of Francois' patch results in 8% failures in our unit tests, and
that's before tweaking the rules in any way (and apparently, at least some
of them have to do with bugs in internal functions that can be easily
fixed). It doesn't strike me as a worse migration than we did getting rid
of magic_quotes or safe_mode. It can be done.
I'm personally now leaning towards this option as the most viable one.
- Ok, so this is now getting very similar like v5 of the STH, but still
with much
more BC breaks.
Not at all. With the Dual Mode RFC, we're going to have declare() and two
distinct modes even in 10 and 20 years from now, for all eternity. It's a
new feature that's here to say. With this, these declare()'s are a
migration feature, that will be gone in several years. The endgame is
completely different.
Zeev
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Monday, February 23, 2015 6:54 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
Additional options we could entertain are: 5. Go for just `E_DEPRECATED` in 7.0. Change `E_DEPRECATED` to
E_RECOVERABLE_ERROR
in 7.1/7.2/8.0 (TBD).6. Same as #5, but also provide a mechanism similar to declare() as
the temporary measure for strict campers to explicitly ask for
E_RECOVERABLE_ERROR’s to be triggered. They will no longer be necessary
when we changeE_DEPRECATED
toE_RECOVERABLE_ERROR
in 7.1/7.2/8.0.
- BC Break too big leading to a Python2/3 situation, low adoption of
PHP7,
something that especially you couldn't want given that the perf
improvements are what keeps PHP on level with HHVM. This automatically
applies to all internal functions regardless if userland uses typehints
or
not,
wordpress will not work with this anymore, no performance comparisons
anymore.I don't see how that would affect PHP 7 adoption at all actually. You can
just disableE_DEPRECATED
and your upgrade would be clean. Technically it
would have to wait to PHP 8 before we change it to E_RECOVERABLE_ERROR,
give
users several years to migrate.
Just disabling E_DEPRECATED
is hiding bugs now. The error gets triggered
because
somebody uses the wrong coercion rules, if i hide this, how am i going to
fix it?
Compare this to Smarty/FPDF/TCPDF. Extremly wide adoption libraries, still
php v4 code.
They throw strict errors and notices like nothing you have ever seen
before, but nobody
talks about turning them into errors in PHP7/8.
Furthermore, you said before this will be only 1-2 years, now until 8 we
have another 8-10 years.
Given we're talking about coercive rules
and not strict rules, I actually don't expect that many failures (initial
tests of Francois' patch results in 8% failures in our unit tests, and
that's before tweaking the rules in any way (and apparently, at least some
of them have to do with bugs in internal functions that can be easily
fixed). It doesn't strike me as a worse migration than we did getting rid
of magic_quotes or safe_mode. It can be done.
8% is alot. Now you are lucky that you have unit tests. The kind of code
that will rely on this conversion probably does not have a single test.
magic quotes has a workaround you can put into an auto_prepend_file,
magically making it work this way before again.
This changes here have no workaround to keep the old mode.
I'm personally now leaning towards this option as the most viable one.
I agree if we can avoid an INI setting, but this will surely lead to a
Python 2/3 situation.
- Ok, so this is now getting very similar like v5 of the STH, but still
with much
more BC breaks.Not at all. With the Dual Mode RFC, we're going to have declare() and two
distinct modes even in 10 and 20 years from now, for all eternity. It's a
new feature that's here to say. With this, these declare()'s are a
migration feature, that will be gone in several years. The endgame is
completely different.
Yes, I know, but it will not ever break BC with PHPv4/v5 compatible code.
No upgrade problem for nobody.
The problem with the coercive STH RFC is imho that ontop of introducing
Typehints what we all want, it changes the ZPP rules.
I would rather have a vote on v1 of the STH vs v5. And then changes to ZPP
handling can be a completly different task, maybe for PHP8.
Zeev
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Monday, February 23, 2015 7:20 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCJust disabling
E_DEPRECATED
is hiding bugs now.
It's an extreme interpretation. Deprecated features are not bugs
over-night. We never considered it that way. E_DEPRECATED
notices aren't
bugs - they're friendly messages that the feature you relied on has limited
lifespan left in it.
The error gets triggered
because somebody uses the wrong coercion rules, if i hide this, how am i
going to fix it?
You're going to have several years to fix it. You could do it during a bug
squash, at your leisure every once in a while, or the first day after PHP 7
comes out. Just like with any other deprecated feature.
Compare this to Smarty/FPDF/TCPDF. Extremly wide adoption libraries, still
php v4 code.
They throw strict errors and notices like nothing you have ever seen
before,
but nobody talks about turning them into errors in PHP7/8.
First, I've seen some pretty horrible stuff, so don't assume :)
But to the point, this:
wiki.php.net/rfc/reclassify_e_strict
I'm not saying I necessarily support this RFC, but there's at least one
person talking about exactly the stuff supposedly nobody's talking about,
and that's without anything remotely close to the tangible benefits that
doing the internal function migration would bring.
Furthermore, you said before this will be only 1-2 years, now until 8 we
have
another 8-10 years.
There's no rulebook saying a major version needs to come out every 10 years.
PHP 3 came out in 1998, PHP 4 came out in 2000, and PHP 5 came out in 2004.
PHP 5.3, which arguably should have been a major version, came out in 2009.
I think we should announce our plans for PHP 8 announced around the same
time as we release PHP 7. Not a concrete timeline like the one we have for
7 right now, but high level plans. Given we're likely going to work on JIT
right after 7 comes out, we're going to have a trigger for 8 much sooner
than 2025 (not that personally I think we need a huge trigger for a major
version, but that's a different story).
The 1 year figure is in the case we decide to change E_DEPRECATED
to
E_RECOVERABLE_ERROR
in 7.1 (which is an unlikely outcome). The 2 year
figure is realistic for PHP 8 given JIT, but if it's optimistic, it may be 3
and not 2. I doubt PHP 8 will take much longer than that, and can't imagine
10 years.
8% is alot. Now you are lucky that you have unit tests. The kind of code
that
will rely on this conversion probably does not have a single test.
I'm talking about the PHP unit test suite, and while 8% failure rate on a
test suite is generally a lot, I wouldn't consider it a high number for such
an impactful patch that has not yet been tuned. Also assume we have until
the end of the year to potentially tweak certain internal functions to be
more lax than the rules they currently expose, which may make sense in some
cases where we may find widespread breakage associated with specific APIs.
I believe we can get it much lower than 8% by tweaking the rules and
tweaking some internal functions.
Regarding other apps - many of the major apps today do have unit test
suites, and I'm seeing unit and system tests a lot more today than 5 or 10
years ago. We'll only truly know where we stand once we actually try
running some of the major apps and real world code see how many E_DEPRECATED
messages we're getting. If we see a few dozen E_DEPRECATED's for a decent
size apps - the feasibility of migrating it (whether immediately, during the
2-3 year term or when people move to 8 and are forced to do so) is there.
Not unlike many of the countless other features we've deprecated in 7 - most
of which for no reason other than code tidiness.
magic quotes has a workaround you can put into an auto_prepend_file,
magically making it work this way before again.
This changes here have no workaround to keep the old mode.
magic_quotes_gpc did, magic_quotes_runtime did not. safe_mode didn't
either, and it was extremely widely used.
I'm personally now leaning towards this option as the most viable
one.I agree if we can avoid an INI setting, but this will surely lead to a
Python 2/3
situation.
I don't think it will. But instead of guessing, we should try the patch
with some real world apps and find out. I think that if we find out we can
migrate Drupal (or whatever) in a couple of days or even a couple of weeks
to be E_DEPRECATED
free - this approach is very viable. If it requires
months and months of updates, it'll be a different story.
Zeev
I agree if we can avoid an INI setting, but this will surely lead to a
Python 2/3
situation.I don't think it will. But instead of guessing, we should try the patch
with some real world apps and find out. I think that if we find out we
can
migrate Drupal (or whatever) in a couple of days or even a couple of weeks
to beE_DEPRECATED
free - this approach is very viable. If it requires
months and months of updates, it'll be a different story.
I have to disagree with this point. Changing the casting rule is not
something that can be easily "fixed".
It is why i like the dual mode choices. There is nothing to port (except
the few things we had to break already), unless one app explicitly wants to
move to strict internally, partially or totally (don't see why one would do
that).
Cheers,
Pierre
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Monday, February 23, 2015 7:20 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCJust disabling
E_DEPRECATED
is hiding bugs now.It's an extreme interpretation. Deprecated features are not bugs
over-night. We never considered it that way.E_DEPRECATED
notices aren't
bugs - they're friendly messages that the feature you relied on has limited
lifespan left in it.
The error gets triggeredbecause somebody uses the wrong coercion rules, if i hide this, how am
i
going to fix it?You're going to have several years to fix it. You could do it during a bug
squash, at your leisure every once in a while, or the first day after PHP 7
comes out. Just like with any other deprecated feature.
This is not how legacy code migrations work at all. You don't migrate
100KLOC code just because the php version isnt compatible anymore.
You just don't upgrade. Now people might upgrade to PHP 7 because its
"just" E_DEPRECATED.
But they will not upgrade their code to PHP 7.1 or 8 then.
Compare this to Smarty/FPDF/TCPDF. Extremly wide adoption libraries,
still
php v4 code.
They throw strict errors and notices like nothing you have ever seen
before,
but nobody talks about turning them into errors in PHP7/8.First, I've seen some pretty horrible stuff, so don't assume :)
But to the point, this:
wiki.php.net/rfc/reclassify_e_strictI'm not saying I necessarily support this RFC, but there's at least one
person talking about exactly the stuff supposedly nobody's talking about,
and that's without anything remotely close to the tangible benefits that
doing the internal function migration would bring.
How can you compare them?
This changes E_STRICT
to a E_WARNING. This is completely different than
changing E_DEPRECATED
to E_RECOVERABLE_ERROR.
The first only requires to wrap an old library in $old =
error_reporting(0); library(); error_reprting($old);
The second leads to a fatal error.
Furthermore, you said before this will be only 1-2 years, now until 8 we
have
another 8-10 years.There's no rulebook saying a major version needs to come out every 10
years.
PHP 3 came out in 1998, PHP 4 came out in 2000, and PHP 5 came out in 2004.
PHP 5.3, which arguably should have been a major version, came out in 2009.
I think we should announce our plans for PHP 8 announced around the same
time as we release PHP 7. Not a concrete timeline like the one we have for
7 right now, but high level plans. Given we're likely going to work on JIT
right after 7 comes out, we're going to have a trigger for 8 much sooner
than 2025 (not that personally I think we need a huge trigger for a major
version, but that's a different story).
The 1 year figure is in the case we decide to change
E_DEPRECATED
to
E_RECOVERABLE_ERROR
in 7.1 (which is an unlikely outcome). The 2 year
figure is realistic for PHP 8 given JIT, but if it's optimistic, it may be
3
and not 2. I doubt PHP 8 will take much longer than that, and can't
imagine
10 years.
Ok, lets say 4-5 years. Given the lagging adoption of PHP major versions,
double this situation to 8-10 years in the wild.
8% is alot. Now you are lucky that you have unit tests. The kind of code
that
will rely on this conversion probably does not have a single test.I'm talking about the PHP unit test suite, and while 8% failure rate on a
test suite is generally a lot, I wouldn't consider it a high number for
such
an impactful patch that has not yet been tuned. Also assume we have until
the end of the year to potentially tweak certain internal functions to be
more lax than the rules they currently expose, which may make sense in some
cases where we may find widespread breakage associated with specific APIs.
I believe we can get it much lower than 8% by tweaking the rules and
tweaking some internal functions.
Regarding other apps - many of the major apps today do have unit test
suites, and I'm seeing unit and system tests a lot more today than 5 or 10
years ago.
Speaking as a consultant who helps people fix their legacy code I am
biased. In my world nobody has tests.
And even those that have on their prorpiortary apps seldom have higher
coverage than 30-50%. Only very
few have 80%.
You cant compare open source code to what is out there, wordpress/drupal is
good code compared to all the propriortary code out there.
We'll only truly know where we stand once we actually try
running some of the major apps and real world code see how many
E_DEPRECATED
messages we're getting. If we see a few dozen E_DEPRECATED's for a decent
size apps - the feasibility of migrating it (whether immediately, during
the
2-3 year term or when people move to 8 and are forced to do so) is there.
Not unlike many of the countless other features we've deprecated in 7 -
most
of which for no reason other than code tidiness.magic quotes has a workaround you can put into an auto_prepend_file,
magically making it work this way before again.
This changes here have no workaround to keep the old mode.magic_quotes_gpc did, magic_quotes_runtime did not. safe_mode didn't
either, and it was extremely widely used.
Fair point.
I'm personally now leaning towards this option as the most viable
one.
I agree if we can avoid an INI setting, but this will surely lead to a
Python 2/3
situation.I don't think it will. But instead of guessing, we should try the patch
with some real world apps and find out. I think that if we find out we can
migrate Drupal (or whatever) in a couple of days or even a couple of weeks
to beE_DEPRECATED
free - this approach is very viable. If it requires
months and months of updates, it'll be a different story.
"I don't think it will" is pretty subjective opinion. You are betting on
this based on a hunch, without any data or way back, its extremely drastic
change you can't deny that.
"We can migrate Drupal"? Are you volunteering to migrate the code of
everyone else as well?
Drupal Core is maybe 0.0001% of all php code out there, probably less. The
man hours spent reviewing and changing code add up to massive man days of
work.
Why are you suddenly so in favour of breaking BC so much when you where
strictly against this before?
My question to you:
Why didn't you just repropose v1? According to your mails before it might
have run with a majority, so I dont see why this changes are necessary that
are alienating more people than converting to the RFC.
Zeev
-----Original Message-----
From: Benjamin Eberlei [mailto:kontakt@beberlei.de]
Sent: Monday, February 23, 2015 9:35 PM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCYou're going to have several years to fix it. You could do it during a
bug
squash, at your leisure every once in a while, or the first day after
PHP 7
comes out. Just like with any other deprecated feature.This is not how legacy code migrations work at all. You don't migrate
100KLOC code just because the php version isnt compatible anymore.
You just don't upgrade. Now people might upgrade to PHP 7 because its
"just" E_DEPRECATED.
Benjamin,
If you've been for a while on internals, you should know I'm a downwards
compatibility freak. You don't need to explain to me about the nature of
upgrades and how much downwards compatibility is important.
But here, I disagree with you, for several reasons.
If we implement this in PHP 7, assuming the most aggressive 2 year timeline
for 8, 7.x will be supported until late 2019, and if it's a 3 cycle - late
- That's actually 5 years to migrate. Ample time for those who care
about upgrades.
Secondly, for some reason you appear very opinionated that this will result
in huge breakage, even though we don't yet have any evidence to suggest it'd
be the case, at least not yet.
Thirdly, there are major reasons to upgrade to PHP 7, and we'd hopefully
have some major reasons for people to upgrade for 8. There are going to be
substantial carrots next to that breakage stick.
Last, you appear to represent the 'can't be bothered' crowd, which I agree,
is fairly large. But that crowd typically just doesn't upgrade, unless
something or someone twists their arm. They don't upgrade even when there's
no or very little compatibility breakage. That's why 5.3 is still so
popular, and 5.2 isn't rare to find. Let alone 5.4.
We have a track record that suggests that compatibility breakage slows
adoption down, but it does not kill it as you suggest.
But they will not upgrade their code to PHP 7.1 or 8 then.
Sooner or later they'd have to, if they want to maintain security. If they
don't - it will not be the first time in history that people are negligent
about their deployment (as the fact there are so many 5.2/5.3 deployments
still out there proves), nothing unique for 7/7.1.
How can you compare them?
This changes
E_STRICT
to a E_WARNING. This is completely different than
changingE_DEPRECATED
to E_RECOVERABLE_ERROR.
If you read the RFC thoroughly, you'd see it contains elements that are
exactly the same thing:
- Promote to
E_DEPRECATED
if there is intent to remove this functionality in
the future.
// Non-static method Foo::method() should not be called statically
Proposed resolution: Convert toE_DEPRECATED
Possible alternative: Convert to E_DEPRECATED, if we intend to make this a
fatal error in the future.
The first only requires to wrap an old library in $old =
error_reporting(0);
library(); error_reprting($old);The second leads to a fatal error.
Are you seriously explaining to me how zend_error() works? :)
Ok, lets say 4-5 years. Given the lagging adoption of PHP major versions,
double this situation to 8-10 years in the wild.
I do believe a 2-3 timeline for 8 (from when 7 is released) is a lot more
likely than 4-5.
The last time we had a major version was, as you know, 11 years ago. We
can't deduce anything about adoption numbers from what we had back then.
We're also on a much more aggressive timeline nowadays, with support for
each version stopping 3 years from when it gets initially released. At
least based on what I'm seeing, adoption cycles appear to be significantly
shorter than they used to be (1-3 years, not 4-5).
Of course, much like there are still users of 5.3, we're likely to have
users of 7.0 even in 2025. That's not the point, and shouldn't play a role
in the decision - it'd happen regardless.
Speaking as a consultant who helps people fix their legacy code I am
biased.
In my world nobody has tests.
And even those that have on their prorpiortary apps seldom have higher
coverage than 30-50%. Only very few have 80%.You cant compare open source code to what is out there, wordpress/drupal
is good code compared to all the propriortary code out there.
I agree. But I am seeing better testing even in the darker corners of our
tech world.
Regardless, your customers either stick around with the old versions, or
find a way to one way or another test the app - be it manually or otherwise.
The former camp won't upgrade even to 7 - which isn't going to be a trivial
upgrade thanks to already a fair amount of compatibility breakage in it.
The latter camp will figure out ways to upgrade to 8, be it during the
prolonged period of time where they can do it at their leisure, or at some
distant point in the future where they are forced to do it because it's
unsupported and a security exploit is discovered.
I agree if we can avoid an INI setting, but this will surely lead to a
Python 2/3
situation.I don't think it will. But instead of guessing, we should try the patch
with some real world apps and find out. I think that if we find out we
can
migrate Drupal (or whatever) in a couple of days or even a couple of
weeks
to beE_DEPRECATED
free - this approach is very viable. If it requires
months and months of updates, it'll be a different story."I don't think it will" is pretty subjective opinion.
So is "this will surely lead to a Python 2/3 situation". I'd say this
pushes the subjectiveness a couple of notches, too :)
You are betting on this
based on a hunch, without any data or way back, its extremely drastic
change you can't deny that.
No, I'm not betting. And in fact I suggested a way to test your subjective
absolute confidence statement vs. my subjective belief - test it out.
"We can migrate Drupal"? Are you volunteering to migrate the code of
everyone else as well?
Obviously not, but then, no RFC author that introduced breakage into PHP
ever did, and there has been a lot of that going on, also in 7.0. I'm
obviously taking this as a test case, not as a service to the Drupal
community. It could be WordPress of Drupal or whatever app we want to test,
and it could be several of those. You appear fairly confident that we're
going to see an end-of-the-world situation. I believe it will actually not
take a lot of time to migrate. If we see that it takes a short time to
migrate such apps, it's an excellent indicator as to the level of real world
breakage that change would induce. Better than your guesstimate and surely
better than mine.
Drupal Core is maybe 0.0001% of all php code out there, probably less. The
man hours spent reviewing and changing code add up to massive man days
of work.
That's correct, but with substantial benefits - unlike most of our typical
compatibility breakage.
Why are you suddenly so in favour of breaking BC so much when you where
strictly against this before?
OK, so you do know my track record :)
Simple - because it helps create a more consistent language, that's safer,
and caters to the key needs of most of the people of both the strict and
dynamic camps using a single rule-set. It's a very big deal, and I think
it's at the very least worth exploring whether the price we'd have to pay
for it. Most of our compatibility breakage is introduced for no other
reason than warm fuzzy feeling or code cleanliness. It's really about a
good bang for the buck ratio. Yes, it's probably more bucks than we usually
pay for compatibility breakage, but it gives us a heck of a lot more bang in
return.
My question to you:
Why didn't you just repropose v1? According to your mails before it might
have run with a majority, so I dont see why this changes are necessary
that
are alienating more people than converting to the RFC.
I actually don't see that much opposition to it. I saw quite a bit of
support, with lots of feedbacks on how to fine tune the conversion rules -
and a bunch of "PLEASE OH PLEASE NO INI ENTRIES", which I'm going to listen
to (planning to go for the E_DEPRECATED
option in 7.0, E_RECOVERABLE_ERROR
in the future). Not saying there isn't opposition, but so far I've heard a
lot more support than opposition.
I think that voting for v0.1 right now - after all that's happened, would be
divisive for the community. It would have been different if we had v0.3
split into v0.1 and then the v0.3 extras as a subsequent vote from the get
go, but that's water under the bridge and isn't a good idea IMHO. It's
clear we need a solution that pleases most people in both camps. I believe
the Coercive STH RFC does a better job at that than the dual mode RFC. v0.1
doesn't.
Thanks,
Zeev
Zeev,
Secondly, for some reason you appear very opinionated that this will result
in huge breakage, even though we don't yet have any evidence to suggest it'd
be the case, at least not yet.
So I tried to compile the patch to get a real-world idea of the breaks.
The first thing, PHP's own phar generator included in the make file
fails due to the new restricted hints (based on the patch provided).
(I had to rebuild with --disable-phar to get it to build).
Let me make that perfectly clear: under these changes you cannot do a
default compile of PHP due to BC breaks.
Let's take a look at the numbers. I just ran a ./configure
--enable-mbstring --disable-phar --enable-debug, and got the following
test results:
Number of tests : 13653 9008
Tests skipped : 4645 ( 34.0%) --------
Tests warned : 0 ( 0.0%) ( 0.0%)
Tests failed : 697 ( 5.1%) ( 7.7%)
Expected fail : 32 ( 0.2%) ( 0.4%)
Tests passed : 8279 ( 60.6%) ( 91.9%)
Time taken : 954 seconds
7.7% of the internal test suite is extremely significant. That's just
short of 700 tests on a reasonably default install.
So I recompiled with a bunch more extensions:
Number of tests : 13716 10343
Tests failed : 811 ( 5.9%) ( 7.8%)
And that's for PHP's suite. What about for the hundreds of millions of
lines of code of untested apps out there?
I tried running Symfony 2's test suite, but couldn't. Because:
$ ../php-src/sapi/cli/php vendor/bin/phpunit
Warning: debug_backtrace()
expects parameter 1 to be integer, boolean
given in Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on
line 58
Warning: array_shift()
expects parameter 1 to be array, null given in
Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on line 59
Warning: Invalid argument supplied for foreach() in
Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on line 61
Fatal error: Uncaught strpos()
expects parameter 1 to be string, boolean given
Symfony2/src/Symfony/Bridge/PhpUnit/DeprecationErrorHandler.php:38
thrown in Symfony2/vendor/phpunit/phpunit/src/Framework/TestSuite.php
on line 871
So yeah, to say "major BC issues" is the literal definition of an
understatement.
Thirdly, there are major reasons to upgrade to PHP 7, and we'd hopefully
have some major reasons for people to upgrade for 8. There are going to be
substantial carrots next to that breakage stick.
Yes, and if we can avoid giving people breaks they can't trivially
avoid (via a "compatibility checker" script, like minor syntax
changes), then we're just making an artificial barrier for the sake of
it.
Rather than incentivizing the pain, why not avoid it?
Last, you appear to represent the 'can't be bothered' crowd, which I agree,
is fairly large. But that crowd typically just doesn't upgrade, unless
something or someone twists their arm. They don't upgrade even when there's
no or very little compatibility breakage. That's why 5.3 is still so
popular, and 5.2 isn't rare to find. Let alone 5.4.We have a track record that suggests that compatibility breakage slows
adoption down, but it does not kill it as you suggest.But they will not upgrade their code to PHP 7.1 or 8 then.
Sooner or later they'd have to, if they want to maintain security. If they
don't - it will not be the first time in history that people are negligent
about their deployment (as the fact there are so many 5.2/5.3 deployments
still out there proves), nothing unique for 7/7.1.
Sure there is. We can make it easier for people. We can make their
experience with 7 significantly easier than 5.2. We can remove the
stigma and the fear associated with upgrading. We've been doing that
since 5.4. Smaller, easier and less painful breaks when they do exist.
Why not continue that?
Ok, lets say 4-5 years. Given the lagging adoption of PHP major versions,
double this situation to 8-10 years in the wild.I do believe a 2-3 timeline for 8 (from when 7 is released) is a lot more
likely than 4-5.
The last time we had a major version was, as you know, 11 years ago. We
can't deduce anything about adoption numbers from what we had back then.
We're also on a much more aggressive timeline nowadays, with support for
each version stopping 3 years from when it gets initially released. At
least based on what I'm seeing, adoption cycles appear to be significantly
shorter than they used to be (1-3 years, not 4-5).
I fully agree with you here. And if we take that stance, then each
major should get "lighter" in terms of what it breaks. To spur people
to take the latest at all times, because they know it's stable and a
minimal amount of pain.
Of course, much like there are still users of 5.3, we're likely to have
users of 7.0 even in 2025. That's not the point, and shouldn't play a role
in the decision - it'd happen regardless.
I think it absolutely plays a role in the decision. We should be
making lives easier for our users, not making more work. Especially
for the class of user that doesn't stay up to date.
Obviously not, but then, no RFC author that introduced breakage into PHP
ever did, and there has been a lot of that going on, also in 7.0. I'm
Hang on a second. All of the breaks that have gone into 7.0 were of
two classes up until now:
- Those that were using deprecated functionality already
- Those that are 100% statically fixable.
You can write a "migration tool" to automatically fix the BC breaks in
Nikita's patch. And in the PHP4 constructors patch. And in the
reserved types patch. Together they are minor (with the exception of
the php4 constructors patch don't affect a lot of production code),
but they are also 100% statically fixable.
This break is fundamentally different. It's by definition not
statically fixable (at least without injecting casts everywhere, which
is exactly the point of the patch, no?). So it's going to require a
person physically fixing bugs (easier if there's a comprehensive test
suite, really not easy if it's manual testing).
Thanks
Anthony
Anthony,
Thanks for testing, but it's a bit premature to jump to conclusions.
First, disabling phar is in the patch instructions at
github.com/php/php-src/pull/1110 - it's a bug in phar that needs to be
fixed. We'll address it.
Secondly, as was obvious both from Francois' email and mine, this is just an
initial patch, and yes, we know it presently fails 8% of the test cases (as
I stated a few hours ago in an email to Benjamin). I still think it's not a
bad start at all; It would be a pretty bad ending, but I think we can tweak
the rules to get a lot less breakage. If you're up for it, you can
experiment with the many the 12 configuration switches that govern this
patch - but even that is a bit premature. Let people who actually want this
RFC to pass try and tweak the patch first :)
Thirdly, as I shared earlier, the RFC was updated to go for E_DEPRECATED
in
PHP 7, which means there would be zero breakage in PHP 7, and ample time for
people to migrate whatever issues this would introduce to their code until
PHP 8. All functionality changes will go through the E_DEPRECATED
cycle,
exactly like the stuff that got removed in PHP 7.
Last, we don't yet have an answer to your question about the billions of
lines of code out there. But as I told to Benjamin, we have every intention
to try the patch out on some real world apps and see how it performs.
Let me assure you that if we find that there are hundreds of issues trying
to get common apps to work, after we tweak the rules - I'll either retract
the RFC or the very least rethink the internal functions part of it.
Zeev
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, February 24, 2015 12:25 AM
To: Zeev Suraski
Cc: Benjamin Eberlei; PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCZeev,
Secondly, for some reason you appear very opinionated that this will
result in huge breakage, even though we don't yet have any evidence to
suggest it'd be the case, at least not yet.So I tried to compile the patch to get a real-world idea of the breaks.
The first thing, PHP's own phar generator included in the make file fails
due
to the new restricted hints (based on the patch provided).
(I had to rebuild with --disable-phar to get it to build).Let me make that perfectly clear: under these changes you cannot do a
default compile of PHP due to BC breaks.Let's take a look at the numbers. I just ran a
./configure --enable-mbstring --
disable-phar --enable-debug, and got the following test results:Number of tests : 13653 9008
Tests skipped : 4645 ( 34.0%) --------
Tests warned : 0 ( 0.0%) ( 0.0%)
Tests failed : 697 ( 5.1%) ( 7.7%)
Expected fail : 32 ( 0.2%) ( 0.4%)
Tests passed : 8279 ( 60.6%) ( 91.9%)Time taken : 954 seconds
7.7% of the internal test suite is extremely significant. That's just
short of 700
tests on a reasonably default install.So I recompiled with a bunch more extensions:
Number of tests : 13716 10343
Tests failed : 811 ( 5.9%) ( 7.8%)And that's for PHP's suite. What about for the hundreds of millions of
lines of
code of untested apps out there?I tried running Symfony 2's test suite, but couldn't. Because:
$ ../php-src/sapi/cli/php vendor/bin/phpunit
Warning:
debug_backtrace()
expects parameter 1 to be integer, boolean
given in Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on line
58Warning:
array_shift()
expects parameter 1 to be array, null given in
Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on line 59Warning: Invalid argument supplied for foreach() in
Symfony2/vendor/phpunit/phpunit/src/Util/ErrorHandler.php on line 61Fatal error: Uncaught
strpos()
expects parameter 1 to be string, boolean
givenSymfony2/src/Symfony/Bridge/PhpUnit/DeprecationErrorHandler.php:38
thrown in Symfony2/vendor/phpunit/phpunit/src/Framework/TestSuite.php
on line 871So yeah, to say "major BC issues" is the literal definition of an
understatement.Thirdly, there are major reasons to upgrade to PHP 7, and we'd
hopefully have some major reasons for people to upgrade for 8. There
are going to be substantial carrots next to that breakage stick.Yes, and if we can avoid giving people breaks they can't trivially avoid
(via a
"compatibility checker" script, like minor syntax changes), then we're
just
making an artificial barrier for the sake of it.Rather than incentivizing the pain, why not avoid it?
Last, you appear to represent the 'can't be bothered' crowd, which I
agree, is fairly large. But that crowd typically just doesn't
upgrade, unless something or someone twists their arm. They don't
upgrade even when there's no or very little compatibility breakage.
That's why 5.3 is still so popular, and 5.2 isn't rare to find. Let
alone 5.4.We have a track record that suggests that compatibility breakage slows
adoption down, but it does not kill it as you suggest.But they will not upgrade their code to PHP 7.1 or 8 then.
Sooner or later they'd have to, if they want to maintain security. If
they don't - it will not be the first time in history that people are
negligent about their deployment (as the fact there are so many
5.2/5.3 deployments still out there proves), nothing unique for 7/7.1.Sure there is. We can make it easier for people. We can make their
experience with 7 significantly easier than 5.2. We can remove the stigma
and the fear associated with upgrading. We've been doing that since 5.4.
Smaller, easier and less painful breaks when they do exist.
Why not continue that?Ok, lets say 4-5 years. Given the lagging adoption of PHP major
versions, double this situation to 8-10 years in the wild.I do believe a 2-3 timeline for 8 (from when 7 is released) is a lot
more likely than 4-5.
The last time we had a major version was, as you know, 11 years ago.
We can't deduce anything about adoption numbers from what we had back
then.
We're also on a much more aggressive timeline nowadays, with support
for each version stopping 3 years from when it gets initially
released. At least based on what I'm seeing, adoption cycles appear
to be significantly shorter than they used to be (1-3 years, not 4-5).I fully agree with you here. And if we take that stance, then each major
should get "lighter" in terms of what it breaks. To spur people to take
the
latest at all times, because they know it's stable and a minimal amount of
pain.Of course, much like there are still users of 5.3, we're likely to
have users of 7.0 even in 2025. That's not the point, and shouldn't
play a role in the decision - it'd happen regardless.I think it absolutely plays a role in the decision. We should be making
lives
easier for our users, not making more work. Especially for the class of
user
that doesn't stay up to date.Obviously not, but then, no RFC author that introduced breakage into
PHP ever did, and there has been a lot of that going on, also in 7.0.
I'mHang on a second. All of the breaks that have gone into 7.0 were of two
classes up until now:
- Those that were using deprecated functionality already 2. Those that
are
100% statically fixable.You can write a "migration tool" to automatically fix the BC breaks in
Nikita's
patch. And in the PHP4 constructors patch. And in the reserved types
patch.
Together they are minor (with the exception of the php4 constructors patch
don't affect a lot of production code), but they are also 100% statically
fixable.This break is fundamentally different. It's by definition not statically
fixable
(at least without injecting casts everywhere, which is exactly the point
of the
patch, no?). So it's going to require a person physically fixing bugs
(easier if
there's a comprehensive test suite, really not easy if it's manual
testing).Thanks
Anthony
Zeev,
Anthony,
Thanks for testing, but it's a bit premature to jump to conclusions.
First, disabling phar is in the patch instructions at
github.com/php/php-src/pull/1110 - it's a bug in phar that needs to be
fixed. We'll address it.
Well, my point was more that it's an indication of the scale of breaks
to expect.
Secondly, as was obvious both from Francois' email and mine, this is just an
initial patch, and yes, we know it presently fails 8% of the test cases (as
I stated a few hours ago in an email to Benjamin). I still think it's not a
bad start at all; It would be a pretty bad ending, but I think we can tweak
the rules to get a lot less breakage. If you're up for it, you can
experiment with the many the 12 configuration switches that govern this
patch - but even that is a bit premature. Let people who actually want this
RFC to pass try and tweak the patch first :)
My concern though is that if you do tweak the rules to get a
reasonable amount of breakage, you're also going to remove all of the
benefit of the type conversions you proposed.
For example: A number of the issues that I saw with phpunit were
related to passing bool to string and null to string.
So to prevent those, you'd need to accept bool and null for strings.
Which brings strings back precisely to the way they are today. So no
changes.
I saw the same things with int (specifically around
debug_backtrace(false), where the first parameter is a bitmap).
I couldn't get it to go far enough in (without spending 30 minutes
debugging phpunit) to actually run a test. I tried disabling all
bool/null checks in the patch, but am simply getting segfaults now.
Will look at it a bit more later.
Thirdly, as I shared earlier, the RFC was updated to go for
E_DEPRECATED
in
PHP 7, which means there would be zero breakage in PHP 7, and ample time for
people to migrate whatever issues this would introduce to their code until
PHP 8. All functionality changes will go through theE_DEPRECATED
cycle,
exactly like the stuff that got removed in PHP 7.
Well, I am concerned at this error rate we're seeing that we won't
cause significant perf degradation due to the errors (even with
reporting=0). And that's not mentioning log files.
Last, we don't yet have an answer to your question about the billions of
lines of code out there. But as I told to Benjamin, we have every intention
to try the patch out on some real world apps and see how it performs.Let me assure you that if we find that there are hundreds of issues trying
to get common apps to work, after we tweak the rules - I'll either retract
the RFC or the very least rethink the internal functions part of it.
Sounds great. My concern here though is that my instinct says that
either the rules are going to come out so similar to today's rules as
to not be effective, or break enough code that it's not worth it.
Please continue the patch, and let's test iterations. I just don't see
where the magic line will be (not breaking too much while providing
enough benefit to please the proponents of strict types).
Thanks
Anthony
-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Tuesday, February 24, 2015 1:12 AM
To: Zeev Suraski
Cc: PHP internals
Subject: Re: [PHP-DEV] Coercive Scalar Type Hints RFCWell, I am concerned at this error rate we're seeing that we won't cause
significant perf degradation due to the errors (even with reporting=0).
And
that's not mentioning log files.
I'd say that's a feature - excellent motivation to fix these warnings! :)
Last, we don't yet have an answer to your question about the billions
of lines of code out there. But as I told to Benjamin, we have every
intention to try the patch out on some real world apps and see how it
performs.Let me assure you that if we find that there are hundreds of issues
trying to get common apps to work, after we tweak the rules - I'll
either retract the RFC or the very least rethink the internal functions
part of
it.Sounds great. My concern here though is that my instinct says that either
the
rules are going to come out so similar to today's rules as to not be
effective,
or break enough code that it's not worth it.Please continue the patch, and let's test iterations. I just don't see
where the
magic line will be (not breaking too much while providing enough benefit
to
please the proponents of strict types).
I hope we can find that middle ground, triggering a reasonable number of
issues, that will actually gradually push people towards stricter code. But
I agree that the jury is still out on whether that's possible.
I think diving into some of the found issues is as interesting as counting
the number of failures. Are we (with some tweaks) onto a gold mine of
potential issues here, or a torrent of noise? Anyway, we'll see.
Thanks,
Zeev
Thanks for testing, but it's a bit premature to jump to conclusions.
First, disabling phar is in the patch instructions at
github.com/php/php-src/pull/1110 - it's a bug in phar that needs to be
fixed. We'll address it.Secondly, as was obvious both from Francois' email and mine, this is just an
initial patch, and yes, we know it presently fails 8% of the test cases (as
I stated a few hours ago in an email to Benjamin). I still think it's not a
bad start at all; It would be a pretty bad ending, but I think we can tweak
the rules to get a lot less breakage. If you're up for it, you can
experiment with the many the 12 configuration switches that govern this
patch - but even that is a bit premature. Let people who actually want this
RFC to pass try and tweak the patch first :)
I understand it is a initial patch. However I totally fail to see how
changing the casting rules by default is doing us and our users
anything good.
Yes, they are inconsistent, nobody can remember or even know all of
them or edge cases. But after a decade or more, most of them (rare
"magic" edge cases are partially covered already) are part of what a
huge amount of codes rely on. As Benjamin pointed out, most of these
codes are not tested. And for the parts being tested, I can say that
it is nearly impossible to extrapolate the impact of such change in
user land. It is not a lack of desire to do so but the very large and
difficult surface we have to cover. I will be against any change in
this area not related to extreme edge cases, if done by default
without any possibility to disable the new casting behaviors. I
consider as a expected disaster in order of magnitude bigger than what
we see in python 2>3.
Thirdly, as I shared earlier, the RFC was updated to go for
E_DEPRECATED
in
PHP 7, which means there would be zero breakage in PHP 7, and ample time for
people to migrate whatever issues this would introduce to their code until
PHP 8. All functionality changes will go through theE_DEPRECATED
cycle,
exactly like the stuff that got removed in PHP 7.
Well, E_DEPRECATED
as it is now has no meaning. But why not.
That being said, again, I do not see how changing casting rules won't
introduce BC breaks. Please enlighten me.
Last, we don't yet have an answer to your question about the billions of
lines of code out there. But as I told to Benjamin, we have every intention
to try the patch out on some real world apps and see how it performs.
Patch applications to test something that does not break BC? I
understand the need of testing a new feature and we do similar things
for strict typing interacting with legacy code, but one must do test
case is legacy codes remaining untouched to actually see the impact,
with or without new addition of modules or codes relying on a new
mode.
Let me assure you that if we find that there are hundreds of issues trying
to get common apps to work, after we tweak the rules - I'll either retract
the RFC or the very least rethink the internal functions part of it.
The common apps are a very but loud part of the PHP codes out there. I
worry about these parts more than about Wordpress, for example.
De : Anthony Ferrara [mailto:ircmaxell@gmail.com]
The first thing, PHP's own phar generator included in the make file
fails due to the new restricted hints (based on the patch provided).
(I had to rebuild with --disable-phar to get it to build).Let me make that perfectly clear: under these changes you cannot do a
default compile of PHP due to BC breaks.
Wrong point. In this case, restrictions don't break existing code, but allow detecting a bug (SplFileInfo(NULL)).
The PR comment explains this already.
So yeah, to say "major BC issues" is the literal definition of an
understatement.
I'm not sure, as it seems all these errors are probably coming from 2 bugs in symfony code. And don't tell me sending a bool as first arg to strpos()
is not a bug. So, we're just helping them to find bugs in their code. Nothing so terrible, imo.
Additionaly, in the final implementation, these errors will be generated as E_DEPRECATED, which will make it much more clear for the developer.
Regards
François
They don't upgrade even when there's
no or very little compatibility breakage. That's why 5.3 is still so
popular, and 5.2 isn't rare to find. Let alone 5.4.
You know my position here but not only are the problems of areas that
were deprecated in 5.3 and removed in 5.4, trying to provide a hosted
version of PHP that works for the older code while keeping the 5.4+ code
happy is a problem. I've just been hit with a change of ownership of one
of my legacy services and the new owner has 'upgraded' resulting in
several sites now having problems. Yes they run but not clean and out of
8 sites I've acquired 5 completely different frameworks. Add in a change
from Apache to Nginx ... you get the picture ...
I've moved them all to one of my own machines with a 'compatible' build
of PHP and we are working again. Should I have reworked each to get them
running, yes, but which customer do you keep happy first? They WILL all
be moved to a single modern framework ... and I've been saying that
since 2012 ... but at least finally I HAVE a stable system on 5.4 which
does seem to work with 5.6 and also with the current PHP7 build. However
in the process I've lost eaccelerator which used to work well on the
PHP5.2 hosting, so I need an alternative simply to stand still ... and
while PHP7 is giving a speed improvement it still lags behind what I
have on the PHP5.4 machines with eaccelerator. So first step get up to
the same platform with all sites, but currently there is no incentive to
switch to a slower system ... Your speed table from earlier shows a
marked improvement up to PHP5.4 with eaccelerator loaded.
PHP7 reduces memory usage and execution time against a basic PHP5.x but
not against my PHP5.4/eaccelerator base ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
I don't think it will. But instead of guessing, we should try the patch
with some real world apps and find out. I think that if we find out we can
migrate Drupal (or whatever) in a couple of days or even a couple of weeks
to beE_DEPRECATED
free - this approach is very viable. If it requires
months and months of updates, it'll be a different story.
There are still a number of things that need agreement before moving
forward?
1/ The 'strict hint' camp can't see why even if 'disabled' so that we
don't have to bother with it they can have what they want, but it's
another layer of unnecessary code even if it can be optimized out.
2/ Drop the 'strict' but keep STH and we still have another layer of
complexity that may not fit with other methods of handling variable
validation.
3/ Changing the rules for converting types is in my list a separate
question to adding STH at all. Even if STH is not being used the rules
change? At least some alternative means of producing the current style
of working should be retained.
4/ There IS still the alternative option of proper annotation which will
provide an alternate method of including type hinting without needing to
change the run time code at all. I don't see why this is being rejected
as vehemently as not including strict or incorporating annotation IN the
code. Keeping development time information in a separate wrapper both
allows everyone to follow their own preferences and develop validation
tools that work for PHP rather than forcing the use of some other
language that is flavour of the month.
Neither strict, Coercive processing nor François's alternative provide a
complete solution to the validation requirements. Something which could
be provided if the right hooks are available to identify hints be that
the current rules or some third party strict rule set. There was a query
about making hints user definable and that fits in better than any of
the current 'single' solutions? Currently nothing is the perfect
solution so should any be made compulsory?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi Zeev,
-----Ursprüngliche Nachricht-----
Von: Zeev Suraski [mailto:zeev@zend.com]
Gesendet: Samstag, 21. Februar 2015 18:22
An: PHP internals
Betreff: [PHP-DEV] Coercive Scalar Type Hints RFCAll,
I’ve been working with François and several other people from internals@ and the PHP community to create a single-mode
Scalar Type Hints proposal.I think it’s the RFC is a bit premature and could benefit from a bit more time, but given the time pressure, as well as the fact
that a not fully compatible subset of that RFC was published and has people already discussing it, it made the most sense to
publish it sooner rather than later.The RFC is available here:
Comments welcome!
Zeev
First of all, thank you and all others working on this RFC but also people working on another RFC related to scalar type hints. It is good that PHP will get scalar type hints eventually.
Although I think the strict mode as proposed in the v0.5 RFC is nice as such, I prefer this RFC simply by the fact that it does not introduce different modes. I genuinely believe that different modes will be very harmful for PHP.
Yet, this RFC is not perfect either. IMO PHP is not ready for scalar type hints, not for PHP 7.0 respectively and we should instead focusing on clearing the way for the introduction of scalar type hints in PHP 7.x and hence introduce all required BC breaks in PHP 7.0 which are necessary in order that scalar type hints can be added to PHP 7.x later on.
I am provoking on purpose of course. But rightly so, because I think in all the debate about strict/weak scalar type hints we lost focus on what really matters in language growth, namely its maturation.
For instance, Pierre and others carped about that string -> bool and float -> bool are accepted by this RFC. While I agree that it is a bad idea to apply implicit conversions to such input (I would not even allow int -> bool to be honest), it makes totally sense for PHP to behave like this at the moment. I would even claim that null, array, object, literally everything should be accepted as well since the explicit (bool) accepts everything as well and some implicit castings such as the one in an if statement accepts also everything.
From questions like these:
Boolean STH (bool):
this is by far too weak. How strings could be consider as valid, how?
"true" > Boolean true? I suppose then "false" will be boolean false?
What's is the boolean value of float 0.5?
At the very least only integer should be accepted, 0 > false, anything >=1 true
I get the impression that even internals start to get confused about the conversion rules PHP has. Implicitly convert something to bool should be exactly the same as an explicit conversion (thus straight forward to verify http://3v4l.org/nVgbG ).
We should start to eliminate the different behaviour of implicit/explicit castings [1], to have a consistent and predictable/obvious behaviour in the long run. Or in other words, and that is what I meant above, PHP's type system needs to mature. While I can understand that it looks beneficial to have all kind of reliefs for the beginner, it is rather harmful in the long run. PHP has so many inconsistencies and requires a user to be aware of all kind of edge cases that I think bugs are introduced more frequently than necessary. We already have different conversion mechanisms in PHP and I guess the reason why https://wiki.php.net/rfc/safe_cast was declined is based on the fact that most people did not want to see yet another group of conversion rules.
There were people claiming that PHP follows the philosophy that a user does not need to know anything about scalar types. PHP will deal with it via type juggling. A function/operator requires an int? Just pass a scalar and PHP will convert it automatically via type juggling to int.
That is long gone (probably was never there) because the user had to know exactly what type can be passed or rather what values, otherwise bugs are inevitable. Consider the following:
"a" % 1;
fmod("a", 0.5);
Kind of logical that % accepts any kind of scalar where fmod does not, right? I do not want to exaggerate too much on this but I think you get my position that PHP needs to get rid of this inconsistencies rather than adding yet another obstacle which impedes to reach consistency. Once scalar type hints are in place it should follow the conversion rules which we want to have in PHP in the long run otherwise the BC impact it would have to change them would be too big and we would at least need to wait till PHP 8 if not even PHP 9.
So what does that mean for scalar types?
IMO it means that way more important than adding scalar type hints to PHP 7.0 is to agree on a new set of conversion rules for the long run. PHP should strive to have one consistent set of conversion rules which apply in all places where implicit or explicit conversion are used. Hence, the way (bool) works in the future needs to change as well [2]. That is my opinion. I am aware of that such a change would have a way to big BC break impact for PHP 7.0 (likewise the option 1 of this RFC). But can be introduced step by step [3] (kind of my choice of option) and migration tools could facilitate the migration to a new version. I think I am not alone with this opinion (I just read the email of Shashank)
I see the migration plan roughly as follows:
PHP 7.0:
-
reserve keywords: bool, int, float including alternatives
-
deprecate alternative type names such as boolean, integer etc.
-
introduce new conversion functions which reflect the current behaviour of (bool), (int) etc.
--> as mentioned above, they could be named oldSchoolBoolConversion etc.
--> Encourage users to use this function instead of (bool), (int) etc since (bool) etc. will change with PHP 8.0. Also mention, that this function should only be used if the weakness is really required otherwise use the new conversion functions from below -
introduce new conversion functions which reflect the new defined conversion rule set (which shall be the only one encouraged in the future) Those functions shall trigger an
E_RECOVERABLE_ERROR
--> encourage users to use this functions instead of (bool), (int) and oldSchoolBoolConversion etc. (unless the weakness is really required, then use oldSchoolBoolConversion) -
update the docs in order to reflect the new encouraged way. Also mention that:
- (bool), (int) etc. will change their behaviour in PHP 8.0
- internal functions will use the new conversion rules if not already done this way in PHP 8.0 (for instance, strstr will no longer accept a scalar as third parameter in the case where we do not support implicit casts to bool)
- operators will use the new conversion rules if not already done this way in PHP 8.0
- (control structures will use the new conversion rules if not already done this way in PHP 8.0) =>Maybe this is too strict for most of you and goes against the spirit of PHP (I suppose some of you will say that - fair enough, I guess you are right). In this case, I would at least use the term "loose comparison" as mentioned here: http://php.net/manual/en/types.comparisons.php#types.comparisions-loose instead of using the term "conversion", then it is compatible with the changes introduced in PHP 8.0
PHP 7.1: necessary bug-fixes introduced with PHP 7.0
PHP 7.x: deprecate even more if required
PHP 8:
- introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like "strict int" for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion)
- exchange the behaviour of (bool), (int) etc. -> use the new conversion rules instead
- change internal functions which do not yet obey to the new conversion rules
- change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an
E_RECOVERABLE_ERROR
for "a" + 1) - (change the control structures in order that they obey the new conversion rules as well) => as mentioned above, probably too strict for PHP
Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO.
Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution.
Thoughts?
Cheers,
Robert
[1] for instance, that implicit/explicit conversion behaves differently and that the implicit conversion in an if statement uses yet another behaviour. Operator signatures and function signatures behave differently, control structures and functions behave differently etc.
[2] (bool) would not longer accept all kind of input data - IMO it would not accept anything, PHP does not support conversions to bool. I would even change the way if works and disallow if(new Foo()){} for instance (since no conversion to bool exist anymore). PHP could add a function which behaves like the current (bool), maybe oldSchoolBoolConversion (ugly name in order that no one is encouraged to use it) and users would need to write if(oldSchoolBoolConversion(new Foo())){} to get the same behaviour as today.
[snip]
...
PHP 7.1: necessary bug-fixes introduced with PHP 7.0
PHP 7.x: deprecate even more if required
PHP 8:
- introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like "strict int" for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion)
- exchange the behaviour of (bool), (int) etc. -> use the new conversion rules instead
- change internal functions which do not yet obey to the new conversion rules
- change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an
E_RECOVERABLE_ERROR
for "a" + 1)- (change the control structures in order that they obey the new conversion rules as well) => as mentioned above, probably too strict for PHP
Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO.
Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution.Thoughts?
+1 - good analysis - a single mode approach with consistent type
coercion rules across the board makes absolute sense even if STH are put
back until PHP 8.x
I see the migration plan roughly as follows:
PHP 7.0:
reserve keywords: bool, int, float including alternatives
deprecate alternative type names such as boolean, integer etc.
introduce new conversion functions which reflect the current behaviour of (bool), (int) etc.
--> as mentioned above, they could be named oldSchoolBoolConversion etc.
--> Encourage users to use this function instead of (bool), (int) etc since (bool) etc. will change with PHP 8.0. Also mention, that this function should only be used if the weakness is really required otherwise use the new conversion functions from belowintroduce new conversion functions which reflect the new defined conversion rule set (which shall be the only one encouraged in the future) Those functions shall trigger an
E_RECOVERABLE_ERROR
--> encourage users to use this functions instead of (bool), (int) and oldSchoolBoolConversion etc. (unless the weakness is really required, then use oldSchoolBoolConversion)update the docs in order to reflect the new encouraged way. Also mention that:
- (bool), (int) etc. will change their behaviour in PHP 8.0
- internal functions will use the new conversion rules if not already done this way in PHP 8.0 (for instance, strstr will no longer accept a scalar as third parameter in the case where we do not support implicit casts to bool)
- operators will use the new conversion rules if not already done this way in PHP 8.0
- (control structures will use the new conversion rules if not already done this way in PHP 8.0) =>Maybe this is too strict for most of you and goes against the spirit of PHP (I suppose some of you will say that - fair enough, I guess you are right). In this case, I would at least use the term "loose comparison" as mentioned here: http://php.net/manual/en/types.comparisons.php#types.comparisions-loose instead of using the term "conversion", then it is compatible with the changes introduced in PHP 8.0
PHP 7.1: necessary bug-fixes introduced with PHP 7.0
PHP 7.x: deprecate even more if required
PHP 8:
- introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of course, whether with an ini-setting, a declare statement or individually with a modifier something like "strict int" for a single parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to discussion)
- exchange the behaviour of (bool), (int) etc. -> use the new conversion rules instead
- change internal functions which do not yet obey to the new conversion rules
- change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an
E_RECOVERABLE_ERROR
for "a" + 1)- (change the control structures in order that they obey the new conversion rules as well) => as mentioned above, probably too strict for PHP
Back to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are the implicit conversions to bool from string, float and int IMO.
Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a little bit as shown. I think we need more time to come up with a very good strategic solution.
Hello,
Am I understanding correctly that you are suggesting changes to type
casting? This seems like a bad idea. Explicit and implicit conversions
are something really different. Generally, implicit conversions are OK
only when no data is lost and explicit conversions (casts) are used
when you realize some information can get lost and you still want to
proceed with the conversion. Having only one type of conversion is
IMHO weird.
Also, I'm not a fan of having to wait for scalar type hints for few
more years. :(
Regards
Pavel Kouril
Hi Pavel,
-----Ursprüngliche Nachricht-----
Von: Pavel Kouřil [mailto:pajousek@gmail.com]
Gesendet: Sonntag, 22. Februar 2015 15:54
An: Robert Stoll
Cc: Zeev Suraski; PHP internals
Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFCI see the migration plan roughly as follows:
PHP 7.0:
reserve keywords: bool, int, float including alternatives
deprecate alternative type names such as boolean, integer etc.
introduce new conversion functions which reflect the current behaviour of (bool), (int) etc.
--> as mentioned above, they could be named oldSchoolBoolConversion etc.
--> Encourage users to use this function instead of (bool),
(int) etc since (bool) etc. will change with PHP 8.0. Also mention,
that this function should only be used if the weakness is really
required otherwise use the new conversion functions from belowintroduce new conversion functions which reflect the new defined conversion rule set (which shall be the only one
encouraged in the future) Those functions shall trigger anE_RECOVERABLE_ERROR
--> encourage users to use this functions instead of (bool), (int)
and oldSchoolBoolConversion etc. (unless the weakness is really
required, then use oldSchoolBoolConversion)update the docs in order to reflect the new encouraged way. Also mention that:
- (bool), (int) etc. will change their behaviour in PHP 8.0
- internal functions will use the new conversion rules if not already done this way in PHP 8.0 (for instance, strstr will no
longer accept a scalar as third parameter in the case where we do not support implicit casts to bool)- operators will use the new conversion rules if not already done this way in PHP 8.0
- (control structures will use the new conversion rules if not
already done this way in PHP 8.0) =>Maybe this is too strict for most
of you and goes against the spirit of PHP (I suppose some of you will
say that - fair enough, I guess you are right). In this case, I would
at least use the term "loose comparison" as mentioned here:
http://php.net/manual/en/types.comparisons.php#types.comparisions-loos
e instead of using the term "conversion", then it is compatible with
the changes introduced in PHP 8.0PHP 7.1: necessary bug-fixes introduced with PHP 7.0 PHP 7.x:
deprecate even more if required PHP 8:
- introduce scalar type hints which reflect the conversion rules as defined (adding strict type hints as well is possible of
course, whether with an ini-setting, a declare statement or individually with a modifier something like "strict int" for a single
parameter or strict function for all parameters incl. return type or strict class for every type defined in the class is up to
discussion)- exchange the behaviour of (bool), (int) etc. -> use the new conversion rules instead
- change internal functions which do not yet obey to the new conversion rules
- change the operators which do not yet obey to the new conversion rules (for instance, + would also emit an
E_RECOVERABLE_ERROR
for "a" + 1)- (change the control structures in order that they obey the new
conversion rules as well) => as mentioned above, probably too strict
for PHPBack to this RFC. think this RFC goes in the right direction with the specified conversion rules. Only thing to get rid of are
the implicit conversions to bool from string, float and int IMO.
Moreover, I like that the RFC already has different steps for adding the new behaviour. Yet, I think it should slow down a
little bit as shown. I think we need more time to come up with a very good strategic solution.Hello,
Am I understanding correctly that you are suggesting changes to type casting? This seems like a bad idea. Explicit and
implicit conversions are something really different. Generally, implicit conversions are OK only when no data is lost and
explicit conversions (casts) are used when you realize some information can get lost and you still want to proceed with the
conversion. Having only one type of conversion is IMHO weird.
Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other programming languages behave like this and are more predictable for users as well as developers because one does not need to learn two sets of conversion rules.
Also, I'm not a fan of having to wait for scalar type hints for few more years. :(
Regards
Pavel Kouril
Hi Pavel,
Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other programming languages behave like this and are more predictable for users as well as developers because one does not need to learn two sets of conversion rules.
Actually this is not true. Other languages have differences between
explicit conversions (aka casting) and implicit conversions as well.
C# is the language I use the most after PHP, so I'll bring that one up
(see https://msdn.microsoft.com/en-us/library/ms173105.aspx), but I
believe other languages (probably Java?) act the same way.
Regards
Pavel Kouril
-----Ursprüngliche Nachricht-----
Von: Pavel Kouřil [mailto:pajousek@gmail.com]
Gesendet: Sonntag, 22. Februar 2015 20:02
An: Robert Stoll
Cc: Zeev Suraski; PHP internals
Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFCHi Pavel,
Yes, I am suggesting to make conversions behave the same regardless if it is implicit or explicit. The only difference
between the two should be that one is stated explicitly by the user where the other is applied implicitly. Other
programming languages behave like this and are more predictable for users as well as developers because one does not
need to learn two sets of conversion rules.Actually this is not true. Other languages have differences between explicit conversions (aka casting) and implicit
conversions as well.
C# is the language I use the most after PHP, so I'll bring that one up (see https://msdn.microsoft.com/en-
us/library/ms173105.aspx), but I believe other languages (probably Java?) act the same way.Regards
Pavel Kouril
Hm... I reconsidered my statements and that is a good thing :)
I am not sure if I got your view point. I will try to elaborate more on mine and explain how I interpret your statement.
Probably it is a philosophical question how to look at it. IMO the only difference in C# (as well as in Java) lies in the way the conversions are applied. Implicit conversions are applied automatically by the compiler where explicit conversions are applied by the user. The difference lies in the fact that C# is statically typed and implicit conversions are only applied when it is certainly safe to apply one. However, Implicit conversions in C# behave the same as explicit conversion since implicit conversion which fail simply do not exist (there is no implicit conversion from double to int for instance). That is the way I look at it. You probably look at it from another point of view and would claim an implicit conversion from double to int in C# exists but just fails all the time => ergo implicit and explicit are different (that is my interpretation of your statement above). In this sense I would agree. But even when you think in this terms then you have to admit, they are fundamentally different in the way that implicit conversion which are different than explicit conversion always fail, in all cases - pretty much as if they do not exist. There are no cases, neither in C# nor in Java which I am aware of, where an implicit cast succeeds in certain cases but not in all and an explicit conversion succeeds in at least more cases than the implicit conversion. Hence, something like "a" should also not work in an explicit conversion in PHP IMO if it is not supported by the implicit conversion (otherwise strict mode is useless btw.)
Try out the following C# code:
dynamic d1 = 1.0;
int d = d1;
You will get the error "Cannot implicitly convert type double
to int
" at runtime.
We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int without precision loss.
This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to me that implicit/explicit conversions behave the same way.
At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which shines due to its consistency...
OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value to fall back on if the conversion fails. If conversions like "a" to int really matters that much to the users of PHP then we could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of them at some point).
Cheers,
Robert
Probably it is a philosophical question how to look at it. IMO the only difference in C# (as well as in Java) lies in the way the conversions are applied. Implicit conversions are applied automatically by the compiler where explicit conversions are applied by the user. The difference lies in the fact that C# is statically typed and implicit conversions are only applied when it is certainly safe to apply one. However, Implicit conversions in C# behave the same as explicit conversion since implicit conversion which fail simply do not exist (there is no implicit conversion from double to int for instance). That is the way I look at it. You probably look at it from another point of view and would claim an implicit conversion from double to int in C# exists but just fails all the time => ergo implicit and explicit are different (that is my interpretation of your statement above). In this sense I would agree. But even when you think in this terms then you have to admit, they are fundamentally different in the way that implicit conversion which are different than explicit conversion always fail, in all cases - pretty much as if they do not exist. There are no cases, neither in C# nor in Java which I am aware of, where an implicit cast succeeds in certain cases but not in all and an explicit conversion succeeds in at least more cases than the implicit conversion. Hence, something like "a" should also not work in an explicit conversion in PHP IMO if it is not supported by the implicit conversion (otherwise strict mode is useless btw.)
Try out the following C# code:
dynamic d1 = 1.0;
int d = d1;
You will get the error "Cannot implicitly convert typedouble
toint
" at runtime.We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int without precision loss.
This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to me that implicit/explicit conversions behave the same way.
At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which shines due to its consistency...
OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value to fall back on if the conversion fails. If conversions like "a" to int really matters that much to the users of PHP then we could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of them at some point).Cheers,
Robert
Well,
I look at it this way (in a simplified manner). Hopefully this will
make you understand my point of view more.
- Implicit conversions work only when you are sure you won't lose stuff
- Explicit conversions are for forcing (casting) variable to become
another type, and when you are explicitely as user calling it, you are
aware you can lose values
Sure, the literal meaning in C# and PHP differs a little bit (because
of static and dynamic typed language differences and stuff), but the
intent is IMHO the same; implicit conversions can happen in the
"background" safely, while for "dangerous" conversions, you have to
cast by hand. And I see use cases for both of these types of
conversions.
Also, you are assuming that there will be a "strict" mode; I sincerely
hope there won't. Ssince introduction of "2 modes", I was always
saying that there should be only one mode - I don't really care
whether it would be strict or weak, but just only one.
Regards
Pavel Kouril
-----Ursprüngliche Nachricht-----
Von: Pavel Kouřil [mailto:pajousek@gmail.com]
Gesendet: Sonntag, 22. Februar 2015 22:18
An: Robert Stoll
Cc: Zeev Suraski; PHP internals
Betreff: Re: [PHP-DEV] Coercive Scalar Type Hints RFCProbably it is a philosophical question how to look at it. IMO the
only difference in C# (as well as in Java) lies in the way the
conversions are applied. Implicit conversions are applied
automatically by the compiler where explicit conversions are applied
by the user. The difference lies in the fact that C# is statically
typed and implicit conversions are only applied when it is certainly
safe to apply one. However, Implicit conversions in C# behave the same
as explicit conversion since implicit conversion which fail simply do
not exist (there is no implicit conversion from double to int for
instance). That is the way I look at it. You probably look at it from
another point of view and would claim an implicit conversion from
double to int in C# exists but just fails all the time => ergo
implicit and explicit are different (that is my interpretation of your
statement above). In this sense I would agree. But even when you think
in this terms then you have to admit, they are fundamentally different
in the way that implicit conversion which are different than explicit
conversion always fail, in all cases - pretty much as if they do not
exist. There are no cases, neither in C# nor in Java which I am aware
of, where an implicit cast succeeds in certain cases but not in all
and an explicit conversion succeeds in at least more cases than the
implicit conversion. Hence, something like "a" should also not work in
an explicit conversion in PHP IMO if it is not supported by the
implicit conversion (otherwise strict mode is useless btw.)Try out the following C# code:
dynamic d1 = 1.0;
int d = d1;
You will get the error "Cannot implicitly convert typedouble
toint
" at runtime.We see a fundamental difference between C# and PHP here. PHP is dynamically typed an relies on values rather than
types (in contrast to C#). Therefore, the above code emits a runtime error even though the data could be converted to int
without precision loss.
This shall be different in PHP according to this RFC and I think that is perfectly fine. Yet, even more important it seems to
me that implicit/explicit conversions behave the same way.
At first it might seem strange to have just one conversion rule set in PHP since PHP is not known to be a language which
shines due to its consistency...
OK, I am serious again. If you think about it from the following point of view: A user writes an explicit conversion in order
to state explicitly that some value will be converted (this is something which will be necessary in a strict mode). Why should
this explicit conversion be different from the implicit one? There should not be any difference between explicit knowledge
and implicit one. That is my opinion. If you really do not care about data loss and just want to squeeze a float/string into an
int no matter what the value really is then you can use the @ in conjunction with ?? and provide the desired default value
to fall back on if the conversion fails. If conversions like "a" to int really matters that much to the users of PHP then we
could keep the oldSchoolIntConversion function (as propose in my first email) even in PHP 10 (I would probably get rid of
them at some point).Cheers,
RobertWell,
I look at it this way (in a simplified manner). Hopefully this will make you understand my point of view more.
- Implicit conversions work only when you are sure you won't lose stuff
- Explicit conversions are for forcing (casting) variable to become another type, and when you are explicitely as user calling
it, you are aware you can lose values
I see. I see and think you are not alone with this opinion. I give you another example and hope you reconsider your position (up to you what position you take afterwards of course).
Consider the following in C#
class A{}
class B : A{}
class C : A{}
A a = new B();
B b = a; // will fail, needs a conversion
C c1 = a; // will fail, needs a conversion
C c2 = (C) a; //will fail at runtime
And now imagine C# would not be based on types but on values. Then the following would be perfectly legal as well:
B b = a; //is fine since a is of type B
C c1 = c; //will fails since a is not of type C
C c2 = (C) c; //still fails since a is not of type C
Or to illustrate it differently. Imagine you have a shop and your main currency is $. However, you accept € as well as long as they are banknotes. In this case the customer can insert the banknotes in a currency exchange machine at the till.
Now imagine the following four use cases:
- A customer buys something with $ -> everything is fine because that is your main currency.
- Another customer buys in € and pays in banknotes, also no problem at all.
- A third customer wants to pay in € but with coins. Your implicit rule does not hold and you reject the customer.
- Another customer comes by and mentions explicitly that she wants to pay in € with coins. Suddenly you do not care about it any longer and try to squeeze the coins into your machine. Unfortunately it does not work at all and what you get out of the machine is nothing.
To me, that is exactly how PHP behaves today. Just take int instead of $ and string instead of €, banknotes are ints wrapped in string (e.g. "1") and coins are "a", "b" etc.
I really do not think it is a clever idea to have such a big difference between implicit and explicit conversion. They should behave the same way. I do not care if you rename explicit casts to something different (I proposed oldSchoolIntConversion, but maybe forceIntConversion would be good enough as well).
Sure, the literal meaning in C# and PHP differs a little bit (because of static and dynamic typed language differences and
stuff), but the
intent is IMHO the same; implicit conversions can happen in the "background" safely, while for "dangerous" conversions,
you have to cast by hand. And I see use cases for both of these types of conversions.
C# also has the "as" syntax which is another form of conversion. But it is another form of conversion, uses another syntax and hence can have other rules. PHP could also have different conversion rules but they should be named differently. Using implicit/explicit and having a different behaviour is inconsistent IMO (ok, I said it too often in this mail, I'll stop now)
Also, you are assuming that there will be a "strict" mode; I sincerely hope there won't. Ssince introduction of "2 modes", I
was always saying that there should be only one mode - I don't really care whether it would be strict or weak, but just only
one.
I do not assume that there will be one but I leave the door open rather than close it (unnecessarily).
Regards
Pavel Kouril
Hi Robert,
So what does that mean for scalar types?
IMO it means that way more important than adding scalar type hints to PHP
7.0 is to agree on a new set of conversion rules for the long run. PHP should
strive to have one consistent set of conversion rules which apply in all places
where implicit or explicit conversion are used.
That's exactly what I mean. I think people should keep in mind, when talking about enabling/disabling a given conversion, that the implicit scope is every explicit or implicit conversion implemented in PHP.
In an ideal world, we would proceed in reverse order. We wouldn't start considering modifying the ZPP ruleset before having aligned every implicit/explicit conversions existing in PHP on a single ruleset. Unfortunately, if we want to keep a chance with STH in 7.0, we cannot do that. So, we will probably evaluate potential BC breaks on ZPP ruleset modifications only, meaning we'll make decision without a good evaluation of the BC breaks introduced by aligning other PHP conversions on the newly-proposed ruleset. So, we'll need to extrapolate from ZPP-only results.
Regards
François
Hi,
For those interested in evaluating the impact of ZPP ruleset modications on internal and userland code, A pull request is now available :
https://github.com/php/php-src/pull/1110
Please note that this is not a mere implementation of the RFC ruleset, although it comes preconfigured this way. It contains a set of 12 configurable options, each one enabling/disabling a particular ruleset modification. This allows for a much more powerful exploration of potential modifications and BC breaks against the existing codebase. Every combination of individual behaviors is possible, providing a theoretical number of about 3,000 potentials rulesets. Of course, a lot of these are not consistent, but it still allows for creative thinking.
Given the time I had to write it, I didn't perform extensive testing. I just ensured the ruleset described in the RFC and the one you get when activating every possible changes both compile and seem to work as expected. I'll test more cases tomorrow. So, code review is key priority and every error (compile or runtime) you may get should be reported as fast as possible.
Overall configuration possibilities include and go beyond the STH RFC, with the exception of numeric strings, whose proposed restrictions are not implemented yet, but will be soon.
So, I hope you'll enjoy the new toy. And thoughts are welcome, as usual.
Regards
François
The object on the call-site should remain to be an object (if it's not
passed by reference), however the called function will receive a string.
It works in PHP-5 and PHP-7. Nothing should be changed.
$ sapi/cli/php -r 'class X {function __toString(){return "abc";}} $x=new
X; var_dump(strlen($x)); var_dump($x);'
int(3)
object(X)#1 (0) {
}
However, declare(strict_types=1) will break this. see
https://github.com/ircmaxell/php-src/compare/scalar_type_hints_v5#diff-ef5bf53d1412b50f85d125ca4fe84741R1182
Thanks. Dmitry.
Dmitry, I was talking about passing an object to a function, expecting a
reference of a string:
'class X { function __toString() { return spl_object_hash($this); } } $x =
new X; function foo(string &$x){} foo($x); var_dump($x);'
Now what would this produce in Coercive STH is still an open question I
guess as there is no implementation yet.
The object on the call-site should remain to be an object (if it's not
passed by reference), however the called function will receive a string.
It works in PHP-5 and PHP-7. Nothing should be changed.
$ sapi/cli/php -r 'class X {function __toString(){return "abc";}} $x=new
X; var_dump(strlen($x)); var_dump($x);'
int(3)
object(X)#1 (0) {
}
However, declare(strict_types=1) will break this. see
https://github.com/ircmaxell/php-src/compare/scalar_type_hints_v5#diff-
ef5bf53d1412b50f85d125ca4fe84741R1182
Thanks. Dmitry.Dmitry, I was talking about passing an object to a function, expecting a
reference of a string:'class X { function __toString() { return spl_object_hash($this); } } $x =
new X; function foo(string &$x){} foo($x); var_dump($x);'Now what would this produce in Coercive STH is still an open question I
guess as there is no implementation yet.
This will change $x because it's passed by reference.
This is the common "problem" for all current proposals, because for
parameters passed by reference we check only "input" types, and don't
guarantee anything about output.
<?php
declare(strict_types=1);
function foo(string &$x) {
$x = 321;
}
$x = "123";
foo($x);
var_dump($x); // -> int(321)
?>
Thanks. Dmitry.