Hi,
Last evening I put together a quick proposal for a weak and strict
checking approach, since I felt that things were being concluded a bit
prematurely. More importantly I detailed the issues I see with a pure
strict type checking only approach.
I am publishing it a bit prematurely imho, but its not without merit
at this stage either and since I will be busy playing frisbee all
weekend, I thought I get it out there for people to comment right now:
http://wiki.php.net/rfc/typecheckingstrictandweak
As Paul insisted that my initial proposal did not sufficiently high
light the fact that there are other proposals, I moved my original
proposal to the above location so that we can have a disambiguation
page that lists all the various related proposals:
http://wiki.php.net/rfc/typechecking
Most of that is in there has been said/proposed on the list, so I am
just pasting the key section on why I think strict checking is
dangerous:
Strict type checking does have the advantage that subtle bugs will be
noticed more quickly and that function/method signatures will become
yet more self documenting and therefore more expressive. Also doing
these type checks based on the signature also means less code and
better performance over having to hand code the validation
Proponents of only providing strict type checking say that for the
most part variables are defined with the proper type unless they come
from an outside source, which usually requires validation anyways,
which is a perfect opportunity to type cast.
That is to define a variable that contains a boolean, developer will
probably do “$is_foo = true” and not “$is_foo = 0”. While this may be
true, it does means that developers using such strict type checking
API's now require that users understand data types, which currently
beginning developers do not necessarily need to.
Furthermore quite often developers need to parse content out of
strings and pass this to other methods. With strict type checking one
is now forced to explicitly type cast. While its certainly doable, its
also additional work that needs to be done while writing the code
(“$foo_int = (int)substr($bar, 3, 10)”). Then again some might argue
that this makes the code clearer.
It also means that users of such strict typed API's will tend to
simply cast and due to laziness (PHP is used for rapid development
after all) might forgo validating first if the content is really what
they expected. Without type checking the burden would be with the
developer providing the API. Since its usually expected that an API is
fairly often, it seems illogic to move this burden to the API users.
More over due to this, a new kind of bug will be introduced due to
over use of cast instead of hand coded parameter validation as is
currently necessary. This could lead to even higher bug rates.
As for outside sources needing validation. This is not always the case
as most people do trust that the data returned from a database is in
the expected format, even though for most RDBMS it will always be
returned as string. Same applies to configuration files, which if
defined in something else than PHP code will most likely only return
strings, but who's values will usually not be validated.
regards
Lukas Kahwe Smith
mls@pooteeweet.org
Last evening I put together a quick proposal for a weak and strict checking
approach, since I felt that things were being concluded a bit prematurely.
More importantly I detailed the issues I see with a pure strict type
checking only approach.
Thanks for the effort, I just wanted to add one thing though.
I'm not sure how this would work with php internals really but you
didn't mention arrays at all in your examples. The way I see it, with
the current conversion code, a weak bool type check would take array()
as a valid false value, and while I agree that it's a useful behavior
in conditionals sometimes, in the case of type checks it feels
horribly wrong.
So I hope it's just not mentionned because it's not a problem,
otherwise I think it should be clarified.
Cheers,
Jordi
Hello Internals.
I'm a userland PHP developer, I readed all the proposals about type
hinting and did some code writing with type hinting to see how it
looks like. Well, to my understanding it's a mess if both types of
type hinting are done.
Here is an example of a function:
function calculate_age($birthday_date, $year) {}.
How to type hint it?
Strictly or softly? Really depends on the case. If you use it
somewhere deep inside your API, than it probably should be strict. But
if not? You could get your params from anythere: database (as
strings), POST or GET (again strings), $year can be simply date('Y')
as a string or just some value from config as int. So what we see is
that strict type hinting is just too much here. Weak type hinting is
more appropriate. But do we need it at all? If I get my data from POST
or GET i'll certanly do a validation on them - trim the $year and
check it for is_numeric, $birthday_date will be checked by preg_match
to de a valid date format and then will be checked by checkdate()
if
it's valid date. That leaves even weak type hinting without any real
work. If I get it from database, I'll just pass it to function,
because that data is validated and contains proper values - in this
case strict type hints will give an error for $year, because it's a
string from database and soft will just make a conversion to int so it
would happen earlier that calculations.
Strict type hint's are useless in works with database results. And
mostly in WEB we work with database results and data from GET/POST or
other sources like XML, JSON API's and so on. There data is always as
strings. Do you want that data be converted to appropriate types by
userland code something like this?
$res = $mysqli->query($sql);
if ($res && $res->num_rows) {
while ($row = $res->fetch_assoc()) {
foreach ($row as $k => $v) {
if (is_numeric($v)) {
$row[$k] = (int)$v;
}
}
}
}
// Now we can use strict type hints!
Looks like it's the weak type hinting that will be used in most cases.
The question I have what will be in such case with weak type hints?:
function test(array $data) {}
test(1);
For me that should be an error, but the PHP mechanics of type
conversions should convert that 1 to array(1). Same with objects. In
object or array case most people probably will want a big error so
they could investigate there is a bug and fix it. But with strings and
integers/floats I'd prefer just silent type conversion. But you people
like consistency and I'd predict you would do int to array conversion
with weak type hinting, same as string to int/float.
A little bit messy, but I believe you got the point. My IMHO is that
a). There can't be a consistent way for all type conversions b).
Strict type hinting is out of the question.
I am publishing it a bit prematurely imho, but its not without merit
at this stage either and since I will be busy playing frisbee all
weekend, I thought I get it out there for people to comment right now:
http://wiki.php.net/rfc/typecheckingstrictandweak
Like Ilia mentioned on IRC, there is no reason why the strict type
checking which he proposes is necessarily a conflict with the weak type
checking.
So I would much rather have seen an RFC where you are trying to
make a case why we would need week type checking, instead of saying that
strict type checking is bad. You might not agree with the strict type
checking, but people ask for it, people have been using it and there are
plenty of reasons why it is useful—whether you see it or not.
So I would propose to:
- have ilia's strict typing patch (minus scalar and numeric)
- have a patch that also adds for casting type hints from your RFC.
Those could (and should) be considered as two new features.
As for syntax, I believe the following would be best:
function add_user(string $name, string $phone_number, (int) $age) { .. }
because:
- the casting type hint "(int) $var" is used for normal casting already
- the strict type hint "int $var" is already used for class names
regards,
Derick
--
http://derickrethans.nl | http://ezcomponents.org | http://xdebug.org
twitter: @derickr
So I would propose to:
- have ilia's strict typing patch (minus scalar and numeric)
- have a patch that also adds for casting type hints from your RFC.
Those could (and should) be considered as two new features.
As for syntax, I believe the following would be best:
function add_user(string $name, string $phone_number, (int) $age)
{ .. }because:
- the casting type hint "(int) $var" is used for normal casting
already- the strict type hint "int $var" is already used for class names
+1
Hi Lukas,
Last evening I put together a quick proposal for a weak and strict checking
approach, since I felt that things were being concluded a bit prematurely.
More importantly I detailed the issues I see with a pure strict type
checking only approach.
I can't see the difference between your proposal and the conclusion I
reached yesterday?
(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hi Lukas,
On Sat, Jul 4, 2009 at 7:20 AM, Lukas Kahwe
Smithmls@pooteeweet.org wrote:Last evening I put together a quick proposal for a weak and strict
checking
approach, since I felt that things were being concluded a bit
prematurely.
More importantly I detailed the issues I see with a pure strict type
checking only approach.I can't see the difference between your proposal and the conclusion I
reached yesterday?(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).
Well to me it Sounded like you wanted to Rely on Standard Type
juggling and what i am proposing is more strict than that. More over i
am Not convinced that strict should Be the Default.
Regards,
Lukas
hi.
Well to me it Sounded like you wanted to Rely on Standard Type juggling and
what i am proposing is more strict than that. More over i am Not convinced
that strict should Be the Default.
The default is what we have now (implicit mixed). The (type) syntax is
known already, it would be rather confusing to use yet another syntax
in this case.
I like the current proposal with "int foo" for strict and "(int) foo"
for an automatic cast. As Gwynne suggested on IRC, I would also like
to have "mixed", for completeness. It brings nothing but it makes the
code clearer. Another useful type would be "callback".
Cheers,
Pierre
I can't see the difference between your proposal and the conclusion I
reached yesterday?(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).Well to me it Sounded like you wanted to Rely on Standard Type juggling
and what i am proposing is more strict than that. More over i am Not
convinced that strict should Be the Default.Regards,
Lukas
Just wanted to note the weak string check is the same as the strict one in
the RFC since it appears to be incomplete. It should encompass any type that
can cast to a string. That includes any scalar, and also objects with
__toString method.
Also if this will be introduced, it needs to account for mixed type
arguments, such as null/bool/int or null/numeric etc. The RFC doesn't
mention null at all leaving me to assume null is always allowed, no matter
what.
I'm a supporter of strict typing for languages in general, however, in PHP
in particular, strict checks don't make sense to me, given its current
behavior.
This is a language where summing up an integer and a string can result in a
float. API authors will start assuming strictness from their clients that is
hard to achieve, and in the end result in peppering function/method calls
with explicit type casting on each argument. We do not want that, I suppose.
There is one type of check that can be added right now, however, while this
discussion continues:
function add_user(scalar name, scalar phone_number, scalar age) {}
I think we can all agree, if we can check for a class and array, then
checking for a scalar is a natural constraint. It's both strict and weak at
the same time since it's generic enough not to need that separation. Scalar
here is defined by the same rules as is_scalar()
, but also allows null as a
passed value to be consistent with the other check types.
Regards,
Stan Vassilev
I can't see the difference between your proposal and the conclusion I
reached yesterday?(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).Well to me it Sounded like you wanted to Rely on Standard Type juggling and
what i am proposing is more strict than that. More over i am Not convinced
that strict should Be the Default.
I don't know what you mean by standard type-juggling. Your proposal
really does not outline what you want very much, just what you're
against. As for strictness, if your proposal suggests that strict
typing is the default, I cannot see where.
As I see it, each proposal is a very minor variation on the other. My
proposal had an extra layer, but it it wasn't well received, so I've
withdrawn it.
Ilia has a patch now that does what I understand you want, using the
(int) syntax, with strict by default. I may be wrong, but I believe
the only thing left to argue about is strict versus weak by default. I
was originally of the opinion that weak typing should be the default.
However, it had barely any support, whereas there was great support
for strict by default.
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
On Sat, Jul 4, 2009 at 7:12 PM, Lukas Kahwe
Smithmls@pooteeweet.org wrote:I can't see the difference between your proposal and the
conclusion I
reached yesterday?(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).Well to me it Sounded like you wanted to Rely on Standard Type
juggling and
what i am proposing is more strict than that. More over i am Not
convinced
that strict should Be the Default.I don't know what you mean by standard type-juggling. Your proposal
really does not outline what you want very much, just what you're
against. As for strictness, if your proposal suggests that strict
typing is the default, I cannot see where.
I did Not specify what doesnt Match in the RFC. I will fix that
omission on monday. I assumed it was clear that i tried to Provide
Complete examples for what will Pass. So Passung a String with
anything but 1 or 0 would Not Pass à Bool Type check.
The other Thing that i wanted to make clear is that After the weak
Type check is that a cast should Happen afterwards.
Finally i wanted to clarify my concerns about Structure typing to
ensure that the Short Syntax is a weak check and only with additional
Chars like the proposed ! Or something like that, can One get a Strict
Type check.
Regards,
Lukas
On Sat, Jul 4, 2009 at 7:12 PM, Lukas Kahwe
Smithmls@pooteeweet.org wrote:I can't see the difference between your proposal and the
conclusion I
reached yesterday?(which was that there is a near consensus around strict checks by
default, with casts allowed with some syntax).Well to me it Sounded like you wanted to Rely on Standard Type
juggling and
what i am proposing is more strict than that. More over i am Not
convinced
that strict should Be the Default.I don't know what you mean by standard type-juggling. Your proposal
really does not outline what you want very much, just what you're
against. As for strictness, if your proposal suggests that strict
typing is the default, I cannot see where.I did Not specify what doesnt Match in the RFC. I will fix that
omission on monday. I assumed it was clear that i tried to Provide
Complete examples for what will Pass. So Passung a String with
anything but 1 or 0 would Not Pass à Bool Type check.
Ok, I have updated the RFC now with a table that shows that I expect
to pass and fail. Its fairly late, so I might have missed something.
In general what I am proposing is more lax than is_*() for the most
part. Especially when it comes to checking strings.
I do not understand why its suddenly so urgent to get this feature
in(*), that people already speak about frustration over the process
after just a few days. We dont need years and usually not months, but
this is not the addition of some function. Its an extension to the
language syntax, so I think its totally normal that we talk about this
for at least a month. Though we do not of course need a daily exchange
of 100 emails about this in this period. Obviously things can still be
refined after the commit, but we should stuff give everybody a bit of
time to let this stuff sink in before we do the initial commit. Also
remember that plenty of people that contribute a fair bit to PHP
internals do not read internals actively every week. So again a month
isn't such a bad idea to have between the initial proposal and a
commit of the feature.
regards,
Lukas Kahwe Smith
mls@pooteeweet.org
(*) even if the patch Ilia proposed doesn't break binary compatibility
anymore, do we really want to start adding such stuff in 5.3.2?
shouldn't we rather talk about finding a better release process (maybe
build on top of recent developments in the version control world) that
enables us to more quickly get x.y releases out without preventing
bigger features like unicode from ever maturing?
Hi Lukas,
Ok, I have updated the RFC now with a table that shows that I expect to pass
and fail. Its fairly late, so I might have missed something. In general what
I am proposing is more lax than is_*() for the most part. Especially when it
comes to checking strings.
I hope you have missed some things (or that they are typos) because
otherwise a good chunk of this is plain lunacy.
value string float int numeric scalar bool array
0 (integer) fail fail pass pass pass pass fail
1 (integer) fail pass pass pass pass pass fail
0 fails conversion to a float, but 1 and 12 succeed?
12 (double) fail pass pass pass pass fail fail
It may seem that this is a good idea (12.0 passing the int check), but
what if 12.0 is OK, but 144.0/12 does not (which might not be 12.0 due
to floating point error)?
'0' (string) pass fail fail pass pass pass fail
'1' (string) pass fail fail pass pass pass fail
'12' (string) pass pass pass pass pass fail fail
Absolute lunacy. Please let this be a typo.
'12.0' (string) pass pass pass pass pass fail fail
'12.34' (string) pass pass fail pass pass fail fail
As above.
I think you need to present this information better. One advantage of
Ilia's proposal is that it is very clear. It does two things: strong
type check, or the same casts that currently exist. I think you need
to say what changes you are introducing, and why they are introduced.
The same flaw existed with my proposal: I dont think anyone wants a
3rd set of rules.
I do not understand why its suddenly so urgent to get this feature in(*),
that people already speak about frustration over the process after just a
I think because this same issue has been going on for so long, and
seem to be so very close now. This idea has been punted around in
various forms and patches for years at this stage.
few days. We dont need years and usually not months, but this is not the
addition of some function. Its an extension to the language syntax, so I
think its totally normal that we talk about this for at least a month.
Well yes. But with near consensus, there is a danger of a 90%
good-enough patch being derailed by new proposals, and I get the
impression most people would be happier with the 90% patch.
shouldn't we
rather talk about finding a better release process (maybe build on top of
recent developments in the version control world) that enables us to more
quickly get x.y releases out without preventing bigger features like unicode
from ever maturing?
I've heard you mention this before. Please roll it into an RFC so we
can think about it (FWIW, the idea that newer version control systems
will somehow change everything makes little sense, so I think a lot of
detail is required here).
Thanks,
Paul
--
Paul Biggar
paul.biggar@gmail.com
Hi Lukas,
On Mon, Jul 6, 2009 at 11:03 PM, Lukas Kahwe
Smithmls@pooteeweet.org wrote:Ok, I have updated the RFC now with a table that shows that I
expect to pass
and fail. Its fairly late, so I might have missed something. In
general what
I am proposing is more lax than is_*() for the most part.
Especially when it
comes to checking strings.I hope you have missed some things (or that they are typos) because
otherwise a good chunk of this is plain lunacy.
Thank you for taking the time to review this. I am feeling kinda
rushed here (I guess Ilia's call for votes proofs me right .. and
apparently wrong at the same time).
value string float int numeric scalar bool array
0 (integer) fail fail pass pass pass pass fail
1 (integer) fail pass pass pass pass pass fail0 fails conversion to a float, but 1 and 12 succeed?
fixed.
12 (double) fail pass pass pass pass fail fail
It may seem that this is a good idea (12.0 passing the int check), but
what if 12.0 is OK, but 144.0/12 does not (which might not be 12.0 due
to floating point error)?
right .. and the use case of this coming out of a config file is non
existant. so that leaves the potentially slowly emerging use case of
this coming out of a database and in this case people should just use
the numeric check.
'0' (string) pass fail fail pass pass pass fail
'1' (string) pass fail fail pass pass pass fail
I presume you see the '0'/'1' pass as bools as lunacy. I disagree.
'12' (string) pass pass pass pass pass fail fail
Absolute lunacy. Please let this be a typo.
That part was indeed lunacy.
'12.0' (string) pass pass pass pass pass fail fail
'12.34' (string) pass pass fail pass pass fail failAs above.
Fixed as well.
I think you need to present this information better. One advantage of
Ilia's proposal is that it is very clear. It does two things: strong
type check, or the same casts that currently exist. I think you need
to say what changes you are introducing, and why they are introduced.
The same flaw existed with my proposal: I dont think anyone wants a
3rd set of rules.
My proposals have a tendency to get this feedback. Probably because I
write too much text and since I can also not produce a patch myself.
Given that I have two large sections of text, one explaining the short
comings of Ilia's approach and one explaining why I think that weak
type checking solves this, I have taken this proposal as far as I can.
If someone seems merit in it, please suggest improvements.
I do not understand why its suddenly so urgent to get this feature
in(*),
that people already speak about frustration over the process after
just aI think because this same issue has been going on for so long, and
seem to be so very close now. This idea has been punted around in
various forms and patches for years at this stage.
So the solution is to sneak it in during the summer, right after a
major releases for which some have even delayed their vacations? Then
again, given the fact that within a few hours this proposal has gotten
5 +1, maybe I am being too paranoid .. well or Ilia is just doing a
perfect job at orchestrating the masses. Either way we do not have
processes for this .. so anything goes.
few days. We dont need years and usually not months, but this is
not the
addition of some function. Its an extension to the language syntax,
so I
think its totally normal that we talk about this for at least a
month.Well yes. But with near consensus, there is a danger of a 90%
good-enough patch being derailed by new proposals, and I get the
impression most people would be happier with the 90% patch.
I have actually not felt much of an attempt to derail in the negative
sense. But seeing that Ilia has labeled the "fairly detailed
discussion" as "complaining for the sake of complaining" on his blog,
I think that Ilia might be feeding this paranoia.
shouldn't we
rather talk about finding a better release process (maybe build on
top of
recent developments in the version control world) that enables us
to more
quickly get x.y releases out without preventing bigger features
like unicode
from ever maturing?I've heard you mention this before. Please roll it into an RFC so we
can think about it (FWIW, the idea that newer version control systems
will somehow change everything makes little sense, so I think a lot of
detail is required here).
Thanks.
regards,
Lukas Kahwe Smith
mls@pooteeweet.org