A different user perspective on scalar type declarations

10 years ago by Theodore Brown — view source

unread

I am a full-time PHP developer responsible for maintaining several large
enterprise applications, as well as a number of libraries and personal apps.
I have been following the scalar type proposals quite closely, as along with
return type declarations, scalar types have the potential to reduce errors,
simplify API documentation, and improve static code analysis.

I am in favor of Anthony's Scalar Type Declarations RFC, for two simple reasons:

It doesn't change the behavior of existing weak types.

PHP has long had an emphasis on backwards compatibility, and I'm worried that
those not in favor of strict types are treating backwards compatibility more
recklessly than they otherwise would in their fervor to avoid two ways of
handling scalar types. In my experience dealing with large enterprise apps,
however, there are hundreds of places where code relies on GET/POST parameters
being automatically trimmed when passed to a function expecting an integer.
The current coercive proposal would deprecate this and later make it an error.
To avoid these notices/errors when upgrading, developers may take the "easy"
route of casting any input passed to a function expecting an int or float.
This is the same "too strict may lead to too lax" problem pointed out by the
coercive RFC itself. There's a reason that integer handling was actually
relaxed back in PHP 5.1 (see http://php.net/manual/en/migration51.integer-parameters.php).
Why suddenly make the default more strict again?

I am not against tightening up some of the default weak conversions (e.g. to
not allow "99 bugs" for an int type), but in my opinion this should be done
very carefully, and separately from any scalar type declaration proposal.
Major changes to the casting rules have the potential to seriously harm PHP 7
adoption, especially in enterprises with large amounts of legacy code. The
Scalar Type Declarations v0.5 RFC has the advantage here because it "just
works" when type hints are added to existing code in the default weak mode.

Strict types are important in some cases.

When it comes to authentication and financial calculations (a couple of areas
I routinely deal with) it is extremely important that errors are caught and
fixed early in the development process. In financial or security-sensitive
code, I would want any value with the wrong type (even a string like "26")
to be flagged as an error when passed to a function expecting an integer.

The option for type-based (rather than value-based) validation is equally
important when it comes to return types. Unless I have missed something, the
"Coercive Types for Function Arguments" RFC currently doesn't deal with return
types at all (they aren't mentioned in the RFC). Would it handle scalar return
types the same way as it does function arguments? If I declare a function to
return an int, and I return a string instead (even if the string is numeric),
there are many cases where it would be an unintentional error. And if it
errors depending on the value, rather than the type, it often wouldn't be
possible to catch the problem statically.

Here's a simple example of the advantage offered by strict types and static
analysis in the Scalar Type Declarations v0.5 RFC:

<?php
declare(strict_types=1);

function getCustomerName(int $customerId): string
{
// look up customer name from database and return
}

function getInvoiceByCustomer(int $customerId): Invoice
{
// retrieve invoice data and return object
}

$id = filter_input(INPUT_GET, 'customer_id', FILTER_VALIDATE_INT);

if ($id === false) {
    echo 'Customer ID must be an integer';
} else {
    $customer = getCustomerName($id);
    $invoice = getInvoiceByCustomer($customer);
    // display invoice
}

Strict types + static analysis can tell you that this will fail (because it's
based purely on types, and a string is being passed to a function expecting
an integer). Coercive typing cannot statically tell you that it will fail,
because it doesn't know whether the string passed to getInvoiceByCustomer
is acceptable as an integer without also knowing its value.

To those who are worried that the addition of a strict mode will split the
community into separate camps, I would say "It's too late!" The community has
already been split over this issue for years. Conceptually, the optional
strict mode proposed in Anthony's RFC is not very different from == vs. ===,
or in_array with the $strict argument set to true. And I certainly am
glad that PHP offers these options!

Theodore Brown

10 years ago by francois@php.net — view source

unread

Hi Theodore,

De : Theodore Brown [mailto:theodorejb@outlook.com]

however, there are hundreds of places where code relies on GET/POST
parameters
being automatically trimmed when passed to a function expecting an
integer.
The current coercive proposal would deprecate this and later make it an
error.

Instead of rejecting the whole RFC, you can ask to keep supporting leading
and trailing blanks in numeric strings. That's something that is really
still under discussion, even between authors. To be clear, I prefer
authorizing leading and trailing blanks (but just blanks), Zeev prefers
restricting it, and I am not sure about Dmitry's preference. So you see it
is still open !

About BC, I don't think we are too lax with BC because developers would have
years to fix a pair of lines in their code. Running wordpress and other
large code just found 10 or 20 places where developers would have to change
something. So that's really minimal work and they have long years to do it.

To avoid these notices/errors when upgrading, developers may take the
"easy"
route of casting any input passed to a function expecting an int or float.

That's possible but just much less often than they would do for 'strict
mode'.

This is the same "too strict may lead to too lax" problem pointed out by
the
coercive RFC itself. There's a reason that integer handling was actually
relaxed back in PHP 5.1 (see
http://php.net/manual/en/migration51.integer-parameters.php).
Why suddenly make the default more strict again?

Agreed. But decision still open.

I am not against tightening up some of the default weak conversions (e.g.
to
not allow "99 bugs" for an int type), but in my opinion this should be
done
very carefully, and separately from any scalar type declaration proposal.
Major changes to the casting rules have the potential to seriously harm
PHP 7
adoption, especially in enterprises with large amounts of legacy code. The
Scalar Type Declarations v0.5 RFC has the advantage here because it "just
works" when type hints are added to existing code in the default weak
mode.

Our STH does not break anything. Maybe the newly-published version is more
clear. We propose implementing exactly the PHP5 logic and just raise
E_DEPRECATED messages that won't stop execution at all. Ignoring these
messages, there will be absolutely no difference in behavior with PHP 5.

Strict types are important in some cases.

When it comes to authentication and financial calculations (a couple of
areas
I routinely deal with) it is extremely important that errors are caught
and
fixed early in the development process. In financial or security-sensitive
code, I would want any value with the wrong type (even a string like
"26")
to be flagged as an error when passed to a function expecting an integer.

Agreed. That's why, in the future, we may add new 'strict' type hint that
will accept nothing but their native type. We didn't add them here because
it would have been too confusing for a first release, and the whole
discussion is already complex enough. But I agree they can have some use.
The difference with the other RFC is that, we can add strict types in the
future but, is you choose the other RFC, it's a much more radical decision
because dual mode will remain forever.

The option for type-based (rather than value-based) validation is equally
important when it comes to return types. Unless I have missed something,
the
"Coercive Types for Function Arguments" RFC currently doesn't deal with
return
types at all (they aren't mentioned in the RFC). Would it handle scalar
return
types the same way as it does function arguments?

Maybe not clear in the RFC but return types are handled exactly the same
way.

If I declare a function to

return an int, and I return a string instead (even if the string is
numeric),
there are many cases where it would be an unintentional error. And if it
errors depending on the value, rather than the type, it often wouldn't be
possible to catch the problem statically.

Right. That's an additional argument in favor of introducing strict scalar
types in the future, as I said above.

Here's a simple example of the advantage offered by strict types and
static
analysis in the Scalar Type Declarations v0.5 RFC:

<?php
declare(strict_types=1);

function getCustomerName(int $customerId): string
{
    // look up customer name from database and return
}

function getInvoiceByCustomer(int $customerId): Invoice
{
    // retrieve invoice data and return object
}

$id = filter_input(INPUT_GET, 'customer_id', FILTER_VALIDATE_INT);

if ($id === false) {
    echo 'Customer ID must be an integer';
} else {
    $customer = getCustomerName($id);
    $invoice = getInvoiceByCustomer($customer);
    // display invoice
}

Strict types + static analysis can tell you that this will fail (because
it's
based purely on types, and a string is being passed to a function
expecting
an integer). Coercive typing cannot statically tell you that it will fail,
because it doesn't know whether the string passed to
getInvoiceByCustomer
is acceptable as an integer without also knowing its value.

If strict type hints are defined in the future, and if you use them in your
example, you will get exactly the same and static analysis will still be
able to find your bug. The difference is that you will decide to use them
only where you need to make it more strict, function by function, argument
by argument.

To those who are worried that the addition of a strict mode will split the
community into separate camps, I would say "It's too late!" The community
has
already been split over this issue for years. Conceptually, the optional
strict mode proposed in Anthony's RFC is not very different from == vs.
===,
or in_array with the $strict argument set to true. And I certainly am
glad that PHP offers these options!

Those choices are just additional options you may use in your code or not,
not a global switch between to global modes.

And, even if you consider the community is already split, we are very
idealistic and consider that it can be unified again ;)

Thanks for all. Tell me where I was not clear.

François

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Theodore Brown [mailto:theodorejb@outlook.com]
Sent: Thursday, February 26, 2015 5:29 PM
To: internals@lists.php.net
Subject: [PHP-DEV] A different user perspective on scalar type
declarations

I am a full-time PHP developer responsible for maintaining several large
enterprise applications, as well as a number of libraries and personal
apps.
I have been following the scalar type proposals quite closely, as along
with
return type declarations, scalar types have the potential to reduce
errors,
simplify API documentation, and improve static code analysis.

I am in favor of Anthony's Scalar Type Declarations RFC, for two simple
reasons:

It doesn't change the behavior of existing weak types.

PHP has long had an emphasis on backwards compatibility, and I'm worried
that those not in favor of strict types are treating backwards
compatibility
more recklessly than they otherwise would in their fervor to avoid two
ways
of handling scalar types. In my experience dealing with large enterprise
apps,
however, there are hundreds of places where code relies on GET/POST
parameters being automatically trimmed when passed to a function
expecting an integer.
The current coercive proposal would deprecate this and later make it an
error.
To avoid these notices/errors when upgrading, developers may take the
"easy"
route of casting any input passed to a function expecting an int or
float.
This is the same "too strict may lead to too lax" problem pointed out by
the
coercive RFC itself. There's a reason that integer handling was actually
relaxed back in PHP 5.1 (see
http://php.net/manual/en/migration51.integer-parameters.php).
Why suddenly make the default more strict again?

I am not against tightening up some of the default weak conversions
(e.g. to
not allow "99 bugs" for an int type), but in my opinion this should be
done
very carefully, and separately from any scalar type declaration
proposal.
Major changes to the casting rules have the potential to seriously harm
PHP
7 adoption, especially in enterprises with large amounts of legacy code.
The
Scalar Type Declarations v0.5 RFC has the advantage here because it
"just
works" when type hints are added to existing code in the default weak
mode.

You may have a point there. As Francois said, he was in favor of allowing
leading and trailing spaces. I'll definitely reconsider. Would love to
hear any additional feedback you may have about the conversion rules!
My goal is to balance the 'Just works' aspect with the strict aspect, and
still be able to put it into one rule-set, because I believe this has some
inherent advantages.

Strict types are important in some cases.

When it comes to authentication and financial calculations (a couple of
areas
I routinely deal with) it is extremely important that errors are caught
and
fixed early in the development process. In financial or
security-sensitive
code, I would want any value with the wrong type (even a string like
"26")
to be flagged as an error when passed to a function expecting an
integer.

I agree completely; However, such use cases like this are a lot less
common than the situations where you do want sensible coercion to be
allowed. Not introducing language constructs to support strict typing
doesn't mean I think it's never useful. I think it's at the level where
it's better to leave it up to (very very simple) custom code, in the form
of if (!is_int($foo)) errorout();, as opposed to introducing a whole 2nd
mode into the language, with cognitive burden it brings. When I read
Anthony's comment about the random number generators a couple of days ago:
"I think the case you have to look at here is the target audience. Are you
looking to be all things to all users? Or are you attempting to be an
opinionated tool to help the 99%. Along with password_hash, I think this
random library serves the 99%."
I couldn't help but think the very same could be said about strict type
hints (paraphrasing it myself, "I think the case we have to look at here
is the target audience. Are we looking to be all things to all users? Or
are we attempting to be an opinionated tool to help the 99%. With coercive
types I think we serve the 99%." - whether it's 99% or 95% or 90% is
negotiable - but it doesn't change the takeaway, I think). Now, the same
can't be said when we use weak types. Weak type hints are completely
useless for developers who want strict type hints, as their behavior is
completely off from what they expect, and they'd never use them. But with
the newly proposed coercive type hints - the gap narrows radically. The
most common real world use cases strict campers brought up in the past as
problematic with weak types - are gone. We're still left with some useful
use cases for strict, but not at the level where it makes sense to add
language-level support, especially in the form of dual mode, with all its
downsides.

The option for type-based (rather than value-based) validation is
equally
important when it comes to return types. Unless I have missed something,
the "Coercive Types for Function Arguments" RFC currently doesn't deal
with
return types at all (they aren't mentioned in the RFC). Would it handle
scalar
return types the same way as it does function arguments? If I declare a
function to return an int, and I return a string instead (even if the
string is
numeric), there are many cases where it would be an unintentional error.
And if it errors depending on the value, rather than the type, it often
wouldn't be possible to catch the problem statically.

We'll update the RFC to explicitly mention return. Yes, return values
will be validated using the same coercive rules as function arguments -
similarly to how they're dealt with in the v0.5 RFC.

Here's a simple example of the advantage offered by strict types and
static
analysis in the Scalar Type Declarations v0.5 RFC:

<?php
declare(strict_types=1);

function getCustomerName(int $customerId): string {
    // look up customer name from database and return }

function getInvoiceByCustomer(int $customerId): Invoice {
    // retrieve invoice data and return object }

$id = filter_input(INPUT_GET, 'customer_id', FILTER_VALIDATE_INT);

if ($id === false) {
    echo 'Customer ID must be an integer'; } else {
    $customer = getCustomerName($id);
    $invoice = getInvoiceByCustomer($customer);
    // display invoice
}

Strict types + static analysis can tell you that this will fail (because
it's based
purely on types, and a string is being passed to a function expecting an
integer). Coercive typing cannot statically tell you that it will fail,
because it
doesn't know whether the string passed to getInvoiceByCustomer is
acceptable as an integer without also knowing its value.

Correct. But a static analyzer can tell you it MAY fail, just as easily
as a static analyzer for strict types can tell you it will fail. Now,
which is better is up for debate. Personally I think the latter is
better, or at the very least just as good. If, in fact, the string you're
passing is really a numeric string (which if I'm reading you're code
correctly, it probably is), then in the static case, seeing the error in
the static analyzer - or at runtime - you're likely to resort to explicit
casting. Explicit casting that may hide data loss if - for whatever
reason - what you get (in some error situation or unexpected flow) ends up
being a non-numeric string. In the coercive case - seeing the warning in
the static analyzer - you're likely to take a look at it and verify that
it's indeed getting the right value, but you'd keep it as-is, and let the
language do a better job at converting the string to an int than an
explicit cast would. This will actually result in more robust code that,
in the unexpected event that a non-numeric string is received in the
future - would reject it, instead of happily accepting it silently.

Conceptually, the optional
strict mode proposed in Anthony's RFC is not very different from == vs.
===,
or in_array with the $strict argument set to true. And I certainly am
glad
that PHP offers these options!

Happy you like it :) But === is very different than strict mode. When we
added it, it allowed you to do something that was just not possible to do
before - and that was actually a perfect fit for a fairly common usecase
(being able to differentiate between NULL and false and 0 in return
values, for instance). The same cannot be said about strict type hints.
They can be done easily today (using is_int() and friends), and - with the
presence of coercive type hints - they're not nearly as commonly needed as
===.

community into separate camps, I would say "It's too late!" The
community
has already been split over this issue for years.

Splitting isn't a binary thing. Of course, there are already lots of
different camps in the PHP community - procedural vs. OO, frameworks vs.
lean, etc. This would add additional fragmentation - as it doesn't
cleanly map into any of the existing splits that already exist.

Thanks for the feedback! It took me a while to answer this, I'm
definitely leaning towards accepting leading and trailing whitespace for
numeric strings now :)

Zeev

10 years ago by Dan Ackroyd — view source

unread

From: Theodore Brown [mailto:theodorejb@outlook.com]

Strict types are important in some cases.

I would want any value with the wrong type (even a string like
"26")
to be flagged as an error when passed to a function expecting an
integer.

I agree completely; However, such use cases like this are a lot less
common than the situations where you do want sensible coercion to be
allowed.

That's just not true on medium to large code bases, and if you think
that's true it's possibly an explanation of why you don't see why
people want strict types so much.

In most applications, the part of the code that is exposed to the
outside world and has to convert strings or unknown types into known
types is a very small layer at the outside edge of the application.

The vast majority of code written for non-trivial applications has no
contact with the outside world. Instead it only communicates to other
layers inside the application where the types required are fully
known, and so the parameters passed should already be in the correct
type. And so type coercion is at best unneeded, and usually not
wanted.

I can understand why people might only want to use weak types for
their code base, but for you to continually dismiss people's desire
for strict types after all this has been explained to you multiple
times is very depressing.

cheers
Dan

10 years ago by Mike Willbanks — view source

unread

Hello,

On Thu, Feb 26, 2015 at 12:49 PM, Dan Ackroyd danack@basereality.com
wrote:

From: Theodore Brown [mailto:theodorejb@outlook.com]

Strict types are important in some cases.

I would want any value with the wrong type (even a string like
"26")
to be flagged as an error when passed to a function expecting an
integer.

I agree completely; However, such use cases like this are a lot less
common than the situations where you do want sensible coercion to be
allowed.

That's just not true on medium to large code bases, and if you think
that's true it's possibly an explanation of why you don't see why
people want strict types so much.

I've worked on several very large and medium sized code bases and I would
prefer sensible coercion to be allowed here. This is likely more a matter
of presence. More information below.

In most applications, the part of the code that is exposed to the
outside world and has to convert strings or unknown types into known
types is a very small layer at the outside edge of the application.

This is true, however, the types that you are receiving back form a
multitude of data sources might be in a mixed format (databases for example
often provide representation back as a string, non-json based web services
provide mainly as a string, etc). While I know what my data looks like
and I know I am always going to get a "string" integer back I do not want
to have to type cast this each and every time. Or that I have a boolean
integer representation that is in a string... You get the point. Sure, I
could certainly go in and take 5 minutes and cast each one but I'm not
certain why the purpose is there... It specifically changes the
determination that PHP is a weakly typed language and all of a sudden I now
need to care that my string integer boolean is not actually a boolean.

The vast majority of code written for non-trivial applications has no
contact with the outside world. Instead it only communicates to other
layers inside the application where the types required are fully
known, and so the parameters passed should already be in the correct
type. And so type coercion is at best unneeded, and usually not
wanted.

Yes, we're talking about a service oriented architecture or busways, etc.
Even then, between multiple layers of the onion you might have things that
are going to [a]synchronous message queues, coming from extensions, going
against a file system, etc. A non-trivial application has to go against
all of these things and the types might be different for each one of them.
For instance, PHP's integer does not handle a big integer from a database
after it exceeds a certain amount of digits, you have to speak binary to a
screen but you have an integer representation and then you might even need
to convert to a string octal in some points. Overall, what I am saying
here is that it is a mixed bag of tricks and each developer happens to have
their preference to how the type system should and should not work.

I can understand why people might only want to use weak types for
their code base, but for you to continually dismiss people's desire
for strict types after all this has been explained to you multiple
times is very depressing.

I am not sure that Zeev is dismissing it so much as that he does not agree
with it and therefore he is doing his best to find an alternative that
remains within his vision of the PHP landscape. That is why we have
multiple options on the table at this point.

Regards,

Mike

10 years ago by Anthony Ferrara — view source

unread

Mike,

One point of clarification:

This is true, however, the types that you are receiving back form a
multitude of data sources might be in a mixed format (databases for example
often provide representation back as a string, non-json based web services
provide mainly as a string, etc). While I know what my data looks like
and I know I am always going to get a "string" integer back I do not want
to have to type cast this each and every time. Or that I have a boolean
integer representation that is in a string... You get the point. Sure, I
could certainly go in and take 5 minutes and cast each one but I'm not
certain why the purpose is there... It specifically changes the
determination that PHP is a weakly typed language and all of a sudden I now
need to care that my string integer boolean is not actually a boolean.

It's funny that you bring up boolean...

With the current coercive proposal, you will still need to worry about
the types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

Passing boolean(false) where an integer is expected will generate an
error. This is a common practice, specifically around internal
functions. Example:
https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4eeea5fd9fe8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

So yes, you'll still need to go in and cast each one in both RFCs
(or handle the errors properly).

The difference is with the dual-mode RFC you can choose not to have to
cast and keep everything as-is today (or more specifically, you need
to explicitly choose strict mode). And you can have user-land behave
identically to internals in both cases.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Thursday, February 26, 2015 9:29 PM
To: Mike Willbanks
Cc: Dan Ackroyd; Zeev Suraski; Theodore Brown; internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Mike,

One point of clarification:

This is true, however, the types that you are receiving back form a
multitude of data sources might be in a mixed format (databases for
example often provide representation back as a string, non-json based
web
services
provide mainly as a string, etc). While I know what my data looks like
and I know I am always going to get a "string" integer back I do not
want to have to type cast this each and every time. Or that I have a
boolean
integer representation that is in a string... You get the point. Sure,
I
could certainly go in and take 5 minutes and cast each one but I'm not
certain why the purpose is there... It specifically changes the
determination that PHP is a weakly typed language and all of a sudden
I now need to care that my string integer boolean is not actually a
boolean.

It's funny that you bring up boolean...

With the current coercive proposal, you will still need to worry about the
types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

That's true, but a lot, lot less.

Passing boolean(false) where an integer is expected will generate an
error.
This is a common practice, specifically around internal functions.
Example:
https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4eeea5fd9f
e8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

It's actually not nearly as common as one might think, and arguably - it's
may be hiding bug or at least some misunderstanding about the semantics of
the argument (not always, but sometimes). Warning about it (as deprecated)
makes perfect sense. I'm going to send some data from real world apps later
today or early tomorrow. Spoiler - not a lot of breakage at all, and
coercive rules seem to have a remarkably good signal to noise ratio.

The difference is with the dual-mode RFC you can choose not to have to
cast
and keep everything as-is today (or more specifically, you need to
explicitly
choose strict mode). And you can have user-land behave identically to
internals in both cases.

Put another way, coercive gets to keep a single, sensible conversion
rule-set in PHP - with relatively minor updates needed over the course of
several years. And contrary to what might be implied here, it would require
a LOT less casting - while still taking advantage of much better data
sanitation.

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

With the current coercive proposal, you will still need to worry about the
types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

That's true, but a lot, lot less.

We apparently have a different definition of "less". Your proposal
requires you to worry about every type in every line of code that ever
existed. Yes, there are fewer dangerous type change errors, but you
need to look at every line of your application to find them.

In my dual-mode proposal, the only place you need to worry about this
is code that's explicitly opted-in to the rules via a per-file switch.
So by default, nobody gets any change. If you opt-in however, then you
get the full rules.

Passing boolean(false) where an integer is expected will generate an
error.
This is a common practice, specifically around internal functions.
Example:
https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4eeea5fd9f
e8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

It's actually not nearly as common as one might think, and arguably - it's
may be hiding bug or at least some misunderstanding about the semantics of
the argument (not always, but sometimes). Warning about it (as deprecated)
makes perfect sense. I'm going to send some data from real world apps later
today or early tomorrow. Spoiler - not a lot of breakage at all, and
coercive rules seem to have a remarkably good signal to noise ratio.

If Symfony or PHPUnit didn't error in these cases in more than one
place, I'd be inclined to agree with you (about not being as common as
one might think). But considering two of the best architected and
tested applications in the ecosystem both error in non-trivial
amounts, I think it's fair to say...

The difference is with the dual-mode RFC you can choose not to have to
cast
and keep everything as-is today (or more specifically, you need to
explicitly
choose strict mode). And you can have user-land behave identically to
internals in both cases.

Put another way, coercive gets to keep a single, sensible conversion
rule-set in PHP - with relatively minor updates needed over the course of
several years. And contrary to what might be implied here, it would require
a LOT less casting - while still taking advantage of much better data
sanitation.

I think you're REALLY downplaying the level of effort that's required
for the updates you're requiring users make. And that scares me.

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Thursday, February 26, 2015 9:54 PM
To: Zeev Suraski
Cc: Mike Willbanks; Dan Ackroyd; Theodore Brown; internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Zeev,

With the current coercive proposal, you will still need to worry
about the
types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

That's true, but a lot, lot less.

We apparently have a different definition of "less". Your proposal
requires
you to worry about every type in every line of code that ever existed.
Yes,
there are fewer dangerous type change errors, but you need to look at
every
line of your application to find them.

Realistically, that's not how it's going to be done, but rather through
testing (either running a test suite, manual tests, or even running it in
production and seeing deprecation warnings in the log, for which you'd have
years to fix). I don't see application audits being invoked for this, not
by a long shot.

When I say you need to worry about them a lot less - I mean that you can get
90%+ of the benefits of strict mode, for ALL of your code, with a tiny
fraction of the hassle.
From past posts, it's very clear you believe that large projects would
gradually migrate to being strict across the board. If we compared the
efforts involved in both cases, clearly, the amount of effort with coercive
is a lot smaller - as you let the language do most of the work for you. In
the few cases where it might be overzealous - it'll alert you to it, and you
can easily fix it. In the strict case, yes, you would pretty much have to
audit your entire app. Yes, you can do it gradually file by file over the
course of a few years - but the combined cost is a lot higher, there would
be a lot more code changes and the resulting code - in all likelihood -
hiding more bugs due to a lot more explicit casting.

Passing boolean(false) where an integer is expected will generate an
error.
This is a common practice, specifically around internal functions.
Example:

https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4eeea5fd9f

e8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

It's actually not nearly as common as one might think, and arguably -
it's may be hiding bug or at least some misunderstanding about the
semantics of the argument (not always, but sometimes). Warning about
it (as deprecated) makes perfect sense. I'm going to send some data
from real world apps later today or early tomorrow. Spoiler - not a
lot of breakage at all, and coercive rules seem to have a remarkably
good
signal to noise ratio.

If Symfony or PHPUnit didn't error in these cases in more than one place,
I'd
be inclined to agree with you (about not being as common as one might
think). But considering two of the best architected and tested
applications in
the ecosystem both error in non-trivial amounts, I think it's fair to
say...

I think that once we see some more conclusive test results from these
projects, that go slightly deeper than just running the code and seeing
deprecation warnings - a lot more people would be inclined to agree.
Perhaps even you :)

What constitutes major breakage in your mind, for a project as large as
PHPUnit, that's acceptable to fix over the course of several years? 20
issues? 100? 200? 500? More?
What if many of these issues are actually pointing out potential real world
problems, and changing them would result in better more robust code?

The difference is with the dual-mode RFC you can choose not to have
to cast and keep everything as-is today (or more specifically, you
need to explicitly choose strict mode). And you can have user-land
behave identically to internals in both cases.

Put another way, coercive gets to keep a single, sensible conversion
rule-set in PHP - with relatively minor updates needed over the course
of several years. And contrary to what might be implied here, it
would require a LOT less casting - while still taking advantage of
much better data sanitation.

I think you're REALLY downplaying the level of effort that's required for
the
updates you're requiring users make. And that scares me.

Given that I actually ran Magento, Drupal, WordPress and both the Symfony
and ZF2 skeleton apps with the new coercive ruleset, and I already know
there's no reason at all to be scared. More on that later today or early
tomorrow - running more tests.

Thanks,

Zeev

10 years ago by Anthony Ferrara — view source

unread

Zeev,

We apparently have a different definition of "less". Your proposal
requires
you to worry about every type in every line of code that ever existed.
Yes,
there are fewer dangerous type change errors, but you need to look at
every
line of your application to find them.

Realistically, that's not how it's going to be done, but rather through
testing (either running a test suite, manual tests, or even running it in
production and seeing deprecation warnings in the log, for which you'd have
years to fix). I don't see application audits being invoked for this, not
by a long shot.

When I say you need to worry about them a lot less - I mean that you can get
90%+ of the benefits of strict mode, for ALL of your code, with a tiny
fraction of the hassle.
From past posts, it's very clear you believe that large projects would
gradually migrate to being strict across the board. If we compared the

Well then, you've heard things I've never said.

I've said, time and time again, that weak and strict should live side
by side in the same application. That you'd keep the outside code
weak, and the critical parts will be strict. That's why dual mode is
good. Not because it gives a "migration path", but lets people choose
what's appropriate for their needs.

So no, I don't believe that large projects would or should migrate to
be strict across the board. If I did, I wouldn't make it per-file.

If Symfony or PHPUnit didn't error in these cases in more than one place,
I'd
be inclined to agree with you (about not being as common as one might
think). But considering two of the best architected and tested
applications in
the ecosystem both error in non-trivial amounts, I think it's fair to
say...

I think that once we see some more conclusive test results from these
projects, that go slightly deeper than just running the code and seeing
deprecation warnings - a lot more people would be inclined to agree.
Perhaps even you :)

What constitutes major breakage in your mind, for a project as large as
PHPUnit, that's acceptable to fix over the course of several years? 20
issues? 100? 200? 500? More?
What if many of these issues are actually pointing out potential real world
problems, and changing them would result in better more robust code?

The difference is with the dual-mode RFC you can choose not to have
to cast and keep everything as-is today (or more specifically, you
need to explicitly choose strict mode). And you can have user-land
behave identically to internals in both cases.

Put another way, coercive gets to keep a single, sensible conversion
rule-set in PHP - with relatively minor updates needed over the course
of several years. And contrary to what might be implied here, it
would require a LOT less casting - while still taking advantage of
much better data sanitation.

I think you're REALLY downplaying the level of effort that's required for
the
updates you're requiring users make. And that scares me.

Given that I actually ran Magento, Drupal, WordPress and both the Symfony
and ZF2 skeleton apps with the new coercive ruleset, and I already know
there's no reason at all to be scared. More on that later today or early
tomorrow - running more tests.

They run without triggering E_DEPRECATED errors? Because that means
they will break with 8 (which by your own words is closer to 2-3 years
out than 9-10).

Anthony

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Anthony Ferrara [mailto:ircmaxell@gmail.com]
Sent: Thursday, February 26, 2015 10:24 PM
To: Zeev Suraski
Cc: internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Zeev,

When I say you need to worry about them a lot less - I mean that you
can get 90%+ of the benefits of strict mode, for ALL of your code,
with a tiny fraction of the hassle.
From past posts, it's very clear you believe that large projects would
gradually migrate to being strict across the board. If we compared
the

Well then, you've heard things I've never said.

I'm referring to this:

A few hundred LOC script would likely never enable strict mode, and would
be just fine because of it (you can mentally keep a few hundred LOC in
your head at one time).

The larger the project, the more the contributors, the more the benefits
to using strict mode. That's not to say that large projects would
immediately go full strict. It's just pointing out that the tradeoffs
would need to be weighed by the authors.

The way I read it, "not immediately going full strict" implied they'd
gradually go full strict, as opposed to not at all (in which case saying
"That's not to say that large projects will necessarily want to go full
strict " would have perhaps been more appropriate). Apologies if I
misunderstood you.

Given that I actually ran Magento, Drupal, WordPress and both the
Symfony and ZF2 skeleton apps with the new coercive ruleset, and I
already know there's no reason at all to be scared. More on that
later today or early tomorrow - running more tests.

They run without triggering E_DEPRECATED errors? Because that means they
will break with 8 (which by your own words is closer to 2-3 years out than
9-
10).

Not without, but at least in my testing so far - the things that are picked
up by the coercive rule-set are quite precise and have excellent signal to
noise ratio. We're still working on the patch so I don't want to provide
the results just yet because they might be bogus, but I'll share them as
soon as they're stable.

Thanks,

Zeev

10 years ago by francois@php.net — view source

unread

De : Anthony Ferrara [mailto:ircmaxell@gmail.com]

They run without triggering E_DEPRECATED errors? Because that means
they will break with 8 (which by your own words is closer to 2-3 years
out than 9-10).

Absolutely no date is planned to switch E_DEPRECATED to E_RECOVERABLE_ERROR. It must be clear for everyone that we don't have to hurry there. My personal opinion is that it cannot be before at least 5 years, and maybe more. If I had to announce a major version, I would say 9, not 8. And no decision will be made before we have a large consensus about it. Once again, there will be no good reason to hurry for it.

Zeev, can't we put a sentence in the RFC where we state that the E_DEPRECATED stage cannot be turned off before, say, 5 years at least ? Once the statement is voted with the RFC, users are protected.

Regards

François

10 years ago by francois@php.net — view source

unread

The RFC is now updated to state that changing E_DEPRECATED to fatal error cannot in any way happen before a delay of 5 years, starting with first stable PHP distribution containing the STH feature.

-----Message d'origine-----
De : François Laupretre [mailto:francois@php.net]
Envoyé : vendredi 27 février 2015 02:41
À : 'Anthony Ferrara'; 'Zeev Suraski'
Cc : internals@lists.php.net
Objet : RE: [PHP-DEV] A different user perspective on scalar type
declarations

De : Anthony Ferrara [mailto:ircmaxell@gmail.com]

They run without triggering E_DEPRECATED errors? Because that means
they will break with 8 (which by your own words is closer to 2-3 years
out than 9-10).

Absolutely no date is planned to switch E_DEPRECATED to
E_RECOVERABLE_ERROR. It must be clear for everyone that we don't have to
hurry there. My personal opinion is that it cannot be before at least 5 years,
and maybe more. If I had to announce a major version, I would say 9, not 8.
And no decision will be made before we have a large consensus about it.
Once again, there will be no good reason to hurry for it.

Zeev, can't we put a sentence in the RFC where we state that the
E_DEPRECATED stage cannot be turned off before, say, 5 years at least ?
Once the statement is voted with the RFC, users are protected.

Regards

François

10 years ago by Mike Willbanks — view source

unread

Anthony,

On Thu, Feb 26, 2015 at 1:29 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Mike,

One point of clarification:

This is true, however, the types that you are receiving back form a
multitude of data sources might be in a mixed format (databases for
example
often provide representation back as a string, non-json based web
services
provide mainly as a string, etc). While I know what my data looks like
and I know I am always going to get a "string" integer back I do not want
to have to type cast this each and every time. Or that I have a boolean
integer representation that is in a string... You get the point. Sure,
I
could certainly go in and take 5 minutes and cast each one but I'm not
certain why the purpose is there... It specifically changes the
determination that PHP is a weakly typed language and all of a sudden I
now
need to care that my string integer boolean is not actually a boolean.

It's funny that you bring up boolean...

With the current coercive proposal, you will still need to worry about
the types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

For some unbeknown reason I was inside of my head going ok, i have a string
integer so that would make an integer and then it would make a boolean.
Thank you for pointing out my obvious miss there :)

Passing boolean(false) where an integer is expected will generate an
error. This is a common practice, specifically around internal
functions. Example:

https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4eeea5fd9fe8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

So yes, you'll still need to go in and cast each one in both RFCs
(or handle the errors properly).

This is certainly a common case, actually quite often for database purposes
do we need to handle booleans to integer conversions which my integer comes
back as a string (depending on which extension of course) and which type
field it is.

The difference is with the dual-mode RFC you can choose not to have to
cast and keep everything as-is today (or more specifically, you need
to explicitly choose strict mode). And you can have user-land behave
identically to internals in both cases.

Anthony

Mike

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Mike Willbanks [mailto:pencap@gmail.com]
Sent: Thursday, February 26, 2015 9:46 PM
To: Anthony Ferrara
Cc: Dan Ackroyd; Zeev Suraski; Theodore Brown; internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Anthony,

On Thu, Feb 26, 2015 at 1:29 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:

Mike,

One point of clarification:

This is true, however, the types that you are receiving back form a
multitude of data sources might be in a mixed format (databases
for example
often provide representation back as a string, non-json based web
services
provide mainly as a string, etc). While I know what my data looks
like
and I know I am always going to get a "string" integer back I do not
want
to have to type cast this each and every time. Or that I have a
boolean
integer representation that is in a string... You get the point.
Sure,
I
could certainly go in and take 5 minutes and cast each one but I'm
not
certain why the purpose is there... It specifically changes the
determination that PHP is a weakly typed language and all of a
sudden I now
need to care that my string integer boolean is not actually a
boolean.

It's funny that you bring up boolean...

With the current coercive proposal, you will still need to worry about
the types: https://wiki.php.net/rfc/coercive_sth#coercion_rules

For some unbeknown reason I was inside of my head going ok, i have a
string
integer so that would make an integer and then it would make a boolean.
Thank you for pointing out my obvious miss there :)

Passing boolean(false) where an integer is expected will generate an
error. This is a common practice, specifically around internal
functions. Example:
https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4ee
ea5fd9fe8916859a07430b94cf42/src/Util/ErrorHandler.php#L58

So yes, you'll still need to go in and cast each one in both RFCs
(or handle the errors properly).

This is certainly a common case, actually quite often for database
purposes
do we need to handle booleans to integer conversions which my integer
comes back as a string (depending on which extension of course) and which
type field it is.

Can you explain that in a bit more detail? What's the data flow exactly, in
both directions?

Thanks!

Zeev

10 years ago by Mike Willbanks — view source

unread

Zeev,

-----Original Message-----
From: Mike Willbanks [mailto:pencap@gmail.com]
Sent: Thursday, February 26, 2015 9:46 PM
To: Anthony Ferrara
Cc: Dan Ackroyd; Zeev Suraski; Theodore Brown; internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Anthony,

On Thu, Feb 26, 2015 at 1:29 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:
  Mike,

  One point of clarification:

  > This is true, however, the types that you are receiving back
form a
  > multitude of data sources might be in a mixed format (databases
for example
> often provide representation back as a string, non-json based web
services
> provide mainly as a string, etc). While I know what my data
looks
like
> and I know I am always going to get a "string" integer back I do
not
want
> to have to type cast this each and every time. Or that I have a
boolean
> integer representation that is in a string... You get the point.
Sure,
I
> could certainly go in and take 5 minutes and cast each one but
I'm
not
> certain why the purpose is there... It specifically changes the
> determination that PHP is a weakly typed language and all of a
sudden I now
> need to care that my string integer boolean is not actually a
boolean.
  It's funny that you bring up boolean...

  With the current coercive proposal, you will still need to worry
about
  the types: https://wiki.php.net/rfc/coercive_sth#coercion_rules
For some unbeknown reason I was inside of my head going ok, i have a
string
integer so that would make an integer and then it would make a boolean.
Thank you for pointing out my obvious miss there :)
  Passing boolean(false) where an integer is expected will generate
an
  error. This is a common practice, specifically around internal
  functions. Example:
  https://github.com/sebastianbergmann/phpunit/blob/a4e23a10d4ee
ea5fd9fe8916859a07430b94cf42/src/Util/ErrorHandler.php#L58
  So yes, you'll still need to go in and cast each one **in both
RFCs**
  (or handle the errors properly).
This is certainly a common case, actually quite often for database
purposes
do we need to handle booleans to integer conversions which my integer
comes back as a string (depending on which extension of course) and which
type field it is.
Can you explain that in a bit more detail? What's the data flow exactly,
in
both directions?

Here is the most basic example and something that people are going to often
run into. You see this type of code with hydrators, mappers, etc.
Ultimately the end result is going to be the same:

https://gist.github.com/mwillbanks/04e3be68f737c25984ab

I'm not certain if there is a need to explain that bit a bit more. But a
string "1" as a bool should work as with a string "0". For instance, today
we have the following for both string's 0 and 1:

$bool = "0";
var_dump($bool); // "0""
var_dump($bool == false); // true
var_dump($bool == true); // false
var_dump($bool == 0); // true
var_dump($bool == 1); // false

Thanks!

Zeev

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Mike Willbanks [mailto:pencap@gmail.com]
Sent: Thursday, February 26, 2015 10:43 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Here is the most basic example and something that people are going to
often
run into. You see this type of code with hydrators, mappers, etc.
Ultimately
the end result is going to be the same:

https://gist.github.com/mwillbanks/04e3be68f737c25984ab

I'm not certain if there is a need to explain that bit a bit more. But a
string
"1" as a bool should work as with a string "0". For instance, today we
have
the following for both string's 0 and 1:

$bool = "0";

var_dump($bool); // "0""

var_dump($bool == false); // true
var_dump($bool == true); // false
var_dump($bool == 0); // true
var_dump($bool == 1); // false

OK, so essentially you're saying that you expect "1" and "0" to be coerced
into booleans. This is something we've been wondering about in the Coercive
RFC, and in the original version we allowed all scalars to be coerced into
bool - but not the other way around. Right now the RFC only allows for
integer->bool coercion, but the database use case seems to be a pretty
strong one. The options we considered were int only, int+string or none at
all. Float is the one that really stands out as pretty meaningless.

I think the opposite side is a lot trickier - converting from bool into
string (or any other scalar type for that matter) is quite likely to hide
bugs. We've found a bunch of bugs like that today, where return values of
strpos() are fed without validation into other APIs that expect an offset,
and similar use cases. Such code patterns are bound to be hiding bugs, at
least in some cases. I'm guessing that direction is less of an issue in
your mind, correct?

Zeev

10 years ago by Mike Willbanks — view source

unread

-----Original Message-----
From: Mike Willbanks [mailto:pencap@gmail.com]
Sent: Thursday, February 26, 2015 10:43 PM
To: Zeev Suraski
Cc: PHP Internals
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

Here is the most basic example and something that people are going to
often
run into. You see this type of code with hydrators, mappers, etc.
Ultimately
the end result is going to be the same:

https://gist.github.com/mwillbanks/04e3be68f737c25984ab

I'm not certain if there is a need to explain that bit a bit more. But a
string
"1" as a bool should work as with a string "0". For instance, today we
have
the following for both string's 0 and 1:

$bool = "0";

var_dump($bool); // "0""

var_dump($bool == false); // true
var_dump($bool == true); // false
var_dump($bool == 0); // true
var_dump($bool == 1); // false

OK, so essentially you're saying that you expect "1" and "0" to be coerced
into booleans. This is something we've been wondering about in the
Coercive
RFC, and in the original version we allowed all scalars to be coerced into
bool - but not the other way around. Right now the RFC only allows for
integer->bool coercion, but the database use case seems to be a pretty
strong one. The options we considered were int only, int+string or none at
all. Float is the one that really stands out as pretty meaningless.

Yes, the database use case and exterior data has been my main concern over
the type hint proposals. Now, this could also be changed (fixed, etc) on a
different layer (aka database extensions to deal with native types) but
that is likely far more to bite off than one would want at this point. It
is relatively painless to go in and cast all of those types but the amount
of code out there which people are going to just 'expect' this to work will
be fairly large and one of those cases that will possibly be cause for
migration concerns.

I think the opposite side is a lot trickier - converting from bool into
string (or any other scalar type for that matter) is quite likely to hide
bugs. We've found a bunch of bugs like that today, where return values of
strpos() are fed without validation into other APIs that expect an offset,
and similar use cases. Such code patterns are bound to be hiding bugs, at
least in some cases. I'm guessing that direction is less of an issue in
your mind, correct?

Yes, direction is less of an issue.

10 years ago by Zeev Suraski — view source

unread

Yes, the database use case and exterior data has been my main concern over
the type hint proposals. Now, this could also be changed (fixed, etc) on
a
different layer (aka database extensions to deal with native types) but
that is
likely far more to bite off than one would want at this point. It is
relatively
painless to go in and cast all of those types but the amount of code out
there
which people are going to just 'expect' this to work will be fairly large
and
one of those cases that will possibly be cause for migration concerns.

Thanks a lot for the input! We'll reconsider accepting "1"/"0" as valid
Booleans as the original proposal did.

Zeev

10 years ago by Lester Caine — view source

unread

Yes, the database use case and exterior data has been my main concern over

the type hint proposals. Now, this could also be changed (fixed, etc) on
a
different layer (aka database extensions to deal with native types) but
that is
likely far more to bite off than one would want at this point. It is
relatively
painless to go in and cast all of those types but the amount of code out
there
which people are going to just 'expect' this to work will be fairly large
and
one of those cases that will possibly be cause for migration concerns.
Thanks a lot for the input! We'll reconsider accepting "1"/"0" as valid
Booleans as the original proposal did.

Using a 'char' or other binary field type as multiple boolean flags also
resolve to 1 and 0 when pulled apart. The debate from the other side is
if there is a need for a 'boolean' field type.
http://www.firebirdmanual.com/firebird/en/firebird-manual/2/simulating-boolean-in-firebird/51
and http://www.firebirdfaq.org/faq12/ show some options used to get
around the various input problems. So like PHP - no agreement on what
BOOL is.

FB3.0 is still in development, but adds a bool field for which IS_TRUE
and IS_FALSE are not a comfortable fit because for any database a field
can have a value or be null (not set) ... this therefore requires using
a zval other than IS_TRUE/IS_FALSE to store a boolean value properly!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by francois@php.net — view source

unread

De : Lester Caine [mailto:lester@lsces.co.uk]

FB3.0 is still in development, but adds a bool field for which IS_TRUE
and IS_FALSE are not a comfortable fit because for any database a field
can have a value or be null (not set) ... this therefore requires using
a zval other than IS_TRUE/IS_FALSE to store a boolean value properly!

Can't you consider converting to 'boolean or null', giving three potential values.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by francois@php.net — view source

unread

De : Zeev Suraski [mailto:zeev@zend.com]

Thanks a lot for the input! We'll reconsider accepting "1"/"0" as valid
Booleans as the original proposal did.

Yes. Same conversion rules : empty string and "0" are false, all the rest is true.

For consistency reasons, we can extend the "0" case to accept leading zeroes and leading and trailing blanks, as for a numeric string.

Probably no need to go as far as common numeric string case, where "0.0" would be false too, but to be discussed. The rule may be more intuitive if we say 'any numeric string that converts to a null number is considered as false'.

Just a detail.

Regards

François

10 years ago by Lester Caine — view source

unread

Yes. Same conversion rules : empty string and "0" are false, all the rest is true.

For consistency reasons, we can extend the "0" case to accept leading zeroes and leading and trailing blanks, as for a numeric string.

Just been checking and yes if a multi-bit binary field is a string of
'0's it's false - nothing to process - but any bit set gives true and
one scans for the set bit. This is one where the leading zero may affect
the numeric conversion, but you can read it as a array of 1's and 0's ...

--
Lester Caine - G8HFL

10 years ago by francois@php.net — view source

unread

Hi Anthony,

Passing boolean(false) where an integer is expected will generate an
error. This is a common practice, specifically around internal
functions. Example:

I think he was talking about receiving integer as boolean, which we support, not boolean as integer.

Regards

François

10 years ago by Rasmus Lerdorf — view source

unread

In most applications, the part of the code that is exposed to the
outside world and has to convert strings or unknown types into known
types is a very small layer at the outside edge of the application.

The vast majority of code written for non-trivial applications has no
contact with the outside world. Instead it only communicates to other
layers inside the application where the types required are fully
known, and so the parameters passed should already be in the correct
type. And so type coercion is at best unneeded, and usually not
wanted.

Looking through some very large code bases I am involved with this
argument falls down on two main points:

There is a lot of data coming from memcache/mysql/pgsql as strings
that is not typed.
There are quite a few objects being shuffled around interchangeably
with strings and making use __toString magic.

Type coercion is needed in both cases here. This is in the backend code
far for user input. It would take a whole lot of refactoring to be able
to turn on strict for this code. Especially getting rid of all use of
__toString objects. It would require force-casts to make this backend
code work in strict mode and then we are back to square one.

-Rasmus

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Dan Ackroyd [mailto:danack@basereality.com]
Sent: Thursday, February 26, 2015 8:49 PM
To: Zeev Suraski
Cc: Theodore Brown; internals@lists.php.net
Subject: Re: [PHP-DEV] A different user perspective on scalar type
declarations

From: Theodore Brown [mailto:theodorejb@outlook.com] 2. Strict types
are important in some cases.

I would want any value with the wrong type (even a string like
"26")
to be flagged as an error when passed to a function expecting an
integer.

I agree completely; However, such use cases like this are a lot less
common than the situations where you do want sensible coercion to be
allowed.

That's just not true on medium to large code bases, and if you think
that's
true it's possibly an explanation of why you don't see why people want
strict
types so much.

In most applications, the part of the code that is exposed to the outside
world and has to convert strings or unknown types into known types is a
very
small layer at the outside edge of the application.

The vast majority of code written for non-trivial applications has no
contact
with the outside world. Instead it only communicates to other layers
inside
the application where the types required are fully known, and so the
parameters passed should already be in the correct type. And so type
coercion is at best unneeded, and usually not wanted.

I can understand why people might only want to use weak types for their
code base, but for you to continually dismiss people's desire for strict
types
after all this has been explained to you multiple times is very
depressing.

First, I'd like to point out that I'm not talking about 'weak' (dynamic)
typing, but the new coercive typing rules. The coercive typing rules are a
lot stricter than the dynamic conversion rules that are employed throughout
PHP.

Now, when you mention the outside the world - what are you referring to? If
you only refer to user input, then I agree. But if you add data sources -
such as databases, web services, filesystem and others - then I disagree.
Even in large projects, PHP interacts predominantly with such data sources
(as well as user input, of course) - and these all provide data
predominantly as strings. Pure computations with no feed of outside data
are not nearly as common a use case for PHP (and web apps/services in
general). Coercive typing is a much better fit not just handling user
data - but also handling all kinds of input data, regardless of where it
comes from. Strict is useful in narrow cases - mainly math-intensive
computations of different sorts. That's a valid but not nearly as common a
use case for PHP. Given that a 2nd mode comes at a price (complexity,
people assuming it is what it's not, further technological division) - it's
just not worth it. We need to aim for the one mode that caters to the 90+%,
and not try to be everything to everyone at the cost of much increased
complexity, not when they can very easily implement it in very simplistic
and easily optimizable custom code.

Zeev

10 years ago by Yasuo Ohgaki — view source

unread

Hi Zeev,

You may have a point there. As Francois said, he was in favor of allowing
leading and trailing spaces. I'll definitely reconsider.

If we consider existing code, leading/trailing spaces may need to be
allowed.
Without considering compatibility issues, leading/trailing spaces should be
validated or removed by user input validation/filter code in the first
place.

I think many users use "$" for the end of string data for regex, but it
includes
newline. To be precise, "\z" should be used for both PCRE and mbregex.

http://perldoc.perl.org/perlre.html

Trailing newline is invalid. Leading/trailing spaces may be considered as
the
same invalid data to be strict.

This is what I thought for this.
I don't have strong opinion.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by francois@php.net — view source

unread

Hi,

If we allow for trailing blanks, we'll allow the same set of chars that is already allowed for leading blanks.

I say'blanks' and not 'whitespaces', because here is the list currently allowed as leading blank (with ascii values) : Space (32) , tab (9) , linefeed (10), carriage-return (13), vertical tab (11), and form feed (12).

-----Message d'origine-----
De : yohgaki@gmail.com [mailto:yohgaki@gmail.com] De la part de Yasuo
Ohgaki
Envoyé : vendredi 27 février 2015 11:25
À : Zeev Suraski
Cc : Theodore Brown; internals@lists.php.net
Objet : Re: [PHP-DEV] A different user perspective on scalar type
declarations

Hi Zeev,

You may have a point there. As Francois said, he was in favor of allowing
leading and trailing spaces. I'll definitely reconsider.

If we consider existing code, leading/trailing spaces may need to be
allowed.
Without considering compatibility issues, leading/trailing spaces should be
validated or removed by user input validation/filter code in the first
place.

I think many users use "$" for the end of string data for regex, but it
includes
newline. To be precise, "\z" should be used for both PCRE and mbregex.

http://perldoc.perl.org/perlre.html

Trailing newline is invalid. Leading/trailing spaces may be considered as
the
same invalid data to be strict.

This is what I thought for this.
I don't have strong opinion.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Lester Caine — view source

unread

If we allow for trailing blanks, we'll allow the same set of chars
that is already allowed for leading blanks.

I say'blanks' and not 'whitespaces', because here is the list
currently allowed as leading blank (with ascii values) : Space (32) ,
tab (9) , linefeed (10), carriage-return (13), vertical tab (11), and
form feed (12).

Depending on the way a database is configured, one may be using char
fields which are 'blank packed' fixed length, or varchar which would
normally only include white space when it is actually added. It's not
uncommon though to cast numeric fields to 'char' to create fixed length
records and I would not like to say how many legacy systems still use
that approach for building tables of data?

--
Lester Caine - G8HFL

10 years ago by Yasuo Ohgaki — view source

unread

Hi Francois,

On Fri, Feb 27, 2015 at 10:12 PM, François Laupretre francois@php.net
wrote:

If we allow for trailing blanks, we'll allow the same set of chars that
is already allowed for leading blanks.

I say'blanks' and not 'whitespaces', because here is the list currently
allowed as leading blank (with ascii values) : Space (32) , tab (9) ,
linefeed (10), carriage-return (13), vertical tab (11), and form feed (12).

I agree.
Emitting E_STRICT and encouraging users to have proper validation/filer is
far better. IMHO.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Matthew Leverton — view source

unread

I am a full-time PHP developer responsible for maintaining several large
enterprise applications, as well as a number of libraries and personal apps.
I have been following the scalar type proposals quite closely, as along with
return type declarations, scalar types have the potential to reduce errors,
simplify API documentation, and improve static code analysis.

I am in favor of Anthony's Scalar Type Declarations RFC, for two simple reasons:

It doesn't change the behavior of existing weak types.

Strict types are important in some cases.

After carefully reviewing both proposals and testing the
implementations out, and switching opinions many times, I've come to
the same conclusion as this. (I'm glad the define() syntax was
simplified.)

With more and more discussion, I feel like the coercive version is
just degrading back into the same weak casts as we already have. If I
can pass null or " 1 " or bool to an int, then it becomes less like a
different RFC and more like the same strict types RFC but without the
strict mode!

To be clear, I'd actually be fine with a weak-only implementation that
follows the same exact rules as the explicit casts. And I'm okay with
strict-mode optionally tacked on top of that -- because it can be
useful and won't get in my way. But I'm no longer in favor of any
in-between coercive implementation.

Not a voter, so just my 2 cents.

--
Matthew Leverton

A different user perspective on scalar type declarations

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL