scalar type-casting

8 years ago by Rasmus Schultz — view source

unread

Since PHP 7.0, I've started using scalar type-hints a lot, and it got me
thinking about scalar type-casting.

After poking through existing RFC's, it turns out, I was going to propose
precisely the thing that Anthony Ferrara proposed in 2012:

https://wiki.php.net/rfc/object_cast_to_types

In the light of scalar type-hints, I feel this RFC is now much more
relevant and useful than it was then.

One thing in this RFC jumps out at me though:

"when an internal function accepts a typed parameter by reference, if the
magic cast method is defined on the passed in object, an error is raised as
the cast cannot be performed because of the reference."

I'm afraid this is inconsistent with the behavior of built-in scalar
type-casts.

For example:

function add_one(int &$value) { $value += 1; }
$one = "1";
add_one($one);
var_dump($one); // int(2)

That is, an implicit type-cast made by passing e.g. a string to an int
reference argument has the side-effect of overwriting the input variable.

This behavior may be "okay" in the case of scalars, which, for the most
part, can just ping-pong between actual types - like, if someone were to
subsequently append to what they thought was a string in the above example,
the string-turned-integer would just convert itself back to a string.

The situation would be very different with an object being passed as
argument and cast to integer - if the object was simply replaced with an
integer as a side-effect, clearly this would have much more serious
ramifications than with scalars which can probably be cast back and forth
between various scalar types.

I'm guessing, at the time when scalar type-hints were introduced, you
likely weighed the pros and cons while designing this behavior and decided
it's "good enough", since it's damn near impossible to define another
rational behavior that is side-effect free and would also do something
meaningful with references (?)

It seems that references are once again the culprit that inspired "weird"
design-decisions such as side-effects.

I would call again for the deprecation/removal of references, but I know
that's a major language BC break and very unlikely to bear fruit, so I
won't suggest that.

Instead, I would like you to consider another, much smaller BC break, much
less likely to affect most code: rather than type-casting values when
passed by reference, instead type-check them, and error on mismatch.

That is, in the example above, add_one($one) would trigger an error,
because the variable you passed by reference isn't of the correct type.

I would need to refactor that code slightly, and introduce an intermediary
variable that is actually an integer, then call the function - or in other
words, I would need to write code that expresses what really happens, the
fact that the function operates on an integer variable in the calling scope:

$one = "1";
$one_int = (int) $one;
add_one($one_int);

This is much safer and much more transparent than the potentially very
surprising side-effect of having your local variable overwritten with a
different type.

The problem I'm describing is pretty serious for the one type-cast that
exists at present: __toString()

Example:

class Foo { public function __toString() { return "foo"; } }
function append_to(string &$str) { $str .= "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

In this example, the caller's instance of Foo gets wiped out and replaced
by a string - the "ping pong type-casting" that saved us in the previous
example won't save us this time.

While the side-effects for scalars being replaced by scalars may be "okay"
under most circumstances, I think this kind of side-effect is pretty
unnatural and surprising for any non-scalar type.

Most of the time, arguments are not by-reference, so I think changing this
this will likely have a pretty minimal impact on real world code - and the
work around (as in the previous example) is pretty easy to implement, and
could likely be fully automated by e.g. PHP Storm, CodeSniffer's cbf tool,
etc.

With this change, what Anthony proposed in 2012 becomes feasible, I think?

(And perhaps it comes feasible to (later) think about completing the
type-casting feature with support for casting between class/interface
types, but that's another subject...)

8 years ago by Rowan Collins — view source

unread

Example:

class Foo { public function __toString() { return "foo"; } }
function append_to(string &$str) { $str .= "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

In this example, the caller's instance of Foo gets wiped out and
replaced
by a string

While this looks surprising in the form you've written it, it should only really be a surprise to the function author, not the caller. If the caller sees only the signature, then the function can do literally anything to their passed by reference variable. The caller is giving full control and "ownership" of that variable, and shouldn't make any assumptions about what it will look like when it comes back.

For example, you don't even need PHP7 to do this:

class Foo { public function __toString() { return "foo"; } }
function append_to(&$str) { $str = (string)$str . "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

I don't think it's any more unreasonable for a reference parameter to change type in the parameter handling of your example function (with strict_types off) than inside the body of my example function. Of course, by setting strict_types=1, the caller can change the implicit cast to an implicit assertion, and get an error in your example; it won't save them from my example, though.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Rasmus Schultz — view source

unread

If the caller sees only the signature, then the function can do
literally anything to their passed by reference variable. The caller is
giving full control and "ownership" of that variable, and shouldn't make
any assumptions about what it will look like when it comes back.

Understood, only this isn't the function doing something - it's the
language.

But I guess the best thing under any circumstances is to simply avoid using
references entirely - you won't have these problems then.

The main point was, I'd still very much like to see Anthony's RFC revived
:-)

So I guess this should be updated:

Perhaps to:

"when a function receives a type-hinted parameter by reference, if the
magic cast method is defined on the passed in object, the variable in the
calling scope is immediately overwritten with the result of the type-cast."

I don't happen to like it, but that's consistent with the existing
behavior, right?

On Sun, Apr 9, 2017 at 1:07 PM, Rowan Collins rowan.collins@gmail.com
wrote:

Example:

class Foo { public function __toString() { return "foo"; } }
function append_to(string &$str) { $str .= "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

In this example, the caller's instance of Foo gets wiped out and
replaced
by a string

While this looks surprising in the form you've written it, it should only
really be a surprise to the function author, not the caller. If the caller
sees only the signature, then the function can do literally anything to
their passed by reference variable. The caller is giving full control and
"ownership" of that variable, and shouldn't make any assumptions about what
it will look like when it comes back.

For example, you don't even need PHP7 to do this:

class Foo { public function __toString() { return "foo"; } }
function append_to(&$str) { $str = (string)$str . "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

I don't think it's any more unreasonable for a reference parameter to
change type in the parameter handling of your example function (with
strict_types off) than inside the body of my example function. Of course,
by setting strict_types=1, the caller can change the implicit cast to an
implicit assertion, and get an error in your example; it won't save them
from my example, though.

Regards,

--
Rowan Collins
[IMSoP]

8 years ago by Yasuo Ohgaki — view source

unread

Hi Rasmus,

Although DbC is not what you need, but DbC could solve your issue
more efficiently. i.e. Faster execution, not shorter code.

https://wiki.php.net/rfc/dbc2

With DbC, caller has responsibility to pass correct parameters.

$one = "1";
$one_int = (int) $one;
add_one($one_int);

add_one(&$value)
require (is_int($value))
{
$value += 1;
}

// Caller has responsibility to pass correct parameters.
$one = filter_validate($_GET['var'], FILTER_VALIDATE_INT);
add_one($one);

class Foo { public function __toString() { return "foo"; } }
function append_to(string &$str) { $str .= "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

class Foo { public function __toString() { return "foo"; } }

function append_to(&$str)
require (is_string($str))
{
$str .= "_bar";
}

$foo = new Foo();

// Caller has responsibility to pass correct parameters, but it's not
append_to($foo); // Error at DbC precondition check in append_foo()
var_dump($foo); // Cannot reach here in dev mode

I really like parameter type check.
Problem is type check makes execution slower.
Another problem is type check is not enough for many codes.

With DbC support, we can specify any expressions. Therefore, we can
check much more complex requirements for functions/methods at
development time.

e.g.
function save_age($user_age)
require (is_int($user_age))
require ($user_age >= 0)
require ($user_age < 150)
{
save_to_somewehre($user_age);
}
//Note: All input parameters must be validated to be correct value for the
app. e.g. use filter_validate()/etc

What you really need might be DbC.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Rasmus Schultz — view source

unread

My concern is actually neither performance nor brevity - my concern is, can
you read the code and actually understand what it does, can you write code
without running into surprising side-effects, and so on.

DbC might have merit in terms of performance, but perhaps not so much in a
scripting language - if performance was critical to a given project, I
would not be using a scripting language. The addition of DbC and marginally
better performance for certain specific use-cases wouldn't change that for
me.

Hi Rasmus,

Although DbC is not what you need, but DbC could solve your issue
more efficiently. i.e. Faster execution, not shorter code.

https://wiki.php.net/rfc/dbc2

With DbC, caller has responsibility to pass correct parameters.

$one = "1";
$one_int = (int) $one;
add_one($one_int);

add_one(&$value)
require (is_int($value))
{
$value += 1;
}

// Caller has responsibility to pass correct parameters.
$one = filter_validate($_GET['var'], FILTER_VALIDATE_INT);
add_one($one);

class Foo { public function __toString() { return "foo"; } }
function append_to(string &$str) { $str .= "_bar"; }
$foo = new Foo();
append_to($foo);
var_dump($foo); // string(7) "foo_bar"

class Foo { public function __toString() { return "foo"; } }

function append_to(&$str)
require (is_string($str))
{
$str .= "_bar";
}

$foo = new Foo();

// Caller has responsibility to pass correct parameters, but it's not
append_to($foo); // Error at DbC precondition check in append_foo()
var_dump($foo); // Cannot reach here in dev mode

I really like parameter type check.
Problem is type check makes execution slower.
Another problem is type check is not enough for many codes.

With DbC support, we can specify any expressions. Therefore, we can
check much more complex requirements for functions/methods at
development time.

e.g.
function save_age($user_age)
require (is_int($user_age))
require ($user_age >= 0)
require ($user_age < 150)
{
save_to_somewehre($user_age);
}
//Note: All input parameters must be validated to be correct value for the
app. e.g. use filter_validate()/etc

What you really need might be DbC.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

8 years ago by Yasuo Ohgaki — view source

unread

Hi Rasmus,

My concern is actually neither performance nor brevity - my concern is,
can you read the code and actually understand what it does, can you write
code without running into surprising side-effects, and so on.

Users must not write code that has side effect, just like user must not do
it with assert().

DbC has 2 main merits

ensure program correctness by pre/post conditions (and invariant) during
development.
better performance and security.

With DbC, it's easy to write and maintain all "necessary and sufficient
conditions" for
every functions/methods that makes sure program correctness.

Unit Test can't achieve what DbC can. i.e. It is not feasible to write all
"necessary and
sufficient conditions" unit tests for every single functions/methods.
invariant check
is even more difficult.

The most important DbC merit is "Ensured program correctness", then
security.
Performance would be the least important for PHP as you mentioned.

P.S. DbC is not a Unit Test replacement. Unless there is Unit Test,
pre/post/invariant
conditions cannot be checked easily/repeatedly.

--
Yasuo Ohgaki
yohgaki@ohgaki.net