[RFC] Safe Casting Functions

10 years ago by Haralan Dobrev — view source

unread

In general I like this RFC.

I don't see why to_string would not accept and cast integers and floats to
strings. And even if there is a valid reason it is not mentioned.

If this gets accepted you should consider the naming in the user
documentation. Beginners should not consider these functions "safe" as in
"security", but only "safe" as in "no data loss".

Good evening,

I am presenting a new RFC to add a set of three functions to do validated
casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

In general I like this RFC.

I don't see why to_string would not accept and cast integers and floats to strings. And even if there is a valid reason it is not mentioned.

It does accept and cast integers and floats to strings… I made an error in the Proposal section, my bad. If you look at the examples table and the tests in the patch, they are indeed accepted.

If this gets accepted you should consider the naming in the user documentation. Beginners should not consider these functions "safe" as in "security", but only "safe" as in "no data loss”.

Right.

--
Andrea Faulds
http://ajf.me/

10 years ago by Lars Strojny — view source

unread

Hi Andrea,

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

I like the proposal except for one thing: the functions returning false in case of an error. As the next logical function would be "to_bool()", I foresee a lot of trouble with regards to API design as returning false there either means successful cast to false or an error while casting. What about changing the functions to take an $isError variable as a second argument?

$value = to_string(1.2, $isError);
if ($isError) {
...
}

Alternatively one could do the same thing with an $isSuccess variable:

$value = to_string(1.2, $isSuccess);
if (!$isSuccess) {
...
}

Thoughts?

cu,
Lars

10 years ago by Andrea Faulds — view source

unread

I like the proposal except for one thing: the functions returning false in case of an error. As the next logical function would be "to_bool()", I foresee a lot of trouble with regards to API design as returning false there either means successful cast to false or an error while casting. What about changing the functions to take an $isError variable as a second argument?

$value = to_string(1.2, $isError);
if ($isError) {
...
}

Alternatively one could do the same thing with an $isSuccess variable:

$value = to_string(1.2, $isSuccess);
if (!$isSuccess) {
...
}

Thoughts?

This is an interesting question. It’s actually one I’d considered myself, I think I briefly mention it in the RFC. However, I don’t expect it to be a problem, and the reason is really quite simple: There’s no clear-cut one-size-fits all boolean casting function you could make, and booleans are trivial to validate yourself.

Adding something like is_bool() would be possible, the problem is what behaviour it would have, and there are too many possibilities. Do you accept 1 and 0? If so, do you accept all non-zero values, or not? Is “true” TRUE or is it FALSE? Is “yes” TRUE or is it FALSE? Do “true” and “yes” validate? Do they not? Does the empty string validate? Does it not? Bear in mind that boolean casting was one of the most problematic areas of the abandoned Scalar Type Hinting RFC: there’s no clear set of definitely boolean values, nor a clear idea of which mean which boolean value. Basically, I don’t think we’d ever be able to draw the line.

The other thing is, well, there’s not really a need. You can quite simply do something like this: if ($value !== “true” && $value === “false”) { /* error! */ } - and while this is less convenient than is_bool, as I mentioned before, there’s no real agreement on what is_bool should do so you’d have to do this anyway. A thought: A weird, but workable solution for one particular style is [“true” => TRUE, “false” => FALSE][$value] ?? NULL - whether that’s a wonderful or horrible use of PHP is up to you. ;)

So I don’t think using FALSE here is a problem, since I doubt is_bool would ever be added.

Regarding using a reference variable for the output, I don’t like that idea much. It doesn’t chain very well… and would it fail a strict type hint, if we added that? Even if it somehow did, I don’t think it is a good idea. Since I don’t think is_bool will happen, there’s also no real need, either.

Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

The main problem that happened with scalar typing RFC remains here:
third set of rules for casting types. Of course, since it's "just"
functions and not language constructs, we can have a set of functions
for any set of casting rules anybody wants. But I think it's still the
same problem here - having three sets of casting rules is not good.
Wait, we actually already have `FILTER_VALIDATE_INT` and
FILTER_VALIDATE_FLOAT, so that would be the fourth set of rules, and the
second set of validation rules, despite already having an extension
specially dedicated to filtering and validation. I think we should not
multiply entities needlessly - if we need some different validation
primitives, why not add them to filter, for example?

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

Hi!

The main problem that happened with scalar typing RFC remains here:
third set of rules for casting types. Of course, since it's "just"
functions and not language constructs, we can have a set of functions
for any set of casting rules anybody wants. But I think it's still the
same problem here - having three sets of casting rules is not good.

We have more than three sets of casting rules, but at least these ones are rather straightforward.

Wait, we actually already have FILTER_VALIDATE_INT and
FILTER_VALIDATE_FLOAT,

Actually, to_int is very close to FILTER_VALIDATE_INT, and I’m not sure, but I think to_float may be close to FILTER_VALIDATE_FLOAT. The main difference with integers is rejection of whitespace and toString-able objects.

so that would be the fourth set of rules, and the
second set of validation rules, despite already having an extension
specially dedicated to filtering and validation. I think we should not
multiply entities needlessly - if we need some different validation
primitives, why not add them to filter, for example

The main point of the RFC is to add casting functions as an alternative to the built-in explicit casts. Currently, the easiest way to convert to an integer is (int), but this is quite dangerous as it performs no validation whatsoever and cannot fail. to_int() and the like are intended to be just as convenient as an explicit cast, so that doing the safer thing (failing on garbage input) is not any more difficult. The hope is that the lazy developer will use is_int() instead of (int) when they need to explicitly cast, and avoid the problems of the latter.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

Wait, we actually already have FILTER_VALIDATE_INT and
FILTER_VALIDATE_FLOAT,

Actually, to_int is very close to FILTER_VALIDATE_INT, and I’m not
sure, but I think to_float may be close to FILTER_VALIDATE_FLOAT. The
main difference with integers is rejection of whitespace and
toString-able objects.

So essentially we have a number of rules, which all are only slightly
different. And when somebody wants to skip spaces, but not tabs, we'd
have yet another set of functions? I don't think it's good for the
language to have a set of functions doing exactly the same, but in
slightly different way, because some people had different preferences.
The language should not be just a bag of use-cases that somebody wanted
to implement. We're getting a lot of criticism for parts of the language
not always playing cohesively together, why make it worse?

The main point of the RFC is to add casting functions as an
alternative to the built-in explicit casts. Currently, the easiest
way to convert to an integer is (int), but this is quite dangerous as
it performs no validation whatsoever and cannot fail. to_int() and

So, why not have filter_var($suspected_int, FILTER_VALIDATE_INT) and
filter_var($suspected_int, FILTER_CONVERT_INT)? We already have
infrastructure for that, why ignore it completely and build another
solution that does exactly the same but treats whitespace differently
and has couple of other tweaks? OK, you want to treat the whitespace
differently, I get it - but why ignore whole filter infrastructure?

the like are intended to be just as convenient as an explicit cast,
so that doing the safer thing (failing on garbage input) is not any
more difficult. The hope is that the lazy developer will use is_int()
instead of (int) when they need to explicitly cast, and avoid the
problems of the latter.

The lazy developer won't check the return value anyway and would get 0
as the result of false-to-int conversion, thus making the whole exercise
pointless anyway :)

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

Actually, to_int is very close to FILTER_VALIDATE_INT, and I’m not
sure, but I think to_float may be close to FILTER_VALIDATE_FLOAT. The
main difference with integers is rejection of whitespace and
toString-able objects.

So essentially we have a number of rules, which all are only slightly
different. And when somebody wants to skip spaces, but not tabs, we'd
have yet another set of functions?

No, not quite. One of the nice things about rejecting whitespace is it lets you handle it however you want. Want to reject all whitespace? to_int($foo). Want to accept all whitespace? to_int(trim($foo)). Want to trim only tabs? to_int(trim($foo, “\t”)). This is actually allows more flexibility than filter_var does with its maze of flags.

The main point of the RFC is to add casting functions as an
alternative to the built-in explicit casts. Currently, the easiest
way to convert to an integer is (int), but this is quite dangerous as
it performs no validation whatsoever and cannot fail. to_int() and

So, why not have filter_var($suspected_int, FILTER_VALIDATE_INT) and
filter_var($suspected_int, FILTER_CONVERT_INT)? We already have
infrastructure for that, why ignore it completely and build another
solution that does exactly the same but treats whitespace differently
and has couple of other tweaks? OK, you want to treat the whitespace
differently, I get it - but why ignore whole filter infrastructure?

It would be possible to make to_int() merely an alias of filter_var(…, FILTER_VALIDATE_INT). Though that would make it depend on ext/filter or otherwise duplicate functionality.

Also, treating whitespace differently wasn’t actually my idea. I’m perfectly fine with ignoring it, but Nikita convinced me I shouldn’t tolerate it. He has a point - it opens more possibilities than allowing it.

the like are intended to be just as convenient as an explicit cast,
so that doing the safer thing (failing on garbage input) is not any
more difficult. The hope is that the lazy developer will use is_int()
instead of (int) when they need to explicitly cast, and avoid the
problems of the latter.

The lazy developer won't check the return value anyway and would get 0
as the result of false-to-int conversion, thus making the whole exercise
pointless anyway :)

Not quite. With strict type hints, FALSE would fail for an integer parameter. Even without them, this still makes validation more straightforward. I suppose there are varying degrees of laziness.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

No, not quite. One of the nice things about rejecting whitespace is
it lets you handle it however you want. Want to reject all
whitespace? to_int($foo). Want to accept all whitespace?
to_int(trim($foo)). Want to trim only tabs? to_int(trim($foo, “\t”)).
This is actually allows more flexibility than filter_var does with
its maze of flags.

This looks like "if you don't like my rules, you can easily apply string
handling functions to implement any rule you like". Which makes total
sense, except if I wanted to play with string handling functions, why
not just write a regexp and be done with it? Of course you can do any
transformation you like to the data, but if you propose it as a default
solution, then "you can add more transformations" is not an advantage -
you could always do that.

It would be possible to make to_int() merely an alias of
filter_var(…, FILTER_VALIDATE_INT). Though that would make it depend
on ext/filter or otherwise duplicate functionality.

You seem to sound like depending on something already existing in PHP is
a bad thing.

Also, treating whitespace differently wasn’t actually my idea. I’m
perfectly fine with ignoring it, but Nikita convinced me I shouldn’t
tolerate it. He has a point - it opens more possibilities than
allowing it.

The point is not whether ignore or not ignore the whitespace. The point
is that having multiple APIs doing the same but differing in one small
detail because somebody thinks this detail is not exactly to his liking
in existing API - IMO is not a good thing.

Not quite. With strict type hints, FALSE would fail for an integer

PHP is not a strictly typed language, so arguing as if it were or just
about to become one doesn't sound like a very good selling point. For
me, it exactly the reverse - you're emphasizing the fact that for it to
work, you need to change the whole nature of PHP not being a strictly
typed language.

parameter. Even without them, this still makes validation more
straightforward. I suppose there are varying degrees of laziness.

I'm not sure how using function starting with to_ is "more
straightforward" than using function starting with filter_ - which seems
to be the only thing different here. Except, of course, for whitespace
handling. Which in my opinion, still does not explain why we should
ignore existing API and create a completely new one.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Rowan Collins — view source

unread

It would be possible to make to_int() merely an alias of

filter_var(…, FILTER_VALIDATE_INT). Though that would make it depend
on ext/filter or otherwise duplicate functionality.
You seem to sound like depending on something already existing in PHP is
a bad thing.

I think the point was that ext/filter extension is optional (its enabled
by default, but not mandatory) whereas

I'm not sure how using function starting with to_ is "more
straightforward" than using function starting with filter_

As I mentioned in a previous thread, the design of ext/filter makes it
feel like a Swiss Army knife when what I'm looking for is a pair of
scissors - it will work, but I have to figure out how to select the
right component (both a function and a filter type), and whether the
other bits will get in my way (do I need to set any options? what is the
actual return type?).

On the other hand, the myriad of options it provides covers off most of
the issues in this discussion, so maybe that complexity is the price you
pay for pleasing all the people, and it's just a matter of documentation.

--
Rowan Collins
[IMSoP]

10 years ago by Josh Watzman — view source

unread

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

I think this is pretty cool, but I'm really worried about some of its typing implications. In particular:

The functions return FALSE on failure instead of NULL because:

Throwing an exception or even returning NULL seems so much better than returning "false" -- "false" is a boolean, not an error, and despite some historical cases of PHP using "false" as a poor person's error code, it really isn't. If you want error codes, use error codes, or use exceptions, but having two kinds of failure, null vs false, is really confusing.

Addressing your arguments one by one:

• If strict type hinting were added, they would fail for a nullable typehint

Throwing an exception also addresses this, in a much cleaner way. If you're worried about strict typing, then returning "false" is even worse, since the return type of, for example, to_int is now "int | bool" as opposed to "nullable-int" or "int" (if you throw).

• FALSE is a traditional error value in PHP

Since this is a new function, one that doesn't interoperate in any complicated way with the existing library or affect BC, this doesn't seem that important. IMO a language should have one failure/absense-of-value, in most cases "null", and having a weird second case seems, well, weird. If you have more interesting failure cases, just throw an exception, or return null if you want, don't continue propagating a weird second kind of null (that isn't actually null, it's a boolean false).

Also, that we couldn't introduce a meaningful to_bool function, even if we decided we wanted it, indicates to me that returning false is the wrong thing to do.

• NULL is used to signify the absence of a value - but what we want to signify is an invalid value (similar to 0 vs NaN)

I might argue that failure is indeed absense-of-value, but an exception seems like a much better way to convey this.

It's also interesting to look at how other languages handle failures of this kind. Most imperative languages I know of either throw or return null; in particular, Python, which has fairly similar type system to PHP in a lot of ways, throws. Functional languages do more interesting things; Haskell does something complicated but morally equivalent to returning null. But I'm not aware of any language which specifically returns false.

10 years ago by Andrea Faulds — view source

unread

Throwing an exception or even returning NULL seems so much better than returning "false" -- "false" is a boolean, not an error, and despite some historical cases of PHP using "false" as a poor person's error code, it really isn’t.

Why isn’t FALSE an error value? NULL signifies an absence of a value, not a bad value. We use FALSE in a lot of places to indicate an error. Heck, filter_var uses it for errors here.

If you want error codes, use error codes, or use exceptions, but having two kinds of failure, null vs false, is really confusing.

What two kinds of error? We’re only using FALSE here.

• If strict type hinting were added, they would fail for a nullable typehint

Throwing an exception also addresses this, in a much cleaner way. If you're worried about strict typing, then returning "false" is even worse, since the return type of, for example, to_int is now "int | bool" as opposed to "nullable-int" or "int" (if you throw).

Exceptions make chaining more difficult. There’s also no precedent for using them.

Returning NULL would mean it’s validate for a nullable type hint, which is bad, but also, NULL signifies a lack of a value. There isn’t a lack of a value, there’s no value. We should not use NULL here.

• FALSE is a traditional error value in PHP

Since this is a new function, one that doesn't interoperate in any complicated way with the existing library or affect BC, this doesn't seem that important. IMO a language should have one failure/absense-of-value, in most cases "null", and having a weird second case seems, well, weird.

Failure and absence of value are different things and should not use the same value. Otherwise, you are liable to confuse missing data and errors.

If you have more interesting failure cases, just throw an exception, or return null if you want, don't continue propagating a weird second kind of null (that isn't actually null, it's a boolean false).

It’s not a “weird second kind of null”, it’s a value FALSE.

Also, that we couldn't introduce a meaningful to_bool function, even if we decided we wanted it, indicates to me that returning false is the wrong thing to do.

It would be possible to use a special value in that specific case. Though, again, booleans have their own problems. I don’t see the point in to_bool().

It's also interesting to look at how other languages handle failures of this kind. Most imperative languages I know of either throw or return null; in particular, Python, which has fairly similar type system to PHP in a lot of ways, throws. Functional languages do more interesting things; Haskell does something complicated but morally equivalent to returning null. But I'm not aware of any language which specifically returns false.

JavaScript returns NaN here and C sets errno to an error value. PHP uses FALSE, in some respects, like JavaScript uses NaN.

--
Andrea Faulds
http://ajf.me/

10 years ago by Josh Watzman — view source

unread

Throwing an exception or even returning NULL seems so much better than returning "false" -- "false" is a boolean, not an error, and despite some historical cases of PHP using "false" as a poor person's error code, it really isn’t.

Why isn’t FALSE an error value? NULL signifies an absence of a value, not a bad value. We use FALSE in a lot of places to indicate an error. Heck, filter_var uses it for errors here.

FALSE is not an error value because it is already something else. It's already a value, it's boolean false. The fact that you couldn't build a reasonable to_bool even if you wanted to, as I as well as others pointed out on this thread, indicates the very real problem with the conflation of purposes here. Derick Rethans said it pretty well:

Then to_bool() whould return false... or true? So hence, it should be
NULL, and that would also be consistent with ext/filter.

You also said:

Also, that we couldn't introduce a meaningful to_bool function, even if we decided we wanted it, indicates to me that returning false is the wrong thing to do.

It would be possible to use a special value in that specific case. Though, again, booleans have their own problems. I don’t see the point in to_bool().

Whether or not we actually should add to_bool is a separate issue; we, in principle, should be able to cleanly add it, since it's a very natural extension to what you've already designed here. The fact that, if we used FALSE for error that to_bool would have to be a special case, inconsistent with the rest, indicates to me that using FALSE is the wrong thing -- again, it's the conflation of the boolean value FALSE, which already has this meaning, with some error code. It shouldn't be conflated, FALSE should not be an error return, it means something else.

If you want error codes, use error codes, or use exceptions, but having two kinds of failure, null vs false, is really confusing.

What two kinds of error? We’re only using FALSE here.

But FALSE is a different kind of error as you're trying to use it here, different than NULL which is specifically designed to be an error / lack of value. NULL doesn't mean anything else, it's a specially-designated value. And if you don't want to use NULL for failure because you don't think this is quite the sort of failure that NULL expresses, then throw an exception -- FALSE is just totally the wrong thing to return.

• If strict type hinting were added, they would fail for a nullable typehint
Throwing an exception also addresses this, in a much cleaner way. If you're worried about strict typing, then returning "false" is even worse, since the return type of, for example, to_int is now "int | bool" as opposed to "nullable-int" or "int" (if you throw).
Exceptions make chaining more difficult. There’s also no precedent for using them.

I think the argument about chaining was addressed by someone else. With precedent, just because there's no precedent doesn't mean we can't start now :) They really feel like the right thing to do here and elsewhere, and we have to start somewhere.

• `FALSE` is a traditional error value in PHP
Since this is a new function, one that doesn't interoperate in any complicated way with the existing library or affect BC, this doesn't seem that important. IMO a language should have one failure/absense-of-value, in most cases "null", and having a weird second case seems, well, weird.
Failure and absence of value are different things and should not use the same value. Otherwise, you are liable to confuse missing data and errors.

But if you want to make that distinction, then you shouldn't use a value that means something else to indicate failure. You should throw an exception. Returning FALSE is liable to replace confusion between missing data and errors with confusion between some other data (i.e., the boolean FALSE) and errors.

I might also argue that missing the data makes it clear there was an error, and then if someone has a nullable typehint they expressly said they accept a lack-of-int, but an exception seems a much cleaner way to deal with this anyways.

If you have more interesting failure cases, just throw an exception, or return null if you want, don't continue propagating a weird second kind of null (that isn't actually null, it's a boolean false).

It’s not a “weird second kind of null”, it’s a value FALSE.

Which is being used as a weird second kind of failure, as opposed to NULL which is the typical failure lack-of-value, as I argue above.

It's also interesting to look at how other languages handle failures of this kind. Most imperative languages I know of either throw or return null; in particular, Python, which has fairly similar type system to PHP in a lot of ways, throws. Functional languages do more interesting things; Haskell does something complicated but morally equivalent to returning null. But I'm not aware of any language which specifically returns false.

JavaScript returns NaN here and C sets errno to an error value. PHP uses FALSE, in some respects, like JavaScript uses NaN.

But it probably shouldn't -- NaN is a special number which is not a number (and which would probably pass a float type annotation, no?). FALSE isn't a special integer which isn't an integer -- it is already something else, it's a boolean. JS has a few special values which are specifically designated as different failures or lack-of-values -- undefined, null, NaN, etc. PHP's FALSE does not fall into that category -- it does mean something, it means something else, the boolean value false. (FWIW, I don't think that the way JS does this feels quite right either, I don't like that there are multiple sorts of non-exception failure values, but at least they return a designated failure value that doesn't have another totally different non-error meaning.)

Josh Watzman

10 years ago by Dmitry Stogov — view source

unread

Both, the idea and implementation look fine.

Personally, I hate the existing type conversion rules, and I would like to
use something like this by default (may be with exception for
null/false/true handling), but it may be a big compatibility break, so
introducing to_int() is better than nothing.

some notes:

it's probably make sense to implement these function as a new opcode(s)
in VM
in case of conversion failure it's better to throw exception (to_int()
returning FALSE is a pain).

Thanks. Dmitry.

Good evening,

I am presenting a new RFC to add a set of three functions to do validated
casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Alexander Lisachenko — view source

unread

Hello, internals!

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Personally I don't like this RFC because it's introducing one more way
to cast value in PHP. We already have boolval(), intval(), strlva()
functions that are not widely used in the source code, because of
dynamic nature of PHP. Developers just use value as is, assuming that
it will cast automatically where needed. This kind of casting is
typically used to prevent an attacks like this $id =
intval($_GET['id']); But this is ugly implementation from my point of
view. Binding and sanitization can do this much better.

There is also one more way to cast values with explicit casting: $id =
(int) $_GET['id']. I think that this way of doing casting is more
natural for developers to read, because many languages use the same
scheme to cast values into another types. Instead of implementing new
to_xxxx() functions, it can be nice to reuse logic of casting with
"(type) $value" to follow
https://wiki.php.net/rfc/scalar_type_hinting_with_cast#conversion_rules
which looks great.

Besides this, there is casting with settype($value, $type) and one
more with filter sanitization.

If this RFC will be accepted there will be one more way with own logic
of casting. And this is not so good from userland point of view.

It can be good only with OOP support for primitive types, for example
$value = '1234'; $number = $value->toInt(); $float =
$value->toFloat(), etc..

Thanks!

10 years ago by Andrea Faulds — view source

unread

some notes:

it's probably make sense to implement these function as a new opcode(s) in VM

That could be an optimisation later, yes. I note that you’ve added function replacement with opcodes for certain commonly-used functions. We could make these functions use that. Then they’d still be usable as callbacks.

in case of conversion failure it's better to throw exception (to_int() returning FALSE is a pain).

I am gradually warming to throwing an exception. An interesting idea I’ve had suggested on Twitter was by Matt Parker, who suggested that I add an optional 2nd argument. Without the argument, it throws an exception. With an argument, it returns that value (as a default) instead of throwing an exception.

--
Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

some notes:

it's probably make sense to implement these function as a new
opcode(s) in VM

That could be an optimisation later, yes. I note that you’ve added
function replacement with opcodes for certain commonly-used functions. We
could make these functions use that. Then they’d still be usable as
callbacks.

I may help with implementation, if RFC accepted.
(It would be better if ZEND_STRLEN, ZEND_TYPE_CHECK, etc were introduced in
first place instead of stlen(), is_long() and family).

in case of conversion failure it's better to throw exception (to_int()
returning FALSE is a pain).

I am gradually warming to throwing an exception. An interesting idea I’ve
had suggested on Twitter was by Matt Parker, who suggested that I add an
optional 2nd argument. Without the argument, it throws an exception. With
an argument, it returns that value (as a default) instead of throwing an
exception.

Together with https://wiki.php.net/rfc/engine_exceptions_for_php7 throwing
exceptions would be the best choice. I would make it permanent, (without
optional areguments)

Thanks. Dmitry.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

it's probably make sense to implement these function as a new
opcode(s) in VM

That could be an optimisation later, yes. I note that you’ve added
function replacement with opcodes for certain commonly-used functions. We
could make these functions use that. Then they’d still be usable as
callbacks.

If those are opcodes, those rules will require 2/3 majority for
acceptance, since those will be the engine rules for type conversion,
not just a set of functions. And, of course, the rules not matching the
other engine rules for type conversion, sorry for sounding like broken
record.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Stas Malyshev [mailto:smalyshev@sugarcrm.com]
Sent: Wednesday, October 22, 2014 11:18 AM
To: Dmitry Stogov; Andrea Faulds
Cc: PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

Hi!

it's probably make sense to implement these function as a new
opcode(s) in VM

That could be an optimisation later, yes. I note that you’ve added
function replacement with opcodes for certain commonly-used
functions. We could make these functions use that. Then they’d still
be usable as callbacks.

If those are opcodes, those rules will require 2/3 majority for
acceptance,
since those will be the engine rules for type conversion, not just a set
of
functions. And, of course, the rules not matching the other engine rules
for
type conversion, sorry for sounding like broken record.

Regardless of how we implement it, this requires a 2/3 majority - it'll be
perceived as an integral part of the core language in the same way that
gettype() and is_array() are considered parts of the core language.
Introducing a new set of typing rules into PHP cannot be a 50%+1 decision.

Zeev

10 years ago by Andrea Faulds — view source

unread

If those are opcodes, those rules will require 2/3 majority for
acceptance, since those will be the engine rules for type conversion,
not just a set of functions. And, of course, the rules not matching the
other engine rules for type conversion, sorry for sounding like broken
record.

No, it wouldn’t require a 2/3 majority. The optimisation me and Dmitry are referring to is merely an optimisation, it’s an implementation detail. This doesn’t touch any of the language spec or the language as understood by users.

Or would you argue that the fact is_long is now an opcode is a language change, and Dmitry should’ve made an RFC before making a change that is completely non-user-facing?!

Regardless of how we implement it, this requires a 2/3 majority - it'll be
perceived as an integral part of the core language in the same way that
gettype() and is_array() are considered parts of the core language.

Sure, but so is all of ext/standard really. If you got rid of ext/standard, PHP wouldn’t really be PHP. Yet there is no such barrier to entry for ext/standard.

Introducing a new set of typing rules into PHP cannot be a 50%+1 decision.

It’s not a “new set of typing rules”. It’s a set of convenient conversion functions.

Would adding ext/filter have required 2/3 majority? Because it has its own set of “typing rules”.

Thinking a bit more on this, if we don't want the 2/3 hurdle and perhaps
make this a bit (or actually a lot) less controversial, we should change the
names of these functions. to_float() strongly implies that this function
represents PHP's standard typing ruleset, which these functions do not.

I’m wary of making the names much longer. The less convenient they are, the less likely they’ll be used… so the less likely the primary goal of the RFC would be achieved.

Also, we already have intval, floatval and strval. I think people will notice the fact they were introduced in PHP 7 and the fact they’re not aliases of those, and perhaps realise they’re different. If they just followed the “standard conversion rules”, why would they exist given the existing functions for that?

Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

No, it wouldn’t require a 2/3 majority. The optimisation me and
Dmitry are referring to is merely an optimisation, it’s an
implementation detail. This doesn’t touch any of the language spec or
the language as understood by users.

Sorry, it's not "merely an optimization", it's making it an engine
primitive, like (int) or empty() are.

Or would you argue that the fact is_long is now an opcode is a
language change, and Dmitry should’ve made an RFC before making a
change that is completely non-user-facing?!

is_long existed long before that, and nothing changed for it. You
propose to add completely new type conversion rules into the engine, in
addition to ones already present and used there. It's not the same as
merely changing how the engine internally runs pre-existing code. The
new rules are definitely becoming major part of the language, not an
implementation detail of some random function like str_pad.

Saying "oh, we just add it like a random function and only then we'll
make it an opcode and it will be implementation detail" sounds a lot
like gaming the system to me.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

No, it wouldn’t require a 2/3 majority. The optimisation me and
Dmitry are referring to is merely an optimisation, it’s an
implementation detail. This doesn’t touch any of the language spec or
the language as understood by users.

Sorry, it's not "merely an optimization", it's making it an engine
primitive, like (int) or empty() are.

Yes, that’s still merely an implementation detail. If HHVM decides to make explode() into an opcode, it’s not a language change. It is not any different if PHP does the same.

Or would you argue that the fact is_long is now an opcode is a
language change, and Dmitry should’ve made an RFC before making a
change that is completely non-user-facing?!

is_long existed long before that, and nothing changed for it.

So why is it a language change?

You propose to add completely new type conversion rules into the engine, in
addition to ones already present and used there. It's not the same as
merely changing how the engine internally runs pre-existing code. The
new rules are definitely becoming major part of the language, not an
implementation detail of some random function like str_pad.

No, they’re just a set of new validation functions.

Saying "oh, we just add it like a random function and only then we'll
make it an opcode and it will be implementation detail" sounds a lot
like gaming the system to me.

…what?

Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

Yes, that’s still merely an implementation detail. If HHVM decides to
make explode() into an opcode, it’s not a language change. It is not
any different if PHP does the same.

If HHVM decides to introduce new type handling rules, however, it is.
Even if they are going to be called using ( and ).

You propose to add completely new type conversion rules into the
engine, in addition to ones already present and used there. It's
not the same as merely changing how the engine internally runs
pre-existing code. The new rules are definitely becoming major part
of the language, not an implementation detail of some random
function like str_pad.

No, they’re just a set of new validation functions.

No, they are not. They are new engine primitives for handling type
conversions.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

You propose to add completely new type conversion rules into the
engine, in addition to ones already present and used there. It's
not the same as merely changing how the engine internally runs
pre-existing code. The new rules are definitely becoming major part
of the language, not an implementation detail of some random
function like str_pad.

No, they’re just a set of new validation functions.

No, they are not. They are new engine primitives for handling type
conversions.

They’re not engine primitives. They’re a set of validation functions.

Now, later as an optimisation, these might end up becoming opcodes if they’re used enough. Or, heck, this patch could do it. It doesn’t make a blind bit of difference how something’s implemented internally. If I get rid of all opcodes and replace them with function calls internally, that’s not a language change. The language has not changed. The way the Zend engine implements the language, in a way that is not user-facing, has changed.

Are you opposed to the existence of ext/filter given it has FILTER_VALIDATE_INT, a “primitive for handling type conversions”?

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

Are you opposed to the existence of ext/filter given it has
FILTER_VALIDATE_INT, a “primitive for handling type conversions”?

FILTER_VALIDATE_INT is an option for a filter_var function, and it is
not introducing any new rules for handling types in the engine. What you
are proposing is not like FILTER_VALIDATE_INT, it's like (int).

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

Hi!

Are you opposed to the existence of ext/filter given it has
FILTER_VALIDATE_INT, a “primitive for handling type conversions”?

FILTER_VALIDATE_INT is an option for a filter_var function, and it is
not introducing any new rules for handling types in the engine.

Nor is this.

What you are proposing is not like FILTER_VALIDATE_INT, it's like (int).

I don’t see the difference:

$x = filter_var(FILTER_VALIDATE_INT, $foo);

$x = to_int($foo);

They’re very similar, except the latter has slightly different rules, is shorter, and if some people (possibly me) get their way, might throw an exception.

Andrea Faulds
http://ajf.me/

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Andrea Faulds [mailto:ajf@ajf.me]
Sent: Thursday, October 23, 2014 12:42 AM
To: Stas Malyshev
Cc: Zeev Suraski; Dmitry Stogov; PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

On 22 Oct 2014, at 22:31, Stas Malyshev smalyshev@sugarcrm.com
wrote:

Hi!

Are you opposed to the existence of ext/filter given it has
FILTER_VALIDATE_INT, a “primitive for handling type conversions”?

FILTER_VALIDATE_INT is an option for a filter_var function, and it is
not introducing any new rules for handling types in the engine.

Nor is this.

What you are proposing is not like FILTER_VALIDATE_INT, it's like (int).

I don’t see the difference:
$x = filter_var(FILTER_VALIDATE_INT, $foo);

$x = to_int($foo);
They’re very similar, except the latter has slightly different rules, is
shorter,
and if some people (possibly me) get their way, might throw an exception.

Andrea,

You're analyzing this from a purely technical perspective. There's a world
of difference between these two if we take into account the human element.

The former provides a very clear idea that you're in the context of
filtering and validation. The whole point of filtering is to, well, filter
out stuff.

The latter, on the other hand, looks like 'PHP, please convert this to an
int for me, the way you do these things, please.' I know what it does
without looking at the docs. But actually, I'm wrong - it does something
different, and has a behavior which is fundamentally inconsistent with the
rest of the language that could easily result in broken apps. to_int()
doing one thing, and (int) doing something completely different? That's a
major #fail.

If we use more explicit naming, this argument goes away completely.
lossless_int() does a fairly good job at self-describing what it does, and
completely prevents creating the false assumption that it's identical to
PHP's type conversion rules. The chances the average developer would think
that lossless_int() and (int) are identical is slim to non-existent.

Regarding the fact that it's longer - I think we've made the decision to be
explicit and not try to shorten things up a very long time ago
(unfortunately not long enough ago). Clarity is a heck of a lot more
important than a few extra chars. Not to mention that in order to properly
use this new mechanism - people will have to write a lot of defensive error
handling code around it - otherwise, what's the point. People who have a
good reason to use these lossless conversions would not be deterred from 6
extra characters - even at the most minimalistic error handling scenario
we're talking about dozens of extra characters compared to a simple (int)
cast.

Zeev

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Andrea Faulds [mailto:ajf@ajf.me]
Sent: Wednesday, October 22, 2014 9:20 PM
To: Stas Malyshev; Zeev Suraski
Cc: Dmitry Stogov; PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

On 22 Oct 2014, at 09:17, Stas Malyshev smalyshev@sugarcrm.com
wrote:

If those are opcodes, those rules will require 2/3 majority for
acceptance, since those will be the engine rules for type conversion,
not just a set of functions. And, of course, the rules not matching
the other engine rules for type conversion, sorry for sounding like
broken record.

No, it wouldn’t require a 2/3 majority.

Andrea,

It absolutely would, regardless of the optimization. Whether or not it's
implemented as functions or opcodes indeed matters very little. What
matters is what it will be perceived as by average developer. And a set of
functions named to_int() will absolutely be perceived as a part of the core
language, in the same way is_int() is.

The RFC itself makes an assertion that fundamentally contradicts the notion
that these are 'just functions'. The RFC reads 'They also prevent any
suggestion of strict type hinting for scalar types, because if that were to
be added, users would simply use dangerous explicit casts to get around
errors and the result would be code that is buggier than it would have been
without type hinting at all.' While it doesn't explicitly say so, it's
clear that one of the goals of the RFC is make it easier for a 'strict
typing' RFC to be accepted in the future. This is a clear indication this
constitutes a fundamental change to the core language.

Changing the function names so that will diffuse confusion on whether or not
they represent the official typing rules of PHP - can change this into a
50%+1 RFC. Otherwise, this is a very, very clear 2/3 RFC.

Zeev

10 years ago by Andrey Andreev — view source

unread

Hi,

The RFC itself makes an assertion that fundamentally contradicts the notion
that these are 'just functions'. The RFC reads 'They also prevent any
suggestion of strict type hinting for scalar types, because if that were to
be added, users would simply use dangerous explicit casts to get around
errors and the result would be code that is buggier than it would have been
without type hinting at all.' While it doesn't explicitly say so, it's
clear that one of the goals of the RFC is make it easier for a 'strict
typing' RFC to be accepted in the future. This is a clear indication this
constitutes a fundamental change to the core language.

I'd argue that it has the exact opposite goal - to be able to say "use
to_string(), to_int(), etc; we don't need strict type hinting".
Though, I'd also argue that this isn't a valid argument - it's a
different feature.

Cheers,
Andrey.

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Andrey Andreev [mailto:narf@devilix.net]
Sent: Thursday, October 23, 2014 1:15 AM
To: Zeev Suraski
Cc: Andrea Faulds; Stas Malyshev; Dmitry Stogov; PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

While it doesn't explicitly say so, it's clear that one of the goals
of the RFC is make it easier for a 'strict typing' RFC to be accepted
in the future. This is a clear indication this constitutes a
fundamental
change to the core language.

I'd argue that it has the exact opposite goal - to be able to say "use
to_string(), to_int(), etc; we don't need strict type hinting".

You may be right, I may have misinterpreted the rationale - but I think the
consequences are still the same. If this (in the eyes of the author) undoes
the need for a feature as fundamental as strict typing, how can it be just a
bunch of simple functions?

Zeev

10 years ago by Andrey Andreev — view source

unread

-----Original Message-----
From: Andrey Andreev [mailto:narf@devilix.net]
Sent: Thursday, October 23, 2014 1:15 AM
To: Zeev Suraski
Cc: Andrea Faulds; Stas Malyshev; Dmitry Stogov; PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

While it doesn't explicitly say so, it's clear that one of the goals
of the RFC is make it easier for a 'strict typing' RFC to be accepted
in the future. This is a clear indication this constitutes a
fundamental
change to the core language.

I'd argue that it has the exact opposite goal - to be able to say "use
to_string(), to_int(), etc; we don't need strict type hinting".

You may be right, I may have misinterpreted the rationale - but I think the
consequences are still the same. If this (in the eyes of the author) undoes
the need for a feature as fundamental as strict typing, how can it be just a
bunch of simple functions?

Well, that might be the author's rationale, but you (probably)
misinterpreted that exactly because it's not a technical limitation
for strict type hints in the future. Personally, I only care about the
technical side of it.

As far as intentions and politics are concerned - if we don't play
politicts for this one, we shouldn't play them for scalar hints as
well. I've said this about every similar proposal so far: It is
useful, so I don't mind having it, but I also want strict scalar
type hints with the same syntax as for objects. So that's that - it's
just not the same thing.

Cheers,
Andrey.

10 years ago by Lester Caine — view source

unread

I am gradually warming to throwing an exception. An interesting idea I’ve had suggested on Twitter was by Matt Parker, who suggested that I add an optional 2nd argument. Without the argument, it throws an exception. With an argument, it returns that value (as a default) instead of throwing an exception.

Now that sounds like a nice compromise in a few places where exceptions
have been introduced, but I still think this is still a little chicken
and egg. If it is going to 'fail' then the reason for the failure is
more important than simply producing no answer. Especially if it is
going to fail in different inputs to other methods.

Again forcing an exception only solution sidesteps that debate!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Derick Rethans — view source

unread

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

The functions return FALSE on failure instead of NULL because:

If strict type hinting were added, they would fail for a nullable
typehint

FALSE is a traditional error value in PHP

NULL is used to signify the absence of a value - but what we want to
signify is an invalid value (similar to 0 vs NaN)

But what about if we also would like a to_bool, which would accept
"true", "false", "0", "1", true, false, 1 and 0?

Then to_bool() whould return false... or true? So hence, it should be
NULL, and that would also be consistent with ext/filter.

cheers,
Derick

10 years ago by Lars Strojny — view source

unread

Hi Derick,

But what about if we also would like a to_bool, which would accept
"true", "false", "0", "1", true, false, 1 and 0?

Yep, I think that totally makes sense. "yes" and "no" would be further candidates but that’s probably already too much.

cu,
Lars

10 years ago by Bob Weinand — view source

unread

I know we have that already discussed a lot now, but I’d like to expose my points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null : $tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be used on external input and exceptions are not a way to handle malformed user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar type hints once), because people then might just catch the EngineException…

Also, null means "no value"; that’s exactly what we need. If the to_{type}() functions cannot return a meaningful value, just return "no value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds ajf@ajf.me:

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

"null" or "false" return value would make these functions not really
useful, because they won't guarantee to return desired type.

printf("%d\n", to_int("abcd")); // will print 0

The only reliable option to support wrong input is exceptions.
On the other hand, exceptions maybe difficult to use or inefficient.
We may avoid exceptions throwing, if provide a default value:

function to_int(mixed $a , int $default_value = null): int;
function to_double(mixed $a , double $default_value = null): double;
function to_string(mixed $a, string $default-value = null): string;

Thanks. Dmitry.

I know we have that already discussed a lot now, but I’d like to expose my
points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null : $tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the
function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be
used on external input and exceptions are not a way to handle malformed
user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar type
hints once), because people then might just catch the EngineException…

Also, null means "no value"; that’s exactly what we need. If the
to_{type}() functions cannot return a meaningful value, just return "no
value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds ajf@ajf.me:

Good evening,

I am presenting a new RFC to add a set of three functions to do
validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Weinand Bob — view source

unread

So, what exactly changes here if we have a second parameter or just return null by default?
It doesn’t make any difference, it’s just another way to write it:

to_int($a, $default)
or
to_int($a) ?? $default

Also, if you want exceptions, you always can wrap a userland function around it — but I’d rather not wrap an userland function around something throwing an exception… inefficient and weird.

Thanks,
Bob

Am 22.10.2014 um 12:27 schrieb Dmitry Stogov dmitry@zend.com:

"null" or "false" return value would make these functions not really
useful, because they won't guarantee to return desired type.

printf("%d\n", to_int("abcd")); // will print 0

The only reliable option to support wrong input is exceptions.
On the other hand, exceptions maybe difficult to use or inefficient.
We may avoid exceptions throwing, if provide a default value:

function to_int(mixed $a , int $default_value = null): int;
function to_double(mixed $a , double $default_value = null): double;
function to_string(mixed $a, string $default_value = null): string;

Thanks. Dmitry.

I know we have that already discussed a lot now, but I’d like to expose my
points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null : $tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the
function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be
used on external input and exceptions are not a way to handle malformed
user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar type
hints once), because people then might just catch the EngineException…

Also, null means "no value"; that’s exactly what we need. If the
to_{type}() functions cannot return a meaningful value, just return "no
value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds ajf@ajf.me:

Good evening,

I am presenting a new RFC to add a set of three functions to do
validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

for me it's weird that to_int() that must return "int" may return not "int".
NULL with ?? seems better than FALSE :)

but if we talk about safety, we should be able to relay on to_int() return
value without additional checks.

Thanks. Dmitry.

So, what exactly changes here if we have a second parameter or just return
null by default?
It doesn’t make any difference, it’s just another way to write it:

to_int($a, $default)
or
to_int($a) ?? $default

Also, if you want exceptions, you always can wrap a userland function
around it — but I’d rather not wrap an userland function around something
throwing an exception… inefficient and weird.

Thanks,
Bob

Am 22.10.2014 um 12:27 schrieb Dmitry Stogov dmitry@zend.com:

"null" or "false" return value would make these functions not really
useful, because they won't guarantee to return desired type.

printf("%d\n", to_int("abcd")); // will print 0

The only reliable option to support wrong input is exceptions.
On the other hand, exceptions maybe difficult to use or inefficient.
We may avoid exceptions throwing, if provide a default value:

function to_int(mixed $a , int $default_value = null): int;
function to_double(mixed $a , double $default_value = null): double;
function to_string(mixed $a, string $default_value = null): string;

Thanks. Dmitry.

On Wed, Oct 22, 2014 at 12:37 PM, Bob Weinand bobwei9@hotmail.com
wrote:

I know we have that already discussed a lot now, but I’d like to expose
my
points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null : $tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the
function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be
used on external input and exceptions are not a way to handle
malformed
user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar
type
hints once), because people then might just catch the EngineException…

Also, null means "no value"; that’s exactly what we need. If the
to_{type}() functions cannot return a meaningful value, just return "no
value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return
here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds ajf@ajf.me:

Good evening,

I am presenting a new RFC to add a set of three functions to do
validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Weinand Bob — view source

unread

If we really want an integer at all price we just can use a simple (int) cast. That’s AFAIK not the point of this RFC.

And at that point where we can add a default as second parameter, we also just can use NULL with ??. The latter is at the same time more powerful and less restrictive.

Also, with a second parameter, we don’t have any possibility to check if the conversion was successful or if the fallback was used.

Bob

Am 22.10.2014 um 14:49 schrieb Dmitry Stogov dmitry@zend.com:

for me it's weird that to_int() that must return "int" may return not "int".
NULL with ?? seems better than FALSE :)

but if we talk about safety, we should be able to relay on to_int() return value without additional checks.

Thanks. Dmitry.

So, what exactly changes here if we have a second parameter or just return null by default?
It doesn’t make any difference, it’s just another way to write it:

to_int($a, $default)
or
to_int($a) ?? $default

Also, if you want exceptions, you always can wrap a userland function around it — but I’d rather not wrap an userland function around something throwing an exception… inefficient and weird.

Thanks,
Bob

Am 22.10.2014 um 12:27 schrieb Dmitry Stogov <dmitry@zend.com mailto:dmitry@zend.com>:

"null" or "false" return value would make these functions not really
useful, because they won't guarantee to return desired type.

printf("%d\n", to_int("abcd")); // will print 0

The only reliable option to support wrong input is exceptions.
On the other hand, exceptions maybe difficult to use or inefficient.
We may avoid exceptions throwing, if provide a default value:

function to_int(mixed $a , int $default_value = null): int;
function to_double(mixed $a , double $default_value = null): double;
function to_string(mixed $a, string $default_value = null): string;

Thanks. Dmitry.

I know we have that already discussed a lot now, but I’d like to expose my
points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null : $tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the
function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be
used on external input and exceptions are not a way to handle malformed
user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar type
hints once), because people then might just catch the EngineException…

Also, null means "no value"; that’s exactly what we need. If the
to_{type}() functions cannot return a meaningful value, just return "no
value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds <ajf@ajf.me mailto:ajf@ajf.me>:

Good evening,

I am presenting a new RFC to add a set of three functions to do
validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/ <http://ajf.me/

10 years ago by Robert Stoll — view source

unread

-----Ursprüngliche Nachricht-----
Von: Weinand Bob [mailto:bobwei9@hotmail.com]
Gesendet: Mittwoch, 22. Oktober 2014 16:16
An: Dmitry Stogov
Cc: Andrea Faulds; PHP Internals
Betreff: Re: [PHP-DEV] [RFC] Safe Casting Functions

If we really want an integer at all price we just can use a simple (int) cast. That’s AFAIK not the point of this RFC.

And at that point where we can add a default as second parameter, we also just can use NULL with ??. The latter is at the
same time more powerful and less restrictive.

Also, with a second parameter, we don’t have any possibility to check if the conversion was successful or if the fallback was
used.

Bob

I believe the point of this RFC is to have save casts in the sense of a type-safe casts. Under this circumstances I would give this RFC a +1
In my opinion, and as mentioned by Dmitry, the only way to achieve this is using exceptions (or triggering an E_RECOVERABLE_ERROR) when something fails.
Consider the following example:

$total = to_int($_GET['a']) + to_int($_GET['b']);

Regardless whether false or NULL is returned, total will be 0 and the error will be ignored. I think that should not be the purpose of this RFC. Otherwise it is merely another way of casting.
Sure, introducing safe casts does not free the user from input validation (and I merely used $_GET as an example) but that is another topic. I am sure someone will argue now that if the user validates the input anyway then such safe casts are not necessary. I would argue that they are still useful and necessary when not even mandatory. Personally, I would only use to_int. Even in cases where I believe to know that the input is valid by design (say it comes from a config file) because you never know if someone (a hacker) was not able to manipulate your config value and thus can exploit your code in some way or the other. I cannot come up with a concrete scenario but I guess you get my point. If in such a case an exception is thrown then the request terminates unexpected but for the better.

Cheers,
Robert

10 years ago by Zeev Suraski — view source

unread

Trying to think what real world example may look like. With exceptions:

try {
$i = lossless_int($sth);
} catch (SomeException $e) {
// error out / provide default / custom error handling
}

If we use FALSE:

$i = lossless_int($sth);
If ($i===false) {
// error out / provide default / custom error handling
}

At first glance they look mostly equivalent, but it really depends on the
use case people will use it for. For instance, if you use it for input
validation for several input fields, the try/catch approach allows you to
put several conversions in one block, whereas the false approach doesn't,
and requires separate checks for every input field. There's still the
question on whether or not the exception would provide enough context for a
meaningful error message or not.

I think a key missing piece in the RFC is an explanation for the assertion
that the current type conversion system "[] makes it difficult to write
robust applications which handle user data.", and more importantly, how
these new functions will fix it. It focuses on how what these functions
would behave, but not so much about how real world usage of these functions
would behave.

My assumption is that with handling user input being the main goal, in case
of failed conversion the likely handling would be erroring out in case of
'garbage' input, as it's impossible to safely determine what were the remote
user's intentions (if we were to 'guess' what he meant with some default,
I'd argue that the current conversion rules are as good a guess as any). If
that's the case, wouldn't it make more sense to change the semantics of this
new family of functions from functions that do the conversion, to ones that
just check whether it can be done?

Something like:

If (!int_convertible($sth)) { // open to new ideas about the name :)
// error out
}
$i = (int) $sth;

Or even safe_to_conver($sth, 'int')) (i.e. one function that can handle all
types)?

Zeev

-----Original Message-----
From: Dmitry Stogov [mailto:dmitry@zend.com]
Sent: Wednesday, October 22, 2014 1:28 PM
To: Bob Weinand
Cc: Andrea Faulds; PHP Internals
Subject: Re: [PHP-DEV] [RFC] Safe Casting Functions

"null" or "false" return value would make these functions not really
useful,
because they won't guarantee to return desired type.

printf("%d\n", to_int("abcd")); // will print 0

The only reliable option to support wrong input is exceptions.
On the other hand, exceptions maybe difficult to use or inefficient.
We may avoid exceptions throwing, if provide a default value:

function to_int(mixed $a , int $default_value = null): int; function
to_double(mixed $a , double $default_value = null): double; function
to_string(mixed $a, string $default-value = null): string;

Thanks. Dmitry.

On Wed, Oct 22, 2014 at 12:37 PM, Bob Weinand bobwei9@hotmail.com
wrote:

I know we have that already discussed a lot now, but I’d like to
expose my points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}

When we return false:
acceptInt(($tmp = to_int($_GET["userinput"])) === false ? null :
$tmp);

When we throw an exception:
try {
acceptInt(to_int($_GET["userinput"]));
} catch (CastingException $e) {
acceptInt(null);
}

When we just return null:
acceptInt(to_int($_GET["userinput"]));

Also, when we want to pass a default value defined outside of the
function, it’s a lot easier now with the coalesce operator:
acceptInt(to_int($_GET["userinput“]) ?? 2 /* default value */);

Also, independently of possible scalar typehints:

Generally exceptions are also a bad idea as the casts probably will be
used on external input and exceptions are not a way to handle
malformed user input. Really not.
Furthermore, false is a bad idea in the same sense (if we get scalar
type hints once), because people then might just catch the
EngineException…

Also, null means "no value"; that’s exactly what we need. If the
to_{type}() functions cannot return a meaningful value, just return
"no value", that means null. And not false, which is a real value.

That’s why I strongly feel that null is the only true thing to return
here.

Thanks,
Bob

Am 21.10.2014 um 00:57 schrieb Andrea Faulds ajf@ajf.me:

Good evening,

I am presenting a new RFC to add a set of three functions to do
validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

--
To unsubscribe,
visit: http://www.php.net/unsub.php

10 years ago by Lester Caine — view source

unread

Something like:

If (!int_convertible($sth)) { // open to new ideas about the name :)
// error out
}
$i = (int) $sth;

And this allows each failure with it's own response, while pushing that
problem to an exception requires one builds a tree in the exception with
responses. There is not a single best case solution but putting the
problem this way around looks tidy and keeps processing of each check
grouped ... but is probably not how an exception based user would think?
Adding additional processing to handle a failed $sth test does not break
the work flow!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Marc Bennewitz — view source

unread

I know we have that already discussed a lot now, but I’d like to expose my points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}
NULL isn't a pointer for a default value - it's simply a type with no
value - no more - no less.
From your example: why do you accept NULL if you need a integer default
value?

function acceptsInt (int $i = 2) { ...

Marc

10 years ago by Rowan Collins — view source

unread

Marc Bennewitz wrote on 22/10/2014 20:12:

I know we have that already discussed a lot now, but I’d like to expose my points on the return value here:

I imagine code like (supposing that we ever will have scalar typehints):

function acceptsInt (int $i = null) {
if ($i === null) {
$i = 2 /* default value /;
}
/ do something with $i */
}
NULL isn't a pointer for a default value - it's simply a type with no
value - no more - no less.
From your example: why do you accept NULL if you need a integer default
value?

function acceptsInt (int $i = 2) { ...

Those are two distinct types of default:

the default value to use when a programmer calls the function with
fewer parameters (e.g. acceptsInt())
the default value to use when a value is provided by the programmer,
but turns out to be null at runtime (e.g. acceptsInt($_GET['foo']))

10 years ago by Marc Bennewitz — view source

unread

I really like the strictness of this casting rules except of "010" will
be a valid integer / float.
As of you don't allow "0x" and trailing white spaces as valid numbers
and don't allow floating like syntax as integers even if it result in
mathematical integer.

Allowing prefixed "0" as valid numbers results in a small data loss
("010" !== to_string(to_int("010"))),
is simple to address in user land with ltrim("010", "0") (Same argument
as trailing whitespace)
and collides with octal notation (010 !== to_int("010"))

Marc

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Marc Bennewitz — view source

unread

You addresses data loss on convert float to int.
Do you also address data loss on int to float?

|to_float(||9223372036854774784) -> pass |as it results in
9223372036854774784
|to_float(|||9223372036854774785|) -> |fail as it results in
9223372036854774784
|
Marc
|

I really like the strictness of this casting rules except of "010" will
be a valid integer / float.
As of you don't allow "0x" and trailing white spaces as valid numbers
and don't allow floating like syntax as integers even if it result in
mathematical integer.

Allowing prefixed "0" as valid numbers results in a small data loss
("010" !== to_string(to_int("010"))),
is simple to address in user land with ltrim("010", "0") (Same argument
as trailing whitespace)
and collides with octal notation (010 !== to_int("010"))

Marc

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

You addresses data loss on convert float to int.
Do you also address data loss on int to float?

|to_float(||9223372036854774784) -> pass |as it results in
9223372036854774784
|to_float(|||9223372036854774785|) -> |fail as it results in
9223372036854774784

Floats aren’t expected to be precise, so I don’t see why this shouldn’t pass. It’s a loss of data, sure, but merely of precision, which is expected here. The reason I have to_int fail is because float overflow to int completely mangles your input.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Good evening,

I am presenting a new RFC to add a set of three functions to do validated casts for scalar types:

https://wiki.php.net/rfc/safe_cast

Please read it.

Thanks!

After some discussions, the RFC has been revised and simplified in a few places. I would suggest you all re-read it.

I may take this to a vote soon.

Thanks!

--
Andrea Faulds
http://ajf.me/

10 years ago by Yasuo Ohgaki — view source

unread

Hi Andrea,

I am presenting a new RFC to add a set of three functions to do validated
casts for scalar types:

https://wiki.php.net/rfc/safe_cast

I like this RFC overall. Precise parameter checks is good for security
always.
I would like to have DbC to harden app security as well.
I'm looking for something like D language.

http://dlang.org/contracts.html

With DbC, checking parameter types/range/etc happen only when development.
Therefore, app runs faster for production. All of runtime checks cannot be
removed
from app code by DbC, so this RFC is nice to have even with DbC.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Andrea Faulds — view source

unread

I would like to have DbC to harden app security as well.
I'm looking for something like D language.

http://dlang.org/contracts.html

With DbC, checking parameter types/range/etc happen only when development.
Therefore, app runs faster for production. All of runtime checks cannot be removed
from app code by DbC, so this RFC is nice to have even with DbC.

Something like contracts is actually an idea I’ve had before. Even some of the most expressive type systems (like Haskell’s) seem to neglect that only a certain range of values of a primitive type may be desired. Having something like that in PHP might be useful.

But I wouldn’t say it’s really on-topic.

--
Andrea Faulds
http://ajf.me/

10 years ago by Anatol Belski — view source

unread

Hi Andrea,

I would like to have DbC to harden app security as well.
I'm looking for something like D language.

http://dlang.org/contracts.html

With DbC, checking parameter types/range/etc happen only when
development. Therefore, app runs faster for production. All of runtime
checks cannot be removed from app code by DbC, so this RFC is nice to
have even with DbC.

Something like contracts is actually an idea I’ve had before. Even some
of the most expressive type systems (like Haskell’s) seem to neglect that
only a certain range of values of a primitive type may be desired. Having
something like that in PHP might be useful.

But I wouldn’t say it’s really on-topic.

--
Andrea Faulds
http://ajf.me/

while briefly looking through the conversion examples, i see some weird
results

string(5) “31e+7” - shouldn't this be valid for int?
string(4) “0x10” - hex, but that's int, no?
string(3) “010” - octal, but that's int, no?
string(4) “10.0” - this would be casting to 10, so int valid
object(Stringable)#2 (0) {} - and similar actually, what if _toString()
returns some int/float literal? that should pass as well, no?

Generally I'd say no to this RFC. The current casting is not perfect, but
as for me - the one suggested is highly questionable as well. IMO as long
as there are no proper strict types in PHP, any other rule set for casting
would be just another coordinate system for the same, which isn't worth
while at least.

Regards

Anatol

10 years ago by Andrea Faulds — view source

unread

while briefly looking through the conversion examples, i see some weird
results

string(5) “31e+7” - shouldn't this be valid for int?

The trend seems to be to consider things with exponents or decimal points as floats. Even though there’s a case for supporting it for ints, (int) and intval() don’t work with exponents, so to_int() shouldn’t either.

string(4) “0x10” - hex, but that's int, no?

Supporting hex is a rather obscure use-case. Also, (int) and intval() don’t support it.

string(3) “010” - octal, but that's int, no?

While allowing leading zeroes would be nice, octal causes problems. In particular, 0-prefixed strings aren’t handled consistently. Some things deal with them as decimal, others deal with them as octal. Because the user’s intent isn’t clear, we can’t support them, and I assume this is why FILTER_VALIDATE_INT doesn’t support them.

string(4) “10.0” - this would be casting to 10, so int valid

Allowing .0 for an int doesn’t feel right. What do we do for “10.01”? Reject it? That seems rather arbitrary when we would be allowing “10.00”. So it’s not accepted.

object(Stringable)#2 (0) {} - and similar actually, what if _toString()
returns some int/float literal? that should pass as well, no?

__toString() always errors if it doesn’t return a string, I see no reason to change that.

Generally I'd say no to this RFC. The current casting is not perfect, but
as for me - the one suggested is highly questionable as well. IMO as long
as there are no proper strict types in PHP, any other rule set for casting
would be just another coordinate system for the same, which isn't worth
while at least.

Something like this RFC is a necessary prerequisite for strict types. Without it, there’s not a convenient way to do a safe conversion. If we just add strict types, people will blindly use (int) or intval() and magically, garbage input will be transformed (through the magic of ignoring everything in the string that doesn’t look like an int) into apparently sane input and apps will do dangerous things when presented with bad user input.

--
Andrea Faulds
http://ajf.me/

10 years ago by Anatol Belski — view source

unread

while briefly looking through the conversion examples, i see some weird
results

string(5) “31e+7” - shouldn't this be valid for int?

The trend seems to be to consider things with exponents or decimal points
as floats. Even though there’s a case for supporting it for ints, (int)
and intval() don’t work with exponents, so to_int() shouldn’t either.

string(4) “0x10” - hex, but that's int, no?

Supporting hex is a rather obscure use-case. Also, (int) and intval()
don’t support it.

string(3) “010” - octal, but that's int, no?

While allowing leading zeroes would be nice, octal causes problems. In
particular, 0-prefixed strings aren’t handled consistently. Some things
deal with them as decimal, others deal with them as octal. Because the
user’s intent isn’t clear, we can’t support them, and I assume this is why
FILTER_VALIDATE_INT doesn’t support them.

string(4) “10.0” - this would be casting to 10, so int valid

Allowing .0 for an int doesn’t feel right. What do we do for “10.01”?
Reject
it? That seems rather arbitrary when we would be allowing “10.00”. So it’s
not accepted.

object(Stringable)#2 (0) {} - and similar actually, what if _toString()
returns some int/float literal? that should pass as well, no?

__toString() always errors if it doesn’t return a string, I see no reason
to change that.
But in the other cases it converts strings to numbers. I mean like
class A {function __toString(){return '10';}} $a = (string) (new A);
//numeric literal

Generally I'd say no to this RFC. The current casting is not perfect,
but as for me - the one suggested is highly questionable as well. IMO as
long as there are no proper strict types in PHP, any other rule set for
casting would be just another coordinate system for the same, which
isn't worth while at least.

Something like this RFC is a necessary prerequisite for strict types.
Without it, there’s not a convenient way to do a safe conversion. If we
just add strict types, people will blindly use (int) or intval() and
magically, garbage input will be transformed (through the magic of
ignoring everything in the string that doesn’t look like an int) into
apparently sane input and apps will do dangerous things when presented
with bad user input.

--
Andrea Faulds
http://ajf.me/

IMHO it's a new rule set around the old thing. There's no way to foresee
all the scenarios. Say I expect an an input to be less than 3. It's up to
a programmer whether to check that the input is (int)'3' > 3 and give up,
or to try sscanf('2e+22', '%f')[0] > 3. Even not talking about regex.
There are already mechanisms allowing to implement that, customizable to a
high level and usually one can come up with them. Maybe that rule set
would sometimes let spare a line, still it depends on concrete use case.

Regards

Anatol

10 years ago by Andrea Faulds — view source

unread

object(Stringable)#2 (0) {} - and similar actually, what if _toString()
returns some int/float literal? that should pass as well, no?

__toString() always errors if it doesn’t return a string, I see no reason
to change that.
But in the other cases it converts strings to numbers. I mean like
class A {function __toString(){return '10';}} $a = (string) (new A);
//numeric literal

?

I see no numbers?

IMHO it's a new rule set around the old thing. There's no way to foresee
all the scenarios. Say I expect an an input to be less than 3. It's up to
a programmer whether to check that the input is (int)'3' > 3 and give up,
or to try sscanf('2e+22', '%f')[0] > 3. Even not talking about regex.
There are already mechanisms allowing to implement that, customizable to a
high level and usually one can come up with them. Maybe that rule set
would sometimes let spare a line, still it depends on concrete use case.

It’s not meant to cover all scenarios. It’s meant to failsafe on edge cases, rather than producing apparently valid values which the user didn’t actually intend (so “N/A” won’t become 0).

--
Andrea Faulds
http://ajf.me/

10 years ago by Anatol Belski — view source

unread

__toString() always errors if it doesn’t return a string, I see no
reason to change that.
But in the other cases it converts strings to numbers. I mean like
class A {function __toString(){return '10';}} $a = (string) (new A);
//numeric literal

?

I see no numbers?

Yeah, try_string(new A) == (string)(new A), but try_int(new A) !=
(int)(string)(new A) in the RFC. Whereby '10' were pretty valid for int,
no?

Regards

Anatol

10 years ago by Andrea Faulds — view source

unread

Yeah, try_string(new A) == (string)(new A), but try_int(new A) !=
(int)(string)(new A) in the RFC. Whereby '10' were pretty valid for int,
no?

Oh, you’re saying how to_int/to_float don’t accept __toString objects. I suppose they could, but (int) and (float) don’t accept that, nor do we accept that anywhere else.

Andrea Faulds
http://ajf.me/

10 years ago by Stanislav Malyshev — view source

unread

Hi!

I like this RFC overall. Precise parameter checks is good for security
always.

I don't see how it matters for security at all. If you need an int,
(int) works as well as any proposed check, security-wise. You may want
different diagnostics, etc. but this doesn't have to do much with
security. In other words, if the security depends on any differences
between (int) and to_int, it's probably not done right.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

On Thu, Nov 20, 2014 at 10:28 AM, Stanislav Malyshev smalyshev@gmail.com
wrote:

I like this RFC overall. Precise parameter checks is good for security
always.

I don't see how it matters for security at all. If you need an int,
(int) works as well as any proposed check, security-wise. You may want
different diagnostics, etc. but this doesn't have to do much with
security. In other words, if the security depends on any differences
between (int) and to_int, it's probably not done right.

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Programmer should control over all inputs as the most important security
measure.
There are two strategies in general.

Convert inputs to secure values and ignore possible attacks.
(Sanitization)
Validate inputs to reject malformed values and record possible attacks.
(Validation and logging)

(int) is sanitization. It works, but it cannot log/detect possible attack
(or bug).

to_int can be used as validation. It has advantage to record possible
attack (or bug). Logging is
one of important security feature. Therefore, validation could be said more
secure than sanitization.

Which strategy to adopt is that depends on organization/application policy.
Public web sites may ignore
invalid inputs due to large amount of attacks while private web sites may
require to record all
possible attacks (or bugs), for example.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Programmer should control over all inputs as the most important security
measure.
There are two strategies in general.

Convert inputs to secure values and ignore possible attacks.
(Sanitization)

Validate inputs to reject malformed values and record possible attacks.
(Validation and logging)

(int) is sanitization. It works, but it cannot log/detect possible attack
(or bug).

to_int can be used as validation. It has advantage to record possible
attack (or bug). Logging is
one of important security feature. Therefore, validation could be said
more secure than sanitization.

Which strategy to adopt is that depends on organization/application
policy. Public web sites may ignore
invalid inputs due to large amount of attacks while private web sites may
require to record all
possible attacks (or bugs), for example.

We know people do things like

$id = $_GET['id'];
pg_qeury("SELECT * FROM some_table WHERE id = $id;");

(int) works mostly. to_int is better as it may detect possible attack or
bug.

I implement user error/exception handlers always to detect possible
attack/bug.
to_int may be used as validation to detect internal logic inconsistency as
well as
user input validation.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Lester Caine — view source

unread

$id = $_GET['id'];
pg_qeury("SELECT * FROM some_table WHERE id = $id;");

Anybody using that method of passing parameters to a database needs much
better education. This particular proposal just adds yet another 'how
not to' rather than actually fixing the underlying security problems.

Tidy up what exists - don't create yet another set of functions that can
still be abused.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Stanislav Malyshev — view source

unread

Hi!

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Could you please be more specific about how this relevant to this
specific case? "But an ISO standard and read it whole" is not exactly a
good argument when discussing the specific issue.

Programmer should control over all inputs as the most important security
measure.
There are two strategies in general.

Convert inputs to secure values and ignore possible attacks.
(Sanitization)

Validate inputs to reject malformed values and record possible
attacks. (Validation and logging)

Thank you, I am aware of what sanitizing and validating input is.

to_int can be used as validation. It has advantage to record possible
attack (or bug). Logging is
one of important security feature. Therefore, validation could be said
more secure than sanitization.

I don't see how it can be said. Logging by itself is not a security
feature, and if you need logging, it could be established independently,
and should be anyway since to_* log nothing. So claiming to_* is a
security feature by itself is like saying fopen() is a security feature
by itself because you could use it to open a log file to which you'd
write security-relevant data.

Which strategy to adopt is that depends on organization/application
policy.

Right. So how can one claim one is more secure than the other? Where is
the lack of security?

(int) works mostly

What you mean by "mostly"? Could you describe the cases where it does
not work and to_* does?

Stas Malyshev
smalyshev@gmail.com

10 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

On Thu, Nov 20, 2014 at 4:38 PM, Stanislav Malyshev smalyshev@gmail.com
wrote:

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Could you please be more specific about how this relevant to this
specific case? "But an ISO standard and read it whole" is not exactly a
good argument when discussing the specific issue.

I brought up ISO 27000 as the definition of IT security, since there are
many
definition for security. ISO 27000 does not define what "security measure"
is,
but it defines "risk treatment". Most people use ISO 27000's "risk
treatment"
as "security measure" more or less, I believe. ISMS is common now, too.

ISO/IEC 27000:2014(E)
2.79
risk treatment
process (2.61) to modify risk (2.68)

Note 1 to entry: Risk treatment can involve:
— avoiding the risk by deciding not to start or continue with the activity
that gives rise to the risk;
— taking or increasing risk in order to pursue an opportunity;
— removing the risk source;
— changing the likelihood;
— changing the consequences;
— sharing the risk with another party or parties (including contracts and
risk financing); and
— retaining the risk by informed choice.
Note 2 to entry: Risk treatments that deal with negative consequences
are sometimes referred to as “risk
mitigation", "risk elimination", "risk prevention" and "risk reduction".
Note 3 to entry: Risk treatment can create new risks or modify existing
risks

Programmer should control over all inputs as the most important security

measure.
There are two strategies in general.

Convert inputs to secure values and ignore possible attacks.
(Sanitization)

Validate inputs to reject malformed values and record possible
attacks. (Validation and logging)

Thank you, I am aware of what sanitizing and validating input is.

to_int can be used as validation. It has advantage to record possible
attack (or bug). Logging is
one of important security feature. Therefore, validation could be said
more secure than sanitization.

I don't see how it can be said. Logging by itself is not a security
feature, and if you need logging, it could be established independently,
and should be anyway since to_* log nothing. So claiming to_* is a
security feature by itself is like saying fopen() is a security feature
by itself because you could use it to open a log file to which you'd
write security-relevant data.

Logging for accounting is important security feature and defined as part of
risk treatment in the ISO standard. So I consider logging related to risk
treatment is security measure.

As CWE/SANS TOP 25's Monster Mitigation says, developers must
control inputs. If there is something wrong in input, it is better to be
recorded for later auditing as it is a part of ISMS requirement.

(int) cannot catch error, but to_int may catch error. Therefore, I think
to_* is good for better security. i.e. Good for more strict input control.

Please note that (int) cast may increase risk, but it is part of risk
treatment (~= security measure) as mentioned in "Note 3" of the risk
treatment definition.

Which strategy to adopt is that depends on organization/application
policy.

Right. So how can one claim one is more secure than the other? Where is
the lack of security?

As I described above, accounting which requires logging is one of security
measure for me.

(int) works mostly

What you mean by "mostly"? Could you describe the cases where it does
not work and to_* does?

Cast doesn't allow to log possible attack/bug. Integer cast involves
truncation.
Casting to 32 bit int for 64 bit database ID causes problem, for example.

Anyway, security measure is not only direct risk elimination, but includes
risk mitigation like logging that helps to evaluate incident impacts. i.e.
Auditability of attack/incident is risk, too.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

10 years ago by Stanislav Malyshev — view source

unread

Hi!

I brought up ISO 27000 as the definition of IT security, since there are
many
definition for security. ISO 27000 does not define what "security
measure" is,

That's exactly the issue. You bring a very generic definitions from
standards and best practices, and then you bring your personal opinion
on how to implement a specific case, and make it sound like the standard
endorses your personal preference. But it is not so - both filtering and
validation can be perfectly secure when properly used (or insecure when
not). There's no requirement in the standards for any of them, at least
you haven't demonstrated any.

As I described above, accounting which requires logging is one of security
measure for me.

And that's fine for your use cases, but it doesn't mean all use cases
must be like yours. So making it sound like sanitizing data is somehow
insecure is not right - unless you can show some actual security
problem, not mismatch with your use case.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Stas Malyshev — view source

unread

Hi!

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Could you please be more specific about how this relevant to this
specific case? "But an ISO standard and read it whole" is not exactly a
good argument discussing specific issue.

Programmer should control over all inputs as the most important security
measure.
There are two strategies in general.

Convert inputs to secure values and ignore possible attacks.
(Sanitization)

Validate inputs to reject malformed values and record possible
attacks. (Validation and logging)

Thank you, I am aware of what sanitizing and validating input is.

to_int can be used as validation. It has advantage to record possible
attack (or bug). Logging is
one of important security feature. Therefore, validation could be said
more secure than sanitization.

This is just your personal opinion. Logging is not a security feature,
and if it were, it could be established independently, and should be
anyway since to_* log nothing. So claiming to_* is a security feature is
just wrong - it's like saying fopen() is a security feature because you
could use it to open a log file to which you'd write security-relevant
data.

Which strategy to adopt is that depends on organization/application
policy. Public web sites may ignore

This is right. So your claim that one is more secure than the other is
not correct.

Stas Malyshev
smalyshev@gmail.com

10 years ago by Yasuo Ohgaki — view source

unread

Hi Stas,

Please refer to CWE/SANS TOP 25, Monster Mitigation especially.

http://cwe.mitre.org/top25/#Mitigations

and ISO 27000. (I cannot provide link to it, since one should buy the
document to read)

Could you please be more specific about how this relevant to this
specific case? "But an ISO standard and read it whole" is not exactly a
good argument discussing specific issue.

I don't insist to read whole ISO 27000 standard. However, it is important
to agree
"security" definition at least. Otherwise, one says "it's security" and
other says
"it's not security".

Once there is agreement for security definition what is security is not
important.
What is important is "is it effective to achieve better security".

to_int can be used as validation. It has advantage to record possible

attack (or bug). Logging is
one of important security feature. Therefore, validation could be said
more secure than sanitization.

This is just your personal opinion. Logging is not a security feature,
and if it were, it could be established independently, and should be
anyway since to_* log nothing. So claiming to_* is a security feature is
just wrong - it's like saying fopen() is a security feature because you
could use it to open a log file to which you'd write security-relevant
data.

It's your personal opinion. ISO 27000 (and ISMS) requires to treat
accounting
(logging) as security feature. The standard defines 3 major area of
security,
confidentiality, integrity, availability. It also adds, reliability,
authenticity and
accountability. This is not my own opinion.

Which strategy to adopt is that depends on organization/application

policy. Public web sites may ignore

This is right. So your claim that one is more secure than the other is
not correct.

We need to close look at the detail.

Validation is better than sanitization for accounting.
Validation generates too many log that may cause DoS (e.g. disk full by
log, etc), may disturb administrator who checks security logs.

Validation (and logging) is better for accounting for sure. However, the log
generated by validation may do harm than good depending on situation.

Regards,

--
Yasuo Ohgaki
yohgaki@ohgaki.net

[RFC] Safe Casting Functions

Thanks!

Thanks!

The lazy developer won't check the return value anyway and would get 0
as the result of false-to-int conversion, thus making the whole exercise
pointless anyway :)

Thanks!

Saying "oh, we just add it like a random function and only then we'll
make it an opcode and it will be implementation detail" sounds a lot
like gaming the system to me.

…what?

No, they are not. They are new engine primitives for handling type
conversions.

They’re very similar, except the latter has slightly different rules, is shorter, and if some people (possibly me) get their way, might throw an exception.

--
Lester Caine - G8HFL

Thanks!

Thanks!

Thanks!

Thanks!

Thanks!

Thanks!

--
Lester Caine - G8HFL

Thanks!

Thanks!

Floats aren’t expected to be precise, so I don’t see why this shouldn’t pass. It’s a loss of data, sure, but merely of precision, which is expected here. The reason I have to_int fail is because float overflow to int completely mangles your input.

Oh, you’re saying how to_int/to_float don’t accept __toString objects. I suppose they could, but (int) and (float) don’t accept that, nor do we accept that anywhere else.

--
Lester Caine - G8HFL

What you mean by "mostly"? Could you describe the cases where it does
not work and to_* does?

And that's fine for your use cases, but it doesn't mean all use cases
must be like yours. So making it sound like sanitizing data is somehow
insecure is not right - unless you can show some actual security
problem, not mismatch with your use case.

This is right. So your claim that one is more secure than the other is
not correct.

Oh dear, I repeated myself accidentally. I think I meant to say, “There isn’t a lack of a value, there’s a bad value”.

Regards,

Floats are special, they are not expected to be precise. If we reject this, then perhaps we should also reject 0.1, because it can’t be precisely represented by a float?

[RFC] Safe Casting Functions

Thanks!

Thanks!

The lazy developer won't check the return value anyway and would get 0 as the result of false-to-int conversion, thus making the whole exercise pointless anyway :)

Thanks!

Saying "oh, we just add it like a random function and only then we'll make it an opcode and it will be implementation detail" sounds a lot like gaming the system to me.

…what?

No, they are not. They are new engine primitives for handling type conversions.

They’re very similar, except the latter has slightly different rules, is shorter, and if some people (possibly me) get their way, might throw an exception.

-- Lester Caine - G8HFL

Thanks!

Thanks!

Thanks!

Thanks!

Thanks!

Thanks!

-- Lester Caine - G8HFL

Thanks!

Thanks!

Floats aren’t expected to be precise, so I don’t see why this shouldn’t pass. It’s a loss of data, sure, but merely of precision, which is expected here. The reason I have to_int fail is because float overflow to int completely mangles your input.

Oh, you’re saying how to_int/to_float don’t accept __toString objects. I suppose they could, but (int) and (float) don’t accept that, nor do we accept that anywhere else.

-- Lester Caine - G8HFL

What you mean by "mostly"? Could you describe the cases where it does not work and to_* does?

And that's fine for your use cases, but it doesn't mean all use cases must be like yours. So making it sound like sanitizing data is somehow insecure is not right - unless you can show some actual security problem, not mismatch with your use case.

This is right. So your claim that one is more secure than the other is not correct.

Oh dear, I repeated myself accidentally. I think I meant to say, “There isn’t a lack of a value, there’s a bad value”.

Regards,

Floats are special, they are not expected to be precise. If we reject this, then perhaps we should also reject 0.1, because it can’t be precisely represented by a float?

The lazy developer won't check the return value anyway and would get 0
as the result of false-to-int conversion, thus making the whole exercise
pointless anyway :)

Saying "oh, we just add it like a random function and only then we'll
make it an opcode and it will be implementation detail" sounds a lot
like gaming the system to me.

No, they are not. They are new engine primitives for handling type
conversions.

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

What you mean by "mostly"? Could you describe the cases where it does
not work and to_* does?

And that's fine for your use cases, but it doesn't mean all use cases
must be like yours. So making it sound like sanitizing data is somehow
insecure is not right - unless you can show some actual security
problem, not mismatch with your use case.

This is right. So your claim that one is more secure than the other is
not correct.