Scalar Type Hints v0.4

10 years ago by Nikita Popov — view source

unread

Based on conversations here and elsewhere on the internet, I'd like to
put forward a rough gameplan for scalar types which I hope addresses
most concerns. This is back-of-the-napkin and I'm not asking for a
committed yes/no, just pre-rfc set of thoughts.

Please don't get hung up on specific names, we can debate those in the
coming week(s), I'm only looking for large architectural issues.

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)

numeric (int|float|numeric-string)

stringish (string or object with __toString())

boolish (like mixed, but coerces to bool)

etc...

Define a way to enable "strict mode" (we'll be weak by default).

2a) Userspace impact: Strict mode will throw a recoverable error on
type mismatch. Weak mode will coerce the type according to conversion
rules (See #3), throwing a recoverable error if coercion isn't
possible.

2b) Internal impact: The same rules apply to internal functions as
userspace functions HOWEVER, we use the types present in ZEND_ARG_INFO
structures, not zpp. This has the net effect that every internal
function remains effectively untyped unless specifically opted in by
means of updating their arg info struct. In weak mode, internal
functions coerce according to conversion rules.

Tighten up coersion rules for weak mode. i.e. "10 dogs" for an int
is a violation, but "10" is acceptable.

3a) Userspace impact: We're in a clean slate state, so this is "safe"
from a BC perspective.

3b) Internal impact: Again, behavior remains unchanged unless the
ZEND_ARG_INFO struct has been modified to add proper typehints. If
typehints have been added, then the more aggressive coersion rules
apply during typehint validation.

I really want to underline the design expressed in #2b and #3b.
zend_parse_parameters()'s types have been removed from the equation in
this proposal. This means that, until someone audits a given function
and makes the decision to give it a type, it will effectively behave
as though always weak, regardless of the caller's flags. This enables
us to give the same contractual behavior internally and externally,
while still implicitly treating internal functions as a bit special
for the purpose of moving forward.

I don't like the way this is heading with regards to internal functions.
Apart from better inter-compatibility, the primary appeal of Andrea's
proposal was that we have the option to make not only userland function
calls strict, but internal ones as well. With these modifications this is
lost for all practical purposes. (*)

I don't buy into Rasmus arguments about internal functions. They concern
one particular edge case (int->float coercion) and I doubt they have much
relevance if applied to codebases with pervasive use of typehints (where
you can be reasonably sure of the types of your variables). Even if, for
the sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than dropping
the strict typing for internal functions altogether.

I'd personally appreciate to just go back to Andrea's proposal with a tweak
to fix the declare() issues.

Nikita

(*) Where "practical purposes" refers to my assumption that it is very
unlikely that we'll add arginfo typehints to the entirety of all bundled
functions and the added typehints will not be heavily colored by people
trying to shove in weak typing even when strict mode is enabled. For our
own good, obviously.

10 years ago by Rasmus Lerdorf — view source

unread

I don't buy into Rasmus arguments about internal functions. They concern
one particular edge case (int->float coercion) and I doubt they have much
relevance if applied to codebases with pervasive use of typehints (where
you can be reasonably sure of the types of your variables). Even if, for
the sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than dropping
the strict typing for internal functions altogether.

int->float is actually secondary to "123"->int. And while they may be
edge-cases there are enough of them that we would be pushing people
towards casting by default which should be a last-resort thing, not the
first thing you do.

-Rasmus

10 years ago by Nikita Popov — view source

unread

I don't buy into Rasmus arguments about internal functions. They concern
one particular edge case (int->float coercion) and I doubt they have much
relevance if applied to codebases with pervasive use of typehints (where
you can be reasonably sure of the types of your variables). Even if, for
the sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than
dropping
the strict typing for internal functions altogether.

int->float is actually secondary to "123"->int. And while they may be
edge-cases there are enough of them that we would be pushing people
towards casting by default which should be a last-resort thing, not the
first thing you do.

The inability to implicitly cast "123" to int is pretty much the KEY
distinction between weak and strict scalar typehints (those pesky
value-dependent type checks). If the strict typing mode doesn't offer this,
what's the point at all?

This is exactly what I fear will happen with an arginfo based approach. If
even fundamental aspects like the "123" vs 123 (or true vs 1) distinction
are suppressed for internal functions, this isn't a strict typing mode,
it's just a weak typing mode with slightly different rules.

Nikita

10 years ago by Sara Golemon — view source

unread

This is exactly what I fear will happen with an arginfo based approach. If
even fundamental aspects like the "123" vs 123 (or true vs 1) distinction
are suppressed for internal functions, this isn't a strict typing mode, it's
just a weak typing mode with slightly different rules.

By the way, I realize I wasn't clear in my previous reply to you. I
don't mean to dismiss your position and the proposal I put forth was
just to get a feel for people's gut reactions to it. Your gut
reaction is clearly negative and that will be taken into account when
I put up 0.4 of the RFC which may or may not look like this proposal,
depending on what others have to say about it.

-Sara

10 years ago by Dmitry Stogov — view source

unread

On Wed, Feb 18, 2015 at 1:53 AM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:

I don't buy into Rasmus arguments about internal functions. They
concern
one particular edge case (int->float coercion) and I doubt they have
much
relevance if applied to codebases with pervasive use of typehints
(where
you can be reasonably sure of the types of your variables). Even if,
for
the sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than
dropping
the strict typing for internal functions altogether.

int->float is actually secondary to "123"->int. And while they may be
edge-cases there are enough of them that we would be pushing people
towards casting by default which should be a last-resort thing, not the
first thing you do.

The inability to implicitly cast "123" to int is pretty much the KEY
distinction between weak and strict scalar typehints (those pesky
value-dependent type checks). If the strict typing mode doesn't offer this,
what's the point at all?

This is exactly what I fear will happen with an arginfo based approach. If
even fundamental aspects like the "123" vs 123 (or true vs 1) distinction
are suppressed for internal functions, this isn't a strict typing mode,
it's just a weak typing mode with slightly different rules.

The difference between true and 1 is even more strict than rules of
statically typed languages.
Could you write a short list, where the strict types are really useful.
In my opinion it's only program verification, but for this case we may
enable strict typing by a tool and not in the language definition (we may
provide callback in the core).

Thanks. Dmitry.

Nikita

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Wednesday, February 18, 2015 3:06 AM
To: Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Subject: Re: [PHP-DEV] Scalar Type Hints v0.4

The inability to implicitly cast "123" to int is pretty much the KEY
distinction
between weak and strict scalar typehints (those pesky value-dependent type
checks). If the strict typing mode doesn't offer this, what's the point at
all?

I am wondering what the point is indeed with preventing "123" to 123. So
far, all the concrete use cases people brought up had to do with "Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting "123"
to 123 is likely to be a problem real world. Is it really just static
analyzers?

Zeev

10 years ago by Andrey Andreev — view source

unread

Hi,

-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Wednesday, February 18, 2015 3:06 AM
To: Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Subject: Re: [PHP-DEV] Scalar Type Hints v0.4

The inability to implicitly cast "123" to int is pretty much the KEY
distinction
between weak and strict scalar typehints (those pesky value-dependent type
checks). If the strict typing mode doesn't offer this, what's the point at
all?

I am wondering what the point is indeed with preventing "123" to 123. So
far, all the concrete use cases people brought up had to do with "Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting "123"
to 123 is likely to be a problem real world. Is it really just static
analyzers?

I too am curious about the potential issue with "123" to 123
specifically, although it could be seen as a subset of another problem
that is solved with strict hints - numeric-character string
identifiers being erroneously treated as integers.

That is especially bad when such identifiers are in fact generated as
integers first so that they are incremental, but the
program/database/business logic requires them to be fixed-length
strings and/or in hexadecimal format. In such cases, even silently
discarding leading zeros can prove to be problematic, while in the
case of hexadecimal representations you'd need more than 10 data
samples to notice the problem if you don't use a strict hint.
Obviously, that would be solved with automated testing, but
unfortunately even code with high test coverage % often lacks depth in
its test cases.

I believe reduced amount of necessary unit tests was already brought
up as an advantage of strict type hints, so I'm just explaining one
such use case in detail here.

Cheers,
Andrey.

10 years ago by Lester Caine — view source

unread

That is especially bad when such identifiers are in fact generated as
integers first so that they are incremental, but the
program/database/business logic requires them to be fixed-length
strings and/or in hexadecimal format. In such cases, even silently
discarding leading zeros can prove to be problematic, while in the
case of hexadecimal representations you'd need more than 10 data
samples to notice the problem if you don't use a strict hint.
Obviously, that would be solved with automated testing, but
unfortunately even code with high test coverage % often lacks depth in
its test cases.

Octal is something that can often be miss converted since it IS the same
as an integer with only a '0' in front in PHP. But that is not something
that can be fixed with the current proposals? Again we have to ensure
that the pre-processing takes care of the problem and how would static
analysis even know there was a problem? A type hint following the SQL
standards would be more helpful than the javascript approach of giving
an error in strict mode.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Andrey Andreev — view source

unread

Hi,

That is especially bad when such identifiers are in fact generated as
integers first so that they are incremental, but the
program/database/business logic requires them to be fixed-length
strings and/or in hexadecimal format. In such cases, even silently
discarding leading zeros can prove to be problematic, while in the
case of hexadecimal representations you'd need more than 10 data
samples to notice the problem if you don't use a strict hint.
Obviously, that would be solved with automated testing, but
unfortunately even code with high test coverage % often lacks depth in
its test cases.

Octal is something that can often be miss converted since it IS the same
as an integer with only a '0' in front in PHP. But that is not something
that can be fixed with the current proposals? Again we have to ensure
that the pre-processing takes care of the problem and how would static
analysis even know there was a problem? A type hint following the SQL
standards would be more helpful than the javascript approach of giving
an error in strict mode.

I'm not talking about octal, and nobody's talking about SQL here.

Consider the following signature:

function foo(int $bar) {}

In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and
on the very first if it was a strict hint:

for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}

And when I said leading zeros, I was talking about fixed-length string
identifiers such as '001', '002', etc. where you may unintentionally
pass such a value to a function that deals with ... quantities, for
example. A strict hint in that case would immediately catch this
logical error while a weak hint would silently ignore the leading
zeros and will happily treat the value as an integer. Again, the
precondition here is that it's not an integer value that happens to be
stored as a string, but a non-integer value that just looks like an
integer.

Cheers,
Andrey.

10 years ago by Zeev Suraski — view source

unread

Consider the following signature:
function foo(int $bar) {}
In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and on
the
very first if it was a strict hint:
for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}
And when I said leading zeros, I was talking about fixed-length string
identifiers such as '001', '002', etc. where you may unintentionally
pass
such a value to a function that deals with ... quantities, for example. A
strict
hint in that case would immediately catch this logical error while a weak
hint
would silently ignore the leading zeros and will happily treat the value
as an
integer. Again, the precondition here is that it's not an integer value
that
happens to be stored as a string, but a non-integer value that just looks
like
an integer.

Thanks the example Andrey, it's helpful.

My goal with asking for these use cases isn't to claim that they never
exist, and I certainly don't want to get into a theological discussion. My
goal is to try and figure out whether many, if not all, proponents of strict
typing would be willing to live with a compromise on single set of rules,
that on one hand would be a lot stricter than what was proposed for weak
typing in the v0.3 RFC (bool->anything fails, any string that's not strictly
looking like a number incl. "100 dogs" -> int/float fails, float->int
fails), but on the other hand, would allow certain lossless conversions
(numeric string -> int/float, int->float, toString() to string, etc.) to
work.

When people from both schools of thought (strict/weak) come to evaluate this
possibility, remember that we don't need a solution that works in 100.0% of
the cases. Edge cases, again, be them strict or weak, can always be
implemented with a bit of custom code inside the function - the important
thing is to get a system that addresses the vast majority of cases.

Thanks!

Zeev

10 years ago by Andrey Andreev — view source

unread

Hi Zeev,

Consider the following signature:
function foo(int $bar) {}
In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and on
the
very first if it was a strict hint:
for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}
And when I said leading zeros, I was talking about fixed-length string
identifiers such as '001', '002', etc. where you may unintentionally
pass
such a value to a function that deals with ... quantities, for example. A
strict
hint in that case would immediately catch this logical error while a weak
hint
would silently ignore the leading zeros and will happily treat the value
as an
integer. Again, the precondition here is that it's not an integer value
that
happens to be stored as a string, but a non-integer value that just looks
like
an integer.
Thanks the example Andrey, it's helpful.

My goal with asking for these use cases isn't to claim that they never
exist, and I certainly don't want to get into a theological discussion. My
goal is to try and figure out whether many, if not all, proponents of strict
typing would be willing to live with a compromise on single set of rules,
that on one hand would be a lot stricter than what was proposed for weak
typing in the v0.3 RFC (bool->anything fails, any string that's not strictly
looking like a number incl. "100 dogs" -> int/float fails, float->int
fails), but on the other hand, would allow certain lossless conversions
(numeric string -> int/float, int->float, toString() to string, etc.) to
work.

I didn't imply that you meant any of that, not in this thread anyway
... If I've done it previously, it's only because your choice of words
had made it appear that way. You don't need to defend yourself
everytime I quote you. :)

When people from both schools of thought (strict/weak) come to evaluate this
possibility, remember that we don't need a solution that works in 100.0% of
the cases. Edge cases, again, be them strict or weak, can always be
implemented with a bit of custom code inside the function - the important
thing is to get a system that addresses the vast majority of cases.

Well, that's usually the case because you simply can't provide a 100% solution.

In this case, I believe we can satisfy if not 100%, then 99% of the
use cases by simply providing both weak and strict hints
simultaneously. From my POV, further restricting conversion rules for
weak hints is deffinately an improvement, but still a limited one.
There's no technical limitation to including two solutions to the
problem, so I'd rather do that and be practical instead of following a
belief for what does or doesn't belong in PHP.

I know you're on the flip side, so we shall agree to disagree on that.

Cheers,
Andrey.

10 years ago by francois@php.net — view source

unread

De : Andrey Andreev [mailto:narf@devilix.net]

Consider the following signature:
function foo(int $bar) {}
In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and
on the very first if it was a strict hint:
for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}

You're right. An hexa string with no leading '0x' and containing decimal digits only cannot be recognized as hexa. But I keep thinking that, balancing pros and cons, it's not enough to justify strict mode. Maybe I'm wrong and additional use cases will make me change my mind, but I consider hexa with no prefix as an edge-case.

I don't say that's the right solution but the problem can be solved at the base_convert() level. If we support '0x' strings as hexadecimal numbers, it can generate the '0x' prefix, which removes ambiguity for PHP and any other software that would have to interpret the string. Unfortunately, it would probably have to be explicitly required through an option because of BC. No perfect solution here.

Another argument some may consider weak : I'm also afraid that, in your example, the user seeing an error raised by strict mode could change its code to 'foo((int)base_convert', definitely hiding the real bug, even for 11 and up. So, Rasmus is right when he says strict mode can sometimes and indirectly be counter-productive. Debugging shouldn't be considered as error suppression only, but it is the case more than often.

Regards

François

10 years ago by Andrey Andreev — view source

unread

Hi François,

De : Andrey Andreev [mailto:narf@devilix.net]

Consider the following signature:
function foo(int $bar) {}
In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and
on the very first if it was a strict hint:
for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}
You're right. An hexa string with no leading '0x' and containing decimal digits only cannot be recognized as hexa. But I keep thinking that, balancing pros and cons, it's not enough to justify strict mode. Maybe I'm wrong and additional use cases will make me change my mind, but I consider hexa with no prefix as an edge-case.

I don't say that's the right solution but the problem can be solved at the base_convert() level. If we support '0x' strings as hexadecimal numbers, it can generate the '0x' prefix, which removes ambiguity for PHP and any other software that would have to interpret the string. Unfortunately, it would probably have to be explicitly required through an option because of BC. No perfect solution here.

In real-world applications, base_covert() would hardly ever be the
culprit and I don't want to change its current behavior. I only used
base_convert() in the example because that allowed the least amount of
code written to display the problem.

Another argument some may consider weak : I'm also afraid that, in your example, the user seeing an error raised by strict mode could change its code to 'foo((int)base_convert', definitely hiding the real bug, even for 11 and up. So, Rasmus is right when he says strict mode can sometimes and indirectly be counter-productive. Debugging shouldn't be considered as error suppression only, but it is the case more than often.

Sorry, but I do consider that to be a weak argument ... We can't help
users who's only concern is eliminating error messages, we can only
help those that understand them.

Cheers,
Andrey.

10 years ago by francois@php.net — view source

unread

Hi,

Octal is something that can often be miss converted since it IS the same
as an integer with only a '0' in front in PHP. But that is not something
that can be fixed with the current proposals?

What do you propose ? Considering leading zero as octal indicator is not an option, IMO. If you have another way, why not.

Again we have to ensure
that the pre-processing takes care of the problem and how would static
analysis even know there was a problem? A type hint following the SQL
standards.

Please give conversion rules and supported syntax for the 'SQL' type you have in mind.

would be more helpful than the javascript approach of giving
an error in strict mode.

Regards

François

10 years ago by Lester Caine — view source

unread

Octal is something that can often be miss converted since it IS the same

as an integer with only a '0' in front in PHP. But that is not something
that can be fixed with the current proposals?
What do you propose ? Considering leading zero as octal indicator is not an option, IMO. If you have another way, why not.

0o 0 and \ are the usual flags for an octal value and we have functions
for octal strings but they are not user friendly in their output as they
tend to ignore adding a leading tag at all. But my favourite is still
'\143\141\164' == "\143\141\164" which is false, but I doubt many would
know why?
Yes it only becomes a problem when one is accessing material like
historic data dumps, and rejecting the numeric string may be 'strictly'
correct, but it's one those 'what the' if one gets an error where for
years it's run perfectly?

Again we have to ensure
that the pre-processing takes care of the problem and how would static
analysis even know there was a problem? A type hint following the SQL
standards.
Please give conversion rules and supported syntax for the 'SQL' type you have in mind.
'octal' just expects a base 8 string. I know there are some examples in
the SQL standards, but since they are paid for documents it's pointless
trying to reference them :(

( Andrey - there may not be plans to support a full range of hints -
weak or strict, but this is all valid material that PHP handles daily
and passes around )

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Leigh — view source

unread

But my favourite is still
'\143\141\164' == "\143\141\164" which is false, but I doubt many would
know why?

Pretty sure one of the first things PHP devs learn is that single
quoted strings only accept ' and \ as escape sequences.

10 years ago by francois@php.net — view source

unread

Hi Andrey,

De : Andrey Andreev [mailto:narf@devilix.net]

I too am curious about the potential issue with "123" to 123
specifically, although it could be seen as a subset of another problem
that is solved with strict hints - numeric-character string
identifiers being erroneously treated as integers.

Please give use cases. Do you want to support '0xhexa' strings ? that's possible. We are not only restricting possible conversions, we can also support additional syntaxes. Just give use cases for what you think should be enabled or disabled, compared to the current behavior.

The only change we have in list so far about (string -> number) is rejecting trailing chars (but accepting trailing blanks).

We are in the process of changing these rules so, please give examples of ' numeric-character string identifiers being erroneously treated as integers'. If you mean '7 years', it's in list already. If others, tell us.

That is especially bad when such identifiers are in fact generated as
integers first so that they are incremental, but the
program/database/business logic requires them to be fixed-length
strings and/or in hexadecimal format. In such cases, even silently
discarding leading zeros can prove to be problematic, while in the
case of hexadecimal representations you'd need more than 10 data
samples to notice the problem if you don't use a strict hint.

Do you mean we should accept hexadecimal string as int ? why not ? Give exact syntax(es) you want to support (except leading/trailing blanks, which are default now).

Regards

François

10 years ago by Andrey Andreev — view source

unread

Hi,

Hi Andrey,

De : Andrey Andreev [mailto:narf@devilix.net]

I too am curious about the potential issue with "123" to 123
specifically, although it could be seen as a subset of another problem
that is solved with strict hints - numeric-character string
identifiers being erroneously treated as integers.

Please give use cases. Do you want to support '0xhexa' strings ? that's possible. We are not only restricting possible conversions, we can also support additional syntaxes. Just give use cases for what you think should be enabled or disabled, compared to the current behavior.

The only change we have in list so far about (string -> number) is rejecting trailing chars (but accepting trailing blanks).

We are in the process of changing these rules so, please give examples of ' numeric-character string identifiers being erroneously treated as integers'. If you mean '7 years', it's in list already. If others, tell us.

That is especially bad when such identifiers are in fact generated as
integers first so that they are incremental, but the
program/database/business logic requires them to be fixed-length
strings and/or in hexadecimal format. In such cases, even silently
discarding leading zeros can prove to be problematic, while in the
case of hexadecimal representations you'd need more than 10 data
samples to notice the problem if you don't use a strict hint.

Do you mean we should accept hexadecimal string as int ? why not ? Give exact syntax(es) you want to support (except leading/trailing blanks, which are default now).

No, I meant the opposite ... I was trying to explain cases where a
weak hint would be insufficient. Sorry for not including examples in
my first mail, I did that in my next reply:

Consider the following signature:
function foo(int $bar) {}
In the case of a string representation of a hexadecimal number, the
following would error only on the last iteration with a weak hint, and
on the very first if it was a strict hint:
for ($i = 0; $i < 11; $i++)
{
    foo(base_convert($i, 10, 16));
}
And when I said leading zeros, I was talking about fixed-length string
identifiers such as '001', '002', etc. where you may unintentionally
pass such a value to a function that deals with ... quantities, for
example. A strict hint in that case would immediately catch this
logical error while a weak hint would silently ignore the leading
zeros and will happily treat the value as an integer. Again, the
precondition here is that it's not an integer value that happens to be
stored as a string, but a non-integer value that just looks like an
integer.

Cheers,
Andrey.

10 years ago by padraic.brady@gmail.com — view source

unread

Hi all,

Hi,

I am wondering what the point is indeed with preventing "123" to 123. So
far, all the concrete use cases people brought up had to do with "Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting "123"
to 123 is likely to be a problem real world. Is it really just static
analyzers?

I too am curious about the potential issue with "123" to 123
specifically, although it could be seen as a subset of another problem
that is solved with strict hints - numeric-character string
identifiers being erroneously treated as integers.

If I may interject briefly (doing it anyway!), there are so many
concepts being munged together that there's bound to be confusion (on
the part of idle readers like me). For this specific case, would I as
someone who wants strict/strong typing really care whether “123” was
coerced to an integer? No. Others are free to disagree. I actually
don’t mind there being a certain amount of logical coercion between
types where it makes sense. That’s not, per se, fully in accordance
with strictest to the strict degree typing which was Nikita’s point.
Coercion is itself a symptom of weak typing, so the more coercion one
introduces, the weaker the typing.

However, “123” is exceptional. It’s redefining an integer as “ an
integer or a string comprised wholly of digits without leading zeroes,
with an optional leading hyphen, and representing an integer up to
PHP_INT_MAX”, i.e. an integer or a string with a real number that be
made an integer without loss. No other string need apply. That’s not
strict-strict typing (there’s coercion) but it’s probably strict
enough to pass muster (it’s one single obvious coercion under limited
circumstances).

Then again, it’s an exception that requires a long sentence. It’s just
not clear, to me, if this is the sole intended exception, or if that
sentence needs to be expanded to a paragraph. A section? Are we going
to need a chapter? I’m assuming octals-in-a-string are a no-go to be
coerced whereas others might just see an integer with superfluous
leading zeroes.

In a future RFC revision, it might be nice to have a table of the
specific coercion rules applicable to a weak/strong/single-unified
option. Granted, final implementation details may be not 100% certain,
but in this case any variation in implementation can have significance
as to whether something is weak/strict/or lies somewhere else on the
spectrum in between.

So, I agree with Nikita that this is less than strict typing, but one
single logical exception doesn’t instantly demote it to extreme weak
typing if its sufficiently narrow in scope. We are compromising, no?

It’s imperfect in other ways, but I’ll let others debate if those are
significant or not.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Pádraic Brady [mailto:padraic.brady@gmail.com]
Sent: Wednesday, February 18, 2015 4:50 PM
To: Andrey Andreev
Cc: Zeev Suraski; Nikita Popov; Rasmus Lerdorf; Sara Golemon; PHP
internals
Subject: Re: [PHP-DEV] Scalar Type Hints v0.4

Hi all,

Hi,

I am wondering what the point is indeed with preventing "123" to 123.
So far, all the concrete use cases people brought up had to do with
"Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting
"123"
to 123 is likely to be a problem real world. Is it really just
static analyzers?

I too am curious about the potential issue with "123" to 123
specifically, although it could be seen as a subset of another problem
that is solved with strict hints - numeric-character string
identifiers being erroneously treated as integers.

However, “123” is exceptional. It’s redefining an integer as “ an integer
or a
string comprised wholly of digits without leading zeroes, with an optional
leading hyphen, and representing an integer up to PHP_INT_MAX”, i.e. an
integer or a string with a real number that be made an integer without
loss.
No other string need apply. That’s not strict-strict typing (there’s
coercion)
but it’s probably strict enough to pass muster (it’s one single obvious
coercion under limited circumstances).

Then again, it’s an exception that requires a long sentence.

There are shorter ways to define this exception - not in a formal way
perhaps, but that gets the point across - something like 'a string that
looks like an integer', and we can throw in 'and that can be converted with
no data loss'. Of course, writing docs isn't one of my strong points :)

It’s just not clear,
to me, if this is the sole intended exception, or if that sentence needs
to be
expanded to a paragraph. A section? Are we going to need a chapter? I’m
assuming octals-in-a-string are a no-go to be coerced whereas others might
just see an integer with superfluous leading zeroes.

It's not the sole exception, but it's one of the most important one in my
opinion. The rationale would be conversions that can be made without
meaningful data loss or creating data that wasn't there.

That means that "42" can coerce into int, and "42.2" can coerce into float,
and int can coerce into float - but not vice versa.
That means that boolean will not coerce into anything, because turning it
into an integer, float or string 'invents' data that wasn't there.
A somewhat sticky points is coercion into boolean. We need to figure this
one out, and hopefully we can come up with something most people can agree
to.
Another open question would be coercion of float/int into string. This
particular point might be good for a secondary vote, as I imagine people
from both camps won't see this particular conversion as a major deal-breaker
for them (could be wrong).

Getting the point across could be made using a table very similar to the one
in wiki.php.net/rfc/scalar_type_hints#behaviour_of_weak_type_checks (as you
seemed to suggest) - of course modified to reflect the much stricter ruleset
we're talking about here. I don't imagine that even a formal declaration
would require a whole chapter.

Thanks!

Zeev

10 years ago by francois@php.net — view source

unread

De : Zeev Suraski [mailto:zeev@zend.com]

That means that "42" can coerce into int, and "42.2" can coerce into float,
and int can coerce into float - but not vice versa.

I was wondering : should we systematically reject float to int, or should we accept it when range fits and fractional part is null ?

Shorter : can 7.0 be considered as an integer ?

Example : if we completely disable float to int :

function foo(int $arg) {}

foo(ceil(<any number>)); <- fails while ceil result is always integer
foo(abs(<float>)); <- the same

On one side, it depends on value, which is not so good. On the other side, we must consider, the number behind the representation is an integer. And PHP math functions too often return integers as floats (mostly for a question of range)

Anyway, one more use case against strict mode, as the examples above, while considered intuitively and technically correct, would fail. And there several more. Would the solution be to create another exception :) ?

A somewhat sticky points is coercion into boolean. We need to figure this
one out, and hopefully we can come up with something most people can
agree to.

As I already said, with such a C-like syntax, we probably cannot disable (int -> bool), so the same for 'integer string' to bool, as it is natural to give this the same rule as native int. Not sure for float but I would say OK too, as C habits prevail IMO.

Another open question would be coercion of float/int into string. This
particular point might be good for a secondary vote, as I imagine people
from both camps won't see this particular conversion as a major deal-breaker
for them (could be wrong).

I don't see the point here. It creates data but there's no ambiguity in the way to represent it as a string. I think bidirectional 'numeric string' <-> int/float conversion is a concept easier to understand and remember.

Regards

François

10 years ago by francois@php.net — view source

unread

De : Pádraic Brady [mailto:padraic.brady@gmail.com]

However, “123” is exceptional. It’s redefining an integer as “ an
integer or a string comprised wholly of digits without leading zeroes,
with an optional leading hyphen, and representing an integer up to
PHP_INT_MAX”

Add leading zeros, and leading/trailing blanks and I think it is still strict enough.

Then again, it’s an exception that requires a long sentence. It’s just
not clear, to me, if this is the sole intended exception, or if that
sentence needs to be expanded to a paragraph. A section? Are we going
to need a chapter? I’m assuming octals-in-a-string are a no-go to be
coerced whereas others might just see an integer with superfluous
leading zeroes.

Leading zeros to recognize octal string are not an option, but alternative non ambiguous syntax is possible, in theory.

'0x'-prefixed hexa is possible too. Thoughts ?

In a future RFC revision, it might be nice to have a table of the
specific coercion rules applicable to a weak/strong/single-unified
option.

I am currently writing this.

Regards

François

10 years ago by Robert Stoll — view source

unread

-----Ursprüngliche Nachricht-----
Von: Zeev Suraski [mailto:zeev@zend.com]
Gesendet: Mittwoch, 18. Februar 2015 08:00
An: Nikita Popov; Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Betreff: RE: [PHP-DEV] Scalar Type Hints v0.4

-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Wednesday, February 18, 2015 3:06 AM
To: Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Subject: Re: [PHP-DEV] Scalar Type Hints v0.4

The inability to implicitly cast "123" to int is pretty much the KEY
distinction between weak and strict scalar typehints (those pesky
value-dependent type checks). If the strict typing mode doesn't offer
this, what's the point at all?

I am wondering what the point is indeed with preventing "123" to 123. So far, all the concrete use cases people brought up
had to do with "Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting "123"
to 123 is likely to be a problem real world. Is it really just static analyzers?

Strict mode is useful in the sense that it prevents unnecessary implicit conversions (which are costly) and it improves readability.
Following an example:

function foo(string $x, int $y){
bar(1);
return strstr($x,"hello", $y);
}

function bar(float $a){}

After adding the implicit conversions the code would look as follows:

function foo(string $x, int $y){
bar((float) 1);
return strstr($x, "hello", (bool) $y);
}

function bar(float $a){}

In strict mode the original code would not be valid (rightly so IMO). Just from reading the original code I would suspect that strstr expects some kind of an offset (hence the int), therefore strict mode probably revealed a bug.
And if not, then one can add the conversion manually. However, this is not as trivial as it sounds. Personally I think it would only make sense to have strict mode in PHP if the user had more strict conversion functions at hand. What is the benefit of the following? if the conversion to int is as sloppy as today then one does not gain anything from the strict mode IMO:

function foo(int $x){}
foo( (int)$_GET["bla"]);

10 years ago by Zeev Suraski — view source

unread

-----Original Message-----
From: Robert Stoll [mailto:php@tutteli.ch]
Sent: Wednesday, February 18, 2015 1:14 PM
To: 'Zeev Suraski'; 'Nikita Popov'; 'Rasmus Lerdorf'
Cc: 'Sara Golemon'; 'PHP internals'
Subject: AW: [PHP-DEV] Scalar Type Hints v0.4

-----Ursprüngliche Nachricht-----
Von: Zeev Suraski [mailto:zeev@zend.com]
Gesendet: Mittwoch, 18. Februar 2015 08:00
An: Nikita Popov; Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Betreff: RE: [PHP-DEV] Scalar Type Hints v0.4

I am wondering what the point is indeed with preventing "123" to 123.
So far, all the concrete use cases people brought up had to do with
"Apple" or
"100 dogs", but nobody ever seems to be able to explain why converting
"123"
to 123 is likely to be a problem real world. Is it really just static
analyzers?

Strict mode is useful in the sense that it prevents unnecessary implicit
conversions (which are costly) and it improves readability.
Following an example:

function foo(string $x, int $y){
bar(1);
return strstr($x,"hello", $y);
}

function bar(float $a){}

After adding the implicit conversions the code would look as follows:

function foo(string $x, int $y){
bar((float) 1);
return strstr($x, "hello", (bool) $y); }

function bar(float $a){}

In strict mode the original code would not be valid (rightly so IMO). Just
from
reading the original code I would suspect that strstr expects some kind of
an
offset (hence the int), therefore strict mode probably revealed a bug.

There are two things I'm not so clear about in what you're saying.
It seems that the 2nd sample adds explicit casts and not implicit casts.
Explicit casts are actually use a much more aggressive ruleset than even the
ruleset in the v0.3 RFC, in the sense that they'd happily convert "Apple"
into (float) 0.0, if you do an explicit (float) cast. They (almost) can't
fail.
Secondly, I think there aren't any common situations where strict typing (in
the form of zval.type comparison) would be any less costly than weak typing.
The difference is really between failure (abort in case there's a type
mismatch in strict) and success (convert to the requested type). The
conversion that may happen in the weak scenario is no costlier than an
explicit cast, probably a tiny bit less actually.

Again, in my opinion pushing users towards explicit casts - which have much
more lax rules than the ones proposed in v0.3, let alone the ones we're
currently considering, will defeat the purpose and actually make finding
bugs harder.

And if not, then one can add the conversion manually. However, this is not
as
trivial as it sounds. Personally I think it would only make sense to have
strict
mode in PHP if the user had more strict conversion functions at hand. What
is
the benefit of the following? if the conversion to int is as sloppy as
today
then one does not gain anything from the strict mode IMO:

function foo(int $x){}
foo( (int)$_GET["bla"]);

I agree, but changing the rules of explicit casts is a huge change and BC
break. If we do implement the single-mode, stricter-than-pure-weak and
weaker-than-pure-strict ruleset, we could introduce a new set of conversion
functions, along the lines of safe_int(), that would follow the same rules
as the corresponding type hints (i.e. accept (int) 32, (string) "32", but
not (float) 32.7, or (string) "32 dogs").

Zeev

10 years ago by Robert Stoll — view source

unread

-----Ursprüngliche Nachricht-----
Von: Zeev Suraski [mailto:zeev@zend.com]
Gesendet: Mittwoch, 18. Februar 2015 14:03
An: Robert Stoll
Cc: Sara Golemon; PHP internals
Betreff: RE: [PHP-DEV] Scalar Type Hints v0.4

-----Original Message-----
From: Robert Stoll [mailto:php@tutteli.ch]
Sent: Wednesday, February 18, 2015 1:14 PM
To: 'Zeev Suraski'; 'Nikita Popov'; 'Rasmus Lerdorf'
Cc: 'Sara Golemon'; 'PHP internals'
Subject: AW: [PHP-DEV] Scalar Type Hints v0.4

-----Ursprüngliche Nachricht-----
Von: Zeev Suraski [mailto:zeev@zend.com]
Gesendet: Mittwoch, 18. Februar 2015 08:00
An: Nikita Popov; Rasmus Lerdorf
Cc: Sara Golemon; PHP internals
Betreff: RE: [PHP-DEV] Scalar Type Hints v0.4

I am wondering what the point is indeed with preventing "123" to 123.
So far, all the concrete use cases people brought up had to do with
"Apple" or
"100 dogs", but nobody ever seems to be able to explain why
converting
"123"
to 123 is likely to be a problem real world. Is it really just
static analyzers?

Strict mode is useful in the sense that it prevents unnecessary
implicit conversions (which are costly) and it improves readability.
Following an example:

function foo(string $x, int $y){
bar(1);
return strstr($x,"hello", $y);
}

function bar(float $a){}

After adding the implicit conversions the code would look as follows:

function foo(string $x, int $y){
bar((float) 1);
return strstr($x, "hello", (bool) $y); }

function bar(float $a){}

In strict mode the original code would not be valid (rightly so IMO).
Just from reading the original code I would suspect that strstr
expects some kind of an offset (hence the int), therefore strict mode
probably revealed a bug.

There are two things I'm not so clear about in what you're saying.
It seems that the 2nd sample adds explicit casts and not implicit casts.

[Robert Stoll]
Sorry, I was probably not clear enough. I just tried to illustrate what internally happens, how the code could look like after the implicit conversions were added. I am aware of that this was oversimplified since the implicit conversions do not correspond to the explicit ones (which is inconsistent IMO but this is another story).
The point I tried to make is, that the following code is not very readable (is misleading) and that the lack of strictness results in unnecessary implicit conversions:

function foo(string $x, int $y){
bar(1);
return strstr($x,"hello", $y);
}

The implicit conversion of 1 to float is unnecessary (and thus an extra cost) - the user should have written 1.0
The implicit conversion from $y to bool hides that strstr expects a bool as third argument and hence hides a potential bug

Two use cases where strict mode would be beneficial. However, the first case is not so dramatic. PHP is not a high performance language and thus it is perfectly fine IMO to have unnecessary implicit conversions. Yet, the latter point is a use case where I would like to have strict mode. Same for the following:

$a = strpos("hello","h");
if($a){}

if the condition of the if statement would be strict I could not have introduced this very common bug.

Explicit casts are actually use a much more aggressive ruleset than even the ruleset in the v0.3 RFC, in the sense that they'd
happily convert "Apple"
into (float) 0.0, if you do an explicit (float) cast. They (almost) can't fail.

[Robert Stoll]
Aye, a pity that safe casting functions where rejected: https://wiki.php.net/rfc/safe_cast
But maybe this discussion revive the RFC

Secondly, I think there aren't any common situations where strict typing (in the form of zval.type comparison) would be
any less costly than weak typing.
The difference is really between failure (abort in case there's a type mismatch in strict) and success (convert to the
requested type). The conversion that may happen in the weak scenario is no costlier than an explicit cast, probably a tiny
bit less actually.

[Robert Stoll]
I was not clear enough, I merely meant unnecessary casts are costly

Again, in my opinion pushing users towards explicit casts - which have much more lax rules than the ones proposed in v0.3,
let alone the ones we're currently considering, will defeat the purpose and actually make finding bugs harder.

[Robert Stoll]
I share this view, as I stated below, strict mode only makes sense with safer casts IMO. I think it would be clever to agree on the way "safe casts" should work, expose them to userland and if we should not get scalar type hints into PHP 7.0 then we have at least safe cast functions which allow almost the same but with a bit more code.

And if not, then one can add the conversion manually. However, this is
not as trivial as it sounds. Personally I think it would only make
sense to have strict mode in PHP if the user had more strict
conversion functions at hand. What is the benefit of the following? if
the conversion to int is as sloppy as today then one does not gain
anything from the strict mode IMO:

function foo(int $x){}
foo( (int)$_GET["bla"]);

I agree, but changing the rules of explicit casts is a huge change and BC break. If we do implement the single-mode,
stricter-than-pure-weak and weaker-than-pure-strict ruleset, we could introduce a new set of conversion functions, along
the lines of safe_int(), that would follow the same rules as the corresponding type hints (i.e. accept (int) 32, (string) "32",
but not (float) 32.7, or (string) "32 dogs").

Zeev

10 years ago by francois@php.net — view source

unread

De : Zeev Suraski [mailto:zeev@zend.com]

If we do implement the single-mode, stricter-than-pure-weak and
weaker-than-pure-strict ruleset, we could introduce a new set of conversion
functions, along the lines of safe_int(), that would follow the same rules
as the corresponding type hints (i.e. accept (int) 32, (string) "32", but
not (float) 32.7, or (string) "32 dogs").

Nice. I like it : create a set of userland functions aligned on zend_parse_arg_xxx().

Just remains to bikeshed on function names :)

Regards

François

10 years ago by francois@php.net — view source

unread

De : Robert Stoll [mailto:php@tutteli.ch]

Strict mode is useful in the sense that it prevents unnecessary implicit
conversions (which are costly) and it improves readability.
Following an example:

function foo(string $x, int $y){
bar(1);
return strstr($x,"hello", $y);
}

function bar(float $a){}

After adding the implicit conversions the code would look as follows:

function foo(string $x, int $y){
bar((float) 1);
return strstr($x, "hello", (bool) $y);
}

function bar(float $a){}

In strict mode the original code would not be valid (rightly so IMO).

Actually, your example is partially invalid because strict-typing radicals now propose to add a (int -> float) exception to so-called strict mode (which proves the approach is flawed, IMHO).

You don't propose a strict-mode alternative in your example. OK, it generates conversions and should fail. Now, how would you write the same code using strict-mode and without adding casts which would do exactly the same, but slower.

If you just mean there's an undetected bug, you're right, but, IMO, a C-like syntax like PHP's cannot disable (int -> bool) implicit conversion.

A partial solution can be brought by a set of strict types I am planning to define in the single (not so weak) mode approach I am working on (something like 'int!', 'float!',...). This would allow people who know what they're doing to demand zval-type-based strict checks, arg by arg. It can be used for performance reasons and for the rare cases where zval type really matters (sorting, for instance). This would be available to internal and userland functions.

Regards

François

10 years ago by padraic.brady@gmail.com — view source

unread

Actually, your example is partially invalid because strict-typing radicals now propose to add a (int -> float) exception to so-called strict mode (which proves the approach is flawed, IMHO).

Careful, it helps not to call folk "radicals" if you intend to pursue
a compromise with them ;).

I wouldn't necessarily mind int->float - it's lossless assuming one way only.

Paddy

--
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
Zend Framework Community Review Team
Zend Framework PHP-FIG Representative

10 years ago by Lester Caine — view source

unread

I wouldn't necessarily mind int->float - it's lossless assuming one way only.
Assuming int is not 64 bit ;)

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by francois@php.net — view source

unread

De : Pádraic Brady [mailto:padraic.brady@gmail.com]

Careful, it helps not to call folk "radicals" if you intend to pursue
a compromise with them ;).

Sorry, english is not my native language, and 'radical' may be offensive.

I was just looking for a word for people who consider providing two modes is a pre-requisite to any discussion.

I wouldn't necessarily mind int->float - it's lossless assuming one way only.

It's lossless but it kills the 'strict' position. It can be claimed, one hand on the heart, this will be the only exception but, as use cases and side effects accumulate, we all know it will finish as a bunch of exceptions to a no-more strict mode, adding confusion where it is not needed. I guess the next one would be (int -> bool), and the rest would follow.

I am taking the problem the other way round, determining from scratch the filtering/conversions I want to enable and disable. The result will probably be the same, but not with the same wasted energy and not in the same time.

Regards

François

10 years ago by Rasmus Lerdorf — view source

unread

De : Pádraic Brady [mailto:padraic.brady@gmail.com]

Careful, it helps not to call folk "radicals" if you intend to pursue
a compromise with them ;).

Sorry, english is not my native language, and 'radical' may be offensive.

I was just looking for a word for people who consider providing two modes is a pre-requisite to any discussion.

I wouldn't necessarily mind int->float - it's lossless assuming one way only.

It's lossless but it kills the 'strict' position. It can be claimed, one hand on the heart, this will be the only exception but, as use cases and side effects accumulate, we all know it will finish as a bunch of exceptions to a no-more strict mode, adding confusion where it is not needed. I guess the next one would be (int -> bool), and the rest would follow.

We need to keep in mind that int->float isn't technically lossless. We
have a 53-bit IEEE754 mantissa to take account for here, so it is only
lossless for values below 36028797018963966 or so.

-Rasmus

10 years ago by Zeev Suraski — view source

unread

De : Pádraic Brady [mailto:padraic.brady@gmail.com]

Careful, it helps not to call folk "radicals" if you intend to pursue
a compromise with them ;).

Sorry, english is not my native language, and 'radical' may be offensive.

I was just looking for a word for people who consider providing two modes is a pre-requisite to any discussion.

I wouldn't necessarily mind int->float - it's lossless assuming one way only.

It's lossless but it kills the 'strict' position. It can be claimed, one hand on the heart, this will be the only exception but, as use cases and side effects accumulate, we all know it will finish as a bunch of exceptions to a no-more strict mode, adding confusion where it is not needed. I guess the next one would be (int -> bool), and the rest would follow.

We need to keep in mind that int->float isn't technically lossless. We
have a 53-bit IEEE754 mantissa to take account for here, so it is only
lossless for values below 36028797018963966 or so.

We can limit ourselves to values below that limit. If you deal with values above it, be explicit about casting.

Zeev

10 years ago by Lester Caine — view source

unread

We can limit ourselves to values below that limit. If you deal with values above it, be explicit about casting.
This is exactly my problem ...
Databases are using 64bit primary keys more and more, and having to
worry about going over some limit is the very thing that any 'hinting'
should be taking care of! This is the very area where using 32bit builds
at least provides a level of protection currently.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Pierre Joye — view source

unread

I don't buy into Rasmus arguments about internal functions. They concern
one particular edge case (int->float coercion) and I doubt they have much
relevance if applied to codebases with pervasive use of typehints (where
you can be reasonably sure of the types of your variables). Even if, for
the sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than
dropping
the strict typing for internal functions altogether.

int->float is actually secondary to "123"->int. And while they may be
edge-cases there are enough of them that we would be pushing people
towards casting by default which should be a last-resort thing, not the
first thing you do.

The inability to implicitly cast "123" to int is pretty much the KEY
distinction between weak and strict scalar typehints (those pesky
value-dependent type checks). If the strict typing mode doesn't offer this,
what's the point at all?

This is exactly what I fear will happen with an arginfo based approach. If
even fundamental aspects like the "123" vs 123 (or true vs 1) distinction
are suppressed for internal functions, this isn't a strict typing mode,
it's just a weak typing mode with slightly different rules.

I totally agree with you here, and with your next more verbose reply.

I am astonished to see where this discussion simply redo what we
discussed to death already and basically see no progress toward a
compromise but a way to get weak typing in place. I do not see much
value to argue in circle forever and will actually support what I
consider as good once there is a RFC in place. Weak typing only won't
be the one I would choose. I remain a fervent supporter of the
previously proposed dual mode, which actually covers all we need. Yes,
there are implementation details (I repeat: yes, I do consider most of
the raised issues as implementation details), but generally it is the
compromises and way I see as the way to go.

Cheers,

Pierre

@pierrejoye | http://www.libgd.org

10 years ago by Sara Golemon — view source

unread

I don't like the way this is heading with regards to internal functions.
Apart from better inter-compatibility, the primary appeal of Andrea's
proposal was that we have the option to make not only userland function
calls strict, but internal ones as well. With these modifications this is
lost for all practical purposes. (*)

Personally, I agree with you. I liked Andrea's proposal a lot, and if
we put this back up for vote it probably would pass the 2/3rd majority
needed. That said, if we can build a better consensus, we should try
to.

I don't buy into Rasmus arguments about internal functions. They concern one
particular edge case (int->float coercion) and I doubt they have much
relevance if applied to codebases with pervasive use of typehints (where you
can be reasonably sure of the types of your variables). Even if, for the
sake of argument, we acknowledge the concern as valid we should be
discussing that particular case (int->float coercion) rather than dropping
the strict typing for internal functions altogether.

I don't fully buy into it either, particularly if we apply
meta-types/unions. I do have to acknowledge his point about
"encouraging the use of blind casts" though. Having a mass of PHP
standard library functions suddenly look typed or apply a different
set of coercion rules than one is used to is potentially dangerous.

-Sara

10 years ago by Rowan Collins — view source

unread

Nikita Popov wrote on 18/02/2015 00:35:

I don't like the way this is heading with regards to internal functions.
Apart from better inter-compatibility, the primary appeal of Andrea's
proposal was that we have the option to make not only userland function
calls strict, but internal ones as well. With these modifications this is
lost for all practical purposes.

I actually rather like this idea - it allows internal functions to
gradually introduce scalar type hints just as userland code will
gradually introduce them, rather than assuming a new meaning of existing
metadata. It gives a chance to look, for each function, what hints make
sense, separate from the technical requirements of translating to a
strongly typed set of C variables, which is the main role of ZPP.

Note that, however natural it may seem to core/extension devs, the
workings of ZPP are not at all obvious to users, so the more
similarities between internal and user-defined type hints, the better.

Where "practical purposes" refers to my assumption that it is very
unlikely that we'll add arginfo typehints to the entirety of all bundled
functions and the added typehints will not be heavily colored by people
trying to shove in weak typing even when strict mode is enabled.

I see it working well if a large batch of uncontroversial functions have
type hints mechanically added straight away, and then more complex cases
are looked at in detail and discussed via pull requests or similar.
What you have identified there is a danger to be watched out for,
certainly, but it is not an inevitable outcome, if we come up with a
decent set of guidelines of how the feature should be used.

Regards,

Rowan Collins
[IMSoP]

10 years ago by francois@php.net — view source

unread

Hi Nikita,

I don't like the way this is heading with regards to internal functions.
Apart from better inter-compatibility, the primary appeal of Andrea's
proposal was that we have the option to make not only userland function
calls strict, but internal ones as well. With these modifications this is
lost for all practical purposes. (*)

Please refer to my other posts proposing single mode with the addition of four 'strict' scalar types at ZPP level (would apply to internal anduserland). These will be chosen by internal function implementors when they decide it, not by the caller. I hope it can solve your question.

The idea is that internal function are treated as weak using the current ZPP types. Then, they can be made strict, one by one, and argument by argument. No need to duplicate type hinting to arg_info IMO.

Rasmus' (int -> float) coercion question does not exist if we find a single mode consensus. And, I am decided to take the time it needs to find one, as I think it can lead to a much more consistent design.

Can you just tell me if what I exposed above solves your concerns ? Thanks.

Regards

François

10 years ago by Michael Wallner — view source

unread

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)

Hold on, usually, type checking functions don't identify NULL as scalar.

--
Regards,
Mike

10 years ago by Rowan Collins — view source

unread

Michael Wallner wrote on 18/02/2015 11:19:

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)
Hold on, usually, type checking functions don't identify NULL as scalar.

No need for anyone to hold on; as it says in the section you've quoted
"we can argue what's in this list separately". Sara's after reactions to
the principle of having meta-types/unions in general, not their
definitions, right now.

Regards,

Rowan Collins
[IMSoP]

10 years ago by Michael Wallner — view source

unread

Michael Wallner wrote on 18/02/2015 11:19:

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)
Hold on, usually, type checking functions don't identify NULL as scalar.

No need for anyone to hold on; as it says in the section you've quoted
"we can argue what's in this list separately". Sara's after reactions to
the principle of having meta-types/unions in general, not their
definitions, right now.

Did you already incorporate "strict mode" yourself?
SCNR

I'm not a native speaker, so "hold on" might mean something different to
you than to me.

--
Regards,
Mike

10 years ago by Rowan Collins — view source

unread

Michael Wallner wrote on 18/02/2015 11:39:

Michael Wallner wrote on 18/02/2015 11:19:

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)
Hold on, usually, type checking functions don't identify NULL as scalar.
No need for anyone to hold on; as it says in the section you've quoted
"we can argue what's in this list separately". Sara's after reactions to
the principle of having meta-types/unions in general, not their
definitions, right now.
Did you already incorporate "strict mode" yourself?
SCNR

I'm not a native speaker, so "hold on" might mean something different to
you than to me.

Sorry, what I meant was, "don't worry, this isn't the kind of issue that
we need to worry about yet".

Sara's e-mail made clear that these were quick examples, and she wasn't
expecting feedback on the details at this stage.

Regards,

Rowan Collins
[IMSoP]

10 years ago by Lazare Inepologlou — view source

unread

2015-02-18 12:45 GMT+01:00 Rowan Collins rowan.collins@gmail.com:

Michael Wallner wrote on 18/02/2015 11:39:

Michael Wallner wrote on 18/02/2015 11:19:

Introduce scalar types for primitives: bool, int, float, string,

resource, object (we already have array)
1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)

Hold on, usually, type checking functions don't identify NULL as scalar.

No need for anyone to hold on; as it says in the section you've quoted
"we can argue what's in this list separately". Sara's after reactions to
the principle of having meta-types/unions in general, not their
definitions, right now.

Did you already incorporate "strict mode" yourself?
SCNR

I'm not a native speaker, so "hold on" might mean something different to
you than to me.

Sorry, what I meant was, "don't worry, this isn't the kind of issue that
we need to worry about yet".

Sara's e-mail made clear that these were quick examples, and she wasn't
expecting feedback on the details at this stage.

Still, nullable types is something that had been proposed several times in
this list, and fits nicely with the introduction of union-types.

So, yes it would be nice null not to be included in scalars. Instead, we
could have a union type like scalar? = scalar|null

Lazare INEPOLOGLOU
Ingénieur Logiciel

Regards,

Rowan Collins
[IMSoP]

10 years ago by francois@php.net — view source

unread

De : Lazare Inepologlou [mailto:linepogl@gmail.com]

So, yes it would be nice null not to be included in scalars. Instead, we
could have a union type like scalar? = scalar|null

That's roughly the idea. However, IMO, the general mechanism for union types needs to be defined before we start defining union type aliases.

Another option is to create new ZPP types. These won't be union type aliases, technically, but can be defined now. From an user's point of view, the usages would be roughly the same, and we can propose them for 7.0.

Regards

François

10 years ago by francois@php.net — view source

unread

Hi Michael,

The case of null is a little special.

As a type hint, we need it for return and union types only.

Considering union types, they were clearly left out of scope for 7.0 and I personally won't propose pre-defined union types before the general case for unions is designed. Conversion to union type, in particular, is complex, and requires more thinking and discussion.

Hold on, usually, type checking functions don't identify NULL as scalar.

I don't understand your point as parameter parsing currently accepts null as scalar (converting to 0 or empty string).

If you mean is_xxx() functions, that's irrelevant because these functions, except a few ones like is_numeric(), are just based on zval type and don't accept any conversion.

I profit of this mail to propose :

disabling accepting null as number, string, or bool (it is already rejected for other types).
also disabling implicit conversions of any type to null (a function declared to return int couldn't return null without adding 'null' to its return type, which requires union types).

A consequence is that, until we have union types, a function returning int or null, for instance, cannot have an explicit type.

It is important because disabling implicit conversion to null allows to trap functions ending without an explicit 'return' statement, while supposed to return a value.

Example:

Function foo(int $a): int
{
If ($a > 0) return $a;
} <- Error : received null while expecting int

While this will be OK when union types exist :

Function foo(int $a): int|null
{
If ($a > 0) return $a;
}

Regards

François

10 years ago by Leigh — view source

unread

Based on conversations here and elsewhere on the internet, I'd like to
put forward a rough gameplan for scalar types which I hope addresses
most concerns. This is back-of-the-napkin and I'm not asking for a
committed yes/no, just pre-rfc set of thoughts.

Please don't get hung up on specific names, we can debate those in the
coming week(s), I'm only looking for large architectural issues.

Introduce scalar types for primitives: bool, int, float, string,
resource, object (we already have array)

Can we keep a 0) of "reserve names for future use in-case of RFC
failure" option.

1a) Introduce meta-types as pre-defined unions (we can add custom
unions in a later rfc). A possible list may be as follows (again, we
can argue what's in this list separately):

mixed: any type

scalar: (null|bool|int|float|string)

numeric (int|float|numeric-string)

stringish (string or object with __toString())

boolish (like mixed, but coerces to bool)

etc...

How do you propose weak typing works with these? Does it only allow
one of the union of types through (thus making it strict), or does it
try and coerce to one if it can? Which one does it pick?

Define a way to enable "strict mode" (we'll be weak by default).

Please give the option to enable strict by default. This is all many
of us have been asking for. Personally I don't care if this cannot be
changed from a script to prevent it being forced on users (yes I'd be
willing to have it as an ini setting even...). Just the option, that's
all we want.

2a) Userspace impact: Strict mode will throw a recoverable error on
type mismatch. Weak mode will coerce the type according to conversion
rules (See #3), throwing a recoverable error if coercion isn't
possible.

2b) Internal impact: The same rules apply to internal functions as
userspace functions HOWEVER, we use the types present in ZEND_ARG_INFO
structures, not zpp. This has the net effect that every internal
function remains effectively untyped unless specifically opted in by
means of updating their arg info struct. In weak mode, internal
functions coerce according to conversion rules.

Tighten up coersion rules for weak mode. i.e. "10 dogs" for an int
is a violation, but "10" is acceptable.

3a) Userspace impact: We're in a clean slate state, so this is "safe"
from a BC perspective.

3b) Internal impact: Again, behavior remains unchanged unless the
ZEND_ARG_INFO struct has been modified to add proper typehints. If
typehints have been added, then the more aggressive coersion rules
apply during typehint validation.

This leaves us in a state where some functions will have defined types
with their aggressive coersion rules and some will not, and we can't
expect users to remember which set of functions have been updated or
not. I think the rules need to apply to everything or nothing.

I really want to underline the design expressed in #2b and #3b.
zend_parse_parameters()'s types have been removed from the equation in
this proposal. This means that, until someone audits a given function
and makes the decision to give it a type, it will effectively behave
as though always weak, regardless of the caller's flags. This enables
us to give the same contractual behavior internally and externally,
while still implicitly treating internal functions as a bit special
for the purpose of moving forward.

So there will be potential ongoing breaks for any type of callers as
and when functions receive their types. No type is existing
behaviour(?), typed + weak is "aggressive coersion" that may fail
where untyped did not (from 3b), and typed + strict wont throw until
the function is typed.

Unrelated to the specifics of this proposal, I've had a quick search
(for things like "locale" and "LC_NUMERIC") but didn't see anything.
How do locale settings affect weak typing (specifically thinking
string -> float)?

10 years ago by Rowan Collins — view source

unread

Leigh wrote on 18/02/2015 13:10:

3b) Internal impact: Again, behavior remains unchanged unless the

ZEND_ARG_INFO struct has been modified to add proper typehints. If
typehints have been added, then the more aggressive coersion rules
apply during typehint validation.
This leaves us in a state where some functions will have defined types
with their aggressive coersion rules and some will not, and we can't
expect users to remember which set of functions have been updated or
not.

That's precisely the case for every existing user-defined function.
Switching to PHP 7 won't suddenly add type hints to every function in
every library and every existing bespoke code base, so there is no way
to avoid that thought process.

I think the rules need to apply to everything or nothing.

The rules will apply to everything in the same way - if a function is
typehinted, it behaves like so; if it's not, it behaves the same way it
did in PHP 5.

Regards,

Rowan Collins
[IMSoP]

10 years ago by Leigh — view source

unread

This leaves us in a state where some functions will have defined types
with their aggressive coersion rules and some will not, and we can't
expect users to remember which set of functions have been updated or
not.

That's precisely the case for every existing user-defined function.
Switching to PHP 7 won't suddenly add type hints to every function in every
library and every existing bespoke code base, so there is no way to avoid
that thought process.

Of course, and some people may opt to avoid type hints altogether in
their own code to avoid this, but they can't avoid changes to internal
functions. How do we plan to release these incremental changes? We try
and minimise BC as much as possible so a patch release might not be
possible. Do we restrict it to minor versions, i.e. the yearly release
schedule?

I think the rules need to apply to everything or nothing.

The rules will apply to everything in the same way - if a function is
typehinted, it behaves like so; if it's not, it behaves the same way it did
in PHP 5.

That's not really what I meant. So a user doesn't have to keep track
of which internal functions are now typed and which are not, all
functions should be typed at the same time, or none at all.

10 years ago by Rowan Collins — view source

unread

Leigh wrote on 18/02/2015 13:31:

This leaves us in a state where some functions will have defined types
with their aggressive coersion rules and some will not, and we can't
expect users to remember which set of functions have been updated or
not.
That's precisely the case for every existing user-defined function.
Switching to PHP 7 won't suddenly add type hints to every function in every
library and every existing bespoke code base, so there is no way to avoid
that thought process.
Of course, and some people may opt to avoid type hints altogether in
their own code to avoid this, but they can't avoid changes to internal
functions. How do we plan to release these incremental changes? We try
and minimise BC as much as possible so a patch release might not be
possible. Do we restrict it to minor versions, i.e. the yearly release
schedule?

Yes, I think that would be sensible. The aim would still be for most
internal functions to have type hints by 7.0, since most of them have
trivial signatures, particularly if we can agree a suitable set of union
types. Internal function "signatures" (which are currently only
signatures in the manual, and a bunch of procedural ZPP magic in the
source) change between minor versions fairly often, so "function x will
now raise errors in strict typing mode for values that would previously
have been accepted" seems a reasonable change note for 7.1, 7.2, etc.

I can't foresee any reason why we'd urgently want to add a typehint in a
patch release. I can, though, see a rush to get every internal function
typehinted by using ZPP causing unintended consequences, and us having
to rush out fixes in 7.0.1.

I think the rules need to apply to everything or nothing.
The rules will apply to everything in the same way - if a function is
typehinted, it behaves like so; if it's not, it behaves the same way it did
in PHP 5.
That's not really what I meant. So a user doesn't have to keep track
of which internal functions are now typed and which are not, all
functions should be typed at the same time, or none at all.

Yes, I'm sorry, I deliberately took the sentence more generally than it
was intended in order to make a different point. I find the lack of
consistency between internal and user-defined functions really
frustrating as a user, so am always on the look out for rules that can
apply neatly to them both.

Regards,

Rowan Collins
[IMSoP]

10 years ago by francois@php.net — view source

unread

De : Rowan Collins [mailto:rowan.collins@gmail.com]
That's precisely the case for every existing user-defined function.
Switching to PHP 7 won't suddenly add type hints to every function in
every library and every existing bespoke code base, so there is no way
to avoid that thought process.

I think the rules need to apply to everything or nothing.

The rules will apply to everything in the same way - if a function is
typehinted, it behaves like so; if it's not, it behaves the same way it
did in PHP 5.

I understand your pov but, in mine, there's no reason to artificially consider internal functions as untyped by adding another mostly-redundant mechanism for internal type hinting. If nothing existed, this would be fine, but if we split ZPP and type hinting info, we're creating a redundancy we'll pull behind us forever.

Regards

François

10 years ago by francois@php.net — view source

unread

De : Leigh [mailto:leight@gmail.com]

Can we keep a 0) of "reserve names for future use in-case of RFC
failure" option.

Reserving names is only needed as long as we keep keywords sharing the same naming space as classes. This is a mistake from the past and, as long as we keep it, each new keyword is a pain. Reserving keywords in advance can only lead to reserving too few or too much. So, IMO, deprecating bare class names as hint is first. Then, we can reserve a limited set of keywords.

How do you propose weak typing works with these? Does it only allow
one of the union of types through (thus making it strict), or does it
try and coerce to one if it can? Which one does it pick?

That's exactly the problem we need to solve before going the union type road. A limited set can be implemented now as new zpp types, but none that requires questionable conversion (while useful, we are not ready for 'int|float', for example).

Regards

François

10 years ago by Rowan Collins — view source

unread

François Laupretre wrote on 18/02/2015 15:47:

De : Leigh [mailto:leight@gmail.com]

Can we keep a 0) of "reserve names for future use in-case of RFC
failure" option.
Reserving names is only needed as long as we keep keywords sharing the same naming space as classes. This is a mistake from the past and, as long as we keep it, each new keyword is a pain. Reserving keywords in advance can only lead to reserving too few or too much. So, IMO, deprecating bare class names as hint is first. Then, we can reserve a limited set of keywords.

What if we defined the types as names in the \PHP namespace, but defined
a slightly different algorithm for resolving typehints vs other uses:

function foo(\PHP\types\numeric $a) // unambiguous but unwieldy
function foo(\My\Namespace\numeric $a) // unambiguously not a built-in
type
function foo(numeric $a) // ambiguous, resolved at compile time

The name would be resolved as follows:

Given a typehint $type:

If $type begins with '\PHP\types', interpret it it as an internal type.
ElseIf $type contains '', interpret it as a class name, and proceed
with normal class resolution at runtime.
ElseIf \PHP\types$type is the name of a built-in type, interpret it
as that internal type.
Else, interpret it as a class name, and proceed with normal class
resolution at runtime.

Basically, this means you can do the following:

class String {}
function accept_scalar_string(string $string) { ... }
function accept_string_object(\String $string) { ... }

The nice thing about this is that if we ever allow users to define
"basic" types - copy-on-write structs, range types, enums, etc - they
could "extend" these built-in types.

Does that make sense to anyone, or am I over-complicating things?

Regards,

Rowan Collins
[IMSoP]

10 years ago by francois@php.net — view source

unread

De : Rowan Collins [mailto:rowan.collins@gmail.com]

What if we defined the types as names in the \PHP namespace, but defined
a slightly different algorithm for resolving typehints vs other uses:

function foo(\PHP\types\numeric $a) // unambiguous but unwieldy
function foo(\My\Namespace\numeric $a) // unambiguously not a built-in
type
function foo(numeric $a) // ambiguous, resolved at compile time

The name would be resolved as follows:

Given a typehint $type:

If $type begins with '\PHP\types', interpret it it as an internal type.

ElseIf $type contains '', interpret it as a class name, and proceed
with normal class resolution at runtime.

ElseIf \PHP\types$type is the name of a built-in type, interpret it
as that internal type.

Else, interpret it as a class name, and proceed with normal class
resolution at runtime.

Basically, this means you can do the following:

class String {}
function accept_scalar_string(string $string) { ... }
function accept_string_object(\String $string) { ... }

The nice thing about this is that if we ever allow users to define
"basic" types - copy-on-write structs, range types, enums, etc - they
could "extend" these built-in types.

Does that make sense to anyone, or am I over-complicating things?

Well, that's a solution, at least theoretically speaking.

Frankly, I find it unreadable and I don't see reserving \PHP\types as a clean solution. IMO, adding semi-virtual namespaces would be mostly confusing and would just hide the initial name clash issue.

There are a lot of other ways to extend type hinting to user-defined types. Maybe we will reserve namespaces for this case. In your design, where would these user types go ? in \PHP\usertype\ ? or would they share \PHP\types with built-in types (which recreates the same issue) ?

Regards

François

Regards,

Rowan Collins
[IMSoP]

10 years ago by Rowan Collins — view source

unread

François Laupretre wrote on 18/02/2015 18:05:

De : Rowan Collins [mailto:rowan.collins@gmail.com]

What if we defined the types as names in the \PHP namespace, but defined
a slightly different algorithm for resolving typehints vs other uses:

function foo(\PHP\types\numeric $a) // unambiguous but unwieldy
function foo(\My\Namespace\numeric $a) // unambiguously not a built-in
type
function foo(numeric $a) // ambiguous, resolved at compile time

The name would be resolved as follows:

Given a typehint $type:

If $type begins with '\PHP\types', interpret it it as an internal type.

ElseIf $type contains '', interpret it as a class name, and proceed
with normal class resolution at runtime.

ElseIf \PHP\types$type is the name of a built-in type, interpret it
as that internal type.

Else, interpret it as a class name, and proceed with normal class
resolution at runtime.

Basically, this means you can do the following:

class String {}
function accept_scalar_string(string $string) { ... }
function accept_string_object(\String $string) { ... }

The nice thing about this is that if we ever allow users to define
"basic" types - copy-on-write structs, range types, enums, etc - they
could "extend" these built-in types.

Does that make sense to anyone, or am I over-complicating things?
Well, that's a solution, at least theoretically speaking.

Frankly, I find it unreadable and I don't see reserving \PHP\types as a clean solution. IMO, adding semi-virtual namespaces would be mostly confusing and would just hide the initial name clash issue.

There are a lot of other ways to extend type hinting to user-defined types. Maybe we will reserve namespaces for this case. In your design, where would these user types go ? in \PHP\usertype\ ? or would they share \PHP\types with built-in types (which recreates the same issue) ?

They'd use whatever (non-reserved) namespace the implementer wanted. e.g.

namespace Symfony\Component\TypeChecking;

basicType nonNegativeInt extends PHP\types\int {
public function isValid(int $value) {
return $value >= 0;
}
}

basicType PositiveInt extends nonNegativeInt {
public function isValid(nonNegativeInt $value) {
return $value != 0;
}
}

Obviously the format of the actual definition is made up off the top of
my head, but it shows how the namespacing would work. There's no need to
reserve a namespace for the user-defined types, because it's no worse a
burden to say "you can't name both a type and a class Foo\Bar" than to
say "you can't name two different classes Foo\Bar".

Regards,

Rowan Collins
[IMSoP]

10 years ago by francois@php.net — view source

unread

De : Rowan Collins [mailto:rowan.collins@gmail.com]

They'd use whatever (non-reserved) namespace the implementer wanted.
e.g.

namespace Symfony\Component\TypeChecking;

basicType nonNegativeInt extends PHP\types\int {
public function isValid(int $value) {
return $value >= 0;
}
}

basicType PositiveInt extends nonNegativeInt {
public function isValid(nonNegativeInt $value) {
return $value != 0;
}
}

Interesting. But, if I understand well, these are not classes, as they keep dealing with scalars. Or we should create an OO API to scalars, which is a very complex project.

Regards

François

10 years ago by Patrick ALLAERT — view source

unread

Hi Sara (and thanks for continuing the work!)

Le Tue Feb 17 2015 at 23:04:20, Sara Golemon pollita@php.net a écrit :
[...]

Define a way to enable "strict mode" (we'll be weak by default).

[...]

Tighten up coersion rules for weak mode. i.e. "10 dogs" for an int
is a violation, but "10" is acceptable.

Regarding 2) and 3):
An option might be to implement "weak mode" only and configure the coercion
rules "reporting" in a similar way than with the error_reporting
configuration entry.

Not focusing on the details here, but I think about something like:

function foo(int $a) {
    var_dump($a);
}

ini_set("coercion_reporting", 0); // current PHP 5.x behaviour

foo(7);
// int(7)

foo("7");
// int(7)

foo("7 dogs");
// int(7)


ini_set("coercion_reporting", COERCION_WARNING); // Warn, but do not

fail in case of potentially bad coercion

foo(7);
// int(7)

foo("7");
// int(7)

foo("7 dogs");
// Warning: Unsafe coercion transforming "7 dogs" to "7".
// int(7)


ini_set("coercion_reporting", COERCION_ERROR); // Fail in case of

potentially bad coercion

foo(7);
// int(7)
foo("7");
// int(7)
foo("7 dogs");
// Catchable fatal error: Unsafe coercion transforming "7 dogs" to "7".

The biggest advantage, IMHO, is that you get the exact same result whether
you do:

foo((int) $value);

or:

foo($value);

... whatever the mode you are in.

Basically, this is weak type hints + something similar to what I
contributed in the past with the "Array to string conversion" notice (see:
https://github.com/php/php-src/commit/d81ea16e).

Care to share the pro's/con's you see with a solution like that?

Thanks in advance.

Cheers,
Patrick

10 years ago by francois@php.net — view source

unread

De : Patrick ALLAERT [mailto:patrickallaert@php.net]
ini_set("coercion_reporting", COERCION_ERROR); // Fail in case of
potentially bad coercion
foo(7);
// int(7)
foo("7");
// int(7)
foo("7 dogs");
// Catchable fatal error: Unsafe coercion transforming "7 dogs" to "7".
The biggest advantage, IMHO, is that you get the exact same result whether
you do:
foo((int) $value);
or:
foo($value);
... whatever the mode you are in.

Basically, this is weak type hints + something similar to what I
contributed in the past with the "Array to string conversion" notice (see:
https://github.com/php/php-src/commit/d81ea16e).

Care to share the pro's/con's you see with a solution like that?

That's a good idea, IMO. We can add an optional message when a conversion is executed in ZPP macros. It can help. The main drawback is that it is pure runtime check, not suitable for static analysis and related tools.

Regards

François

10 years ago by Patrick ALLAERT — view source

unread

Le Wed Feb 18 2015 at 18:35:02, François Laupretre francois@php.net a
écrit :

De : Patrick ALLAERT [mailto:patrickallaert@php.net]
ini_set("coercion_reporting", COERCION_ERROR); // Fail in case of
potentially bad coercion
foo(7);
// int(7)
foo("7");
// int(7)
foo("7 dogs");
// Catchable fatal error: Unsafe coercion transforming "7 dogs" to
"7".
The biggest advantage, IMHO, is that you get the exact same result
whether
you do:
foo((int) $value);
or:
foo($value);
... whatever the mode you are in.

Basically, this is weak type hints + something similar to what I
contributed in the past with the "Array to string conversion" notice
(see:
https://github.com/php/php-src/commit/d81ea16e).

Care to share the pro's/con's you see with a solution like that?
That's a good idea, IMO. We can add an optional message when a conversion
is executed in ZPP macros. It can help. The main drawback is that it is
pure runtime check, not suitable for static analysis and related tools.

Does it sounds like a "compromise"? ;)

More seriously, I'm not sure that should be a prerequisite for accepting an
RFC.

Patrick

10 years ago by francois@php.net — view source

unread

De : Patrick ALLAERT [mailto:patrickallaert@php.net]

The biggest advantage, IMHO, is that you get the exact same result whether
you do:
foo((int) $value);
or:
foo($value);
... whatever the mode you are in.

Wrong. Parameter parsing rules are much more restrictive than casting rules.

Only 'foo((int)'orange')' would (erroneously) succeed.

François

10 years ago by Patrick ALLAERT — view source

unread

Le Wed Feb 18 2015 at 19:10:08, François Laupretre francois@php.net a
écrit :

De : Patrick ALLAERT [mailto:patrickallaert@php.net]

The biggest advantage, IMHO, is that you get the exact same result
whether
you do:
foo((int) $value);
or:
foo($value);
... whatever the mode you are in.
Wrong. Parameter parsing rules are much more restrictive than casting
rules.

Only 'foo((int)'orange')' would (erroneously) succeed.

Francois, I'm very aware of the distinction between cast mechanism and ZPP,
but I obviously haven't been clear about the fact that the
(conversion|coercion|type juggling|...) reporting configuration I proposed
would have to be used in ZPP AND casting mechanism (and anywhere else
where some conversion applies).

With:

$value = "foo";
foo((int) $value);

it is: "(int) $value" that would generate a warning/error depending on the
reporting, not while parsing the parameter of function foo(), which would
receive an int (0) in this precise case.

And this would address the cases:

http://example.org/foo.php?id=42
http://example.org/foo.php?id=bar

foo.php:
<?php
fetchById(int $id) {
// ...
}
fetchById($_GET["id"]);

even if $_GET["id"] is replaced by (int) $_GET["id"];

Cheers,
Patrick

10 years ago by francois@php.net — view source

unread

Hi Patrick,

I understand but we cannot include casting rules in the scope. And, while attractive, I think ZPP and casting cannot share the same ruleset, at least as long as casting is defined as aimed to being as permissive as possible.

François

De : Patrick ALLAERT [mailto:patrickallaert@php.net]
Envoyé : jeudi 19 février 2015 13:46
À : francois@php.net; Sara Golemon; PHP internals
Objet : Re: [PHP-DEV] Scalar Type Hints v0.4

Le Wed Feb 18 2015 at 19:10:08, François Laupretre francois@php.net a écrit :

De : Patrick ALLAERT [mailto:patrickallaert@php.net]

The biggest advantage, IMHO, is that you get the exact same result whether
you do:
foo((int) $value);
or:
foo($value);
... whatever the mode you are in.

Wrong. Parameter parsing rules are much more restrictive than casting rules.

Only 'foo((int)'orange')' would (erroneously) succeed.

Francois, I'm very aware of the distinction between cast mechanism and ZPP, but I obviously haven't been clear about the fact that the (conversion|coercion|type juggling|...) reporting configuration I proposed would have to be used in ZPP AND casting mechanism (and anywhere else where some conversion applies).

With:

$value = "foo";

foo((int) $value);

it is: "(int) $value" that would generate a warning/error depending on the reporting, not while parsing the parameter of function foo(), which would receive an int (0) in this precise case.

And this would address the cases:

http://example.org/foo.php?id=42

http://example.org/foo.php?id=bar

foo.php:

&lt;?php

fetchById(int $id) {

    // ...

}

fetchById($_GET["id"]);

even if $_GET["id"] is replaced by (int) $_GET["id"];

Cheers,

Patrick

10 years ago by Sara Golemon — view source

unread

> Regarding 2) and 3):
> An option might be to implement "weak mode" only and configure the coercion
> rules "reporting" in a similar way than with the error_reporting
> configuration entry.
>
> ini_set("coercion_reporting", 0); // current PHP 5.x behaviour
>

The significant problem with this is that it effects not only the
current script, but also all callees (until the next time someone
flips the bit).

So imagine LibraryA.php was written for PHP5, no scalar type hints,
and all that comes with it.

Your application turns on COERSION_WARNING and calling LibraryA::doStuff(1,2,3);

That call is valid because you're a good programmer who reads the
manual and knows how to pass the right args. The author of LibraryA,
however, wrote it while drunk at a ruby meetup and is depending on
weak conversions all over the place. They even need md5(array()) to
output 4410ec34d9e6c1a68100ca0ce033fb17 (yes, I know we don't allow
that anymore, enjoy the metaphor)

My point is that it potentially imposes new warnings on foreign code.

-Sara

10 years ago by Patrick ALLAERT — view source

unread

Le Wed Feb 18 2015 at 19:10:54, Sara Golemon a écrit :

> On Wed, Feb 18, 2015 at 7:34 AM, Patrick ALLAERT
> wrote:
> > Regarding 2) and 3):
> > An option might be to implement "weak mode" only and configure the
> coercion
> > rules "reporting" in a similar way than with the error_reporting
> > configuration entry.
> >
> > ini_set("coercion_reporting", 0); // current PHP 5.x behaviour
> >
>
> The significant problem with this is that it effects not only the
> current script, but also all callees (until the next time someone
> flips the bit).
>

That precisely my intention.

> So imagine LibraryA.php was written for PHP5, no scalar type hints,
> and all that comes with it.
>
> Your application turns on COERSION_WARNING and calling
> LibraryA::doStuff(1,2,3);
>
> That call is valid because you're a good programmer who reads the
> manual and knows how to pass the right args. The author of LibraryA,
> however, wrote it while drunk at a ruby meetup and is depending on
> weak conversions all over the place. They even need md5(array()) to
> output 4410ec34d9e6c1a68100ca0ce033fb17 (yes, I know we don't allow
> that anymore, enjoy the metaphor)
>

So, by turning it on, I would realise there is something wrong in my
"LibraryA"?

> My point is that it potentially imposes new warnings on foreign code.
>

Eureka :)

That's what happened when I introduced the "Array to string conversion":
lot of people complained about it and many frameworks had to fix various
issues where it happened under the hood (e.g.: with `array_diff()` on
multidimensional arrays).

My point is that the same is true when adding E_NOTICE, E_WARNING,
E_DEPRECATED,... to the error_reporting: it might prevent libraries to work
correctly (read: without extra PHP errors).

Why can't strictness follow that path?

PS: your feedback makes me feel it would be; even more; a viable option :)

Cheers,
Patrick

10 years ago by francois@php.net — view source

unread

De : Patrick ALLAERT [mailto:patrickallaert@php.net]

My point is that it potentially imposes new warnings on foreign code.

Eureka :)

That's what happened when I introduced the "Array to string conversion":
lot of people complained about it and many frameworks had to fix various
issues where it happened under the hood (e.g.: with array_diff() on
multidimensional arrays).

My point is that the same is true when adding E_NOTICE, E_WARNING,
E_DEPRECATED,... to the error_reporting: it might prevent libraries to work
correctly (read: without extra PHP errors).

Why can't strictness follow that path?

Because strictness is not the overall objective the PHP language is aiming to. If it was the case, your mechanism would be fine, but deprecating ZPP conversion would be simpler and fine too. This is definitely not the same case as generating a notice on array to string (and why did you generate a notice instead of E_DEPRECATE, we would be rid of this crap now).

That's what I hate in this 'weak' vs 'strict' terminology. It makes implicit that 'strict' is the natural future and improvement of 'weak'. That's absolutely not the case as 'weak' mode is not as negative as name suggests, and 'strict' is not so positive either. So, you may stop considering that the natural path for 'weak'-typed software is to migrate to strict types.

When we decide encouraging migrating to strict mode with a deprecation on ZPP conversion, I hope I'll be far away...

PS: your feedback makes me feel it would be; even more; a viable option :)

Fine. But may I remind you the so-called great benefit you underlined in your post is totally wrong and shows total ignorance of the difference between casting and ZPP conversion rules which, IMO, is a fundamental pre-requisite before laughing at people working on this.

Regards

François

10 years ago by Patrick ALLAERT — view source

unread

Le Thu Feb 19 2015 at 00:38:25, François Laupretre francois@php.net a
écrit :

This is definitely not the same case as generating a notice on array to
string (and why did you generate a notice instead of E_DEPRECATE, we would
be rid of this crap now).

I haven't decided that without discussing [1] it.

E_DEPRECATED is meant for something that may/will not work in the future
and the plan was not to stop converting arrays as the string "Array" when
it happens.
Moreover, no other "bad" conversion used E_DEPRECATED and it would have
been inconsistent IMO.

Note that there is room for improvements considering that "weird"
conversions are using one of E_NOTICE, E_WARNING, E_ERROR
or E_RECOVERABLE_ERROR depending on the case.

Feel free to propose something in this regard, I'm all for consistencies,
provided that it gives a sufficient amount of BC.

Regards,
Patrick

[1] http://marc.info/?l=php-internals&m=130709981705863

10 years ago by Patrick ALLAERT — view source

unread

Le Thu Feb 19 2015 at 00:38:25, François Laupretre francois@php.net a
écrit :

Why can't strictness follow that path?

Because strictness is not the overall objective the PHP language is aiming
to.

I cannot agree more with that.

If it was the case, your mechanism would be fine, but deprecating ZPP
conversion would be simpler and fine too.

I'm not so sure about the "simpler".

This is definitely not the same case as generating a notice on array to
string.

Sure, I just wanted to pinpoint that "because strictness is not the overall
objective of the PHP language", we may consider a weak approach accompanied
by an activable (configurable?) mechanism that would notices us of bad
types, bad coercion, conversion with loss,...

That's what I hate in this 'weak' vs 'strict' terminology. It makes
implicit that 'strict' is the natural future and improvement of 'weak'.
That's absolutely not the case as 'weak' mode is not as negative as name
suggests, and 'strict' is not so positive either. So, you may stop
considering that the natural path for 'weak'-typed software is to migrate
to strict types.

I never implied something like this, quite the opposite since I feel I am
completely aligned with you!

When we decide encouraging migrating to strict mode with a deprecation on
ZPP conversion, I hope I'll be far away...

+1

PS: your feedback makes me feel it would be; even more; a viable option
:)

Fine. But may I remind you the so-called great benefit you underlined in
your post is totally wrong and shows total ignorance of the difference
between casting and ZPP conversion rules which, IMO, is a fundamental
pre-requisite before laughing at people working on this.

I never laughed at any one here. Sorry if someone felt that way by the
simple use of a smiley.

10 years ago by francois@php.net — view source

unread

Hi Patrick,

We already plan a similar mechanism by raising an E_DEPRECATED on conversions that would have succeded in PHP 5 and will fail using the proposed new ‘PHP 7’ ZPP ruleset.

Then, it is technically possible to raise a notice on non-strict conversion but it must be discussed in depth because it can be very confusing, as E_NOTICE or, even, E_STRICT, are typically associated with ‘bad practice’, and that’s not the case here. So, a lot of people would assume these as something ‘clean’ code should avoid. Maybe another error type could be needed, but I don’t see the need as so important.

Regards

François

De : Patrick ALLAERT [mailto:patrickallaert@php.net]
Envoyé : jeudi 19 février 2015 11:07
À : francois@php.net; Sara Golemon
Cc : PHP internals
Objet : Re: [PHP-DEV] Scalar Type Hints v0.4

Le Thu Feb 19 2015 at 00:38:25, François Laupretre francois@php.net a écrit :

Why can't strictness follow that path?

Because strictness is not the overall objective the PHP language is aiming to.

I cannot agree more with that.

If it was the case, your mechanism would be fine, but deprecating ZPP conversion would be simpler and fine too.

I'm not so sure about the "simpler".

This is definitely not the same case as generating a notice on array to string.

Sure, I just wanted to pinpoint that "because strictness is not the overall objective of the PHP language", we may consider a weak approach accompanied by an activable (configurable?) mechanism that would notices us of bad types, bad coercion, conversion with loss,...

That's what I hate in this 'weak' vs 'strict' terminology. It makes implicit that 'strict' is the natural future and improvement of 'weak'. That's absolutely not the case as 'weak' mode is not as negative as name suggests, and 'strict' is not so positive either. So, you may stop considering that the natural path for 'weak'-typed software is to migrate to strict types.

I never implied something like this, quite the opposite since I feel I am completely aligned with you!

When we decide encouraging migrating to strict mode with a deprecation on ZPP conversion, I hope I'll be far away...

+1

PS: your feedback makes me feel it would be; even more; a viable option :)

Fine. But may I remind you the so-called great benefit you underlined in your post is totally wrong and shows total ignorance of the difference between casting and ZPP conversion rules which, IMO, is a fundamental pre-requisite before laughing at people working on this.

I never laughed at any one here. Sorry if someone felt that way by the simple use of a smiley.

Scalar Type Hints v0.4

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

Cheers,

Regards,

Regards,

Regards,

Regards,

Regards,

Regards,

Regards,

Regards,

Regards,

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL