Hi François,
Hi Yasuo,
Le 17/09/2015 00:10, Yasuo Ohgaki a écrit :
Hi all,
Assigning different type to already initialized variable is a bug most
likely. There may be cases that variable should have several types,
e.g. return INT for success and
FALSE
for failure, but programmers can
use different variable or return value directly or make BOOL/NULL
exception.
While I don't like the idea of 'auto-typing' a variable (assign it an
immutable type depending on the first assignment), the idea is quite
similar to one I proposed when we were discussing scalar type hinting :
the possibility to assign a type hint to every variable/property. In
order to be really usable, it requires compound types but that's a
detail because, anyway, we will need compound types.
The problem is that, AFAIK, it implies a mechanism to attach an optional
type hint to a zval, and check it each time a value is assigned. While
this would be extremely powerful and could dramatically change the way
PHP types are considered and handled, this is a huge and complex work,
especially in terms of performance. IMO, this may be an idea for 8.0,
not before. Anyway, I may be wrong, if you see a simpler way to
implement your concept, I'm interested.
Regards
François
Can you explain your statement that this would be a huge and complex
work? You must mean that there would be multiple places in the php
source code where variables are assigned? I'm not yet discussing
performance, but only the aspect of adding the feature.
I'm not familiar enough with the code base, but I imagine that there
is one primary location where variables are assigned. Maybe a very
small finite number of other cases, such as list() and references. It
is possible that a c-function or extension could modify the variable
assigned to it, thereby adding a multitude of possible locations. Yet
I don't see that this is the case in any standard c implementations.
For example, php_array_walk taking a reference, should sort the array
in place and in any case not change the value-type.
As to adding an additional member to the zval struct, that could be
done with a simple bool value. The type of the object could be set
inside of the value struct. I believe that this is why Yasuo suggested
that the first set value should be chosen.
I also disagree with the idea of first-init value as being unclear. If
I am not mistaken, with an additional zend_uchar ie. IS_NULL_TYPEDEF =
"t" the value member union of zval could contain the identifier of
this class. Then update IS_NULL to return true also if "t". I am not
qualified to say if that is possible or do it, but it sounds like a
"finite" amount of work. I don't see a particularly large performance
impact on the language. We have an additional 1 byte per zval and some
type checks. Admittedly, the function needs to go a few levels deep in
the struct to get the type of an object (when already defined), but
performance-wise it should be minor. In the case of setting a null
value to an object/string or setting any value to a basic type, I
would imagine the impact to extremely small.
Hi Eliah,
Thank you Eliah. I was planning to discuss this next year.
Following paragraph is for others. Making sure prevent confusions.
I've implemented session.use_strict_mode, but this discussion is for
"strict type mode".
I was proposing type check for variable conversions.
e.g. Raise error/exception for "string" to "int" conversion.
Hi François,
Hi Yasuo,
Le 17/09/2015 00:10, Yasuo Ohgaki a écrit :
Hi all,
Assigning different type to already initialized variable is a bug most
likely. There may be cases that variable should have several types,
e.g. return INT for success and
FALSE
for failure, but programmers canuse different variable or return value directly or make BOOL/NULL
exception.
While I don't like the idea of 'auto-typing' a variable (assign it an
immutable type depending on the first assignment), the idea is quite
similar to one I proposed when we were discussing scalar type hinting :
the possibility to assign a type hint to every variable/property. In
order to be really usable, it requires compound types but that's a
detail because, anyway, we will need compound types.The problem is that, AFAIK, it implies a mechanism to attach an optional
type hint to a zval, and check it each time a value is assigned. While
this would be extremely powerful and could dramatically change the wayPHP types are considered and handled, this is a huge and complex work,
especially in terms of performance. IMO, this may be an idea for 8.0,
not before. Anyway, I may be wrong, if you see a simpler way to
implement your concept, I'm interested.Regards
François
Can you explain your statement that this would be a huge and complex
work? You must mean that there would be multiple places in the php
source code where variables are assigned? I'm not yet discussing
performance, but only the aspect of adding the feature.
Performance will be increased overall because unnecessary type conversions
will be removed. e.g. $_GET/$_POST variables are "string", but numeric IDs
in these array will be converted to "int" over and over.
$something = get_something_by_int_id($_GET['id']); // "string"
$_GET['id'] is converted to "int" with type hint
All inputs must be validated anyway, so initial type conversion is inevitable
overhead.
I'm expecting overall performance increase, but I cannot be sure w/o
benchmarks. There are many cases that treating "numeric string"
as "string" is the fastest. e.g. Many native database's API assumes
"string" data for them and returns "string" data.
I'm not familiar enough with the code base, but I imagine that there
is one primary location where variables are assigned. Maybe a very
small finite number of other cases, such as list() and references. It
is possible that a c-function or extension could modify the variable
assigned to it, thereby adding a multitude of possible locations. Yet
I don't see that this is the case in any standard c implementations.
For example, php_array_walk taking a reference, should sort the array
in place and in any case not change the value-type.As to adding an additional member to the zval struct, that could be
done with a simple bool value. The type of the object could be set
inside of the value struct. I believe that this is why Yasuo suggested
that the first set value should be chosen.
I'm sorry. I cannot be sure what part you're discussing. Your reply
is new thread.
My proposal does not require new member in zval structure, but
requires error/exception to automatic variable type conversions.
(NULL type could be special case like SQL. This should be discussed
fully.)
I also disagree with the idea of first-init value as being unclear. If
I am not mistaken, with an additional zend_uchar ie. IS_NULL_TYPEDEF =
"t" the value member union of zval could contain the identifier of
this class. Then update IS_NULL to return true also if "t". I am not
qualified to say if that is possible or do it, but it sounds like a
"finite" amount of work. I don't see a particularly large performance
impact on the language. We have an additional 1 byte per zval and some
type checks. Admittedly, the function needs to go a few levels deep in
the struct to get the type of an object (when already defined), but
performance-wise it should be minor. In the case of setting a null
value to an object/string or setting any value to a basic type, I
would imagine the impact to extremely small.
IIRC, my proposal had 2 parts
- Introduce type affinity to $_GET/$_POST/$_COOKIE and
affinity functions for external data like SQL, JSON, XML, etc.
(Similar to SQLite's type affinity. SQLite stores any value to
columns, but convert integer like string to native int/long.)
https://wiki.php.net/rfc/introduce-type-affinity - Introduce error/exception for type conversions.
No RFC yet.
Bool type handling is subject to discuss. The way INI converts value to
bool type may be used. Or let programmers handle appropriate conversions
as a part of input validation. e.g.
$_GET['bool'] = $_GET['bool'] === 't' ? TRUE
: FALSE; // Treat 't' as
TRUE, otherwise FALSE.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi,
Le 26/12/2015 21:35, Elijah Johnson a écrit :
Can you explain your statement that this would be a huge and complex
work? You must mean that there would be multiple places in the php
source code where variables are assigned? I'm not yet discussing
performance, but only the aspect of adding the feature.
There may other options I don't know but, AFAIK, this implies adding an
optional type hint at the zval level. This type hint should be verified
at least before each conversion. Copy-on-write is an other issue, as it
is currently not compatible with zval type hints. Seeing only variables
with well-defined names, and focusing on arrays, only scratches the
surface. Everything happens at the zval level. So, IMO, attaching type
hints to variables and properties is a huge and complex work.
Regards
François
Thanks,
On Mon, Dec 28, 2015 at 6:34 AM, François Laupretre francois@php.net
wrote:
Hi,
Le 26/12/2015 21:35, Elijah Johnson a écrit :
Can you explain your statement that this would be a huge and complex
work? You must mean that there would be multiple places in the php source
code where variables are assigned? I'm not yet discussing performance, but
only the aspect of adding the feature.There may other options I don't know but, AFAIK, this implies adding an
optional type hint at the zval level. This type hint should be verified at
least before each conversion. Copy-on-write is an other issue, as it is
currently not compatible with zval type hints. Seeing only variables with
well-defined names, and focusing on arrays, only scratches the surface.
Everything happens at the zval level. So, IMO, attaching type hints to
variables and properties is a huge and complex work.Regards
François
I think I see what you are saying. Copy on write takes a reference to the
entire z-val in two local variables, so a type-hint at the z-val level
would be shared.
This returns us to the insight of the original mailer who suggested that an
object or array variable should be typed by its first-assigned object. This
would simply need to be a global mode - stack-mode-legacy,
stack-mode-static-object-types, and stack-mode-super-strict for those who
want basic types also.
I'm not saying that this is ideal, just that we need to compromise a bit to
accommodate our existing code base and performance issues.
The proposal for an additional z-val which stores the class name in the
zval.value member and is counted as null could accommodate the case where
the user wants to assign the type before he has an object.
I think this is the best option. Another theory I had was to store the
types at the context level in some kind of array, but its really too much.
The idea just now proposed of 3 global modes will eliminate the issue of
storing at the z-val level. The mode "stack-mode-static-object-types" is
even already compatible with every line of code that I have properly
written in PHP.
Some additional observations -
Thanks,
On Mon, Dec 28, 2015 at 6:34 AM, François Laupretre francois@php.net
wrote:Hi,
Le 26/12/2015 21:35, Elijah Johnson a écrit :
Can you explain your statement that this would be a huge and complex
work? You must mean that there would be multiple places in the php source
code where variables are assigned? I'm not yet discussing performance, but
only the aspect of adding the feature.There may other options I don't know but, AFAIK, this implies adding an
optional type hint at the zval level. This type hint should be verified at
least before each conversion. Copy-on-write is an other issue, as it is
currently not compatible with zval type hints. Seeing only variables with
well-defined names, and focusing on arrays, only scratches the surface.
Everything happens at the zval level. So, IMO, attaching type hints to
variables and properties is a huge and complex work.Regards
François
I think I see what you are saying. Copy on write takes a reference to the
entire z-val in two local variables, so a type-hint at the z-val level
would be shared.This returns us to the insight of the original mailer who suggested that
an object or array variable should be typed by its first-assigned object.
This would simply need to be a global mode - stack-mode-legacy,
stack-mode-static-object-types, and stack-mode-super-strict for those who
want basic types also.I'm not saying that this is ideal, just that we need to compromise a bit
to accommodate our existing code base and performance issues.The proposal for an additional z-val which stores the class name in the
zval.value member and is counted as null could accommodate the case where
the user wants to assign the type before he has an object.I think this is the best option. Another theory I had was to store the
types at the context level in some kind of array, but its really too much.The idea just now proposed of 3 global modes will eliminate the issue of
storing at the z-val level. The mode "stack-mode-static-object-types" is
even already compatible with every line of code that I have properly
written in PHP.
The mode "stack-mode-static-object-types" would ideally also prevent
assignment of an object with a current string value, or string placeholder
value. What I mean by placeholder value - an additional z-val type "t"
returning true for IS_NULL, where the class name id is stored in value
union. This would be declared by type hint ex. MyObject $object; If the
variable is assigned, MyObject $object = ...; then potentially the same.
On assignment would look something like this (pseudo-code):
bool checkType(zVal, newZVal)
{
bool throw_error = false;
if (MODE_STACK_TYPE_OBJECT)
{
if (isObjectType(zVal))
{
int class_id= GET_PLACEHOLDER_CLASS(zVal);// ie.
zVal.value or value of current object
if (class_id !== GET_CLASS(newZVal))
{
throw_error = true;
}
} else if (IS_OBJECT(newZVal) && !IS_NULL(zVal){
throw_error = true; // assigning an object to a non-null,
non-object
}
}
if (throw_error)
{
// assign to null, generate TypeError
// ie. Warning: assign of type to type, assigned null value
return false; // prevent assignment by caller
}
return true
}
Some additional checks would likely be necessary to prevent placeholders
from being assigned and returned, and if strict mode was implemented, there
would need to be a second placeholder type (or some other means of
identification such as a constant).
Implementing placeholders isn't a necessary step, but it would make for
very readable code.
Hi Elijah,
The mode "stack-mode-static-object-types" would ideally also prevent
assignment of an object with a current string value, or string placeholder
value. What I mean by placeholder value - an additional z-val type "t"
returning true for IS_NULL, where the class name id is stored in value
union. This would be declared by type hint ex. MyObject $object; If the
variable is assigned, MyObject $object = ...; then potentially the same.On assignment would look something like this (pseudo-code):
bool checkType(zVal, newZVal)
{
bool throw_error = false;
if (MODE_STACK_TYPE_OBJECT)
{
if (isObjectType(zVal))
{
int class_id= GET_PLACEHOLDER_CLASS(zVal);// ie.
zVal.value or value of current object
if (class_id !== GET_CLASS(newZVal))
{
throw_error = true;
}
} else if (IS_OBJECT(newZVal) && !IS_NULL(zVal){
throw_error = true; // assigning an object to a non-null,
non-object
}
}
if (throw_error)
{
// assign to null, generate TypeError
// ie. Warning: assign of type to type, assigned null value
return false; // prevent assignment by caller
}
return true
}Some additional checks would likely be necessary to prevent placeholders
from being assigned and returned, and if strict mode was implemented, there
would need to be a second placeholder type (or some other means of
identification such as a constant).Implementing placeholders isn't a necessary step, but it would make for
very readable code.
Now I understand why you think zval modification is needed and concern
about performance. My proposal checks basic types simply. i.e. Only
checks if a variable is object type or not. Simple and quick.
Object(class) is type, so it makes sense checking class consistency. If we
check object's class, not only the class but also ancestor classes should be
checked. This may affect performance.
I'm not sure if this worth the effort.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi all,
Object(class) is type, so it makes sense checking class consistency. If we
check object's class, not only the class but also ancestor classes should be
checked. This may affect performance.
I'm not sure if this worth the effort.
It sounds negative, so I correct this.
Since we have class type hint, it better to support class check. IMO.
Almost all cases are simple class equality check. So it wouldn't hurt
performance much, I suppose.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi (all),
Hi all,
Object(class) is type, so it makes sense checking class consistency. If
we
check object's class, not only the class but also ancestor classes
should be
checked. This may affect performance.
I'm not sure if this worth the effort.It sounds negative, so I correct this.
Since we have class type hint, it better to support class check. IMO.
Almost all cases are simple class equality check. So it wouldn't hurt
performance much, I suppose.Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
I think that it would be worthwhile to get either a RFC or some test code
on this, the latter depending on how you would assess the difficulty. The
feature itself has a huge demand and goes along the lines of current
development.
If something could be developed, then we could assess performance. I would
estimate it to be small, however in any case the feature is optional. We're
not anymore considering to increase the size of the z-val.
The feature has 2 stages, one of which would be drastically easier to
implement and would show us about performance.
Stage 1 - typing only objects already set by comparing classes upon
assignment, if set a particular mode
Stage 2 - Adding some form of language hint, which will require a parser
and some method of storing the class hint for null objects. The latter has
a proposed solution not sounding hard to implement, but modifying the
parser sound like a more difficult step.