Hi!
I would like to present to your attention an RFC about using object as keys:
https://wiki.php.net/rfc/objkey
It was discussed in the past on the list:
http://marc.info/?t=141145969600001&r=1&w=2
and I think it makes sense to propose a formal RFC for it. Both the text
and the code in the patch includes bits done by myself and Joe Watkins.
The patch does not cover 100% of cases but should work for most
reasonable scenarios, if something is wrong or you have ideas how to
make it better please tell.
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.
Thanks,
Stas
Hi!
I would like to present to your attention an RFC about using object as keys:
Hi Stas!
I’m trying to wrap my head around a real-world use-case with this. We have spl_object_hash, which effectively provides a unique hash for an object. If the intent is to provide an opportunity of individual classes to decide what their hash is, couldn’t they provide that via __toString? I know many frameworks use __toString to build out some implementation of an object (Zend form for example), but the point of __toString is to provide a string representation of an object.
I want to say, I’m not at all against this - rather I support it. I’m just looking for the RFC to provide an example that I and others can relate to.
It was discussed in the past on the list:
http://marc.info/?t=141145969600001&r=1&w=2
and I think it makes sense to propose a formal RFC for it. Both the text
and the code in the patch includes bits done by myself and Joe Watkins.
The patch does not cover 100% of cases but should work for most
reasonable scenarios, if something is wrong or you have ideas how to
make it better please tell.The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.Thanks,
Stas
Hi!
I’m trying to wrap my head around a real-world use-case with this.
We have spl_object_hash, which effectively provides a unique hash for
This hash has nothing to do with object's contents. But imagine number
GMP("42") and imagine you actually want two GMP objects expressing "42"
actually represent the same hash key. Or imagine you want to generate
the key somehow in a way related to object's content and not just a
random number. As I said in the RFC, evidence that so many languages
implement it shows that this use case is quite real. Of course, you can
always default to spl_object_hash, but now you have control over it.
an object. If the intent is to provide an opportunity of individual
classes to decide what their hash is, couldn’t they provide that via
__toString? I know many frameworks use __toString to build out some
implementation of an object (Zend form for example), but the point of
__toString is to provide a string representation of an object.
This is covered in the RFC, right in the introduction.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Hi!
I’m trying to wrap my head around a real-world use-case with this.
We have spl_object_hash, which effectively provides a unique hash forThis hash has nothing to do with object's contents. But imagine number
GMP("42") and imagine you actually want two GMP objects expressing "42"
actually represent the same hash key. Or imagine you want to generate
the key somehow in a way related to object's content and not just a
random number. As I said in the RFC, evidence that so many languages
implement it shows that this use case is quite real. Of course, you can
always default to spl_object_hash, but now you have control over it.
Thank you for your clarity. With this new approach, wouldn’t we best be served by renaming/deprecating/removing spl_object_hash? I’m concerned these different approaches will introduce quite a bit of confusion with object hashing. This RFC’s approach gives the user more power to determine what’s best in this case, so I’d lean more towards renaming spl_object_hash to something that reflects getting a unique ID per object (e.g. spl_unique_object_id, etc).
an object. If the intent is to provide an opportunity of individual
classes to decide what their hash is, couldn’t they provide that via
__toString? I know many frameworks use __toString to build out some
implementation of an object (Zend form for example), but the point of
__toString is to provide a string representation of an object.This is covered in the RFC, right in the introduction.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
On Oct 26, 2014, at 9:43 PM, Stas Malyshev smalyshev@gmail.com
wrote:Hi!
I’m trying to wrap my head around a real-world use-case with
this. We have spl_object_hash, which effectively provides a
unique hash forThis hash has nothing to do with object's contents. But imagine
number GMP("42") and imagine you actually want two GMP objects
expressing "42" actually represent the same hash key. Or imagine
you want to generate the key somehow in a way related to object's
content and not just a random number. As I said in the RFC,
evidence that so many languages implement it shows that this use
case is quite real. Of course, you can always default to
spl_object_hash, but now you have control over it.Thank you for your clarity. With this new approach, wouldn’t we best
be served by renaming/deprecating/removing spl_object_hash? I’m
concerned these different approaches will introduce quite a bit of
confusion with object hashing. This RFC’s approach gives the user
more power to determine what’s best in this case, so I’d lean more
towards renaming spl_object_hash to something that reflects getting a
unique ID per object (e.g. spl_unique_object_id, etc).
Actually, I see spl_object_hash($this) the 90% implementation of
__hash(), so how about making it the default for any object?
--
Regards,
Mike
Actually, I see spl_object_hash($this) the 90% implementation of
__hash(), so how about making it the default for any object?
I agree - it might be a good default.
To Will's question: It is not sufficient for all cases. Having a custom
implementation allows the reverse.
$u = new ustring(".....");
$a[$u] = 42;
unset($u);
foreach ($a as $k => $v) {
$u = u($k); // u() is a bad name, see other thread for that debate
}
also spl_object_hash is a unique identify, but given value objects
having them in array_keys might mean that one doesn't want identity but
grouped by value ...
function get_date_for_item($item) {
return new DateTime(...);
}
foreach ($items as $item) {
$counts[get_date_for_item($item)]++;
}
here we create a new Datetime instance for each item, while multiple
items might share the same data. in the loop we're counting distinct
dates not DateTime instances.
johannes
Hi!
I would like to present to your attention an RFC about using object as keys:
https://wiki.php.net/rfc/objkey
It was discussed in the past on the list:
http://marc.info/?t=141145969600001&r=1&w=2
and I think it makes sense to propose a formal RFC for it. Both the text
and the code in the patch includes bits done by myself and Joe Watkins.
The patch does not cover 100% of cases but should work for most
reasonable scenarios, if something is wrong or you have ideas how to
make it better please tell.The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.Thanks,
Stas
Morning Stas,
Nicely done.
Whether SPL classes, or any other classes, should use the functionality
should be left to another discussion.
I wonder if it might be feasible to try and define what the contract of
this method is, in the same kind of way as the Java docs do for
Object.hashCode ? We can't have the exact same contract perhaps, but it
might be useful to try to define it at this stage.
It seems __toScalar might be a good name, this is what the method
actually does, the engine then coerces to a type suitable for use as a
key, but you can return a double.
It might be more forward thinking therefore to use the name __toScalar,
while today we'd only be using it for this, if we come up against the
requirement to treat an object as a scalar for anything else, we have
the machinery already and we don't have to add another magic method at
that time.
Not sure what others think about that ... I liked the name __hash
better than __toKey, I like __toScalar better than those because it
describes what the method is meant to do the best.
Cheers
Joe
Hi!
It seems __toScalar might be a good name, this is what the method
actually does, the engine then coerces to a type suitable for use as a
key, but you can return a double. It might be more forward thinking
therefore to use the name __toScalar, while today we'd only be using
it for this,
__toScalar does not express the fact why we are calling it - to use it
as a key. It's not just a scalar conversion, it's conversion for the
purpose of hashing.
Hi!
It seems __toScalar might be a good name, this is what the method
actually does, the engine then coerces to a type suitable for use as a
key, but you can return a double. It might be more forward thinking
therefore to use the name __toScalar, while today we'd only be using
it for this,
__toScalar does not express the fact why we are calling it - to use it
as a key. It's not just a scalar conversion, it's conversion for the
purpose of hashing.
I think that’s the general confusion around this. Generally, the examples provided (Java/Ruby/Python) are based on two things:
- The value returned is a signed 32-bit (debatable) integer
- Each hash is calculated based on the properties available (and assigned).
With this in mind, should we not expect, at the very least, and integer on each return? For example:
class Company
{
private $companyId;
private $clientId;
public function __construct($companyId, $clientId)
{
$this->companyId = (int) $companyId;
$this->clientId = (int) $clientId;
}
public function __toHash()
{
return $this->companyId * $this->clientId;
}
}
This is a trivial example, it can be used to define a simple approach for object “uniqueness” and prevent potentially prevent duplicate hashes in a structure.
That said, with PHP’s typeless approach, I understand the case could be made for combining values into a string. To me, it seems more logical and uniform to require an integer - whether signed or unsigned.
Hi!
It seems __toScalar might be a good name, this is what the method
actually does, the engine then coerces to a type suitable for use as a
key, but you can return a double. It might be more forward thinking
therefore to use the name __toScalar, while today we'd only be using
it for this,
__toScalar does not express the fact why we are calling it - to use it
as a key. It's not just a scalar conversion, it's conversion for the
purpose of hashing.
Morning Stas,
True, I was just thinking that the next time we need to represent an
object as a scalar for whatever reason we might end up having to add
another magic method.
It's easy enough to document that __toWhatever is used in the context we
intend to use it in.
Not super bothered about it ... it's obviously a +1 from me whatever the
name.
About restricting to signed/unsigned ints ... this doesn't seem
sensible, the PHP programmer doesn't care about signs, or types.
It makes sense to allow the programmer to return anything we would
currently coerce to a valid key, don't think that needs to change.
Cheers
Joe
Hi!
It seems __toScalar might be a good name, this is what the method
actually does, the engine then coerces to a type suitable for use as a
key, but you can return a double. It might be more forward thinking
therefore to use the name __toScalar, while today we'd only be using
it for this,
__toScalar does not express the fact why we are calling it - to use it
as a key. It's not just a scalar conversion, it's conversion for the
purpose of hashing.
It would also be used by Anthony’s Object Cast To Types (https://wiki.php.net/rfc/object_cast_to_types) RFC if revived, and __toScalar makes more sense there.
Personally, I think __toKey is best. It’s not a hash as such, it’s an equivalent string or integer key value. It fits nicely with __toString, and would fit with other __to* functions like __toInt if that RFC was revived. People would look at __hash and think it’s sql_object_hash, which it isn’t.
--
Andrea Faulds
http://ajf.me/
Am 27.10.2014 02:37 schrieb "Stas Malyshev" smalyshev@gmail.com:
I would like to present to your attention an RFC about using object as
keys:
I don't like this, mainly because it blocks a future direct use and storage
of objects as keys in an array, i.e. what SplObjectStorage does.
It is also badly named, because it does NOT implement objects as keys. It
implements deriving surrogate keys from objects when used in an array key
context.
This is not what other languages do. Python, for example, uses a
combination of hash and eq (or cmp), the former to locate the
hash slot, and the latter to then check for equality when running the hash
chain anchored in that slot. The thing stored as a key, is the real object.
See https://wiki.python.org/moin/DictionaryKeys
best regards
Patrick
Hi!
I don't like this, mainly because it blocks a future direct use and storage
of objects as keys in an array, i.e. what SplObjectStorage does.
It does not. It just allows the objects to control how they are seen
when they are used as keys in regular PHP arrays. That does not prevent
SplObjectStorage or anything else from doing whatever one wants. I
personally would say SplObjectStorage probably should respect __hash if
provided, but we can discuss it separately.
It is also badly named, because it does NOT implement objects as keys. It
implements deriving surrogate keys from objects when used in an array key
context.
That's how the objects are used as keys. Storing objects in the
hashtable would really require rewriting the whole hashtable and would
probably be very inefficient as you'd have to call PHP code each time
you compare two objects in the chain. Moreover, modifying such objects
would probably create havoc. So having an immutable value sounds like
the best solution for the practical use cases.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Am 27.10.2014 08:09 schrieb "Stas Malyshev" smalyshev@sugarcrm.com:
I don't like this, mainly because it blocks a future direct use and
storage
of objects as keys in an array, i.e. what SplObjectStorage does.It does not. It just allows the objects to control how they are seen
when they are used as keys in regular PHP arrays. That does not prevent
SplObjectStorage or anything else from doing whatever one wants.
I did not mean SplObjectStorage would be affected negatively. Just used it
as the best known example for something that directly supports objects
themselves as keys.
Once your proposal is in the language, you will never, in the future, be
able to add real support for objects as keys, because the semantics is
blocked.
I do understand where your proposal is coming from and what it is trying to
accomplish. But I think, at least, that it should clearly spell out that
any ambition to really support objects as array keys in the base language,
will then be given up.
I personally would say SplObjectStorage probably should respect __hash if
provided, but we can discuss it separately.
I hardly see how that would make sense. SplObjectStorate operates with
object identity as determined by spl_object_hash, right? Changing that to
use your key derivation method, would break that contract. Okay, maybe it
could make sense by adding a second set of attach/detach methods that work
this way when the objects support the new hash method.
Storing objects in the hashtable would really require rewriting the whole
hashtable and would probably be very inefficient as you'd have to call
PHP code each time you compare two objects in the chain.
Right. Somehow python manages to live quite fine with that fact.
Moreover, modifying such objects would probably create havoc.
Right. You don't do that when your object implements hash and eq.
best regards
Patrick
Hi!
Once your proposal is in the language, you will never, in the future, be
able to add real support for objects as keys, because the semantics is
blocked.
This implies this support is not "real" and we want some other support.
I don't think I agree with either.
I do understand where your proposal is coming from and what it is trying
to accomplish. But I think, at least, that it should clearly spell out
that any ambition to really support objects as array keys in the base
language, will then be given up.
Was there ever such an ambition? Does somebody have any idea how to
properly do it and a reason why?
I hardly see how that would make sense. SplObjectStorate operates with
object identity as determined by spl_object_hash, right? Changing that
Now, it does, since there's no other option. However, in the future
there may be other options - i.e. objects whose identity is not the same
as their memory address. For example, for an object representing number
(GMP) or string (UString) their identity is their content, not their
memory address. Thus, if you want to use UString as an index, or have an
unique set of strings, spl_object_hash would not be your friend.
Of course, there's always an option of just telling it - e.g. by
providing an option to the ctor.
Right. Somehow python manages to live quite fine with that fact.
For some definitions of "quite fine", I presume - I can't see how a
hashtable that repeatedly calls user code can be efficient. It probably
isn't.
Right. You don't do that when your object implements hash and eq.
Or, more precisely, you're asked nicely not to do it - because there's
no way to actually ensure it.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Hi!
Once your proposal is in the language, you will never, in the future, be
able to add real support for objects as keys, because the semantics is
blocked.This implies this support is not "real" and we want some other support.
I don't think I agree with either.I do understand where your proposal is coming from and what it is trying
to accomplish. But I think, at least, that it should clearly spell out
that any ambition to really support objects as array keys in the base
language, will then be given up.Was there ever such an ambition? Does somebody have any idea how to
properly do it and a reason why?I hardly see how that would make sense. SplObjectStorate operates with
object identity as determined by spl_object_hash, right? Changing thatNow, it does, since there's no other option. However, in the future
there may be other options - i.e. objects whose identity is not the same
as their memory address. For example, for an object representing number
(GMP) or string (UString) their identity is their content, not their
memory address. Thus, if you want to use UString as an index, or have an
unique set of strings, spl_object_hash would not be your friend.
Of course, there's always an option of just telling it - e.g. by
providing an option to the ctor.Right. Somehow python manages to live quite fine with that fact.
For some definitions of "quite fine", I presume - I can't see how a
hashtable that repeatedly calls user code can be efficient. It probably
isn't.Right. You don't do that when your object implements hash and eq.
Or, more precisely, you're asked nicely not to do it - because there's
no way to actually ensure it.Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Morning,
As already pointed out by Stas, using objects as array keys would
require a rewrite of the HashTable API, something integral to PHP.
I'm not sure that is realistic, or possible.
I think probably, the best way we can support objects as collection
keys is with the introduction of generics, something totally
disconnected from this RFC, that nobody has a working patch for.
Cheers
Joe
Hi!
I don't like this, mainly because it blocks a future direct use and storage
of objects as keys in an array, i.e. what SplObjectStorage does.It does not. It just allows the objects to control how they are seen
when they are used as keys in regular PHP arrays. That does not prevent
SplObjectStorage or anything else from doing whatever one wants. I
personally would say SplObjectStorage probably should respect __hash if
provided, but we can discuss it separately.
Do we want to consider to let core use spl_object_hash()
if ::__hash()
is not implemented, so objects can always be used as keys?
--
Regards,
Mike
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.
I don't think it's clear from the RFC ...
Is the resulting value intended always to return the same object
independent of what has been done to the object in the mean time? A
fixed 'address' for the physical data when used as an array key?
If that is the case, then __toKey seems more comfortable. To me a hash
is used to establish if the item I am looking at has changed since it
was last accessed. There are two use cases compare the contents are the
same, and access the same object later even if it has been updated.
That said, the EKey value IS intended to identify objects which have
changed since last being accessed ... so you can't win either way?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hi!
Is the resulting value intended always to return the same object
independent of what has been done to the object in the mean time?
That's on you to decide. If you have immutable value object, then yes.
If you have mutable value object (which usually isn't a good idea, but
who knows) then no. If it's not a value object, then probably you don't
want __hash on it at all.
A fixed 'address' for the physical data when used as an array key? If
that is the case, then __toKey seems more comfortable. To me a hash is
I'm fine with __toKey, if that's what seems better to the majority.
Hello, internals!
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.
I like this idea with custom hash implementation because spl_object_hash()
is not reliable when objects are quickly created/destroyed, so hashes can
be the same for several different objects. However, will it be better to
introduce an interface for that? For example, Hashable can be a good name
(like Traversable one). Default implementation then can be a simple trait
that will be added later to the concrete class.
From: Alexander Lisachenko [mailto:lisachenko.it@gmail.com], Sent: Monday, October 27, 2014 11:18 AM
Hello, internals!
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.I like this idea with custom hash implementation because
spl_object_hash()
is not reliable when objects are quickly created/destroyed, so hashes can
be the same for several different objects. However, will it be better to
introduce an interface for that? For example, Hashable can be a good name
(like Traversable one). Default implementation then can be a simple trait
that will be added later to the concrete class.
I like the idea introducing an interface for this functionality, instead
of adding a further magic method. But I think anything like "hash" or
"hashable" is confusing for users.
Maybe something like
interface ArrayKeyConvertable
{
function toArrayKey();
}
Christian
From: Alexander Lisachenko [mailto:lisachenko.it@gmail.com], Sent: Monday, October 27, 2014 11:18 AM
Hello, internals!
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.I like this idea with custom hash implementation because
spl_object_hash()
is not reliable when objects are quickly created/destroyed, so hashes can
be the same for several different objects. However, will it be better to
introduce an interface for that? For example, Hashable can be a good name
(like Traversable one). Default implementation then can be a simple trait
that will be added later to the concrete class.I like the idea introducing an interface for this functionality, instead
of adding a further magic method. But I think anything like "hash" or
"hashable" is confusing for users.
The magic method is more of a PHP approach while an interface would be more appropriate. That said, this RFC is a true representation of a hash vs something like spl_object_hash. That’s what causes user confusion. spl_object_hash would’ve been better served as a name like spl_object_id or spl_object_hash_id. Something that indicates uniqueness regardless of the values of a particular object.
Maybe something like
interface ArrayKeyConvertable
{
function toArrayKey();
}Christian
The magic method is more of a PHP approach while an interface would be more appropriate. That said, this RFC is a true representation of a hash vs something like spl_object_hash. That’s what causes user confusion. spl_object_hash would’ve been better served as a name like spl_object_id or spl_object_hash_id. Something that indicates uniqueness regardless of the values of a particular object.
The bit of the jigsaw I'm still missing here is just what the object of
the exercise is?
In the good old days one would have created an array of 'boxes' and we
could put anything in the boxes. The key would be something suitable to
describe the box, and the box could contain a 'sub-array' of material,
or a handle to content. My real life situation today is that the key is
always a simple 64bit number and I need a means of making that work
transparently between 32bit installations and 64bit ones. No need for
'magic methods' or any processing of the content of the object being
handled. Since in many cases the object IS just a record from the
database, in some circumstances there is little point even creating an
object, one just passes by reference the 'sub-array' values to a static
set of code and that spews out a set of html for the output. And on a
persistent connection all that can be held in memory.
Since the id is always unique one can check if the content has already
been loaded simply by looking for the id in the base array. The core
design of the system solves many problems that can arise. If one has a
new version of an object one either creates a new bucket ( this is what
the underlying database will do anyway ) or if history is not important
simply apply the data direct. It depends on how one needs to go back
through the data if you create a new id, although if one is recording
for example addresses, then each old record needs to maintain it's own data.
If one is simply reading raw data from an unmanaged source, then some
means of identifying the data is important, and producing a 'hash' of
the material is a valid approach to checking if a previous copy of the
same data has been processed. But even here there is no 'magic' way of
handling the data, it needs a certain amount of existing knowledge to
identify items in the data that can be matched.
SO an 'interface' that allows the creating of a suitable ID is what we
are talking about? And one that provides a link to the underlying
content what ever it is? So arrays do exactly what they are good at and
allow a bucket of content to be managed as required? Trying to use the
'object' as a key is just wrong how ever one looks at it?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
The magic method is more of a PHP approach while an interface would
be more appropriate. That said, this RFC is a true representation of a
hash vs something like spl_object_hash. That’s what causes user
confusion. spl_object_hash would’ve been better served as a name like
spl_object_id or spl_object_hash_id. Something that indicates
uniqueness regardless of the values of a particular object.The bit of the jigsaw I'm still missing here is just what the object of
the exercise is?
There are many different use cases for hash keys, and PHP's "array" type is particularly flexible.
For instance, de-duplicating a list of items ($foo[$x] = $x), or counting occurrences of each ($foo[$x]++)
In the case of a simple DB model, the primary key value is the obvious choice for the hash key, but not all objects are as simple as that.
Being able to store the object itself would save having to recreate it from the key, e.g. looking the ID back up for a DB model, or re-parsing UTF8 for a UString.
The magic method is more of a PHP approach while an interface would
be more appropriate. That said, this RFC is a true representation of a
hash vs something like spl_object_hash. That’s what causes user
confusion. spl_object_hash would’ve been better served as a name like
spl_object_id or spl_object_hash_id. Something that indicates
uniqueness regardless of the values of a particular object.The bit of the jigsaw I'm still missing here is just what the object of
the exercise is?There are many different use cases for hash keys, and PHP's "array" type is particularly flexible.
For instance, de-duplicating a list of items ($foo[$x] = $x), or counting occurrences of each ($foo[$x]++)
In the case of a simple DB model, the primary key value is the obvious choice for the hash key, but not all objects are as simple as that.
Being able to store the object itself would save having to recreate it from the key, e.g. looking the ID back up for a DB model, or re-parsing UTF8 for a UString.
But that statement makes no sense!
Accessing the object is via a handle, so you will always have a pointer
to it. Adding some abstract identifier to describe the pointer is always
going to be secondary to the pointer itself? You need a name to create
an object and creating an array of objects needs keys which can either
just be the PHP default location in the array, or you add a name to that
key in some way. The content of the location in the array is the handle
to the object. It will never be 'the object'?
To look up an object you need to know something about the object and
that is either some subset of the object or an abstract reference which
one looks up in order to access the pointer.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Lester Caine wrote on 30/10/2014 16:44:
The magic method is more of a PHP approach while an interface would
be more appropriate. That said, this RFC is a true representation of a
hash vs something like spl_object_hash. That’s what causes user
confusion. spl_object_hash would’ve been better served as a name like
spl_object_id or spl_object_hash_id. Something that indicates
uniqueness regardless of the values of a particular object.The bit of the jigsaw I'm still missing here is just what the object of
the exercise is?There are many different use cases for hash keys, and PHP's "array" type is particularly flexible.
For instance, de-duplicating a list of items ($foo[$x] = $x), or counting occurrences of each ($foo[$x]++)
In the case of a simple DB model, the primary key value is the obvious choice for the hash key, but not all objects are as simple as that.
Being able to store the object itself would save having to recreate it from the key, e.g. looking the ID back up for a DB model, or re-parsing UTF8 for a UString.
But that statement makes no sense!Accessing the object is via a handle, so you will always have a pointer
to it. Adding some abstract identifier to describe the pointer is always
going to be secondary to the pointer itself? You need a name to create
an object and creating an array of objects needs keys which can either
just be the PHP default location in the array, or you add a name to that
key in some way. The content of the location in the array is the handle
to the object. It will never be 'the object'?To look up an object you need to know something about the object and
that is either some subset of the object or an abstract reference which
one looks up in order to access the pointer.
I think you may be confusing two things here - one is storing objects as
the values of an associative array, and choosing an appropriate key to
use for each object; the other is using objects (or some representation
of objects) as the keys of an associative array, with something
completely different as the value.
Yes, by "the object itself" I mean a pointer to the object, as most
people do when talking colloquially about "passing objects around", etc.
The point is to store a live reference, as opposed to a string or
integer which could, programmatically, be used to look up/recreate the
object.
The use case which came up recently was UString objects, which can
easily be converted to and from plain PHP strings, but would be useful
as keys in their own right. With current PHP you could do $foo[
(string)$u_str ] = $bar; with the proposed RFC, you could do $foo[
$u_str ] = $bar with the same result; but either way, you would still
need to convert back to an object in order to use any of the UString
methods in a foreach(), array_walk()
, etc.
--
Rowan Collins
[IMSoP]
The use case which came up recently was UString objects, which can
easily be converted to and from plain PHP strings, but would be useful
as keys in their own right. With current PHP you could do $foo[
(string)$u_str ] = $bar; with the proposed RFC, you could do $foo[
$u_str ] = $bar with the same result; but either way, you would still
need to convert back to an object in order to use any of the UString
methods in a foreach(),array_walk()
, etc.
I would still like to see the debate on proper use of unicode IN PHP
strings. Pushing that problem into objects is just as wrong as this
debate. If I'm using a unicode colation to sort a set of array keys then
the KEY should be unicode, not yet another handle to a secondary object
which the needs to be sorted ... but then perhaps we need this fiddle to
avoid the complexity that unicode can create?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
The use case which came up recently was UString objects, which can
easily be converted to and from plain PHP strings, but would be useful
as keys in their own right. With current PHP you could do $foo[
(string)$u_str ] = $bar; with the proposed RFC, you could do $foo[
$u_str ] = $bar with the same result; but either way, you would still
need to convert back to an object in order to use any of the UString
methods in a foreach(),array_walk()
, etc.I would still like to see the debate on proper use of unicode IN PHP
strings. Pushing that problem into objects is just as wrong as this
debate. If I'm using a unicode colation to sort a set of array keys then
the KEY should be unicode, not yet another handle to a secondary object
which the needs to be sorted ... but then perhaps we need this fiddle to
avoid the complexity that unicode can create?
I agree it’s high time Unicode be in PHP, but we are calling this PHP 7 and not 6 for a reason. The lessons learned in the difficulties of implementing Unicode directly in the language were a direct player in the death of PHP 6. We need to keep that in mind as we’re continuing these types of discussions.
The only time I see this being an issue related to Unicode (short-term) is if we do not implement a language-integrated collation option. For example, if the solution for collation in general is to use a built-in class or similar function, we will run into these issues.
However, if the assumption is we will allow collations to be configurable - or assume Unicode in some fashion, the weight of this effort will be on the implementation of Unicode and not so much this.
As it stands right now, the RFC wiki seems to have no references to Unicode implementation beyond 2010’s end of PHP6.
--
Lester Caine - G8HFLContact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
From: Alexander Lisachenko [mailto:lisachenko.it@gmail.com], Sent: Monday, October 27, 2014 11:18 AM
Hello, internals!
The name __hash is not final, I am open to using __toKey instead or any
reasonable alternative, we may also include a couple of options in the
vote if that will be a point of disagreement.I like this idea with custom hash implementation because
spl_object_hash()
is not reliable when objects are quickly created/destroyed, so hashes can
be the same for several different objects. However, will it be better to
introduce an interface for that? For example, Hashable can be a good name
(like Traversable one). Default implementation then can be a simple trait
that will be added later to the concrete class.I like the idea introducing an interface for this functionality, instead
of adding a further magic method. But I think anything like "hash" or
"hashable" is confusing for users.The magic method is more of a PHP approach while an interface would be more appropriate. That said, this RFC is a true representation of a hash vs something like spl_object_hash. That’s what causes user confusion. spl_object_hash would’ve been better served as a name like spl_object_id or spl_object_hash_id. Something that indicates uniqueness regardless of the values of a particular object.
Put another way, I think a key question here is:
class Foo {
public $bar;
}
$a = new Foo;
$a->bar = 'baz';
$b = new Foo;
$b->bar = 'baz';
$arr[$a] = true;
$arr[$b] = true;
Does $arr now contain one value or two?
If the desire is for it to contain 2, then spl_object_hash($a) already
has that covered, I think.
If the desire is for it to contain 1, then the proposal sounds like a
way to normalize Foo::stringifiedValueObjectEquivalent().
--Larry Garfield
Hi!
Put another way, I think a key question here is:
class Foo {
public $bar;
}$a = new Foo;
$a->bar = 'baz';
$b = new Foo;
$b->bar = 'baz';$arr[$a] = true;
$arr[$b] = true;Does $arr now contain one value or two?
That depends on the semantics of class Foo. If Foo is something like
UString, then it should contain one value, since UString is a value
object (https://en.wikipedia.org/wiki/Value_object) - or at least it
should be. However, if Foo represents something having separate identity
- i.e. it's a Person class and 'bar' represents name - then of course it
should contain two values, since the name is not the sole source of
Person's identity. So the decision is on you as a programmer.
And to give you the tool to make this decision and let PHP engine know
about is exactly the point of this RFC.
If the desire is for it to contain 2, then spl_object_hash($a) already
has that covered, I think.If the desire is for it to contain 1, then the proposal sounds like a
way to normalize Foo::stringifiedValueObjectEquivalent().
You can describe it as such, but in a proper standartized way and with
syntax that allows you to use same syntax constructs on all keys and not
have to check each time if it's a special value and call a special
function on it (same purpose as for __toString).
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Hi!
Put another way, I think a key question here is:
class Foo {
public $bar;
}$a = new Foo;
$a->bar = 'baz';
$b = new Foo;
$b->bar = 'baz';$arr[$a] = true;
$arr[$b] = true;Does $arr now contain one value or two?
That depends on the semantics of class Foo. If Foo is something like
UString, then it should contain one value, since UString is a value
object (https://en.wikipedia.org/wiki/Value_object) - or at least it
should be. However, if Foo represents something having separate identity
- i.e. it's a Person class and 'bar' represents name - then of course it
should contain two values, since the name is not the sole source of
Person's identity. So the decision is on you as a programmer.And to give you the tool to make this decision and let PHP engine know
about is exactly the point of this RFC.
My only concern at this point is the default value of the hash. If we were to use spl _object_hash, we could be setting a precedence that a hash must be unique to each object.
Any thoughts on that?
If the desire is for it to contain 2, then spl_object_hash($a) already
has that covered, I think.If the desire is for it to contain 1, then the proposal sounds like a
way to normalize Foo::stringifiedValueObjectEquivalent().You can describe it as such, but in a proper standartized way and with
syntax that allows you to use same syntax constructs on all keys and not
have to check each time if it's a special value and call a special
function on it (same purpose as for __toString).Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Hi!
My only concern at this point is the default value of the hash. If we
were to use spl _object_hash, we could be setting a precedence that a
hash must be unique to each object.
I don't think there should be any default value. Most objects are not
suitable as keys, only some of them are (they must be immutable and have
easily derived identity). So the default is to not allow it for
arbitrary object. The programmer should specially designate the objects
to be suitable for hashing by creating the hash function - and there the
suitable function can be used, spl_object_hash or stringification or
anything else, depending on the actual case.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
My only concern at this point is the default value of the hash. If we were to use spl _object_hash, we could be setting a precedence that a hash must be unique to each object.
In addition to what Stas says above, well, not all objects work like that. What if I want objects to hash by value, not by identity? Don’t force the identity model.
Andrea Faulds
http://ajf.me/
My only concern at this point is the default value of the hash. If we were to use spl _object_hash, we could be setting a precedence that a hash must be unique to each object.
In addition to what Stas says above, well, not all objects work like that. What if I want objects to hash by value, not by identity? Don’t force the identity model.
I’m not suggesting we force anything. The tone of the conversation earlier referred to using spl_object_hash as a default. If we don’t do this, that’s totally fine.
We don’t have an “__equals” equivalent to Java’s equals(). While I understand the intent of this RFC is not to address that, perhaps it should be considered in the future.
--
Andrea Faulds
http://ajf.me/
Larry Garfield wrote on 30/10/2014 18:07:
On Oct 30, 2014, at 2:13 AM, Christian Stoller stoller@leonex.de
wrote:From: Alexander Lisachenko [mailto:lisachenko.it@gmail.com], Sent:
Monday, October 27, 2014 11:18 AMHello, internals!
The name __hash is not final, I am open to using __toKey instead
or any
reasonable alternative, we may also include a couple of options in
the
vote if that will be a point of disagreement.I like this idea with custom hash implementation because
spl_object_hash()
is not reliable when objects are quickly created/destroyed, so
hashes can
be the same for several different objects. However, will it be
better to
introduce an interface for that? For example, Hashable can be a
good name
(like Traversable one). Default implementation then can be a simple
trait
that will be added later to the concrete class.I like the idea introducing an interface for this functionality,
instead
of adding a further magic method. But I think anything like "hash" or
"hashable" is confusing for users.The magic method is more of a PHP approach while an interface would
be more appropriate. That said, this RFC is a true representation of
a hash vs something like spl_object_hash. That’s what causes user
confusion. spl_object_hash would’ve been better served as a name
like spl_object_id or spl_object_hash_id. Something that indicates
uniqueness regardless of the values of a particular object.Put another way, I think a key question here is:
class Foo {
public $bar;
}$a = new Foo;
$a->bar = 'baz';
$b = new Foo;
$b->bar = 'baz';$arr[$a] = true;
$arr[$b] = true;Does $arr now contain one value or two?
If the desire is for it to contain 2, then spl_object_hash($a) already
has that covered, I think.If the desire is for it to contain 1, then the proposal sounds like a
way to normalize Foo::stringifiedValueObjectEquivalent().--Larry Garfield
The desire is for the author of Foo to be able to choose which behaviour
is appropriate for them.
--
Rowan Collins
[IMSoP]
I would like to present to your attention an RFC about using object as keys:
A few points I have against this proposal, as I understand it:
-
It does not store the object, only the result of
__hash
.
Without the actual object this is not helpful for any use-case I have. -
Using a method in the object prevents you from hashing on
different members of the objects for different uses. For example,
there is a User object. Most of the time the User::id will be used for
the hash. However, sometimes the hash needs to be on User::username.
If the hashing code is pushed inside the object and call it
automatically then you can't hash on the different values.
In summary 1) the hashing function should be external to the object,
- should not be invoked magically, and 3) the object needs to be
stored. This can already be done in user-land; here is one such
example that I have created but there are others:
https://github.com/morrisonlevi/Ardent/blob/master/src/HashMap.php
2014-10-27 16:54 GMT+01:00 Levi Morrison levim@php.net:
I would like to present to your attention an RFC about using object as
keys:A few points I have against this proposal, as I understand it:
It does not store the object, only the result of
__hash
.
Without the actual object this is not helpful for any use-case I have.Using a method in the object prevents you from hashing on
different members of the objects for different uses. For example,
there is a User object. Most of the time the User::id will be used for
the hash. However, sometimes the hash needs to be on User::username.
If the hashing code is pushed inside the object and call it
automatically then you can't hash on the different values.In summary 1) the hashing function should be external to the object,
- should not be invoked magically, and 3) the object needs to be
stored. This can already be done in user-land; here is one such
example that I have created but there are others:
https://github.com/morrisonlevi/Ardent/blob/master/src/HashMap.php
Although I don't care very much for the first two points, the third point
is very alarming. Yes, we should have the option to get the objects used as
keys, otherwise the new functionality does not offer much.
Lazare INEPOLOGLOU
Ingénieur Logiciel
Hi!
I would like to present to your attention an RFC about using object as
keys:
I think it should be made clear that what the target of your RFC is not to
support objects as keys, what you propose instead is an implicit
translation from:
$a[$obj]
to
$a[$obj->__hash()]
This is clearly different. And has at least one major drawback:
The object itself is not in the array, you cannot retrieve it by
foreach($arr as $obj => $v) { }.
=> This is quite counter-intuitive and counter-productive: one would expect
to find the object there, not its "hash".
As others noted, it also prevents a full-fledged objects-as-key
implementation in the future.
In the end it causes issues and brings very little compared to an explicit
call to convert an object to a key.
-1
--
Etienne Kneuss
Hi!
As others noted, it also prevents a full-fledged objects-as-key
implementation in the future.
You do realize to have such implementation we'd need to rewrite our hash
layer completely and also introduce the concept of immutable object,
otherwise changing the object would make the hash completely broken?
Which means it would probably never happen unless PHP engine is
radically rewritten.In the end it causes issues and brings very little compared to an
explicit call to convert an object to a key.
Same as __toString obviously is useless as you could just call a method
explicitly.
Hi!
As others noted, it also prevents a full-fledged objects-as-key
implementation in the future.
You do realize to have such implementation we'd need to rewrite our
hash
layer completely and also introduce the concept of immutable object,
otherwise changing the object would make the hash completely broken?
Which means it would probably never happen unless PHP engine is
radically rewritten.
Regarding the immutability, I was wondering if it would be possible to treat the hash not as a magic method but a magic property, and then enforce immutability by saying that it could only be written once in a given instance. Generally speaking, a value or model object would be able to write to it in the constructor, and any incidental changes to the object (e.g. to lazy load data fields) would not affect it.
Potentially, the hash implementation would "only" have to store an object reference next to the existing string (or int) key - all the logic of assigning hash buckets, performing equality checks within buckets, etc, would ignore the object and deal with the scalar value in exactly the same way as if it had been implicitly converted as with the current RFC.
If two non-identical objects with the same hash identity were assigned to the same array, there would have to be a decision of which ended up referenced, but as long as it was defined and documented, I can't see either option being a massive problem.
I realise that when it came down to brass tacks, it would be more complex than I'm implying - the memory impact of adding a pointer field on every key might be a problem, for instance - but it would be good not to dismiss it as impossible too quickly, and use up the syntax for a less powerful feature.
In the end it causes issues and brings very little compared to an
explicit call to convert an object to a key.
Same as __toString obviously is useless as you could just call a method
explicitly.
You're putting words into someone else's mouth there. There's a big difference between "brings very little" and "is useless". There are certainly cases where an implicit call would be useful, but they are probably rather rarer than those where the user would want the object back afterwards.
Hi!
As others noted, it also prevents a full-fledged objects-as-key
implementation in the future.
You do realize to have such implementation we'd need to rewrite our hash
layer completely and also introduce the concept of immutable object,
otherwise changing the object would make the hash completely broken?
Which means it would probably never happen unless PHP engine is
radically rewritten.
You argue two things here:
- it would need a big rewrite of the hash api, so it's unlikely to ever
happen
It would work almost exactly like string keys, except the hash would be
computed in userland. As such I don't think it is fair to say it would
require a complete rewrite. Non-trivial changes? Sure.
But even if it did require a complete rewrite (and I have no strong
evidence that it wouldn't), recent history has shown that much bigger
rewrites are definitely possible and have happened (e.g. interned strings,
ast, uniform syntax, ...).
- Hash needs to be stable, so we need the concept of immutable objects
Having non-stable hashes is always an issue, and these requirements would
have to be made clear in the documentation.
Your implementation suffers from the same usability problems if the hash is
not stable (However it is true that it offers a simpler way to "recover").
So I don't think we would need to introduce the concept of immutable
objects, or at least not more than we would with your implementation.
In the end it causes issues and brings very little compared to an
explicit call to convert an object to a key.
Same as __toString obviously is useless as you could just call a method
explicitly.
Let me rephrase:
The small expressivity gain of omitting a call comes at a quite high (IMO)
cost in your implementation. I believe it is not worth it.
Comparing only one side of this equation with another feature makes no
sense.
Best,
Etienne Kneuss
Hi Stas,
I think it should be made clear that what the target of your RFC is not to
support objects as keys, what you propose instead is an implicit
translation from:$a[$obj]
to
$a[$obj->__hash()]This is clearly different.
I agree the RFC should make it very clear that foreach, key()
,
array_keys()
, etc. will just return the hash, not the object instance.
I for one was a bit confused and couldn't tell if that was the case
after reading the entire RFC.
Cheers
Matteo Beccati
Development & Consulting - http://www.beccati.com/