Since I was the one who started this discussion, I'd like to reply to
some of these points.
First off, let me say - as you pointed out, when the values are
unique, they are best represented
as keys... however, this of course applies only to value-types, which
isn't the problem, and not
why I started this conversation in the first place. Objects don't work
as keys. And objects do
not always have a value that you can use as keys.
I believe I understand the nature of arrays quite well - the
situations you covered in this e-mail
simply do not address the issues I was trying to tackle. I too have
used the key/value duplication
strategy you mentioned:
array("val1" => "val1", "val2" => "val2");
I stopped doing that many years ago - what's the benefit of storing
everything twice? At some
point I started doing this instead:
array("val1" => true, "val2" => true);
You're storing the same amount of information, but now there's no
confusion as far as which is
the key and which is the value - and for longer keys, you don't need
twice the memory. And when
modifying the array, you don't have to maintain both the key and the value.
Effectively this is a set, and as such works fine - but works only
for scalar values, not for objects.
function create_set($arr) {
return array_combine($arr, $arr);
}
$set = create_set(array("val1","val2"));
This seems very misleading to me, as what comes out of create_set() is
not a set, but
just another array - which happens to contain a set. I get what you're
trying to do here,
but... it just doesn't work.
I don't think there's any "work-around" for not having sets, and I'm
not sure if that's what
you're suggesting either, but - there's no way you can educate anyone
out of that issue
if or when someone needs a set of objects that (per definition) do not
have unique keys.
You can implement a set of course, as a class - using array_search()
internally in the
class, you can check if a given value (or object) is present in the
set. I've done this on
one occasion, and it actually performs acceptably for a small (~1000)
set of objects, so
it's not that this problem represents a roadblock by any means. But it
is a solution with
obvious performance deficiencies and trade-offs...
From: Adam Jon Richardson adamjonr@gmail.com
To: internals@lists.php.net
Cc:
Date: Thu, 4 Oct 2012 12:48:53 -0400
Subject: Arrays which have properties of sets
A while back, there was a thread discussing adding a specific function
for removing elements from an array by value. Rasmus L. noted that
when the values are unique, they would be better represented as keys:The problem is that it is a rather unusual thing to do. I don't mean
removing an element, that is very common, but you are implying that you
know you don't have duplicates and you want a fast way to remove an
element by value in that case. To me that means you built your array
badly. If the values are unique then they should be the keys, or at
least a representation of the value should be the key such that you
don't need to scan the entire hash for the element when you need to
access it. And removal is just like any other access.
When I come across situations where an array's properties match those
of a set, I create array "sets" by duplicating keys and values:
array("val1" => "val1", "val2" => "val2");
This approach has several benefits:
- Form the union of values using the "+" operator (which forms unions
through use of keys)- Fast, easy removal of values through unset (e.g., unset($elements["val1"]))
- Standard handling of values in for loops:
for ($elements as $element) {
echo $element;
}
Writing a userland function to create array sets from
numerically-indexed arrays (avoiding the duplication of typing the key
and value) is indeed trivial:
function create_set($arr) {
return array_combine($arr, $arr);
}
$set = create_set(array("val1","val2"));
However, I find that the use of array sets is very common and helpful
to me, and I see many situations where a developer would have been
better served by leveraging the key to store the value. I suspect many
PHP users don't understand the nature of PHP arrays. I'm wondering if
adding a function (or language construct) for creating array sets
would prove helpful (just because of the vast amount of code which
could make use of it) and facilitate the education of developers so
they might better leverage the benefits of the underlying hash map
structure when their array is essentially a set and they wish to
perform operations such as removing an item by value.
Ada
Since I was the one who started this discussion, I'd like to reply to
some of these points.First off, let me say - as you pointed out, when the values are
unique, they are best represented
as keys... however, this of course applies only to value-types, which
isn't the problem, and not
why I started this conversation in the first place. Objects don't work
as keys. And objects do
not always have a value that you can use as keys.
Object hash may be what you are looking for.
http://php.net/manual/en/function.spl-object-hash.php
Cheers,
Pierre
Yeah, on that note - I've never understood what use this function is,
as it reuses object IDs... it will return the same hash for two
different objects during the same script execution - so it's unusable
as far as getting unique keys for objects... and I don't know what
else you could really use it for?
Object hash may be what you are looking for.
Yeah, on that note - I've never understood what use this function is,
as it reuses object IDs... it will return the same hash for two
different objects during the same script execution - so it's unusable
as far as getting unique keys for objects... and I don't know what
else you could really use it for?
Sorry, forgot to reply to all here...
No it's not. "Reuses object IDs" here just means when an object is
destroyed it no longer exists in memory and thus its hash becomes
reusable, which makes absolutely perfect sense to me. This does not
make it "unusable as far as getting unique keys for objects". The keys
are unique for all the objects that exist in memory at that time. If
an object is destroyed (it no longer exists in memory), why would you
still need to retain a unique key for it?
Do you keep keys for nonexistent doors as well?
Object hash may be what you are looking for.
-----Original Message-----
From: Rasmus Schultz [mailto:rasmus@mindplay.dk]
Sent: 07 October 2012 02:10
To: internals@lists.php.net
Subject: [PHP-DEV] Re: Arrays which have properties of setsFirst off, let me say - as you pointed out, when the values
are unique, they are best represented as keys... however,
this of course applies only to value-types, which isn't the
problem, and not why I started this conversation in the first
place. Objects don't work as keys. And objects do not always
have a value that you can use as keys.
See SPL's SplObjectStorage. That allows object instances as keys.
Jared
the manual states, "The implementation in SplObjectStorage returns the
same value as spl_object_hash()
" - so I don't know how this would
really work any better than a custom implementation.
perhaps safer would be to simply implement a collection-type that
requires the classes of elements in the collection to implement an
interface, exposing a unique key... and for objects that don't have a
unique key, generate sequential keys from a static counter in each
class... not ideal if you want to implement a document store that can
store "anything", since now every stored object/class needs to provide
a key...
On Sun, Oct 7, 2012 at 4:06 PM, Jared Williams
jared.williams1@ntlworld.com wrote:
See SPL's SplObjectStorage. That allows object instances as keys.
Jared
the manual states, "The implementation in SplObjectStorage returns the
same value asspl_object_hash()
" - so I don't know how this would
really work any better than a custom implementation.perhaps safer would be to simply implement a collection-type that
requires the classes of elements in the collection to implement an
interface, exposing a unique key... and for objects that don't have a
unique key, generate sequential keys from a static counter in each
class... not ideal if you want to implement a document store that can
store "anything", since now every stored object/class needs to provide
a key...
How is that safer? If anything this introduces every aspect of being
unsafe since relies on a user implementation that may or may not exist
and may or may not be unique. As opposed to spl_object_hash, which can
actually identify objects uniquely by looking at the internal symbol
table that exposes all the objects in memory.
On Sun, Oct 7, 2012 at 4:06 PM, Jared Williams
jared.williams1@ntlworld.com wrote:See SPL's SplObjectStorage. That allows object instances as keys.
Jared
looks like you're right - spl_object_hash()
does in deed work... I
guess I was mislead by some of the notes people made in the
documentation... perhaps these should be moderated, and the
documentation should be update to clear up some of the mystery and
confusion that seems to surround this function?
http://us3.php.net/manual/en/function.spl-object-hash.php#110242
"Changing the state of an object does not affect the result" - well of
course not, since spl_object_hash()
does not in fact seem to hash the
"object", or at least not it's state.
http://us3.php.net/manual/en/function.spl-object-hash.php#95666
"The unique identifiers of destroyed objects can and will be reused" -
this could be demonstrated more clearly by using unset() ... though I
understand it now, this example made me think at first that keys would
just be reused at random...
http://us3.php.net/manual/en/function.spl-object-hash.php#94647
"It seems that when switching scopes, the last one is repeated" -
that's not at all what's happening in this extremely misleading
example... rather, spl_object_hash(new stdClass()) creates an object
that immediately falls out of scope and gets destroyed, at which point
the same key may be reused in a different scope OR even in the same.
http://us3.php.net/manual/en/function.spl-object-hash.php#94060
this seems to demonstrate an alternative approach, but as pointed out
by Sherif, this is not the same thing at all.
http://us3.php.net/manual/en/function.spl-object-hash.php#91164
"The spl hash will always be the same for a given object, regardless
of content" - this is definitely not the case.
http://us3.php.net/manual/en/function.spl-object-hash.php#90296
http://us3.php.net/manual/en/function.spl-object-hash.php#87422
Both attempt to implement spl_object_hash()
in older versions of PHP,
but both are extremely misleading since they have essentially nothing
in common with spl_object_hash()
as such.
http://us3.php.net/manual/en/function.spl-object-hash.php#76220
This actually clarifies something things that probably should be
explained in the documentation.
It's perhaps the worst case of misleading user comments to date... ;-)
the manual states, "The implementation in SplObjectStorage returns the
same value asspl_object_hash()
" - so I don't know how this would
really work any better than a custom implementation.perhaps safer would be to simply implement a collection-type that
requires the classes of elements in the collection to implement an
interface, exposing a unique key... and for objects that don't have a
unique key, generate sequential keys from a static counter in each
class... not ideal if you want to implement a document store that can
store "anything", since now every stored object/class needs to provide
a key...How is that safer? If anything this introduces every aspect of being
unsafe since relies on a user implementation that may or may not exist
and may or may not be unique. As opposed to spl_object_hash, which can
actually identify objects uniquely by looking at the internal symbol
table that exposes all the objects in memory.On Sun, Oct 7, 2012 at 4:06 PM, Jared Williams
jared.williams1@ntlworld.com wrote:See SPL's SplObjectStorage. That allows object instances as keys.
Jared
looks like you're right -
spl_object_hash()
does in deed work... I
guess I was mislead by some of the notes people made in the
documentation... perhaps these should be moderated, and the
documentation should be update to clear up some of the mystery and
confusion that seems to surround this function?http://us3.php.net/manual/en/function.spl-object-hash.php#110242
"Changing the state of an object does not affect the result" - well of
course not, sincespl_object_hash()
does not in fact seem to hash the
"object", or at least not it's state.
Well, here the user actually makes a mistake in the last line. They
var_dump(spl_object_hash($foo_obj)) instead of $foo_obj2, which is the
new object the intended to test. But yes, the hash is unique per
object and doesn't consider state.
http://us3.php.net/manual/en/function.spl-object-hash.php#95666
"The unique identifiers of destroyed objects can and will be reused" -
this could be demonstrated more clearly by using unset() ... though I
understand it now, this example made me think at first that keys would
just be reused at random...
I guess their intent here is to elaborate on the documented behavior
that the hash may be reused if the object is destroyed. Though you can
say that when you destroy and reinitialize the object of the same
class in the same variable that it will necessarily get the same exact
hash. It likely won't, but obviously there's no guarantee of that,
which is why it's documented that way.
http://us3.php.net/manual/en/function.spl-object-hash.php#94647
"It seems that when switching scopes, the last one is repeated" -
that's not at all what's happening in this extremely misleading
example... rather, spl_object_hash(new stdClass()) creates an object
that immediately falls out of scope and gets destroyed, at which point
the same key may be reused in a different scope OR even in the same.
Sure.
http://us3.php.net/manual/en/function.spl-object-hash.php#94060
this seems to demonstrate an alternative approach, but as pointed out
by Sherif, this is not the same thing at all.
Right, this isn't a hash in the sense that a hash is normally a
fixed-length checksum of some sort. The implementation is poor in that
I could reassign the hash to the same object a dozen times over and
get a dozen different hashes every time. You don't expect to send the
string "foo" to md5()
a dozen times and get a dozen different hashes,
just like you shouldn't expect that from spl_object_hash()
.
It doesn't uniquely identify anything. The implementation of uniqid()
without arguments is the equivelant of a hex encoded microtime()
,
because that's essentially all it is internally.
http://us3.php.net/manual/en/function.spl-object-hash.php#91164
"The spl hash will always be the same for a given object, regardless
of content" - this is definitely not the case.http://us3.php.net/manual/en/function.spl-object-hash.php#90296
http://us3.php.net/manual/en/function.spl-object-hash.php#87422Both attempt to implement
spl_object_hash()
in older versions of PHP,
but both are extremely misleading since they have essentially nothing
in common withspl_object_hash()
as such.
Yea, there's lots of pretty bad code in user notes. That's nothing new :)
http://us3.php.net/manual/en/function.spl-object-hash.php#76220
This actually clarifies something things that probably should be
explained in the documentation.
It is "Return Values A string that is unique for each currently
existing object and is always the same for each object." from
php.net/spl-object-hash
:)
It's perhaps the worst case of misleading user comments to date... ;-)
Nah, I've seen worse.
I've removed the notes you've mentioned that provide bad code.
Cheers!
the manual states, "The implementation in SplObjectStorage returns the
same value asspl_object_hash()
" - so I don't know how this would
really work any better than a custom implementation.perhaps safer would be to simply implement a collection-type that
requires the classes of elements in the collection to implement an
interface, exposing a unique key... and for objects that don't have a
unique key, generate sequential keys from a static counter in each
class... not ideal if you want to implement a document store that can
store "anything", since now every stored object/class needs to provide
a key...How is that safer? If anything this introduces every aspect of being
unsafe since relies on a user implementation that may or may not exist
and may or may not be unique. As opposed to spl_object_hash, which can
actually identify objects uniquely by looking at the internal symbol
table that exposes all the objects in memory.On Sun, Oct 7, 2012 at 4:06 PM, Jared Williams
jared.williams1@ntlworld.com wrote:See SPL's SplObjectStorage. That allows object instances as keys.
Jared