A common issue in lots of applications is tree sorting with unlimited
depth. Phorum has used a recursive function since 1999 to solve this
problem. However, at MySQL Conference this year, we finally wrote a non
recursive function for it and acheived both memory and time savings on
very large data sets. I always knew it could be done, I had just never
stopped and worked it out. Most examples I found involved a left/right
model that our data does not use. The left/right model involves
altering rows when new rows are inserted into the tree. That is not
attractive. Phorum uses just a parent_id field to track child members.
We would like to take this to another level and make a PHP internal
funciton to do the work. In fact, since MySQL Conference, one of team
members has written the function as a PHP extension. My questions is,
if we made a generic tree sorting function, could this be a function
that we could get considered for addition to the other array sorting
functions. I have probably ported the Phorum PHP based for tree sorting
over to 100 different other applications for sorting trees in PHP. It
would be very nice to finally have an internal function for it.
Speed is not our main benefit, although there is a noticable speed
boost. In addition to sorting the function, each node would be assigned
a depth value (optionally named by a function parameter if needed).
This is where the biggest memory savings have been found for us. On
large data sets (2000 members) in PHP, we have found that altering the
input array would cause copy of the whole array, blowing up the memory
usage of the script to as much as 10x. Doing this in C helps to remove
that problem.
Basically, an array would look like:
$array = array(
1 => array(
"id" => 1,
"parent" => 0,
"name" => "item 1"
),
2 => array(
"id" => 2,
"parent" => 0,
"name" => "item 2"
),
3 =>array(
"id" => 3,
"parent" => 1,
"name" => "item 1"
),
);
The function call would look like:
array_treesort($array, "id", "parent");
The returned array would look like:
$array = array(
1 => array(
"id" => 1,
"parent" => 0,
"name" => "item 1",
"depth" => 0
),
3 =>array(
"id" => 3,
"parent" => 1,
"name" => "item 1",
"depth" => 1
),
2 => array(
"id" => 2,
"parent" => 0,
"name" => "item 2",
"depth" => 0
),
);
Its a little funky to have a function take names of fields like this,
but this is the only way we could think of to make this work for people
without them having to change their data structure.
You can see our current work at
http://www.phorum.org/tracfcgi/browser/phorum5/trunk/extension_src That
code has been written specifically for Phorum to replace our existing
PHP function. It does more than the PHP internal function would do.
But, the extra parts are not the things that we were trying to overcome.
They are just gravy.
--
Brian Moon
Senior Developer
http://dealnews.com/
It's good to be cheap =)
A common issue in lots of applications is tree sorting with unlimited
depth. Phorum has used a recursive function since 1999 to solve this
problem. However, at MySQL Conference this year, we finally wrote a non
recursive function for it and acheived both memory and time savings on
very large data sets. I always knew it could be done, I had just never
stopped and worked it out. Most examples I found involved a left/right
model that our data does not use. The left/right model involves
altering rows when new rows are inserted into the tree. That is not
attractive. Phorum uses just a parent_id field to track child members.We would like to take this to another level and make a PHP internal
funciton to do the work. In fact, since MySQL Conference, one of team
members has written the function as a PHP extension. My questions is,
if we made a generic tree sorting function, could this be a function
that we could get considered for addition to the other array sorting
functions. I have probably ported the Phorum PHP based for tree sorting
over to 100 different other applications for sorting trees in PHP. It
would be very nice to finally have an internal function for it.Speed is not our main benefit, although there is a noticable speed
boost. In addition to sorting the function, each node would be assigned
a depth value (optionally named by a function parameter if needed).
This is where the biggest memory savings have been found for us. On
large data sets (2000 members) in PHP, we have found that altering the
input array would cause copy of the whole array, blowing up the memory
usage of the script to as much as 10x. Doing this in C helps to remove
that problem.Basically, an array would look like:
$array = array(
1 => array(
"id" => 1,
"parent" => 0,
"name" => "item 1"
),
2 => array(
"id" => 2,
"parent" => 0,
"name" => "item 2"
),
3 =>array(
"id" => 3,
"parent" => 1,
"name" => "item 1"
),
);The function call would look like:
array_treesort($array, "id", "parent");
The returned array would look like:
$array = array(
1 => array(
"id" => 1,
"parent" => 0,
"name" => "item 1",
"depth" => 0
),
3 =>array(
"id" => 3,
"parent" => 1,
"name" => "item 1",
"depth" => 1
),
2 => array(
"id" => 2,
"parent" => 0,
"name" => "item 2",
"depth" => 0
),
);Its a little funky to have a function take names of fields like this,
but this is the only way we could think of to make this work for people
without them having to change their data structure.You can see our current work at
http://www.phorum.org/tracfcgi/browser/phorum5/trunk/extension_src That
code has been written specifically for Phorum to replace our existing
PHP function. It does more than the PHP internal function would do.
But, the extra parts are not the things that we were trying to overcome.
They are just gravy.--
Brian Moon
Senior Developerhttp://dealnews.com/
It's good to be cheap =)--
Take a look at http://www.php.net/manual/en/function.array-multisort.php#68689
/ http://rquadling.php1h.com/array_multisort_column.php
Richard.
--
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"
Hello,
Take a look at
http://www.php.net/manual/en/function.array-multisort.php#68689
/ http://rquadling.php1h.com/array_multisort_column.php
Could you please explain how you think that multisort array would help
in doing a tree sort? AFAIK, tree sorting is not a simple sort algorithm
where you can tell which node comes before the other based on an $a vs.
$b comparison. After playing with it for quite a bit of time, I'm quite
convinced that you actually have to walk the tree to find indention
levels and node positions.
Imagine a tree like this:
node id parent id
1 0 (root node)
2 1
3 1
4 2
The tree sort with indention would look like this:
1
2
4
3
This is about as simple as it can get. So the ordered tree nodes for
this example would be array(1,2,4,3). Now try to find an array sorting
algorithm to get to this result. You'll find that node 4 is quite a
nasty one to get your head around.
Maybe we missed something obvious here, but there's most probably a
reason why a lot of applications implement tree sorting as a recursive
function call and not as an array sort of some kind.
From your followup post:
That doesn't look right. Surely ID is already sorted so the array
won't change?
And sorting by "parent" and then "id" would make no difference either.
I see you already found out the big issue with tree sorting ;-) You
cannot apply simple linear sorting to find out in what order the nodes
in a tree appear.
I hope my example cleared up some of the confusion.
With kind regards,
Maurice Makaay
Phorum.org
Hi,
A quick sketch of an idea that should work:
<?php
// test data
$nodes=array(
array('id'=>1, 'parent_id'=>0),
array('id'=>2, 'parent_id'=>1),
array('id'=>3, 'parent_id'=>1),
array('id'=>4, 'parent_id'=>2)
);
// create column data
$keys=$indents=$index=array();
$i=0;
foreach($nodes as $node)
{
$id=$node['id'];
$pid=$node['parent_id'];
$index[$id]=$i;
$keys[]=$pid ? $keys[$index[$pid]].'_'.$id : $id;
$indents[]=$pid ? $indents[$index[$pid]]+1 : 0;
++$i;
}
// sort
array_multisort($keys, $indents, $nodes);
// display
$i=0;
foreach($nodes as $node)
echo str_repeat(' ', $indents[$i++]), $node['id'], "\n";
?>
Regards,
Michael
Hello,
Take a look at
http://www.php.net/manual/en/function.array-multisort.php#68689
/ http://rquadling.php1h.com/array_multisort_column.php
Could you please explain how you think that multisort array would help
in doing a tree sort? AFAIK, tree sorting is not a simple sort algorithm
where you can tell which node comes before the other based on an $a vs.
$b comparison. After playing with it for quite a bit of time, I'm quite
convinced that you actually have to walk the tree to find indention
levels and node positions.Imagine a tree like this:
node id parent id
1 0 (root node)
2 1
3 1
4 2The tree sort with indention would look like this:
1 2 4 3
This is about as simple as it can get. So the ordered tree nodes for
this example would be array(1,2,4,3). Now try to find an array sorting
algorithm to get to this result. You'll find that node 4 is quite a
nasty one to get your head around.Maybe we missed something obvious here, but there's most probably a
reason why a lot of applications implement tree sorting as a recursive
function call and not as an array sort of some kind.From your followup post:
That doesn't look right. Surely ID is already sorted so the array
won't change?
And sorting by "parent" and then "id" would make no difference either.I see you already found out the big issue with tree sorting ;-) You
cannot apply simple linear sorting to find out in what order the nodes
in a tree appear.I hope my example cleared up some of the confusion.
With kind regards,
Maurice Makaay
Phorum.org
Hi,
A quick sketch of an idea that should work:
Yes, that would work. The problem though, is that there's still
accumulation of data going on, before the actual sorting can take place.
Remember that the main reason for writing the C-extension was to get the
memory usage down.
Here's some benchmarks that I just did between the multisort solution
and the extension (time / memory). I commented out the efree() calls in
the extension so see the actual memory usage in these benchmarks:
10 nodes
extension 4.19616699219E-05 / 756
multisort 0.000128984451294 / 3916
100 nodes
extension 0.000138998031616 / 5444
multisort 0.000992059707642 / 32536
1000 nodes
extension 0.00320601463318 / 52276
multisort 0.0115978717804 / 320988
5000 nodes
extension 0.127704143524 / 260356
multisort 0.0901730060577 / 1594448
10000 nodes
extension 0.438856840134 / 520308
multisort 0.172026872635 / 3196832
As you can see, the extension is faster and lighter on memory. At a
really large number of nodes, the extension becomes slower, but the
memory stays low. It's the memory that bothers my servers the most, not
the execution time.
With kind regards,
Maurice Makaay
Phorum.org
At a really
large number of nodes, the extension becomes slower, but the memory stays low.
That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.
regards,
Derick
Derick Rethans wrote:
At a really
large number of nodes, the extension becomes slower, but the memory stays low.That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.
Me too. But, fwiw, the multisort method does not do the same thing.
So, its apples to oranges. See my other email. In the apples to apples
comparisons with use the same algorithm in PHP and in C, the C version
is always faster.
--
Brian Moon
Senior Developer
http://dealnews.com/
It's good to be cheap =)
Derick Rethans wrote:
At a really
large number of nodes, the extension becomes slower, but the memory stays low.That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.Me too. But, fwiw, the multisort method does not do the same thing.
So, its apples to oranges. See my other email. In the apples to apples
comparisons with use the same algorithm in PHP and in C, the C version
is always faster.
I agree with Brian here, the code I referenced was an array multisort
by column which I created based upon user notes on the array_multisort
function. You do not require a trawl through to get the list of "keys"
like is done here.
So,
$sorted = array_multisort_column($nodes, 'parent_id', 'id');
It is a single pass mechanism. It uses the indexes supplied to compare
them in sequence. A simple bubble sort.
BUT this is not a tree sort as this is something quite different. The
array_multisort()
nor my array_multisort_column() functions will not
work for more than 1 level of parent/child relationships.
Richard
--
Brian Moon
Senior Developerhttp://dealnews.com/
It's good to be cheap =)
--
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731
"Standing on the shoulders of some very clever giants!"
Hi,
That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.
Depends on your definition of "optimal".
In setting up the structures, I have a linear search going to find the
node for a parent. That's why it gets slower for (very) large quantities
of nodes. It would be easy enough to introduce a hashing table there for
quick lookups, but of course at the cost of extra memory. I have looked
at my forums (since like Brian wrote: the code that we have is targeted
at Phorum's message trees) and there I need sorting for up to a several
thousand nodes at most. The large majority of the sorts are several
hundreds at most. So this code needs no further optimization there.
If this would go in the PHP internals, then I asume the function code
would probably be balancing between fast processing and little memory use.
Regards,
Maurice Makaay
Phorum.org
Hi,
That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.
The timing for large numbers of nodes got slower due to a linear lookup
(as explained in a previous message too). I implemented a hash table
lookup as a replacement. This still does not use a lot of memory and the
speed does no longer decrease at large quantities of nodes.
FYI, here are some benchmarks against the multisort routine:
10 nodes
extension 8.39233398438E-05 / 1316
multisort 0.0001380443573 / 3808
100 nodes
extension 0.00014591217041 / 10416
multisort 0.000678062438965 / 32160
1000 nodes
extension 0.000765800476074 / 79868
multisort 0.00885391235352 / 298844
5000 nodes
extension 0.00524711608887 / 284572
multisort 0.110008001328 / 1596844
10000 nodes
extension 0.0113878250122 / 537368
multisort 0.136567831039 / 3252232
20000 nodes
extension 0.0704438686371 / 1042796
multisort 0.370564937592 / 6567656
With kind regards,
Maurice Makaay
Phorum.org
HI Maurice,
That is very peculiar... it should never be slower than an
implementation in PHP - unless your algorithm isn't optimal.The timing for large numbers of nodes got slower due to a linear lookup
(as explained in a previous message too). I implemented a hash table
lookup as a replacement. This still does not use a lot of memory and the
speed does no longer decrease at large quantities of nodes.
Only for the record, You may have missed it, there was some work done
in this area by Sterling a couple of years ago (well, too many ;). He
thought there was a need for such functions. He began to work on an
extension called advanced data types (adt) in pecl . It may be
something you can consider and bring it back to life :) ?
Or at least it could be a good idea to continue your work in PECL, in
a single tree extension. At some point point, the new
functions/classes may be in php or maybe not. But you will at least
have a place to stabilize/improve, publish or maintain.
Please note that these are only suggestions, I would like to have
advance data types like tree (or derived) n php but for what I see, we
are not ready yet to understand their importance :)
--Pierre
Maurice Makaay wrote:
Hi,
A quick sketch of an idea that should work:
Yes, that would work. The problem though, is that there's still
accumulation of data going on, before the actual sorting can take place.
Remember that the main reason for writing the C-extension was to get the
memory usage down.
Actually, this does not work. It uses string sorting to sort the nodes
in the $keys array. This means the nodes at a given level are now out
of order compared to the original order of the input array
Try this for making a node list:
for($x=1;$x<=40;$x++){
$parent = rand(0,$x-1);
$nodes[$x] = array(
"id" => $x,
"parent_id" => $parent,
"title" => "item $x"
);
}
You will see that the order of the nodes within a level is string
sorted. That is not desired. The order of the children should not
change from the order they appeared in the input array. In this case,
numeric, but with our function, the order is preserved. So, you could
sort the array by ANY thing you wanted before calling the tree sort
function and the nodes at any given level will be in that order.
And as Maurice said, the memory overhead is still there.
--
Brian Moon
Senior Developer
http://dealnews.com/
It's good to be cheap =)
A common issue in lots of applications is tree sorting with unlimited
depth. Phorum has used a recursive function since 1999 to solve this
problem. However, at MySQL Conference this year, we finally wrote a
non
Seems like you could just make it a custom extension and see if people
use it a lot...
Even if you just had every phorum user asking for it, that would drive
a lot of interest, no?
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
Richard Lynch wrote:
Seems like you could just make it a custom extension and see if people
use it a lot...Even if you just had every phorum user asking for it, that would drive
a lot of interest, no?
Well, making a custom Phorum extension is a whole other discussion for
our team to have. If we went down that road, we would do more than just
this function. And with this function in the core or not, we may still
do that at some point.
My point for wanting it in PHP is that I have used a tree sorting
algorithm OUTSIDE of Phorum a lot in my 10 years of PHP coding. There
are many many uses outside of Phorum for such a function IMO.
--
Brian Moon
Senior Developer
http://dealnews.com/
It's good to be cheap =)
Brian Moon schrieb:
Well, making a custom Phorum extension is a whole other discussion for
our team to have. If we went down that road, we would do more than just
this function.
An extension that provides efficient graph / tree algorithms (the latter
are just a special case of the former) would be usefull, IMHO.
--
Sebastian Bergmann http://sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69
Richard Lynch wrote:
Seems like you could just make it a custom extension and see if
people
use it a lot...Even if you just had every phorum user asking for it, that would
drive
a lot of interest, no?Well, making a custom Phorum extension is a whole other discussion for
our team to have. If we went down that road, we would do more than
just
this function. And with this function in the core or not, we may
still
do that at some point.My point for wanting it in PHP is that I have used a tree sorting
algorithm OUTSIDE of Phorum a lot in my 10 years of PHP coding. There
are many many uses outside of Phorum for such a function IMO.
My point is that the path to getting anything into core might be to
release a generalized extension that, for your purposes, would be
almost as good as having it in core in the short term.
Long-term, one of these happens:
The extension either "grows" into a full-blow extension with more
tree/graph functions, and everybody installs it because it's so
useful, which is as good as having it in core.
The extension doesn't grow, but the function is found useful in all
the circumstances you predict and it is absorbed into core, as a
1-function extension is kinda silly.
It turns out, maybe, that it's not as generally needed as you think,
and you just have a custom extension, which you start adding
phorum/specific stuff to that optimizes phorum when it's present.
The whole PECL thing goes doesn in flames because ISPs (still) won't
install anything that's not in "core" from ./configure line.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?