Developer Blog

  • Blog
  • /
  • Optimizing Nested Set performance
By Dracony on 24 April 2016

Just yesterday I finished the last test for the new relationship type for PHPixie ORM - Nested Set. I thought whether to implement this or the Closure Table approach for storing trees in SQL databases. But the latter has a huge downside of the quasiquadratic size of the relationship table in the worst case (just 20 nodes can result in 190 records stored there). So the next step was to try improving the classic Nested Set approach.

The biggest problem with it is the high cost of node manipulation. The more to the left of the tree we insert a new node the more records need to be updated. For example:

Nested Set

Let’s say we need to add a Fay subcategory to Pixies, this will result in the following:

Nested Set Insertion

As we see all the records in the table need to be updated. Now imagine you use this approach to save a tree of comments to your articles. Every time somebody adds a new comment you update a whole bunch of records, furthermore you have to make sure this is done within a transaction and preferably on a SERIALIZABLE level. Otherwise even just a few active users commenting at the same time can break the entire index. And of course the performance also suffers greatly as a result.

I have found an elegant way that fixes this problem by slightly modifying the classic Nested Set. The idea is simple although the implementation does become somewhat more difficult. Every node should also store the identifier of the subtree they belong to, for example the id field of the root node. Also in each subtree we enumerate the left and right indexes separately. This way the above example would look like this:

Optimized

When inserting we only need to update the nodes in a single subtree:

Optimized Insertion

As we see the Plants subtree didn’t change at all this time. In practice this greatly reduces the amount of updated rows. If the changes are made in different subtrees they don’t interfere with each other at all, and even if one subtree gets corrupted it won’t affect the others.

As a downside the code becomes more complex. It’s easy to make a mistake and forget to update the root field when moving a node from one subtree to the other, harder to construct the object tree from such records, etc. That’s actually why so much time was spent on writing tests.

Usage with PHPixie ORM

The full doc on using Nested Set is coming soon, but for now here’s a quick guide for those who want to try it out now.

First add 4 integer fields to your table: left, right, rootId and depth (of course the field names are fully configurable, and those are just the defaults). Then add the following to /bundes/app/assets/config/orm.php:

1
2
3
4
5
6
7
8
9
return array(
     'relationships' => array(
          //....
         array(
             'model' => 'category' //model name,
             'type' => 'nestedSet'
         )
     )
);

Then it’s straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$fairies->children->add($sprites);
$fairies->children->add($pixies);

//or like this
$pixies->parent->set($fairies);

//move a category to root
$pixies->parent->remove();

//load the entire tree from database
//note the ->where('depth', 0) condition,
//we use it to start only from the top-level nodes
$categories = $orm->query('category')
    ->where('depth', 0)
    ->find(array('children'));

By the way PHPixie will itself update the indexes when you delete entities, and prevent you from deleting entities if they would leave undeleted children.

I hope you’ll like this new feature, and even if you don’t use PHPixie may find this approach to Nested Set useful.

If you’re interested in implementation details or have any question just drop a message in our chatroom.

comments powered by Disqus