Oleh: kod34fr33 | 6/Mei/2008

Adjacency list tree on Mysql

Modelling hierarchy data in relational database is hard.

Adjacency list is one of few method to modelling this data to SQL.
if you do not know about adjacency list, or you want to know other method than adjacency list, you might wanto to read Joe Celko’s Trees and Hierarchies in SQL for Smarties

Adjacency list is known with its “easy to insert but hard to retrieve”.

In this post, i want to share my solution on hierarchy data in MySQL with Adjacency list method, in order to have these method work, you need MySQL with TRIGGER support (i use 5.0.45 version).

Lets take a look on dummy data:

Director
|-- Secretary
|
|-- Treasury Mgr
|    |-- Employee 1
|    |-- Employee 2
|
|-- Sales Mgr
     |-- Region 1 Mgr
     |    |-- Employee 3
     |    |-- Employee 4
     |    |-- Employee 5
     |
     |-- Region 2 Mgr
          |-- Subregion A Mgr
          |    |-- Employee 6
          |    |-- Employee 7
          |-- Subregion B Mgr
               |-- Employee 8
               |-- Employee 9

Now, create table

mysql> CREATE TABLE `employee` (
  > `emp_id` INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
  > `emp_parent_id` INT NULL,
  > `name` VARCHAR(200) NOT NULL,
  > FOREIGN KEY(`emp_parent_id`) REFERENCES `employee`(`emp_id`)
  > ) ENGINE=INNODB DEFAULT CHARSET=UTF8;

Then, to make life easier when retrieving data we need additional table (helper table) : a path table

mysql> CREATE TABLE `employee_path`(
  > `emp_id` INT PRIMARY KEY NOT NULL,
  > `path` TEXT,
  > FOREIGN KEY(`emp_id`) REFERENCES `employee`(`emp_id`)
  > ) ENGINE=INNODB DEFAULT CHARSET=UTF8;

after that, we need trigger to automate path builder everytime employee record inserted or updated.

mysql> DELIMITER //
mysql> CREATE TRIGGER `makeEmpTree` AFTER INSERT ON `site`
  > FOR EACH ROW BEGIN
  >  DECLARE `parent_path TEXT DEFAULT '';
  >  SELECT `path` INTO `parent_path` FROM `employee_path` WHERE `emp_id` = NEW.`emp_parent_id`; --updated on 2008-06-03 thx to /dev/null
  >  INSERT INTO `employee_path` VALUES(
  >    NEW.`emp_id`,
  >    CONCAT(IF(LENGTH(`parent_path`) < 2, '/', `parent_path`), NEW.`emp_id`, '/')
  >  );
  > END;
  > //
mysql> CREATE TRIGGER `updateEmpTree` BEFORE UPDATE ON `site`
  > FOR EACH ROW BEGIN
  >  DECLARE `old_path` TEXT DEFAULT '';
  >  DECLARE `new_path` TEXT DEFAULT '';
  >  SELECT `path` INTO `old_path` FROM `employee_path` WHERE `emp_id` = NEW.`emp_id`;
  >  SELECT `path` INTO `new_path` FROM `employee_path` WHERE `emp_id` = NEW.`emp_parent_id`;
  >  UPDATE `employee_path` SET `path` =
  >    REPLACE(`path`, `old_path`,
  >      CONCAT(IF(LENGTH(`new_path`) < 2, '/', `new_path`), NEW.`emp_id`, '/') )
  >    WHERE LEFT(`path`, LENGTH(`old_path`)) = `old_path`;
  > END;
  > //
mysql> DELIMITER ;

done!

lets take hands on data:

1st – INSERTING DATA

mysql> INSERT INTO `employee` VALUES
  > (DEFAULT, NULL, 'Director'), (DEFAULT, 1, 'Secretary'),
  > (DEFAULT, 1, 'Treasury Mgr'), (DEFAULT, 3, 'Employee 1'),
  > (DEFAULT, 3, 'Employee 2'), (DEFAULT, 1, 'Sales Mgr'),
  > (DEFAULT, 6, 'Region 1 Mgr'), (DEFAULT, 7, 'Employee 3'),
  > (DEFAULT, 7, 'Employee 4'), (DEFAULT, 7, 'Employee 5'),
  > (DEFAULT, 6, 'Region 2 Mgr'), (DEFAULT, 11, 'Subregion A Mgr'),
  > (DEFAULT, 12, 'Employee 6'), (DEFAULT, 12, 'Employee 7'),
  > (DEFAULT, 11, 'Subregion B Mgr'), (DEFAULT, 15, 'Employee 8'),
  > (DEFAULT, 15, 'Employee 9');

should be returning output such as

Query OK, 17 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

now, check on path table, it should have content like below:

mysql> SELECT * FROM `employee_path`;
+--------+----------------+
| emp_id | path           |
+--------+----------------+
|      1 | /1/            |
|      2 | /1/2/          |
|      3 | /1/3/          |
|      4 | /1/3/4/        |
|      5 | /1/3/5/        |
|      6 | /1/6/          |
|      7 | /1/6/7/        |
|      8 | /1/6/7/8/      |
|      9 | /1/6/7/9/      |
|     10 | /1/6/7/10/     |
|     11 | /1/6/11/       |
|     12 | /1/6/11/12/    |
|     13 | /1/6/11/12/13/ |
|     14 | /1/6/11/12/14/ |
|     15 | /1/6/11/15/    |
|     16 | /1/6/11/15/16/ |
|     17 | /1/6/11/15/17/ |
+--------+----------------+
17 rows in set (0.00 sec)

2nd – BASIC SELECT QUERY
Get all root nodes
Useful when hierarchy has multiple roots. (This sample has only 1 root)

mysql> SELECT * FROM `employee` WHERE `emp_parent_id` IS NULL;

Get Sibling node
sample query: get sibling of Treasury Manager (id = 3)

mysql> SELECT e.* FROM `employee` e, `employee` e2
  > WHERE
  >   e.`emp_parent_id` = e2.`emp_parent_id` AND
  >   e2.`emp_id` = 3 AND
  >   e.`emp_id` <> e2.`emp_id`;

Get Full Tree Nodes From Known Root Node
sample query:get full tree nodes where root is director (emp_id = 1)
Just replace number ‘1′ on ‘/1/%’ with another root id if table has multiple root node

mysql> SELECT `emp_id`, `path`
  > FROM `employee_path`
  > WHERE `path` LIKE('/1/%')
  > ORDER BY `path` ASC;

Get Branch Nodes
sample query: get branch nodes of Sales Mgr (id = 6)
Actually, query below is generalization of query above ;)

mysql> SELECT
  >  e.`emp_id` AS 'Most Root Node Id',
  >  e2.`emp_id,
  >  e2.`name`
  >  p.`path`
  > FROM
  >  `employee` e, `employee` e2,
  >  `employee_path` p, `employee_path` p2
  > WHERE
  >  e.emp_id = 6 AND
  >  p2.`emp_id` = e.`emp_id AND
  >  p.`path` LIKE(CONCAT(p2.`path`,'%')) AND
  >  e2.`emp_id = p.`emp_id`
  > ORDER BY p.`path` ASC;

Note:
- This query is not optimized!
- This query is generic query, so you could have fun with it.
e.g. : replace ‘e.emp_id = 6′ with :
+ ‘e.`emp_id` = 1′ and you have same result with query “get full tree from known root node
+ ‘e.`emp_parent_id` IS NULL’ and you have all tree grouped by each root node

get hierarchy style
To have hierarchy style we could use query below:

mysql> SELECT
  >  e.`emp_id`,
  >  CONCAT(
  >    REPEAT('  ',
  >      (LENGTH(p.`path`) - LENGTH(REPLACE(p.`path`,'/','')),
  >      e.`name`
  >    ) AS tree
  > FROM `employee` e
  > LEFT JOIN `employee_path` p ON e.`emp_id` = p.`emp_id`
  > ORDER BY p.`path ASC;

+--------+---------------------------+
| emp_id | tree                      |
+--------+---------------------------+
|      1 |     Director              |
|      2 |       Secretary           |
|      3 |       Treasury Mgr        |
|      4 |         Employee 1        |
|      5 |         Employee 2        |
|      6 |       Sales Mgr           |
|     11 |         Region 2 Mgr      |
|     12 |           Subregion A Mgr |
|     13 |             Employee 6    |
|     14 |             Employee 7    |
|     15 |           Subregion B Mgr |
|     16 |             Employee 8    |
|     17 |             Employee 9    |
|      7 |         Region 1 Mgr      |
|     10 |           Employee 5      |
|      8 |           Employee 3      |
|      9 |           Employee 4      |
+--------+---------------------------+

17 rows in set (0.00 sec)

3rd. Manipulating Node
To know whether those query below success or not, you could check yourself by comparing the result from “hierarchy style query” before and after execute each query below.
Move leaf node to another node
Sample query: move employee 3 to treasury Mgr

mysql> UPDATE `employee` SET `emp_parent_id` = 3 WHERE `emp_id` = 8;

Move leaf node to become root node
Sample query: make Region 2 Mgr become a root node (actually, IMO, this action should not happen in the context of employee ;p )

mysql> UPDATE `employee` SET `emp_parent_id` = NULL WHERE `emp_id` = 11;

Move branch node to another branch
Sample query: move Subregion A Mgr to Sales Mgr

mysql> UPDATE `employee` SET `emp_parent_id` = 6 WHERE `emp_id` = 12;

4th. Note

Testing can only prove the presence of bugs, not their absence.
–Edsger W. Dijkstra

This appoarch, of course, have a limitation.
Limitation, or bug, i know was (please share, if you found another limitation/bug) :

  • There is no checking when moving a node to its decendant. (I’m still thinking about check whether the new parent node is its decendant or not in before update trigger)
  • This approach does not cover node weight yet.(if you have no idea about weight, take a look at “hierarchy style” query result and compare it with sample hierarchy on the top. and ask to yourself, why Region 1 Mgr on the result become below Region 2 Mgr? ;p)

PS: I found postgresql developer wrote about another different approach of adjacency list in his blog. you might want to read about it.


Tanggapan

  1. Dear kod34fr33,

    Thank you very much for you work.
    using a helper table quite an nice approach id one wants to go with the adjacency list model.
    However, I’ve got one question:
    In your first trigger you use the following select:
    “SELECT `path` INTO `parent_path` FROM `employee_path` WHERE `emp_id` = NEW.`emp_id`;”
    As the trigger starts “AFTER INSERT”, I ask myself how there can ever be a result for that query?
    An employee with the emp_id of the employee just inserted into the “employee”-table will never be found in the path-table.
    I suppose if you changed this to “SELECT `path` INTO `parent_path` FROM `employee_path` WHERE `emp_id` = NEW.`emp_parent_id`;” it should work.
    Thanks again,

    /dev/null

    /dev/null
    After read Joe Celko’s book and Tropashko’s article, my approach more relevant to call as materialized path a.k.a path enumeration than adjacency list.
    and you are right about the trigger. that’s my mistake. thanks for pointing that out. –kod34fr33

  2. This is great!
    I working on a project to implement Decision tree for predictions.

    Regards from india

  3. Someone is copying information, do you have two blogs? The exact same article is at http://icreativelabs.com/blog/2008/05/adjacency-list-tree-on-mysql – but it seems that your’s is the original?

  4. This is actually copied verbatim from the book he links at the beginning. You can find stolen versions of this content all over the web.

    Shameful, really.

  5. Tx for good thinking – neat, simple and efficient.

    Is the reason for the second helper table to avoid table mutating problems on the triggers? Otherwise the `path` could just be another column in the employee table.

    I think your triggers should read after insert and before update in EMPLOYEE rather than on SITE.

    Maybe your note about “node weight” to fix the sort order can be solved by adding columns to the order by clause, eg order by p.path ASC,p.name ASC

  6. Oops. order by p.path ASC,p.name ASC should be
    order by p.path ASC,e.name ASC

  7. Sorry the sort cannot be fixed as I suggested

  8. HIERARCHY SORTING SOLUTION
    Having embarrassed myself by suggesting a non workable sort solution (see above) I have found one that works.
    The most user friendly sort order for any hierarchy (eg product categories, staff positions, index in a book etc ) is to have all the same-level entries sorted alphabetically, like this:

    Director
    Sales Mgr
    Region 1 Mgr
    Employee 3
    Employee 4
    Employee 5
    Region 2 Mgr
    Subregion A Mgr
    Employee 6
    Employee 7
    Subregion B Mgr
    Employee 8
    Employee 9
    Secretary
    Treasury Mgr
    Employee 1
    Employee 2

    The level two employees are Sales Mgr, Secretary and Treasury Mgr, in that order.
    Under the Sales Mgr is Region 1 Mgr and Region 2 Mgr, in that order, and so on.
    They must be sorted alphabetically regardless of the sequence in which they were added to the table. Sorting by the primary key is not a good idea.

    The reason why kod34fr33`s Region 2 Mgr sorted before Region 1 Mgr is that /1/6/11/ is alphabetically before /1/6/7/.
    One can achieve an alphabetic sort within a depth level, by adding a `sort` column to the helper table. The sort column must be built the same way as the path column, but using employee.name rather than emp_id and adding a few lines to the triggers.

    The sort keys for Region 1 Mgr and Region 2 Mgr would then be “/Director/Sales Mgr/Region 1 Mgr” and “/Director/Sales Mgr/Region 2 Mgr”
    and when the same select is done but using “order by p.sort” the results are sorted alphabetically within each level.

    I would not be concerned about the fact that the sort column will hold long strings. These hierarchy tables are typically static lookup tables with relatively few rows (compared to business transaction tables which grow every day), so disk space is not a serious consideration.

  9. Good Article

    http://www.dealsghost.com


Beri tanggapan

Your response:

Kategori