当前位置：网站首页>Using ltree to process hierarchical data in PostgreSQL

Using ltree to process hierarchical data in PostgreSQL

2022-04-22 06:36:00 【PostgreSQLChina】

author ： alexei · Vasiliev （Alexey Vasiliev）

translator ： Lei Yanliang , Working in Hangao basic software Co., Ltd ,PostgreSQL Database technology enthusiasts ,PostgreSQL ACE、PGCM、10g &11g OCM,OGG Certified Professional .

Original address ：https://leopard.in.ua/2013/09/02/postgresql-ltree#.YEhtc2gzaUk

In this paper , We will learn how to use PostgreSQL Of ltree modular , The module allows data to be stored in a hierarchical tree structure .

What is? ltree？

Ltree yes PostgreSQL modular . It implements a data type ltree, Labels used to represent data stored in a hierarchical tree structure . Provides a wide range of tools for searching tag trees .

Why choose ltree？

ltree Realized a materialized path , about INSERT / UPDATE / DELETE It's very fast for me , And for SELECT The operation is faster
Usually , It's better than using recursion that often needs to recalculate branches CTE Or recursive functions
For example, the built-in query syntax and special operators for query and navigation tree
Indexes ！！！

Initial data

First , You should enable extensions in the database . You can do this with the following command ：

CREATE EXTENSION ltree;

Let's create a table and add some data to it ：

CREATE TABLE comments (user_id integer, description text, path ltree);
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0001.0001.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 1, md5(random()::text), '0001.0001.0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0001.0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 6, md5(random()::text), '0001.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 8, md5(random()::text), '0001.0003.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 11, md5(random()::text), '0001.0003.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 2, md5(random()::text), '0001.0003.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 5, md5(random()::text), '0001.0003.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 7, md5(random()::text), '0001.0003.0002.0002.0001');
INSERT INTO comments (user_id, description, path) VALUES ( 20, md5(random()::text), '0001.0003.0002.0002.0002');
INSERT INTO comments (user_id, description, path) VALUES ( 31, md5(random()::text), '0001.0003.0002.0002.0003');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0004');
INSERT INTO comments (user_id, description, path) VALUES ( 34, md5(random()::text), '0001.0003.0002.0002.0005');
INSERT INTO comments (user_id, description, path) VALUES ( 22, md5(random()::text), '0001.0003.0002.0002.0006');

in addition , We should add some indexes ：

CREATE INDEX path_gist_comments_idx ON comments USING GIST(path);
CREATE INDEX path_comments_idx ON comments USING btree(path);

As you can see , I built comments Table with path Field , This field contains the tree All the way . As you can see , For the tree separator , I use 4 Two numbers and dots .

Let us in commenets Find... In the table path With ‘0001.0003’ The record of ：

$ SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

Let's go through EXPLAIN Order to check this SQL：

$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on comments  (cost=0.00..1.24 rows=2 width=38) (actual time=0.013..0.017 rows=12 loops=1)
   Filter: (path <@ '0001.0003'::ltree)
   Rows Removed by Filter: 7
 Total runtime: 0.038 ms
(4 rows)

Let's disable seq scan To test ：

$ SET enable_seqscan=false;
SET
$ EXPLAIN ANALYZE SELECT user_id, path FROM comments WHERE path <@ '0001.0003';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using path_gist_comments_idx on comments  (cost=0.00..8.29 rows=2 width=38) (actual time=0.023..0.034 rows=12 loops=1)
   Index Cond: (path <@ '0001.0003'::ltree)
 Total runtime: 0.076 ms
(3 rows)

Now? SQL slow , But I can see SQL How to use index Of .
first SQL The sentence uses sequence scan, Because there is not much data in the table .

We can select “path <@ ‘0001.0003’” Another way to do it ：

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
(12 rows)

You shouldn't forget the order of the data , Here's an example ：

$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0001');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0002');
$ INSERT INTO comments (user_id, description, path) VALUES ( 9, md5(random()::text), '0001.0003.0001.0003');
$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*';
user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       5 | 0001.0003.0002.0003
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
(15 rows)

Now sort ：

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

Can be in lquery Add a few modifiers to the end of the non asterisk label of , To make it match better than a perfect match ：
“ @”- Case-insensitive matching , for example a @ matching A
“ *”- Match any tag with that prefix , for example foo * matching foobar
“％”- Match the words that start with the underscores

$ SELECT user_id, path FROM comments WHERE path ~ '0001.*{1,2}.0001|0002.*' ORDER by path;
 user_id |           path
---------+--------------------------
       2 | 0001.0001.0001
       2 | 0001.0001.0001.0001
       1 | 0001.0001.0001.0002
       5 | 0001.0001.0001.0003
       6 | 0001.0002.0001
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(19 rows)

We have to parent ‘0001.0003’ Find all the direct childrens, See below ：

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*{1}' ORDER by path;
 user_id |      path
---------+----------------
       8 | 0001.0003.0001
       9 | 0001.0003.0002
(2 rows)

by parent ‘0001.0003’ Find all childrens, See below ：

$ SELECT user_id, path FROM comments WHERE path ~ '0001.0003.*' ORDER by path;
 user_id |           path
---------+--------------------------
       6 | 0001.0003
       8 | 0001.0003.0001
       9 | 0001.0003.0001.0001
       9 | 0001.0003.0001.0002
       9 | 0001.0003.0001.0003
       9 | 0001.0003.0002
      11 | 0001.0003.0002.0001
       2 | 0001.0003.0002.0002
       7 | 0001.0003.0002.0002.0001
      20 | 0001.0003.0002.0002.0002
      31 | 0001.0003.0002.0002.0003
      22 | 0001.0003.0002.0002.0004
      34 | 0001.0003.0002.0002.0005
      22 | 0001.0003.0002.0002.0006
       5 | 0001.0003.0002.0003
(15 rows)

by children ‘0001.0003.0002.0002.0005’ find parent:

$ SELECT user_id, path FROM comments WHERE path = subpath('0001.0003.0002.0002.0005', 0, -1) ORDER by path;
 user_id |        path
---------+---------------------
       2 | 0001.0003.0002.0002
(1 row)

If your path is not unique , You'll get multiple records .

summary

It can be seen that , Use ltree The path of materialization is very simple . In this paper , I didn't list ltree All possible uses of . It's not considered a full-text search issue ltxtquery. But you can do it at PostgreSQL Official documents (http://www.postgresql.org/docs/current/static/ltree.html) Found it in .

Learn more about PostgreSQL Hot news 、 news information 、 Wonderful activities , Please visit China PostgreSQL Official website ：www.postgresqlchina.com

Solve more PostgreSQL Related knowledge 、 technology 、 Work problems , Please visit China PostgreSQL Official Q & a community ：www.pgfans.cn

Download more PostgreSQL Related information 、 Tools 、 Plug in problems , Please visit China PostgreSQL Official download site ：www.postgreshub.cn

版权声明
本文为[PostgreSQLChina]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204220551004766.html