当前位置:网站首页>How tall is the B+ tree of the MySQL index?

How tall is the B+ tree of the MySQL index?

2022-08-09 10:29:00 InfoQ

一、问题

I often meet the students of the business line to ask,Since the pageI/O对MySQLQuery performance is greatly affected,那么对于一次MySQL查询,How many times the bottom layer is going to be pagedI/O呢?

为了回答这个问题,Below we simplify a few concepts:

  • h:Collectively referred to as the height of the index;
  • h1:The height of the clustered index;
  • h2:The height of the secondary secondary index;
  • k:The fan-out coefficient of the intermediate node.

二、分析

Have to say this is a great question,It is closely related to our daily inquiries.这个问题看似简单,But it's not that easy to answer.首先我们来看下
MySQL B+树索引的结构

null
我们都知道MySQL里,索引是用B+树来实现的.B+Only the leaf nodes of the tree store specific data(Clustered indexes store complete row data,The secondary secondary index stores the index value+主键,Non-covering index queries also have to return the table),Intermediate nodes are used to index leaf nodes.

2.1 索引高度h与页面I/O数的关系

It can be seen from the index structure,Each query must visit the leaf node,The number of pages accessed is exactly the height of the indexh.例如,A point query on the primary key
SELECT * FROM USER WHERE id=1
,那么要查询h1page to find the row data in the leaf node,也即进行h1次页面I/O.(另外,Secondary indexes are basically loaded in memory,Here we ignore this situation for now.)

综上,Query the corresponding pageI/OThe number is related to the index used,主要分为以下几种情况:

  • 点查询:
  • 聚族索引:h1
  • 二级索引:
  • 覆盖索引:h2
  • 回表查询:h2+h1
  • 范围查询:这种情况相对比较复杂,But the principle is similar to the point query,读者可自行分析;
  • 全表查询:B+The leaves of the tree are connected by a linked list,For full table queries,All leaf nodes need to be visited from beginning to end.

2.2 索引高度htheoretical calculations

从上可以看出,hThe size of the index is bound to become a key to index performance.Then usually the index height of the tableh是多大呢?

We assume that the fan-out coefficient of the intermediate node is k、The number of row records of the leaf node is n,则叶子结点数为
k^(h-1)
,总记录数为
k^(h-1)*n
.

在InnoDB里,每个页面默认16KB,假设主键是4B的int类型.对于中间节点,There is a page number after each primary key value4B,还有6B的其他数据(参考《MySQL技术内幕:InnoDB存储引擎》),Then the fan-out coefficientk=16KB/(4B+4B+6B)≈1170.Let's also assume that the record size of each row is 1KB,The number of records that each leaf node can holdn=16KB/1KB=16.

在高度h=3时,叶子结点数=1170^2 ≈137W,总记录数=1170^2*16=2190W!!也就是说,InnoDBBy indexing the page three timesI/O,即可索引2190W行记录.

同理,在高度h=4时,总行数=1170^3*16≈256亿条!!!

This is just our theoretical analysis,InnoDBThere is also no corresponding view provided for viewing,So how do we check the true height of the index?

三、See how to index the true height

God Jiang Chengyao is here
《查看 InnoDB表中每个的索引高度》
A method to view the true height of the index is described in this article.

null
如上图所示,InnoDB是索引组织表,每个页都包含一个PAGE_LEVEL的信息(见上图右半部分),用于表示当前页所在索引中的高度.默认叶子节点的高度为0,那么Root页的PAGE_LEVEL+1就是这棵索引的高度.

The next question is how to get all the indexes of a tableRootWhere is the page?在《MySQL技术内幕:InnoDB存储引擎》analyzed in the book<space,3>这个页(即ibd文件的第3个页面,从0开始)is a clustered indexRoot页,在《MySQL内核:InnoDB存储引擎 卷1》also analyzed,RootThe position of the page usually does not change.then other indexesRootWhere is the page?通过下面的SQLThe statement can find out the index of each index in the tableRoot页信息:

SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
FROM information_schema.INNODB_SYS_INDEXES a,
 information_schema.INNODB_SYS_TABLES b
WHERE a.table_id = b.table_id 
 AND a.space <> 0;

运行结果如下:

null
其中<SPACE,PAGE_NO>就是索引的Root页信息,SPACEIt can be considered as a tableibd文件,PAGE_NO代表ibdThe page number in the file(从0开始).With this information, it is easy to locate,因为PAGE_LEVEL在每个Root页的偏移量64位置处,占用两个字节,这样我们通过
hexdump
You can quickly locate the height information of each index tree.例如,We check it with the following commanduserThe height of the clustered index on the table:

$hexdump -C -s 49216 -n 10 user.ibd
0000c040 00 01 00 00 00 00 00 00 03 2b

这里,49216表示的是
16384*3+64
,即从第3In-page offset64位置开始读取10个字节,前两个字节为PAGE_LEVEL,后8个字节是index_id,It's what you see in the picture aboveindex_id=811(0x032b=811)的聚簇索引,这里PAGE_LEVEL为00 01,Then the height of the index tree is 2.

综上,The methods for viewing the true height of an index are summarized below:

  • 通过
    information_schema.INNODB_SYS_INDEXES
    information_schema.INNODB_SYS_TABLES
    找到对应索引Root页在ibdThe page number of the filePAGE_NO(Clustered indexes are usually 3,Other indexes accumulate upwards);
  • 通过
    hexdump -C -s 16384*PAGE_NO+64 -n 10 user.ibd
    ,前两个字节+1is the height of the index.

四、Verify the true height of the index

我们创建一个user表,其结构如下:

CREATE TABLE `user` (
 `id` INT(11) NOT NULL AUTO_INCREMENT,
 `card_id` INT(11) NOT NULL DEFAULT '0' COMMENT '身份证号',
 `name` VARCHAR(32) NOT NULL DEFAULT '' COMMENT '英文名字',
 `age` TINYINT(2) UNSIGNED NOT NULL DEFAULT '0' COMMENT '年龄',
 `content` VARCHAR(1024) NOT NULL DEFAULT '' COMMENT '附属信息',
 `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最近更新时间',

 PRIMARY KEY (`id`),
 UNIQUE KEY `uniq_card_id` (`card_id`),
 KEY `idx_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='用户表';

Continue inserting data into the table,
card_id
字段跟id一样从1开始,
name
为32个随机字符,
content
for the length in[500,1000]之间的随机字符.数据示例如下:

null
我们通过
hexdump -C -s 49216 -n 10 user.ibd && hexdump -C -s 65600 -n 10 user.ibd && hexdump -C -s 81984 -n 10 user.ibd
See the true height of the three indices,汇总如下:



It can be seen from the real data,聚簇索引在2700WWhen the height rises to 4,than analyzed above2190W要慢500W左右,This should be mainly because2190Wis based on row records1KB来估算的,And the data we insert is in [0.5KB,1KB]之间,Slave pages allow leaf nodes to hold more row records.

Another interesting phenomenon was found,
idx_name
索引在2300WWhen the height rises to 4.这主要是因为
name
长度为32,扇出系数k=16KB/(32B+4B+6B)≈390,This is better than clustered indexes1170要小的多(Or to be thinner),造成索引“瘦高”的情况.

五、总结

  • page for one queryI/Onumber and index heighth呈正相关,主要分为以下几种情况:
  • 点查询:
  • 聚族索引:h1
  • 二级索引:
  • 覆盖索引:h2
  • 回表查询:h2+h1
  • 范围查询:这种情况相对比较复杂,But the principle is similar to the point query,读者可自行分析;
  • 全表查询:B+The leaves of the tree are connected by a linked list,For full table queries,All leaf nodes need to be visited from beginning to end.
  • The index height is usually 2~4层.在高度h=3、主键为int类型、The row record size is 1KB时,The total number of indexable rows is 1170^2*16=2190W.这也是
    Many big manufacturers will2000WAs a sub-library sub-table standard
    的原因;
  • Here's how to check the true height of the index:
  • 通过
    information_schema.INNODB_SYS_INDEXES
    information_schema.INNODB_SYS_TABLES
    找到对应索引Root页在ibdThe page number of the filePAGE_NO(Clustered indexes are usually 3,Other indexes accumulate upwards);
  • 通过
    hexdump -C -s 16384*PAGE_NO+64 -n 10 user.ibd
    ,前两个字节+1is the height of the index.
  • 索引高度hIt is also related to the data type of the index field.如果是int或short,扇出系数大,Indexing is more efficient,The entire index appears to belong“矮胖”型;而如果是varchar(32)等,The fan-out coefficient is lower,The entire index appears to belong“瘦高”型,The indexing efficiency is naturally lower.So when we select the type in the field,其
    The simpler the type, the better the efficiency
    .

看到这里,是不是有点心动了?Then don't hurry up and use the method of this article to see the index height of the table you are responsible for?

Comes with a quick view command:

#聚簇索引(第一个)
hexdump -C -s 49216 -n 10 user.ibd

#第二个索引
hexdump -C -s 65600 -n 10 user.ibd

#第三个索引
hexdump -C -s 81984 -n 10 user.ibd



关于作者

Du Yunjie,高级架构师,Head of Zhuanzhuan Architecture Department,Executive Chairman of Zhuanzhuan Technical Committee,腾讯云TVP.负责服务治理、MQ、云平台、APM、IM、分布式调用链路追踪、监控系统、配置中心、分布式任务调度平台、分布式ID生成器、Basic components such as distributed locks.微信号:
waterystone
,Constructive exchanges are welcome.

转转研发中心及业界小伙伴们的技术学习交流平台,定期分享一线的实战经验及业界前沿的技术话题.

关注公众号「转转技术」(综合性)、「大转转FE」(专注于FE)、「转转QA」(专注于QA),更多干货实践,欢迎交流分享~
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/221/202208091018162250.html