当前位置：网站首页>MySQL advanced index [classification, performance analysis, use, design principles]

Out of data , The database system also maintains a data structure that satisfies a specific search algorithm , These data structures are referenced in some way （ Point to ） data , In this way, advanced search algorithms can be implemented on these data structures , This data structure is the index .

An index is like a catalogue of books , We can quickly find the chapter we want to read through the catalogue , You don't have to look through pages .

1、 demonstration

The table structure and its data are as follows ：

If we want to implement SQL Statement for ： select * from user where age = 42;

1.1、 No index

Without index , You need to scan from the first line , Scan until the last line , We call it Full table scan , Very good performance low .

1.2、 When there is an index

If we index this table , Suppose the index structure is a binary tree , So it means , Would be right age This field establishes a binary tree index structure .

At this point, when we query , It only takes a few scans to find the data , Greatly improve the efficiency of query

notes ：

Here we just assume that the structure of the index is a binary tree , Introduce the general principle of index , It's just a sketch , and

Not the real structure of the index .

2、 characteristic

advantage ：

Improve the efficiency of data retrieval , Reduce the IO cost .

Sort data through index columns , Reduce the cost of sorting data , Reduce CPU Consumption of .

Inferiority ：

Index columns also take up space .

Indexing greatly improves query efficiency , At the same time, it also reduces the speed of updating tables , Such as on the table INSERT、UPDATE、DELETE when , Low efficiency .

3、 Index structure

MySQL The index of is implemented in the storage engine layer , Different storage engines have different index structures , It mainly includes the following ：

B+Tree Indexes ： The most common type of index , Most engines support B+ Tree index .

Hash Indexes ： The underlying data structure is realized by hash table , Only the exact index matches the columns , Range query is not supported .

R-tree( Spatial index ）： The spatial index is MyISAM A special index type of the engine , Mainly used for geospatial data types , Usually used less .

Full-text( Full-text index )： It's a way of building inverted indexes , How to quickly match documents . Be similar to Lucene,Solr,ES .

The above is MySQL All index structures supported in , Next , Let's take a look at the support of different storage engines for index structure .

Indexes	InnoDB	MyISAM	Memory
B+tree Indexes	Support	Support	Support
Hash Indexes	I won't support it	I won't support it	Support
R-tree Indexes	I won't support it	Support	I won't support it
Full-text	5.6 Support for	Support	I won't support it

Be careful ： What we usually call index , If not specified , All refer to B+ Index of tree structure organization .

3.1 Binary tree

If say MySQL The index structure of adopts the data structure of binary tree , The ideal structure is as follows ：

If the primary key is inserted sequentially , Will form a one-way linked list , The structure is as follows

therefore , If you choose a binary tree as the index structure , There will be the following disadvantages ：

On sequential insertion , It will form a linked list , Query performance is greatly reduced .

In case of large amount of data , Deeper levels , The retrieval speed is slow .

At this point, you may think , We can choose red and black trees , Red black tree is a self balanced binary tree , So even if it's sequential insertion According to the , The resulting data structure is also a balanced binary tree , The structure is as follows :

however , Even so , Because the red black tree is also a binary tree , So there will be a disadvantage ：

In case of large amount of data , Deeper levels , The retrieval speed is slow .

therefore , stay MySQL In the index structure of , Did not choose binary tree or red black tree , And the choice is B+Tree, So what is B+Tree Well ？

3.2 B-Tree

B-Tree,B A tree is a kind of multi fork road scale search tree , Relative to a binary tree ,B Each node of the tree can have multiple branches , That is, multi fork .

At a maximum degree （max-degree） by 5(5 rank ) Of b-tree For example , So this one B Each node of the tree can store up to 4 individual key,5 A pointer to the ：

notes ：

The degree of a tree refers to the number of child nodes of a node .

characteristic ：

5 Step B Trees , Each node can store up to 4 individual key, Corresponding 5 A pointer to the .

Once the node stores key The quantity has arrived 5, Will fission , The intermediate element splits up .

stay B In the tree , Both non leaf nodes and leaf nodes store data .

3.3 B+Tree

B+Tree yes B-Tree Variants , We have a maximum degree （max-degree） by 4（4 rank ） Of b+tree For example , Let's take a look at its structural diagram ：

We can see , Two parts ：

The green framed part , It's the index part , It only plays the role of indexing data , Don't store data .
The part framed in red , It's the data storage part , Specific data should be stored in its leaf node .

characteristic ：

All the data will appear in the leaf node .

Leaf nodes form a one-way linked list .

Non leaf nodes only serve to index data , The specific data is stored in the leaf node .

The structure we see above is standard B+Tree Data structure of , Next , Let's see MySQL After optimization in B+Tree.

MySQL Index data structure for classic B+Tree optimized .

In the original B+Tree On the basis of , Add a pointer to the linked list of adjacent leaf nodes , So we have a sequence pointer B+Tree, Improve the performance of interval access , Conducive to sorting .

3.4 Hash

Hash index is to use a certain hash Algorithm , Convert key values to new hash value , Map to the corresponding slot , Then stored in hash In the table .

If two ( Or more ) Key value , Map to the same slot , They produced hash Conflict （ Also known as hash Collision ）, can To solve the problem through a linked list .

characteristic ：

Hash Indexes can only be used for peer-to-peer comparisons (=,in), Range query is not supported （between,>,< ,...） .

Cannot complete sort operation with index .

High query efficiency , Usually ( non-existent hash Conflict situation ) It only needs one search , Efficiency is usually higher than B+tree Cable lead .

stay MySQL in , Support hash The index is Memory Storage engine .

and InnoDB It has adaptive function hash function ,hash The index is InnoDB The storage engine is based on B+Tree The index is automatically built under specified conditions .

Interview questions ：

Why? InnoDB The storage engine chooses to use B+tree Index structure ?

answer ：

Relative to a binary tree , Fewer levels , High search efficiency ;

about B-tree, Whether leaf nodes or non leaf nodes , Data will be saved , This results in fewer key values stored in a page , The pointer decreases , Save a lot of data as well , Can only increase the height of the tree , Resulting in reduced performance ;

relative Hash Indexes ,B+tree Support range matching and sorting operations ;

4、 Index classification

stay MySQL database , The specific types of indexes are mainly divided into the following categories ：

primary key 、 unique index 、 General index 、 Full-text index .

classification	meaning	characteristic	keyword
Primary key Indexes	The index created for the primary key in the table	Automatically created by default , Can only There is one	PRIMARY
only Indexes	Avoid duplicate values in a data column in the same table	There can be multiple	UNIQUE
General index	Quickly locate specific data	There can be multiple
Full-text index	The full-text index looks up the keywords in the text , Not more than Compare the value in the index	There can be multiple	FULLTEXT

4.1 Clustered index & Secondary indexes

And in the InnoDB In the storage engine , According to the storage form of the index , It can be divided into the following two types ：

classification	meaning	characteristic
Clustered index (ClusteredIndex)	Put data storage and index together , The leaf node of the index structure holds the row data	There has to be , And only There is one
Secondary indexes (SecondaryIndex)	Separate data from index , The leaf node of the index structure is associated with the corresponding primary key	There can be multiple

Clustered index selection rules :

If there is a primary key , A primary key index is a clustered index .

If there is no primary key , The first unique... Will be used （UNIQUE） Index as clustered index .

If the table does not have a primary key , Or there is no suitable unique index , be InnoDB It will automatically generate a rowid As a hidden clustered index .

The specific structures of clustered index and secondary index are as follows ：

The data of this row is hung under the leaf node of the clustered index .

The primary key value corresponding to the field value is hung under the leaf node of the secondary index .

When we execute the following SQL When the sentence is , What is the specific search process like .

select *
from emp
where name = ' Jin yong '

The process is as follows ：

1、 Because it's based on name Field to query , So first according to name=' Jin yong ' To name Field in the secondary index . But in the secondary index, you can only find Jin yong The corresponding primary key value 1.

2、 Because the data returned by the query is *, So at this time , You also need to use the primary key value 1, Find in the clustered index 1 Corresponding records , Finally find 1 The corresponding line row.

3、 Finally get the data of this line , Just go back .

Return to the table for query ：

This first to look up data in the secondary index , Find primary key value , Then it is added to the clustered index according to the primary key value , obtain

The way of data , It is called back to table query .

Thinking questions ：

Here are two SQL sentence , The execution efficiency is high ? Why? ?

1、 select * from emp where id = 1 ;

2、select * from user where name = ' Jin yong ' ;

remarks : id Primary key ,name Fields are indexed ;

4.2 Index Syntax

1、 Create index

CREATE [ UNIQUE | FULLTEXT ] INDEX index_name ON table_name (index_col_name,... ) ;

2、 Look at the index

SHOW INDEX FROM table_name ;

3、 Delete index

DROP INDEX index_name ON table_name ; 1

5、SQL Performance analysis

5.1 SQL Frequency of execution

MySQL After successful client connection , adopt show [session|global] status Command can provide server status information .

By the following instructions , You can view the INSERT、UPDATE、DELETE、SELECT Frequency of visits ：

notes ：

1、session Is to view the current session ;

2、global Global query is data ;

SHOW GLOBAL STATUS LIKE 'Com_______';

Com_delete: Number of deletions

Com_insert: Number of inserts

Com_select: Query times

Com_update: Number of updates

Through the above instructions , We can see whether the current database is based on query , Or mainly add, delete and modify , So as to provide data

Provide reference basis for library optimization .

If it is mainly based on addition, deletion and modification , We can consider not optimizing the index . If it is based on query , Then we should consider optimizing the index of the database .

Then through the query SQL Frequency of execution of , We can know whether the current database is mainly for addition, deletion and modification , Or mainly query . That fake For example, it is mainly based on query , How can we optimize those query statements ？

The number of times we can use the slow query log .

5.2 Slow query log

The slow query log records all execution times that exceed the specified parameters （long_query_time, Company ： second , Default 10 second ） All of the SQL Statement log .

MySQL The slow query log is not enabled by default , We can look at the system variables slow_query_log.

SHOW variables like 'slow_query_log';

If you don't turn it on , Need to be in MySQL Configuration file for （/etc/my.cnf） The following information is configured in ：

notes ：

If you don't find the location provided above my.cnf, You can go to C:\ProgramData\MySQL\MySQL Server 8.0 look for my.ini, That's what I am .

The first red box is to open MySQL Slow log query switch , The second is to set the time of slow log to 2 second ,SQL Statement execution time exceeds 2 second , It will be regarded as slow query , Log slow queries .

Once configured , Restart... With the following command MySQL The server tests .

systemctl restart mysqld

test ：

The implementation is as follows SQL sentence ：

No data is inserted now ：

select * from tb_user;

Insert this time 100w Data , Query again

select count(*) from tb_sku;

Check the slow query log ：

In the end, we found out , In slow query log , Only the execution time will be recorded, which exceeds our preset time （10s） Of SQL, Perform faster SQL It won't be recorded .

That such , Log by slow query , We can locate those with low execution efficiency SQL, Thus targeted optimization .

5.3 profile details

show profiles Be able to do SQL Optimization helps us understand where the time is spent .

adopt have_profiling Parameters , Be able to see the present MySQL Do you support profile operation ：

SELECT @@have_profiling ;

You can see , At present MySQL It's supporting profile Operation of the , If the switch is off . Can pass set Statements in

session/global Level on profiling：

SET [session/global] profiling = 1;

The switch has been turned on , Next , What we do SQL sentence , Will be MySQL Record , And record where the execution time is spent .

We directly execute the following SQL sentence ：

select * from tb_user;

select * from tb_user where id = 1;

select * from tb_user where name = ' White ';

select count(*) from tb_sku;

Execute a series of business SQL The operation of , Then check the execution time of the instruction through the following instructions ：

1、 Check each one SQL The time-consuming basic situation of

show profiles;

2、 View specified query_id Of SQL The time-consuming situation of each stage of the statement

show profile for query query_id;

3、 View specified query_id Of SQL sentence CPU Usage situation

show profile cpu for query query_id;

Check each one SQL Time consuming :

View specified SQL The time-consuming situation of each stage :

5.4 explain

EXPLAIN perhaps DESC Command acquisition MySQL How to execute SELECT Statement information , Included in SELECT How tables are joined and the order in which they are joined during statement execution .

grammar ：

Directly in select Add the keyword before the statement explain / desc

EXPLAIN SELECT  Field list  FROM  Table name  WHERE  Conditions  ;

Explain The meaning of each field in the execution plan :

6、 Use of index

6.1 Verify index efficiency

Here is a simple example , I use one with 100w Table of data , Operate on it .

In this list id Primary key , There's a primary key index , Other fields are not indexed . Let's first query one of the records , Look at the fields inside , The implementation is as follows SQL：

select * from tb_sku where id = 50000;

You can see that even if there is 100w The data of , according to id Data query , The performance is still very fast , Because the primary key id It's indexed . So next , Let's go again according to sn Field to query , The implementation is as follows SQL：

SELECT * FROM tb_sku WHERE sn = '1000000031450050000';

We can see the basis sn Field to query , The query returned a piece of data , The result is time consuming 13 s 608 ms, Because of sn No index , And the query efficiency is very low .

Then we can aim at sn Field , Build an index , After indexing , We are again based on sn The query , Let's take a look at the query time .

Create index ：

create index idx_sku_sn on tb_sku(sn) ;

And then do the same again SQL sentence , Look again SQL Time consuming .

We will obviously see ,sn After the field is indexed , Query performance is greatly improved . Before and after indexing , The query time is not an order of magnitude

6.2 The leftmost prefix rule

If you index multiple columns （ Joint index ）, Follow the leftmost prefix rule . The leftmost prefix rule means that the query starts from the leftmost column of the index , And don't skip columns in the index . If you jump a column , The index will be partially invalidated ( The following field index is invalid ).

for example , I give t_user Create a federated index on the table ,age,sex,status, If I make a query , The leftmost column age non-existent , Then all indexes are invalid .

And you can't skip a column in the middle , Otherwise, the field index behind the column will be invalidated .

Here's a question to think about ：

If I do sql Statement writing , take age and status Exchange positions , Whether the leftmost prefix rule is satisfied at this time ？

answer ：

Satisfy , The leftmost column in the leftmost prefix rule , It refers to when querying , The leftmost field of the union index ( This is the first field ) There must be , Write with us SQL when , The order in which conditions are written is irrelevant .

notes ：

In the union index , Range query appears (>,<), The column index on the right side of the range query is invalid .

When business allows , Use as much as possible similar to >= or <= This kind of range query , Avoid using > or < .

6.3 Index failure

6.3.1 Index column operation

Do not operate on index columns , Index will fail .

For example, in t_user On the table , The index column is age, If in age use AVG After the function does the average operation , Index failure .

6.3.2 String without quotes

When using string type fields , Without quotes , Index will fail .

Why? ？ Because the database has implicit type conversion , Index will fail .

6.3.3 Fuzzy query

If it's just tail blur matching , The index will not fail . If it's a fuzzy head match , Index failure .

When we are making fuzzy query '%1%' , The percent sign cannot be in front of , In this way, the sub index will become invalid ,'1%', Just like this .

6.3.4 or Connection condition

use or The conditions of separation , If or The columns in the previous condition are indexed , And there's no index in the next column , Then the indexes involved will not be used .

select * from tb_user where id = 10 or age = 23;

For example, the above statement , If id There is an index , however age No index , At this time, the index will expire , So go ahead or When inquiring , It's best to index both fields .

6.3.5 Data distribution affects

If MySQL Evaluation uses indexes more slowly than full tables , Index is not used .

select * from tb_user where phone >= '17799990005'; 
select * from tb_user where phone >= '17799990015';

For example, the above two statements , But the values passed in are different , The final implementation plan is also completely different , Why? ？

because MySQL In the query , Will evaluate the efficiency of using the index and the efficiency of full table scanning , If you walk the whole table, scanning is faster , Then abandon the index , Take a full scan . Because the index is used to index a small amount of data , If large quantities of data are returned through index query , It's not as fast as scanning the whole table , At this point, the index will be invalidated .

6.4 SQL Tips

SQL Tips , Is an important means to optimize the database , Simply speaking , Is in the SQL Add some human prompts in the statement to optimize the operation .

1、 use index ： Suggest MySQL Which index to use to complete this query （ It's just advice ,mysql Internal evaluation will be conducted again ）.

explain select * from  Table name  use index( Index name ) where  Conditions

2、 ignore index ： Ignore the specified index .

explain select * from  Table name  ignore index( Index name ) where  Conditions

3、force index ： Force index .

explain select * from  indicate  force index( Index name ) where  Conditions

6.5 Overlay index

Try to use overlay index , Reduce select *.

So what is an overlay index ？ Overlay index refers to the index used in the query , And the columns that need to be returned , All can be found in this index .

This is equivalent to , Our watch has id,age,sex These fields , then id and age They all have indexes .

If we're making a query Use it directly select * , Return all fields , At this time, the back table query will be triggered , What back to table query ？

Originally, if we only returned age, instead of * Number Return all data , This will take the secondary index , To age Field in the secondary index . Found... In the secondary index age The corresponding primary key value , Then return the data .

If the data returned from the query is *, So at this time , You also need to use the primary key value , Find the record corresponding to the primary key value in the clustered index , Finally find Primary key value The corresponding line row.

This requires two index scans , That is, you need to query back to the table , The performance is relatively poor .

In this case, how to optimize ？

That is to add the index to the field without index .

6.6 Prefix index

When the field type is string （varchar,text,longtext etc. ） when , Sometimes you need to index long strings , This makes the index big , When inquiring , Waste a lot of disk IO, Affecting query efficiency .

At this point, you can prefix only part of the string with , build Vertical index , This can greatly save index space , To improve index efficiency .

The grammar is as follows ：

create index idx_xxxx on table_name(column(n)) ;

Example ： by tb_user Tabular email Field , Set the length to 5 Prefix index of .

create index idx_email_5 on tb_user(email(5));

Prefix length ：

It can be determined according to the selectivity of the index , Selectivity refers to index values that are not repeated （ base ） And the total number of records in the data table , The higher the index selectivity, the higher the query efficiency , The only index selectivity is 1, This is the best index selectivity , Performance is also the best .

select count(distinct email) / count(*) from tb_user ;

6.7 Single column index and joint index

Single index ： That is, an index contains only a single column .
Joint index ： That is, an index contains multiple columns .

In the business scenario , If there are multiple query criteria , Consider when indexing fields , It is recommended to establish a joint index , Instead of a single column index .

Because if , We use a single column index , A query statement , There are multiple indexes ,MySQL Only one of them , In this way, the rest will certainly go back to the table query and reduce the performance .

7、 Index design principles

For large amount of data , The tables that are frequently queried are indexed .

For often used as query criteria （where）、 Sort （order by）、 grouping （group by） Index the fields of the operation .

Try to select highly differentiated columns as indexes , Try to build a unique index , The more distinguishable , The more efficient the index is .

If it is a string type field , The length of the field is long , You can focus on the characteristics of the field , Building prefix index .

Try to use a federated index , Reduce single column index , When inquiring , Joint indexes can often overwrite indexes , Save storage space , Avoid returning to your watch , Improve query efficiency .

To control the number of indexes , The index is not that more is better , More indexes , The greater the cost of maintaining the index structure , It will affect the efficiency of addition, deletion and modification .

If the index column cannot store NULL value , Please use... When creating the table NOT NULL Constrain it . When the optimizer knows whether each column contains NULL When the value of , It can better determine which index is most effectively used for queries .