当前位置:网站首页>MySQL advanced index [classification, performance analysis, use, design principles]

MySQL advanced index [classification, performance analysis, use, design principles]

2022-04-23 17:38:00 Everything will always return to plain

Catalog

1、 demonstration

1.1、 No index

1.2、 When there is an index

2、 characteristic

3、 Index structure

3.1 Binary tree

3.2 B-Tree

3.3 B+Tree

3.4 Hash

4、 Index classification

4.1 Clustered index & Secondary indexes

4.2 Index Syntax

5、SQL Performance analysis

5.1 SQL Frequency of execution

5.2 Slow query log

5.3 profile details

5.4 explain

6、 Use of index

6.1 Verify index efficiency

6.2 The leftmost prefix rule

6.3 Index failure

6.4 SQL Tips

6.5 Overlay index

6.6 Prefix index

6.7 Single column index and joint index

7、 Index design principles


Indexes (index) Help MySQL Data structure for efficient data acquisition ( Orderly ).

Out of data , The database system also maintains a data structure that satisfies a specific search algorithm , These data structures are referenced in some way ( Point to ) data , In this way, advanced search algorithms can be implemented on these data structures , This data structure is the index .

An index is like a catalogue of books , We can quickly find the chapter we want to read through the catalogue , You don't have to look through pages .

1、 demonstration

The table structure and its data are as follows :

If we want to implement SQL Statement for : select * from user where age = 42;

1.1、 No index

Without index , You need to scan from the first line , Scan until the last line , We call it Full table scan , Very good performance low .

1.2、 When there is an index

If we index this table , Suppose the index structure is a binary tree , So it means , Would be right age This field establishes a binary tree index structure .

At this point, when we query , It only takes a few scans to find the data , Greatly improve the efficiency of query

notes :

Here we just assume that the structure of the index is a binary tree , Introduce the general principle of index , It's just a sketch , and

Not the real structure of the index .

2、 characteristic

advantage :

  • Improve the efficiency of data retrieval , Reduce the IO cost .

  • Sort data through index columns , Reduce the cost of sorting data , Reduce CPU Consumption of .

Inferiority :

  • Index columns also take up space .

  • Indexing greatly improves query efficiency , At the same time, it also reduces the speed of updating tables , Such as on the table INSERT、UPDATE、DELETE when , Low efficiency .

3、 Index structure

MySQL The index of is implemented in the storage engine layer , Different storage engines have different index structures , It mainly includes the following :

  1. B+Tree Indexes : The most common type of index , Most engines support B+ Tree index .

  2. Hash Indexes : The underlying data structure is realized by hash table , Only the exact index matches the columns , Range query is not supported .

  3. R-tree( Spatial index ) : The spatial index is MyISAM A special index type of the engine , Mainly used for geospatial data types , Usually used less .

  4. Full-text( Full-text index ): It's a way of building inverted indexes , How to quickly match documents . Be similar to Lucene,Solr,ES .

The above is MySQL All index structures supported in , Next , Let's take a look at the support of different storage engines for index structure .

Indexes InnoDB MyISAM Memory
B+tree Indexes Support Support Support
Hash Indexes I won't support it I won't support it Support
R-tree Indexes I won't support it Support I won't support it
Full-text 5.6 Support for Support I won't support it

Be careful : What we usually call index , If not specified , All refer to B+ Index of tree structure organization .

3.1 Binary tree

If say MySQL The index structure of adopts the data structure of binary tree , The ideal structure is as follows :

 

  If the primary key is inserted sequentially , Will form a one-way linked list , The structure is as follows

therefore , If you choose a binary tree as the index structure , There will be the following disadvantages :

  1. On sequential insertion , It will form a linked list , Query performance is greatly reduced .

  2. In case of large amount of data , Deeper levels , The retrieval speed is slow .

At this point, you may think , We can choose red and black trees , Red black tree is a self balanced binary tree , So even if it's sequential insertion According to the , The resulting data structure is also a balanced binary tree , The structure is as follows :

however , Even so , Because the red black tree is also a binary tree , So there will be a disadvantage :

  • In case of large amount of data , Deeper levels , The retrieval speed is slow .

therefore , stay MySQL In the index structure of , Did not choose binary tree or red black tree , And the choice is B+Tree, So what is B+Tree Well ?

3.2 B-Tree

B-Tree,B A tree is a kind of multi fork road scale search tree , Relative to a binary tree ,B Each node of the tree can have multiple branches , That is, multi fork .

At a maximum degree (max-degree) by 5(5 rank ) Of b-tree For example , So this one B Each node of the tree can store up to 4 individual key,5 A pointer to the :

notes :

The degree of a tree refers to the number of child nodes of a node .

characteristic :

  1. 5 Step B Trees , Each node can store up to 4 individual key, Corresponding 5 A pointer to the .

  2. Once the node stores key The quantity has arrived 5, Will fission , The intermediate element splits up .

  3. stay B In the tree , Both non leaf nodes and leaf nodes store data .

3.3 B+Tree

B+Tree yes B-Tree Variants , We have a maximum degree (max-degree) by 4(4 rank ) Of b+tree For example , Let's take a look at its structural diagram :

We can see , Two parts :

  1. The green framed part , It's the index part , It only plays the role of indexing data , Don't store data .

  2. The part framed in red , It's the data storage part , Specific data should be stored in its leaf node .

characteristic :

  1. All the data will appear in the leaf node .

  2. Leaf nodes form a one-way linked list .

  3. Non leaf nodes only serve to index data , The specific data is stored in the leaf node .

The structure we see above is standard B+Tree Data structure of , Next , Let's see MySQL After optimization in B+Tree.

MySQL Index data structure for classic B+Tree optimized .

In the original B+Tree On the basis of , Add a pointer to the linked list of adjacent leaf nodes , So we have a sequence pointer B+Tree, Improve the performance of interval access , Conducive to sorting .

3.4 Hash

Hash index is to use a certain hash Algorithm , Convert key values to new hash value , Map to the corresponding slot , Then stored in hash In the table .

  If two ( Or more ) Key value , Map to the same slot , They produced hash Conflict ( Also known as hash Collision ), can To solve the problem through a linked list .

characteristic :

  1. Hash Indexes can only be used for peer-to-peer comparisons (=,in), Range query is not supported (between,>,< ,...) .

  2. Cannot complete sort operation with index .

  3. High query efficiency , Usually ( non-existent hash Conflict situation ) It only needs one search , Efficiency is usually higher than B+tree Cable lead .

stay MySQL in , Support hash The index is Memory Storage engine .

and InnoDB It has adaptive function hash function ,hash The index is InnoDB The storage engine is based on B+Tree The index is automatically built under specified conditions .

Interview questions :

Why? InnoDB The storage engine chooses to use B+tree Index structure ?

answer :

  • Relative to a binary tree , Fewer levels , High search efficiency ;

  • about B-tree, Whether leaf nodes or non leaf nodes , Data will be saved , This results in fewer key values stored in a page , The pointer decreases , Save a lot of data as well , Can only increase the height of the tree , Resulting in reduced performance ;

  • relative Hash Indexes ,B+tree Support range matching and sorting operations ;

4、 Index classification

stay MySQL database , The specific types of indexes are mainly divided into the following categories :

primary key 、 unique index 、 General index 、 Full-text index .

classification meaning characteristic keyword
Primary key Indexes The index created for the primary key in the table Automatically created by default , Can only There is one PRIMARY
only Indexes Avoid duplicate values in a data column in the same table There can be multiple UNIQUE
General index Quickly locate specific data There can be multiple
Full-text index The full-text index looks up the keywords in the text , Not more than Compare the value in the index There can be multiple FULLTEXT

4.1 Clustered index & Secondary indexes

And in the InnoDB In the storage engine , According to the storage form of the index , It can be divided into the following two types :

classification meaning characteristic
Clustered index (ClusteredIndex) Put data storage and index together , The leaf node of the index structure holds the row data There has to be , And only There is one
Secondary indexes (SecondaryIndex) Separate data from index , The leaf node of the index structure is associated with the corresponding primary key There can be multiple

Clustered index selection rules :

  • If there is a primary key , A primary key index is a clustered index .

  • If there is no primary key , The first unique... Will be used (UNIQUE) Index as clustered index .

  • If the table does not have a primary key , Or there is no suitable unique index , be InnoDB It will automatically generate a rowid As a hidden clustered index .

The specific structures of clustered index and secondary index are as follows :

The data of this row is hung under the leaf node of the clustered index .

The primary key value corresponding to the field value is hung under the leaf node of the secondary index .

When we execute the following SQL When the sentence is , What is the specific search process like .

select *
from emp
where name = ' Jin yong '

The process is as follows :

1、 Because it's based on name Field to query , So first according to name=' Jin yong ' To name Field in the secondary index . But in the secondary index, you can only find Jin yong The corresponding primary key value 1.

2、 Because the data returned by the query is *, So at this time , You also need to use the primary key value 1, Find in the clustered index 1 Corresponding records , Finally find 1 The corresponding line row.

3、 Finally get the data of this line , Just go back .

Return to the table for query

This first to look up data in the secondary index , Find primary key value , Then it is added to the clustered index according to the primary key value , obtain

The way of data , It is called back to table query .

Thinking questions :

Here are two SQL sentence , The execution efficiency is high ? Why? ?

1、 select * from emp where id = 1 ;

2、select * from user where name = ' Jin yong ' ;

remarks : id Primary key ,name Fields are indexed ;

4.2 Index Syntax

1、 Create index

CREATE [ UNIQUE | FULLTEXT ] INDEX index_name ON table_name (index_col_name,... ) ;

2、 Look at the index

SHOW INDEX FROM table_name ;

3、 Delete index

DROP INDEX index_name ON table_name ; 1

5、SQL Performance analysis

5.1 SQL Frequency of execution

MySQL After successful client connection , adopt show [session|global] status Command can provide server status information .

By the following instructions , You can view the INSERT、UPDATE、DELETE、SELECT Frequency of visits :

notes :

1、session Is to view the current session ;

2、global Global query is data ;

SHOW GLOBAL STATUS LIKE 'Com_______';

 

Com_delete: Number of deletions

Com_insert: Number of inserts

Com_select: Query times

Com_update: Number of updates

Through the above instructions , We can see whether the current database is based on query , Or mainly add, delete and modify , So as to provide data

Provide reference basis for library optimization .

If it is mainly based on addition, deletion and modification , We can consider not optimizing the index . If it is based on query , Then we should consider optimizing the index of the database .

Then through the query SQL Frequency of execution of , We can know whether the current database is mainly for addition, deletion and modification , Or mainly query . That fake For example, it is mainly based on query , How can we optimize those query statements ?

The number of times we can use the slow query log .

5.2 Slow query log

The slow query log records all execution times that exceed the specified parameters (long_query_time, Company : second , Default 10 second ) All of the SQL Statement log .

MySQL The slow query log is not enabled by default , We can look at the system variables slow_query_log.

SHOW variables like 'slow_query_log';

  If you don't turn it on , Need to be in MySQL Configuration file for (/etc/my.cnf) The following information is configured in :

notes :

If you don't find the location provided above my.cnf, You can go to C:\ProgramData\MySQL\MySQL Server 8.0 look for my.ini, That's what I am .

The first red box is to open MySQL Slow log query switch , The second is to set the time of slow log to 2 second ,SQL Statement execution time exceeds 2 second , It will be regarded as slow query , Log slow queries .

Once configured , Restart... With the following command MySQL The server tests .

systemctl restart mysqld

test :

The implementation is as follows SQL sentence :

No data is inserted now :

select * from tb_user; 

Insert this time 100w Data , Query again

select count(*) from tb_sku;

 

  Check the slow query log :

In the end, we found out , In slow query log , Only the execution time will be recorded, which exceeds our preset time (10s) Of SQL, Perform faster SQL It won't be recorded .

That such , Log by slow query , We can locate those with low execution efficiency SQL, Thus targeted optimization .

5.3 profile details

show profiles Be able to do SQL Optimization helps us understand where the time is spent .

adopt have_profiling Parameters , Be able to see the present MySQL Do you support profile operation :

SELECT @@have_profiling ;

You can see , At present MySQL It's supporting profile Operation of the , If the switch is off . Can pass set Statements in

session/global Level on profiling:

SET [session/global] profiling = 1;

The switch has been turned on , Next , What we do SQL sentence , Will be MySQL Record , And record where the execution time is spent .

We directly execute the following SQL sentence :

select * from tb_user;

select * from tb_user where id = 1;

select * from tb_user where name = ' White ';

select count(*) from tb_sku;

Execute a series of business SQL The operation of , Then check the execution time of the instruction through the following instructions :

1、 Check each one SQL The time-consuming basic situation of

show profiles;

2、 View specified query_id Of SQL The time-consuming situation of each stage of the statement

show profile for query query_id;

3、 View specified query_id Of SQL sentence CPU Usage situation

show profile cpu for query query_id;

Check each one SQL Time consuming :

  View specified SQL The time-consuming situation of each stage :

5.4 explain

EXPLAIN perhaps DESC Command acquisition MySQL How to execute SELECT Statement information , Included in SELECT How tables are joined and the order in which they are joined during statement execution .

grammar :

Directly in select Add the keyword before the statement explain / desc

EXPLAIN SELECT  Field list  FROM  Table name  WHERE  Conditions  ;

 Explain The meaning of each field in the execution plan :

6、 Use of index

6.1 Verify index efficiency

Here is a simple example , I use one with 100w Table of data , Operate on it .

In this list id Primary key , There's a primary key index , Other fields are not indexed . Let's first query one of the records , Look at the fields inside , The implementation is as follows SQL:

select * from tb_sku where id = 50000;

  You can see that even if there is 100w The data of , according to id Data query , The performance is still very fast , Because the primary key id It's indexed . So next , Let's go again according to sn Field to query , The implementation is as follows SQL:

SELECT * FROM tb_sku WHERE sn = '1000000031450050000'; 

We can see the basis sn Field to query , The query returned a piece of data , The result is time consuming 13 s 608 ms, Because of sn No index , And the query efficiency is very low .

Then we can aim at sn Field , Build an index , After indexing , We are again based on sn The query , Let's take a look at the query time .

Create index :

create index idx_sku_sn on tb_sku(sn) ;

And then do the same again SQL sentence , Look again SQL Time consuming .

We will obviously see ,sn After the field is indexed , Query performance is greatly improved . Before and after indexing , The query time is not an order of magnitude

6.2 The leftmost prefix rule

If you index multiple columns ( Joint index ), Follow the leftmost prefix rule . The leftmost prefix rule means that the query starts from the leftmost column of the index , And don't skip columns in the index . If you jump a column , The index will be partially invalidated ( The following field index is invalid ).

for example , I give t_user Create a federated index on the table ,age,sex,status, If I make a query , The leftmost column age non-existent , Then all indexes are invalid .

And you can't skip a column in the middle , Otherwise, the field index behind the column will be invalidated .

Here's a question to think about :

If I do sql Statement writing , take age and status Exchange positions , Whether the leftmost prefix rule is satisfied at this time ?

answer :

Satisfy , The leftmost column in the leftmost prefix rule , It refers to when querying , The leftmost field of the union index ( This is the first field ) There must be , Write with us SQL when , The order in which conditions are written is irrelevant .

notes :

In the union index , Range query appears (>,<), The column index on the right side of the range query is invalid .

When business allows , Use as much as possible similar to >= or <= This kind of range query , Avoid using > or < .

6.3 Index failure

6.3.1 Index column operation

Do not operate on index columns , Index will fail .

For example, in t_user On the table , The index column is age, If in age use AVG After the function does the average operation , Index failure .

6.3.2 String without quotes

When using string type fields , Without quotes , Index will fail .

Why? ? Because the database has implicit type conversion , Index will fail .

6.3.3 Fuzzy query

If it's just tail blur matching , The index will not fail . If it's a fuzzy head match , Index failure .

When we are making fuzzy query '%1%' , The percent sign cannot be in front of , In this way, the sub index will become invalid ,'1%', Just like this .

6.3.4 or Connection condition

use or The conditions of separation , If or The columns in the previous condition are indexed , And there's no index in the next column , Then the indexes involved will not be used .

select * from tb_user where id = 10 or age = 23;

For example, the above statement , If id There is an index , however age No index , At this time, the index will expire , So go ahead or When inquiring , It's best to index both fields .

6.3.5 Data distribution affects

If MySQL Evaluation uses indexes more slowly than full tables , Index is not used .

select * from tb_user where phone >= '17799990005'; 
select * from tb_user where phone >= '17799990015';

For example, the above two statements , But the values passed in are different , The final implementation plan is also completely different , Why? ?

because MySQL In the query , Will evaluate the efficiency of using the index and the efficiency of full table scanning , If you walk the whole table, scanning is faster , Then abandon the index , Take a full scan . Because the index is used to index a small amount of data , If large quantities of data are returned through index query , It's not as fast as scanning the whole table , At this point, the index will be invalidated .

6.4 SQL Tips

SQL Tips , Is an important means to optimize the database , Simply speaking , Is in the SQL Add some human prompts in the statement to optimize the operation .

1、 use index : Suggest MySQL Which index to use to complete this query ( It's just advice ,mysql Internal evaluation will be conducted again ).

explain select * from  Table name  use index( Index name ) where  Conditions 

2、 ignore index : Ignore the specified index .

explain select * from  Table name  ignore index( Index name ) where  Conditions 

3、force index : Force index .

explain select * from  indicate  force index( Index name ) where  Conditions 

6.5 Overlay index

Try to use overlay index , Reduce select *.

So what is an overlay index ? Overlay index refers to the index used in the query , And the columns that need to be returned , All can be found in this index .

This is equivalent to , Our watch has id,age,sex These fields , then id and age They all have indexes .

If we're making a query Use it directly select * , Return all fields , At this time, the back table query will be triggered , What back to table query ?

Originally, if we only returned age, instead of * Number Return all data , This will take the secondary index , To age Field in the secondary index . Found... In the secondary index age The corresponding primary key value , Then return the data .

If the data returned from the query is *, So at this time , You also need to use the primary key value , Find the record corresponding to the primary key value in the clustered index , Finally find Primary key value The corresponding line row.

This requires two index scans , That is, you need to query back to the table , The performance is relatively poor .

In this case, how to optimize ?

That is to add the index to the field without index .

6.6 Prefix index

When the field type is string (varchar,text,longtext etc. ) when , Sometimes you need to index long strings , This makes the index big , When inquiring , Waste a lot of disk IO, Affecting query efficiency .

At this point, you can prefix only part of the string with , build Vertical index , This can greatly save index space , To improve index efficiency .

The grammar is as follows :

create index idx_xxxx on table_name(column(n)) ;

Example : by tb_user Tabular email Field , Set the length to 5 Prefix index of .

create index idx_email_5 on tb_user(email(5));

Prefix length :

It can be determined according to the selectivity of the index , Selectivity refers to index values that are not repeated ( base ) And the total number of records in the data table , The higher the index selectivity, the higher the query efficiency , The only index selectivity is 1, This is the best index selectivity , Performance is also the best .

select count(distinct email) / count(*) from tb_user ;

6.7 Single column index and joint index

  • Single index : That is, an index contains only a single column .

  • Joint index : That is, an index contains multiple columns .

In the business scenario , If there are multiple query criteria , Consider when indexing fields , It is recommended to establish a joint index , Instead of a single column index .

Because if , We use a single column index , A query statement , There are multiple indexes ,MySQL Only one of them , In this way, the rest will certainly go back to the table query and reduce the performance .

7、 Index design principles

  1. For large amount of data , The tables that are frequently queried are indexed .

  2. For often used as query criteria (where)、 Sort (order by)、 grouping (group by) Index the fields of the operation .

  3. Try to select highly differentiated columns as indexes , Try to build a unique index , The more distinguishable , The more efficient the index is .

  4. If it is a string type field , The length of the field is long , You can focus on the characteristics of the field , Building prefix index .

  5. Try to use a federated index , Reduce single column index , When inquiring , Joint indexes can often overwrite indexes , Save storage space , Avoid returning to your watch , Improve query efficiency .

  6. To control the number of indexes , The index is not that more is better , More indexes , The greater the cost of maintaining the index structure , It will affect the efficiency of addition, deletion and modification .

  7. If the index column cannot store NULL value , Please use... When creating the table NOT NULL Constrain it . When the optimizer knows whether each column contains NULL When the value of , It can better determine which index is most effectively used for queries .

版权声明
本文为[Everything will always return to plain]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231736239240.html