当前位置:网站首页>Index: teach you index from zero basis to proficient use

Index: teach you index from zero basis to proficient use

2022-04-23 17:36:00 Crisp_ LF

Indexes :( Teach you from zero foundation to proficient use )

introduction :

​ Recently in to see mysql How it works when this book , I found that my foundation is still relatively thin and soft , So after reading the index in a muddle, I began to look for videos for further study mysql, Index is an important part of database , So I spent a long time here , But learning is a continuous process , Maybe I just finished learning the index these days , Remember clearly , But it must be forgotten after infrequent use , So write down this blog , One is to deepen your memory of the index , It's convenient for you to review and consolidate later .

​ This blog is mainly for my video viewing notes , Add your own integration .

​ The video link is :https://www.bilibili.com/video/BV1Kr4y1i7ru?p=104

​ Reading documents is the fastest way to learn , If you encounter something you won't in the process of learning , You can watch this video , The chapters of the video correspond to this blog one by one .

1. Index Overview

1.1 Introduce

​ Indexes (index) Help MySQL Data structure for efficient data acquisition ( Orderly ). Out of data , The database system also maintains a data structure that satisfies a specific search algorithm , These data structures are referenced in some way ( Point to ) data , In this way, advanced search algorithms can be implemented on these data structures , This data structure is the index .

image-20220420095237701

When it comes to data structures , Everyone will be worried , Worry that you can't understand , Can't keep up with the rhythm . But you don't have to worry about , This article will explain the index in detail , So that Xiaobai can quickly master the index .

1.2 demonstration

The table structure and its data are as follows :

image-20220420095752735
  • If we want to implement SQL Statement for : select * from user where age = 45;

    1. No index case

    image-20220420095841183
    • Without index , You need to scan from the first line , Scan until the last line , We call it Full table scan , Very low performance .

    2. When there is an index

    • If we index this table , Suppose the index structure is a binary tree , So it means , Would be right age This field establishes a binary tree index structure .[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-FHBGhvM3-1650629270338)(/Users/apple/Library/Application Support/typora-user-images/image-20220420100018075.png)]

    • At this point, when we query , It only takes three scans to find the data , Greatly improve the efficiency of query .

remarks : Here we just assume that the structure of the index is a binary tree , Introduce the general principle of index , It's just a sketch , Not the real structure of the index , The real structure of the index , More on that later .

1.3 characteristic

  1. advantage :

    • Improve the efficiency of data retrieval , Reduce the IO cost

    • Sort data through index columns , Reduce the cost of sorting data , Reduce CPU Consumption of .

  2. Inferiority

    • Index columns also take up space .
    • Indexing greatly improves query efficiency , At the same time, it also reduces the speed of updating tables , Such as on the table INSERT、UPDATE、DELETE when , Low efficiency .

2. Index structure

2.1 summary

  • MySQL The index of is implemented in the storage engine layer , Different storage engines have different index structures , It mainly includes the following :
  1. Index structure :

    • B+Tree Indexes : The most common type of index , Most engines support B+ Tree index

    • Hash Indexes : The underlying data structure is realized by hash table , Only the exact index matches the columns , No

      Support range query

    • R-tree( Spatial index ): The spatial index is MyISAM A special index type of the engine , Mainly used for geospatial data types , Usually used less

    • Full-text( Full-text index ): It's a way of building inverted indexes , How to quickly match documents . similar Lucene,Solr,ES

  2. The above is MySQL All index structures supported in , Next , Let's take a look at the support of different storage engines for index structure .

    • B+tree Indexes : InnoDB : Support MyISAM : Support Memory: Support
    • Hash Indexes : InnoDB : I won't support it MyISAM : I won't support it Memory: Support
    • R-tree Indexes : InnoDB : I won't support it MyISAM : Support Memory: I won't support it
    • Full-text Indexes :InnoDB :5.6 Support for MyISAM : Support Memory: I won't support it

Be careful : What we usually call index , If not specified , All refer to B+ Index of tree structure organization .

2.2 Binary tree

  • If say MySQL The index structure of adopts the data structure of binary tree , The ideal structure is as follows :

image-20220420101227504

  • If the primary key is inserted sequentially , Will form a one-way linked list , The structure is as follows :

image-20220420101302156

  • therefore , If you choose a binary tree as the index structure , There will be the following disadvantages :
    • On sequential insertion , It will form a linked list , Query performance is greatly reduced .
    • In case of large amount of data , Deeper levels , The retrieval speed is slow .

At this point, you may think , We can choose red and black trees , Red black tree is a self balanced binary tree , So even if you insert data sequentially , The resulting data structure is also a balanced binary tree , The structure is as follows :

image-20220420101409130

  • however , Even so , Because the red black tree is also a binary tree , So there will be a disadvantage :
    • In case of large amount of data , Deeper levels , The retrieval speed is slow .

therefore , stay MySQL In the index structure of , Did not choose binary tree or red black tree , And the choice is B+Tree, So what is B+Tree Well ? In detail B+Tree Before , Let's first introduce a B-Tree

2.3 B-Tree

  • B-Tree,B A tree is a kind of multi fork road scale search tree , Relative to a binary tree ,B Each node of the tree can have multiple branches , That is, multi fork .

    At a maximum degree (max-degree) by 5(5 rank ) Of b-tree For example , So this one B Each node of the tree can store up to 4 individual key,5 A pointer to the .

    image-20220420101612094

Knowledge tips : The degree of a tree refers to the number of child nodes of a node .

  • We can use a data structure visualization website to simply demonstrate . https://www.cs.usfca.edu/~galles/visualization/BTree.htmlimage-20220420101730883

  • Insert a set of data : 100 65 169 368 900 556 780 35 215 1200 234 888 158 90 1000 88

    120 268 250 . Then observe some data insertion process , Changes in nodes .

    image-20220420101802330

  • characteristic :

    • 5 Step B Trees , Each node can store up to 4 individual key, Corresponding 5 A pointer to the .
    • Once the node stores key The quantity has arrived 5, Will fission , The intermediate element splits up
    • stay B In the tree , Both non leaf nodes and leaf nodes store data .

2.4 B+Tree

  • B+Tree yes B-Tree Variants , We have a maximum degree (max-degree) by 4(4 rank ) Of b+tree For example , Let's take a look at its structural diagram : image-20220420101903847

  • We can see , Two parts :

    • The green framed part , It's the index part , It only plays the role of indexing data , Don't store data .

    • The part framed in red , It's the data storage part , Specific data should be stored in its leaf node

  • We can use a data structure visualization website to simply demonstrate . https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html image-20220420102008477

  • Insert a set of data : 100 65 169 368 900 556 780 35 215 1200 234 888 158 90 1000 88

    120 268 250 . Then observe some data insertion process , Changes in nodes .image-20220420102031119

  • In the end, we see ,B+Tree And B-Tree comparison , There are three main differences :

    • All the data will appear in the leaf node .
    • Leaf nodes form a one-way linked list .
    • Non leaf nodes only serve to index data , The specific data is stored in the leaf node .
  • The structure we see above is standard B+Tree Data structure of , Next , Let's see MySQL After optimization in B+Tree.

    • MySQL Index data structure for classic B+Tree optimized . In the original B+Tree On the basis of , Add a pointer to the linked list of adjacent leaf nodes , So we have a sequence pointer B+Tree, Improve the performance of interval access , Conducive to sorting .image-20220420102140679

2.5 Hash

  • MySQL In addition to support B+Tree Indexes , It also supports an index type —Hash Indexes .
  1. structure :

    • Hash index is to use a certain hash Algorithm , Convert key values to new hash value , Map to the corresponding slot , Then stored in hash In the table .[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-cbtQfHK5-1650629270340)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220420102305.png)]

    • If two ( Or more ) Key value , Map to the same slot , They produced hash Conflict ( Also known as hash Collision ), It can be solved by linked list .[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-z78lMX5p-1650629270340)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220420102342.png)]

  2. characteristic :

    • Hash Indexes can only be used for peer-to-peer comparisons (=,in), Range query is not supported (between,>,< ,…)
    • Cannot complete sort operation with index
    • High query efficiency , Usually ( non-existent hash Conflict situation ) It only needs one search , Efficiency is usually higher than B+tree Cable lead
  3. Storage engine support

  • stay MySQL in , Support hash The index is Memory Storage engine . and InnoDB It has adaptive function hash function ,hash The index is InnoDB The storage engine is based on B+Tree The index is automatically built under specified conditions .

Thinking questions : Why? InnoDB The storage engine chooses to use B+tree Index structure ?

  • Relative to a binary tree , Fewer levels , High search efficiency ;

  • about B-tree, Whether leaf nodes or non leaf nodes , Data will be saved , This causes a page to store

The key value of is reduced , The pointer decreases , Save a lot of data as well , Can only increase the height of the tree , Resulting in reduced performance ;

  • relative Hash Indexes ,B+tree Support range matching and sorting operations ;

3. Index classification

1. Index type

  • stay MySQL database , The specific types of indexes are mainly divided into the following categories : primary key 、 unique index 、 General index 、 Full-text index .

    • primary key :

      • meaning : The index created for the primary key in the table
      • characteristic : Automatically created by default , There can only be one
      • keyword :PRIMARY
    • unique index :

      • meaning : Avoid duplicate values in a data column in the same table
      • characteristic : There can be multiple
      • keyword :UNIQUE
    • General index :

      • meaning : Quickly locate specific data
      • characteristic : There can be multiple
      • keyword : nothing
    • Full-text index :

      • meaning : The full-text index looks up the keywords in the text , Instead of comparing the values in the index
      • characteristic : There can be multiple
      • keyword :FULLTEXT

2. Clustered index & Secondary indexes

  • And in the InnoDB In the storage engine , According to the storage form of the index , It can be divided into the following two types :

    • Clustered index (ClusteredIndex)
      • meaning : Put data storage and index together , The leaf node of the index structure holds the row data
      • characteristic : There has to be , And there's only one
    • General index :
      • meaning : Separate data from index , The leaf node of the index structure is associated with the corresponding primary key
      • characteristic : There can be multiple
  • Clustered index selection rules :

    • If there is a primary key , A primary key index is a clustered index .

    • If there is no primary key , The first unique... Will be used (UNIQUE) Index as clustered index .

    • If the table does not have a primary key , Or there is no suitable unique index , be InnoDB It will automatically generate a rowid As a hidden clustered index .

  • The specific structures of clustered index and secondary index are as follows :image-20220420103237255

    • The data of this row is hung under the leaf node of the clustered index .

    • The primary key value corresponding to the field value is hung under the leaf node of the secondary index .

  • Next , Let's analyze , When we execute the following SQL When the sentence is , What is the specific search process like .image-20220420103346614

  • The specific process is as follows :

    • Because it's based on name Field to query , So first according to name='Arm’ To name Field in the secondary index . But in the secondary index, you can only find Arm The corresponding primary key value 10.
    • Because the data returned by the query is *, So at this time , You also need to use the primary key value 10, Find in the clustered index 10 Corresponding records , Finally find 10 The corresponding line row.
    • Finally get the data of this line , Just go back .

Return to the table for query : This first to look up data in the secondary index , Find primary key value , Then it is added to the clustered index according to the primary key value , How to get the data , It is called back to table query .

Thinking questions :

Here are two SQL sentence , The execution efficiency is high ? Why? ?

  • A. select * from user where id = 10 ;

  • B. select * from user where name = ‘Arm’ ;

remarks : id Primary key ,name Fields are indexed ;

answer :

  • A The execution performance of the statement is higher than B sentence .

  • because A Statement goes directly to the clustered index , Direct return data . and B The statement needs to query name The secondary index of the field , Then query the clustered index , That is, you need to query back to the table .

4. Index Syntax

  1. Create index
CREATE [ UNIQUE | FULLTEXT ] INDEX index_name ON table_name (index_col_name,... ) ;
  1. Look at the index
SHOW INDEX FROM table_name ;
  1. Delete index
DROP INDEX index_name ON table_name ;

Case presentation

  • Let's create a table first tb_user, And query the test data .
create table tb_user(
  id int primary key auto_increment comment ' Primary key ', 
  name varchar(50) not null comment ' user name ', 
  phone varchar(11) not null comment ' cell-phone number ', 
  email varchar(100) comment ' mailbox ', 
  profession varchar(11) comment ' major ', 
  age tinyint unsigned comment ' Age ', 
  gender char(1) comment ' Gender  , 1:  male , 2:  Woman ', 
  status char(1) comment ' state ', 
  createtime datetime comment ' Creation time ' 
) comment ' System user table '; 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Lyu3 bu4 ', '17799990000', '[email protected]', ' Software Engineering ', 23, '1', '6', '2001-02-02 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Cao Cao ', '17799990001', '[email protected]', ' Communication Engineering ', 33, '1', '0', '2001-03-05 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' zhaoyun ', '17799990002', '[email protected]', ' English ', 34, '1', '2', '2002-03-02 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' The Monkey King ', '17799990003', '[email protected]', ' Project cost ', 54, '1', '0', '2001-07-02 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Hua mu LAN ', '17799990004', '[email protected]', ' Software Engineering ', 23, '2', '1', '2001-04-22 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Big Joe ', '17799990005', '[email protected]', ' dance ', 22, '2', '0', '2001-02-07 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Luna ', '17799990006', '[email protected]', ' Applied Mathematics ', 24, '2', '0', '2001-02-08 00:00:00'); 12345678910111213141516171819
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Cheng Yaojin ', '17799990007', '[email protected]', ' chemical ', 38, '1', '5', '2001-05-23 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Xiang yu ', '17799990008', '[email protected]', ' Metal material ', 43, '1', '0', '2001-09-18 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' White ', '17799990009', '[email protected]', ' Mechanical engineering and its automation   turn ', 27, '1', '2', '2001-08-16 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Han xin ', '17799990010', '[email protected]', ' Inorganic nonmetallic material worker   cheng ', 27, '1', '0', '2001-06-12 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Jingke ', '17799990011', '[email protected]', ' accounting ', 29, '1', '0', '2001-05-11 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' The king of Lanling ', '17799990012', '[email protected]', ' Project cost ', 44, '1', '1', '2001-04-09 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Berserker iron ', '17799990013', '[email protected]', ' Applied Mathematics ', 43, '1', '2', '2001-04-10 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' The sable cicada ', '17799990014', '[email protected]', ' Software Engineering ', 40, '2', '3', '2001-02-12 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Daji ', '17799990015', '[email protected]', ' Software Engineering ', 31, '2', '0', '2001-01-30 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Mi month ', '17799990016', '[email protected]', ' industrial economy ', 35, '2', '0', '2000-05-03 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Ying Zheng ', '17799990017', '[email protected]', ' chemical ', 38, '1', '1', '2001-08-08 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Judge dee ', '17799990018', '[email protected]', ' International Trade ', 30, '1', '0', '2007-03-12 00:00:00');
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Angela ', '17799990019', '[email protected]', ' city planning ', 51, '2', '0', '2001-08-15 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Dianwei ', '17799990020', '[email protected]', ' city planning ', 52, '1', '2', '2000-04-12 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Lian po ', '17799990021', '[email protected]', ' Civil Engineering ', 19, '1', '3', '2002-07-18 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Hou Yi ', '17799990022', '[email protected]', ' Urban landscape ', 20, '1', '0', '2002-03-10 00:00:00'); 
INSERT INTO tb_user (name, phone, email, profession, age, gender, status, createtime) VALUES (' Jiang Ziya ', '17799990023', '[email protected]', ' Project cost ', 29, '1', '4', '2003-05-26 00:00:00');
  • When the data is ready , Next , Let's complete the following requirements :

    • A. name The field is the name field , The value of this field may be repeated , Create an index for this field .
    CREATE INDEX idx_user_name ON tb_user(name);
    
    • B. phone The value of the mobile phone number field , Right and wrong , And only , Create a unique index for this field .
    CREATE UNIQUE INDEX idx_user_phone ON tb_user(phone);
    
    • C. by profession、age、status Create a federated index .
    CREATE INDEX idx_user_pro_age_sta ON tb_user(profession,age,status);
    
    • D. by email Establish an appropriate index to improve query efficiency .
    CREATE INDEX idx_email ON tb_user(email);
    
    • After completing the above requirements , Let's see tb_user All index data of the table .
    show index from tb_user;
    
image-20220420221201267

5.SQL Performance analysis

1.SQL Frequency of execution

MySQL After successful client connection , adopt show [session|global] status Command can provide server status information . By the following instructions , You can view the INSERT、UPDATE、DELETE、SELECT Frequency of visits :

-- session  Is to view the current session  ; 
-- global  Global query is data  ; 
SHOW GLOBAL STATUS LIKE 'Com_______';

image-20220420221755854

  • Com_delete: Number of deletions

  • Com_insert: Number of inserts

  • Com_select: Query times

  • Com_update: Number of updates

  • We can perform several more query operations in the current database , Then check the execution frequency again , have a look Com_select Whether the parameters will change .

image-20220421081853490

Through the above instructions , We can see whether the current database is based on query , Or mainly add, delete and modify , So as to provide data

Provide reference basis for library optimization . If it is mainly based on addition, deletion and modification , We can consider not optimizing the index . If so

Query based , Then we should consider optimizing the index of the database .

Then through the query SQL Frequency of execution of , We can know whether the current database is mainly for addition, deletion and modification , Or mainly query . If it is mainly based on query , How can we optimize those query statements ? The number of times we can use the slow query log .

2. Slow query log

  • The slow query log records all execution times that exceed the specified parameters (long_query_time, Company : second , Default 10 second ) All of the SQL Statement log .

  • MySQL The slow query log is not enabled by default , We can look at the system variables slow_query_log. image-20220421082255872

  • If you want to start slow query log , Need to be in MySQL Configuration file for (/etc/my.cnf) The following information is configured in :

    #  Turn on MySQL Slow log query switch  
    slow_query_log=1 
    #  Set the time of slow log to 2 second ,SQL Statement execution time exceeds 2 second , It will be regarded as slow query , Log slow queries  
    long_query_time=2
    
  • If you want to start slow query log , Need to be in MySQL Configuration file for (/etc/my.cnf) The following information is configured in :

    # /var/lib/mysql/localhost-slow.log.
    # Restart MySQL The server 
    systemctl restart mysqld
    
  • then , Check the switch again , The slow query log has been opened .image-20220421082645452

test :

  • The implementation is as follows SQL sentence :
--  This article SQL The execution efficiency is quite high ,  Execution time consuming  0.00sec
select * from tb_user;  
--  because tb_sku In the table ,  Pre deposited 1000w The record of , count once , Time consuming  13.35sec
select count(*) from tb_sku; 

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-d0zLtmzz-1650629270341)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220421082932.png)]

  • Check the slow query log :

    • In the end, we found out , In slow query log , Only the execution time will be recorded, which exceeds our preset time (2s) Of SQL, Perform faster SQL It won't be recorded .image-20220421083030520
  • That such , Log by slow query , We can locate those with low execution efficiency SQL, Thus targeted optimization .

3.profile**** details

  • show profiles Be able to do SQL Optimization helps us understand where the time is spent . adopt have_profiling

    Parameters , Be able to see the present MySQL Do you support profile operation :

    SELECT @@have_profiling ;
    

image-20220421083309826

  • You can see , At present MySQL It's supporting profile Operation of the , But the switch is off . Can pass set Statements in

    session/global Level on profiling:

    SET profiling = 1;
    
  • The switch has been turned on , Next , What we do SQL sentence , Will be MySQL Record , And record where the execution time is spent . We directly execute the following SQL sentence :

    select * from tb_user; 
    select * from tb_user where id = 1; 
    select * from tb_user where name = ' White '; 
    select count(*) from tb_sku;
    
  • Execute a series of business SQL The operation of , Then check the execution time of the instruction through the following instructions :

    --  Check each one SQL The time-consuming basic situation of  
    show profiles; 
    
    --  View specified query_id Of SQL The time-consuming situation of each stage of the statement  
    show profile for query query_id; 
    
    --  View specified query_id Of SQL sentence CPU Usage situation  
    show profile cpu for query query_id;
    
  • Check each one SQL Time consuming : image-20220421083719165

  • View specified SQL The time-consuming situation of each stage : image-20220421083750707

4.explain

  • EXPLAIN perhaps DESC Command acquisition MySQL How to execute SELECT Statement information , Included in SELECT How tables are joined and the order in which they are joined during statement execution .

    • grammar :

      --  Directly in select Add the keyword before the statement  
      explain / desc EXPLAIN SELECT  Field list  FROM  Table name  WHERE  Conditions  ;
      
    • result

      image-20220421083951942

  • Explain The meaning of each field in the execution plan :

    • id : select The serial number of the query , Represents execution in a query select Clause or the order of the operation table (id identical , Execution order from top to bottom ;id Different , The bigger the value is. , Execute first ).
    • select_type : Express SELECT The type of , Common values are SIMPLE( A simple watch , That is, no table join or subquery is used )、PRIMARY( Main query , That is, the outer query )、UNION(UNION The second or subsequent query statement in )、SUBQUERY(SELECT/WHERE Then it contains sub queries ) etc.
    • type : Indicates the connection type , The connection types with good to poor performance are NULL、system、const、eq_ref、ref、range、 index、all .
    • possible_key : Displays the indexes that may be applied to this table , One or more .
    • key : Actual index used , If NULL, No index is used .
    • key_len : Represents the number of bytes used in the index , This value is the maximum possible length of the index field , It's not the actual length , Without losing accuracy , The shorter the length, the better .
    • rows : MySQL The number of rows that you think you must execute the query , stay innodb In the engine's table , Is an estimate , It may not always be accurate .
    • filtered : Represents the number of rows returned as a percentage of the number of rows to be read , filtered The greater the value, the better .

6. Index usage

1. Verify index efficiency

  • Before explaining the principles of index use , Let's start with a simple case , To verify the index , See if you can improve data query performance by indexing . During the demonstration , Let's use the form we prepared before tb_sku , In this table 1000w The record of .image-20220421084606428

  • In this list id Primary key , There's a primary key index , Other fields are not indexed . Let's first query one of the records , Look at the fields inside , The implementation is as follows SQL:

    select * from tb_sku where id = 1\G;
    

    image-20220421084714502

  • You can see that even if there is 1000w The data of , according to id Data query , The performance is still very fast , Because the primary key id It's indexed . So next , Let's go again according to sn Field to query , The implementation is as follows SQL:

    SELECT * FROM tb_sku WHERE sn = '100000003145001';
    

    image-20220421084848680

  • We can see the basis sn Field to query , The query returned a piece of data , The result is time consuming 20.78sec, Because of sn No index , And the query efficiency is very low . Then we can aim at sn Field , Build an index , After indexing , We are again based on sn The query , Let's take a look at the query time .

  • Create index :

    create index idx_sku_sn on tb_sku(sn) ;
    
  • And then do the same again SQL sentence , Look again SQL Time consuming .

    SELECT * FROM tb_sku WHERE sn = '100000003145001';
    

    image-20220421085228578

  • We will obviously see ,sn After the field is indexed , Query performance is greatly improved . Before and after indexing , The query time is not an order of magnitude .

2. The leftmost prefix rule

  • If you index multiple columns ( Joint index ), Follow the leftmost prefix rule . The leftmost prefix rule means that the query starts from the leftmost column of the index , And don't skip columns in the index . If you jump a column , The index will be partially invalidated ( The following field index is invalid ).

  • With tb_user Table as an example , Let's take a look at the previous tb_user The index created by the table .image-20220421085342854

  • stay tb_user In the table , There is a joint index , This joint index involves three fields , The order is :profession,age,status.

  • For the leftmost prefix, the law refers to , When inquiring , Leftmost column , That is to say profession There must be , Otherwise, all indexes will be invalid . And you can't skip a column in the middle , Otherwise, the field index behind the column will be invalidated . Next , Let's demonstrate several groups of cases , Take a look at the specific implementation plan :

  • explain select * from tb_user where profession = ' Software Engineering ' and age = 31 and status = '0';
    

    image-20220421085502536

  • explain select * from tb_user where profession = ' Software Engineering ' and age = 31;
    

    image-20220421085549878

  • explain select * from tb_user where profession = ' Software Engineering ';
    

    image-20220421085816429

  • Of the above three sets of tests , We found that as long as the leftmost field of the joint index profession There is , The index will take effect , But the length of the index is different . And tested by the above three groups , We can also speculate that profession The length of the field index is 47、age The length of the field index is 2、status The length of the field index is 5.

  • explain select * from tb_user where age = 31 and status = '0';
    

    image-20220421085949251

  • explain select * from tb_user where status = '0';
    

    [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-x3K6cRZ5-1650629270343)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220421090046.png)]

  • And through the above two sets of tests , We can also see that the index is not effective , The reason is that the leftmost prefix rule is not satisfied , The leftmost column of the union index profession non-existent .

  • explain select * from tb_user where profession = ' Software Engineering ' and status = '0';
    

    [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-2Jvc6fUl-1650629270343)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220421090126.png)]

  • Aforementioned SQL When inquiring , There is profession Field , The leftmost column exists , The index satisfies the basic condition of the leftmost prefix rule . But when querying , Skip the age This column , So the following column indexes will not be used , That is, the index part takes effect , So the length of the index is 47.

3. Range queries

  • In the union index , Range query appears (>,<), The column index on the right side of the range query is invalid .

  • explain select * from tb_user where profession = ' Software Engineering ' and age > 30 and status = '0';
    

    image-20220421090249608

  • When range queries use > or < when , Let's go , But the length of the index is 49, It indicates that the query range is on the right status Fields are not indexed .

  • explain select * from tb_user where profession = ' Software Engineering ' and age >= 30 and status = '0';
    

    image-20220421090328496

  • When range queries use >= or <= when , Let's go , But the length of the index is 54, It means that all fields are indexed .

  • therefore , When business allows , Use as much as possible similar to >= or <= This kind of range query , Avoid using > or < .

4. Index failure

1. Column operation of index

  • Do not operate on index columns , Index will fail .

  • Case study :

    • stay tb_user In the table , In addition to the union index described earlier , There is also an index , yes phone Single column index of the field .

      image-20220422190102080

    • A. When according to phone When performing equivalent matching query on fields , The index works .

      explain select * from tb_user where phone = '17799990015';
      

      image-20220422190212739

    • B. When according to phone After the function operation of the field , Index failure .

      explain select * from tb_user where substring(phone,10,2) = '15';
      

      image-20220422190307444

2. String without quotes

  • When using string type fields , Without quotes , Index will fail .

  • Case study :

    • We use two sets of examples , Let's look at fields of string type , The difference between single quotation mark and no single quotation mark :

      explain select * from tb_user where profession = ' Software Engineering ' and age = 31 and status = '0'; 
      
      explain select * from tb_user where profession = ' Software Engineering ' and age = 31 and status = 0;
      

      image-20220422190510138

      explain select * from tb_user where phone = '17799990015'; 
      
      explain select * from tb_user where phone = 17799990015;
      

      image-20220422191534513

    • After the above two sets of examples , We will clearly find , If the string is not in single quotation marks , For query results , It doesn't matter , But the database has implicit type conversion , Index will fail .

3. Fuzzy query

  • If it's just tail blur matching , The index will not fail . If it's a fuzzy head match , Index failure .

  • Case study :

    • Let's take a look at these three SQL Statement execution effect , Take a look at its implementation plan :

    • Because in the following query statement , Are all based on profession A field , Conform to the leftmost prefix rule , The union index is valid , Let's mainly take a look at , Fuzzy query ,% Add before keyword , And the influence added after the keyword .

      explain select * from tb_user where profession like ' Software %'; 
      
      explain select * from tb_user where profession like '% engineering ';
      
      explain select * from tb_user where profession like '% work %';
      

      image-20220422191834671

    • After the above tests , We found that , stay like Fuzzy query , Add the keyword after %, The index can work . And if you add... In front of the keyword %, The index will fail .

4.or Connection condition

  • Case study

    • use or The conditions of separation , If or The columns in the previous condition are indexed , And there's no index in the next column , Then the indexes involved will not be used .

      explain select * from tb_user where id = 10 or age = 23; 
      
      explain select * from tb_user where phone = '17799990017' or age = 23;
      

      image-20220422192053044

    • because age No index , So even id、phone There is an index , Indexes will also fail . So you need a needle for age Also index . then , We can age Field indexing .

      create index idx_user_age on tb_user(age);
      

      image-20220422192151900

    • After indexing , Let's do the above again SQL sentence , Look at the changes in the implementation plan before and after .

      image-20220422192247099

    • Final , We found that , When or The conditions of connection , When the left and right fields have indexes , The index will take effect .

5. Data distribution affects

  • If MySQL Evaluation uses indexes more slowly than full tables , Index is not used .

  • Case study

    select * from tb_user where phone >= '17799990005';
    
    select * from tb_user where phone >= '17799990015';
    

    image-20220422192440409image-20220422192500534

  • After testing, we found that , same SQL sentence , Only the field values passed in are different , The final implementation plan is also completely different , Why is that ?

    • Because of MySQL In the query , Will evaluate the efficiency of using the index and the efficiency of full table scanning , If you walk the whole table, scanning is faster , Then abandon the index , Take a full scan . Because the index is used to index a small amount of data , If large quantities of data are returned through index query , It's not as fast as scanning the whole table , At this point, the index will be invalidated .

5.SQL**** Tips

  • Let's follow the table above , Check the index image-20220422192830517

  • Put the above idx_user_age, idx_email The two indexes used in the previous test are deleted directly .

    drop index idx_user_age on tb_user; 
    
    drop index idx_email on tb_user;
    
  • A. perform SQL : explain select * from tb_user where profession = ‘ Software Engineering ’;

    image-20220422192959938

    • The query left the union index .
  • B. perform SQL, establish profession A single column index of :create index idx_user_pro ontb_user(profession);

    image-20220422193049182

  • C. After creating a single column index , Re execution A Medium SQL sentence , View execution plan , See which index to go .

    image-20220422193128444

  • test result , We can see ,possible_keys in idx_user_pro_age_sta,idx_user_pro Both indexes may use , Final MySQL I chose idx_user_pro_age_sta Indexes . This is a MySQL The result of automatic selection .

  • that , Can we check when , Specify which index to use by yourself ? The answer is yes , At this point, you can use MySQL Of SQL Prompt to complete . Next , Introduce to you SQL Tips .

    • SQL Tips , Is an important means to optimize the database , Simply speaking , Is in the SQL Add some human prompts in the statement to optimize the operation .

      1. use index : Suggest MySQL Which index to use to complete this query ( It's just advice ,mysql Internal evaluation will be conducted again ).
      explain select * from tb_user use index(idx_user_pro) where profession = ' Software engineering   cheng ';
      
      1. ignore index : Ignore the specified index .
      explain select * from tb_user ignore index(idx_user_pro) where profession = ' Software engineering   cheng ';
      
      1. force index : Force index .
      explain select * from tb_user force index(idx_user_pro) where profession = ' Software engineering   cheng ';
      
    • Examples demonstrate

      • A. use index

        explain select * from tb_user use index(idx_user_pro) where profession = ' Software engineering   cheng ';
        

        image-20220422193507522

      • B. ignore index

        explain select * from tb_user ignore index(idx_user_pro) where profession = ' Software engineering   cheng ';
        

        [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-2zLjTrzT-1650629270345)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220422193556.png)]

      • C. force index

        explain select * from tb_user force index(idx_user_pro_age_sta) where profession = ' Software Engineering ';
        

        image-20220422193635826

6 . Overlay index

  • Try to use overlay index , Reduce select *. So what is an overlay index ? Overlay index means The query uses an index , And the columns that need to be returned , All can be found in this index .

  • Let's look at a group SQL Implementation plan of , Look at the differences in the implementation plan , Then we will do a specific analysis .

    explain select id, profession from tb_user where profession = ' Software Engineering ' and age = 31 and status = '0' ; 
    
    explain select id,profession,age, status from tb_user where profession = ' Software Engineering ' and age = 31 and status = '0' ; 
    
    explain select id,profession,age, status, name from tb_user where profession = ' soft   Piece of work ' and age = 31 and status = '0' ; 
    
    explain select * from tb_user where profession = ' Software Engineering ' and age = 31 and status = '0';
    
    • The above items SQL The execution result of is :

    image-20220422193914251

  • From the above implementation plan, we can see , These four SQL All indicators in front of the execution plan of the statement are the same , I can't see the difference . But at this time , Our main focus is on the latter Extra, The first two days SQL As the result of the Using where; Using Index ; And the last two SQL As the result of the : Using index condition .

    • Using where; Using Index : Search uses index , But all the data needed can be found in the index column , So you don't need to go back to the table to query the data
    • Using index condition : Search uses index , But you need to return the table to query the data
  • because , stay tb_user There is a union index in the table idx_user_pro_age_sta, The index is associated with three fields profession、age、status, And this index is also a secondary index , Therefore, the primary key of this row is hung under the leaf node id. So when we query the returned data in id、profession、age、status In , Then go directly to the secondary index and directly return the data . If it goes beyond this range , You need to get the primary key id, Then scan the clustered index , Get additional data , This process is back to the table . And if we keep using select * The query returns all field values , It is easy to cause back to table query ( Unless the query is based on the primary key , Only the clustered index will be scanned at this time ).

  • For a clearer understanding , What is an overlay index , What is return table query , Let's take a look at the following group SQL Implementation process of .

    A. Table structure and index diagram :

    image-20220422194310303

  • id It's the primary key , Is a clustered index . name The field establishes a general index , Is a secondary index ( Secondary index ).

    B. perform SQL : select * from tb_user where id = 2;

    image-20220422194215721

  • according to id Inquire about , Go directly to the clustered index query , An index scan , Direct return data , High performance

    C. perform SQL:selet id,name from tb_user where name = ‘Arm’; image-20220422194505392

  • Although it is based on name A field , Query secondary index , But because the query returns in the field id,name, stay name In the secondary index of , These two values can be obtained directly , Because the overlay index , So there is no need to query back to the table , High performance .

    D. perform SQL:selet id,name,gender from tb_user where name = ‘Arm’;image-20220422194615174

  • Because in name In the secondary index of , It doesn't contain gender, therefore , Two index scans are required , That is, you need to query back to the table , The performance is relatively poor .

Thinking questions :

  • A watch , There are four fields (id, username, password, status), Because of the amount of data , The following SQL Statement optimization , How to proceed is the best solution :
select id,username,password from tb_user where username ='itcast';
  • answer : Aim at username, password Set up a joint index , sql by : create index

idx_user_name_pass on tb_user(username,password);

  • This can avoid the above SQL sentence , During the query , The back table query... Appears .

7. Prefix index

  • When the field type is string (varchar,text,longtext etc. ) when , Sometimes you need to index long strings , This makes the index big , When inquiring , Waste a lot of disk IO, Affecting query efficiency . At this point, you can prefix only part of the string with , Index , This can greatly save index space , To improve index efficiency .
  1. grammar :
create index idx_xxxx on table_name(column(n)) ;
  • Example :

    • by tb_user Tabular email Field , Set the length to 5 Prefix index of .

      create index idx_email_5 on tb_user(email(5));
      

      [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-VbsrO5hw-1650629270346)(https://gitee.com/hzg-sss/typora-picture/raw/master/20220422195014.png)]

  1. Prefix length
  • It can be determined according to the selectivity of the index , Selectivity refers to index values that are not repeated ( base ) And the total number of records in the data table , The higher the index selectivity, the higher the query efficiency , The only index selectivity is 1, This is the best index selectivity , Performance is also the best .

    select count(distinct email) / count(*) from tb_user ; 
    
    select count(distinct substring(email,1,5)) / count(*) from tb_user ;
    
  1. The query process of prefix index

    image-20220422195234707

8. Single column index and joint index

  • Single index : That is, an index contains only a single column .

  • Joint index : That is, an index contains multiple columns .

  • Let's take a look first tb_user The current index in the table :image-20220422195352249

    • In the queried index , Existing single column index , There is also a joint index .
  • Next , Let's execute a SQL sentence , Look at its implementation plan :

    image-20220422195433512

    • From the above implementation plan, we can see , stay and Two fields connected phone、name There are single column indexes on , But in the end mysql Only one index will be selected , in other words , The index of only one field , At this time, the table query will be returned .
  • Then , Let's create another phone and name Field to query the execution plan .

    create unique index idx_user_phone_name on tb_user(phone,name);
    

    image-20220422195540705

    • here , When inquiring , Just left, joint index , And include in the joint index phone、name Information about , The corresponding primary key is hung under the leaf node id, Therefore, there is no need to query back to the table .

In the business scenario , If there are multiple query criteria , Consider when indexing fields , It is recommended to establish a joint index , Instead of a single column index .

  • If the query uses a federated index , The specific structural diagram is as follows :

    image-20220422195649885

7. Index design principles

  1. For large amount of data , The tables that are frequently queried are indexed .

  2. For often used as query criteria (where)、 Sort (order by)、 grouping (group by) Index the fields of the operation .

  1. Try to select highly differentiated columns as indexes , Try to build a unique index , The more distinguishable , The more efficient the index is .
  1. If it is a string type field , The length of the field is long , You can focus on the characteristics of the field , Building prefix index .

  2. Try to use a federated index , Reduce single column index , When inquiring , Joint indexes can often overwrite indexes , Save storage space , Avoid returning to your watch , Improve query efficiency .

  3. To control the number of indexes , The index is not that more is better , More indexes , The greater the cost of maintaining the index structure , It will affect the efficiency of addition, deletion and modification .

  4. If the index column cannot store NULL value , Please use... When creating the table NOT NULL Constrain it . When the optimizer knows whether each column contains NULL When the value of , It can better determine which index is most effectively used for queries .

版权声明
本文为[Crisp_ LF]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231727096349.html