当前位置：网站首页>Why don't MySQL use select * as query criteria? (continuously updated)

Why don't MySQL use select * as query criteria? (continuously updated)

2022-04-21 23:24:00 【Stobli】

Please note that , This article is only for InnoDB Storage engine discussion .

1、 Reduce the number of times to return to the table

We know ,InnoDB There are two kinds of indexes in , Clustered index and secondary index , Each index corresponds to one B+ Trees , The leaf node of the cluster index stores complete user data , The leaf node of the secondary index only stores Index columns + Primary key data , And the leaf nodes sort the links according to the index column value from small to large , Concrete , You can consult the information and supplement the knowledge points by yourself .

When we execute a query like this

SELECT
    *
FROM
    t1
WHERE
    key1 = "a"

hypothesis t1 There is 2 An index , They are clustered index and secondary index idx_key1 (key1),（ Ignore execution plan ） Then it will take advantage of idx_key1 To execute the query statement , from idx_key1 Find a match in the index key1= "a" The first record , Indexes idx_key1 Except for the records k1 In addition to the value of , The corresponding primary key value is also recorded , But because of SELECT "*" You need to query all user data , And here's only value (key1,id), Therefore, you need to immediately go back to the table with the primary key value to query the complete data of the row , And then back to the client . please remember , Is to query the table immediately , Instead of waiting for all matching records to be queried and then returning to the table at one time , Of course InnoDB To optimize this situation , Some strategies are used to reduce the number of table returns , Here to ignore .

Back to the table ： The records matched in the secondary index will have corresponding index columns and primary key values , Return to the cluster index according to the primary key value （ That's the primary key index ） Query the corresponding complete user record in , This process is back to the table .

So suppose the index idx_key1 It matches 10 Data , Then there will be 10 Back to the table , And the cost of returning to the table is very high , Returning to the table means generating pages IO, Will affect performance .

If we change this query statement ：

SELECT
    key1, id
FROM
    t1
WHERE
    key1 = "a"

Then you only need to find the matching records in the secondary index , There is no need to return to the table , More performance friendly .

2、 Make use of index Access method

Hypothesis table t1 Yes 2 An index , They are clustered index and secondary index （ Joint index ） idx_key_part (key_part1, key_part2, key_part3) .

When we execute a query like this

SELECT
    *
FROM
    t1
WHERE
    key_part2 = "a"

Although there is a joint index , But the access method is still ALL Full table scan , Because in the index idx_key_part in , The data is based on key_part1 Sort , Again according to key_part2 Sort , Finally, according to key_part3 Sort of , When the query condition is key_part2=a,InnoDB Can not know key_part2=a Where are the records of , Full table scan matching is required .

for instance , Suppose the index idx_key_part (key_part1, key_part2, key_part3) There is 4 Data ：

(1,b,X)
(1,c,X)
(2,c,X)
(3,a,X)

, In the case of ignoring the first element , The second element extracted separately is （b,c,c,a）, As we can see, it is disordered , So you can only scan the whole table , Match one by one .

In this case InnoDB Don't use indexes idx_key_part , Because instead of querying the secondary index , And the cost of going back to the table , It's not as fast as directly querying the cluster index .

But if we change the query to the following , Change it to

SELECT
    key1, key2, key3, id
FROM
    t1
WHERE
    key2 = "a"

You can use idx_key_part Indexes , Because although we WHERE Cannot use index in clause , however idx_key_part Recorded key1, key2, key3, id, Compared with direct cluster index , Clustered indexes store more elements , This means one page at a time IO Fewer records can be read , It will produce more IO, therefore InnoDB Will choose idx_key_part To query , That is to say INDEX Access method , That is what we call index coverage .

We go through EXPLAIN Look at the InnoDB Did you really do that ：

3、 Speed up the connection of watches

To put it simply, the principle of associative table is to select a driver table first , Execute query results on the drive table , Every time a query result is matched, the driven table is accessed immediately . For example, the following query statement

SELECT 
    *
FROM 
    t1, t2
WHERE 
    t1.m1>1 AND t1.m1=t2.m2 AND t2.n2<'d'

Suppose you choose t1 Make the driving table , Inquire about t1 The result set after is consistent with t1.m1>1 There are 2 Bar record , Namely

t1.m1 = 2
t1.m1 = 3

So every match to one , You should visit immediately t2 Watch once , Respectively for t2 Table execution query

SELECT * FROM t2 WHERE t2.m2=2 AND t2.n2<'d';
SELECT * FROM t2 WHERE t2.m2=3 AND t2.n2<'d';

Please note that , Is every match to one t1 Record , Check immediately t2 once , Instead of executing the query after all the queries are found . Hypothesis table t2 It's indexed idx_m1_m2_n2, Obviously , above 2 Query statement cannot use this index , But if we change the query to

SELECT 
    t1.*, t2.m2
FROM 
    t1, t2
WHERE 
    t1.m1>1 AND t1.m1=t2.m2 AND t2.n2<'d'

although WHERE Clause cannot take advantage of secondary index , however InnoDB Or choose the secondary index idx_m1_m2_n2 To execute the query , Because you can use INDEX Access method , Cost ratio ALL low , Say it again ,INDEX Than ALL The reason for higher efficiency is , The secondary index stores more data per page , Even a full scan of the secondary index , Generate page IO Also less than clustered indexes .

To make a long story short , Better not to use * As a query list , It's the columns that the query actually uses , increase INDEX Probability as a query method .

4、 stay Join Buffer Place more records in

above-mentioned , Each item of data in the driving table is matched in the join table query , You need to access the driven table once . If the driven table has too much data and cannot use indexes , Equivalent to accessing this table N Time （N Depending on the number of pages ）, This IO The price is too high ！！

therefore InnoDB In order to reduce the number of accesses to the driven table , Think of a strategy . Namely Join Buffer , Request a fixed size of memory before executing the query .

First, put several records in the result set of the driving table into Join Buffer in , Then scan the driven table , The records of each driven table are one-time and Join Buffer Match multiple drive table records in , This matching memory occurs in memory , This can significantly reduce the number of driven tables IO cost .

Join Buffer Not all columns of the drive table records will be stored , Instead, it stores the columns in the query list and filter criteria

therefore , When we write the query, try not to write SELECT *, Instead, write only the columns you really need to use , Improve Join Buffer Utilization ratio ,SELECT The fewer columns , The more pieces you can store , the IO The fewer times ！！

Join Buffer It can be configured through system variables , Default 265KB