当前位置:网站首页>2022-08-09 mysql/stonedb-subquery performance improvement-introduction

2022-08-09 mysql/stonedb-subquery performance improvement-introduction

2022-08-10 00:32:00 Emperor Zun Wu Shi

目录

摘要:

开场白:

一. subquery(子查询)是什么

子查询定义:

为什么需要子查询:

子查询的种类:

二. 为何subquery会成为性能瓶颈

mysql查询处理过程:

Factors that cause performance bottlenecks:

Caused by a resource conflict:

Caused by resource exhaustion:

nested loop引发的

mysql8 Subquery optimization enhancements

Join Algorithms

Join Optimizations

主要join算法对比:

stonedbThe slow subqueries that actually occur in :

三. StoneDB对于subquery的性能提升

Query Optimization Methodology

性能分析工具:

Take a slow subquery as an example to analyze the calming point:

慢子查询explain分析:

性能瓶颈点:

热力图分析:

解决性能瓶颈

optimized slowSQL的explain分析:

Time-consuming comparison after optimization:

Optimized heatmap analysis:

结束语:


摘要:

子查询作为SQL语言中重要的组成部分, Exactly what subqueries makeSQLThe language really becomes a structured query language,使SQLThe range of applications is unprecedented.尤其是OLAP场景,由于业务的复杂性,resulting in a large number of subqueries.Subqueries have also become the most complex and time-consuming part of a relational database to execute,How to improve the performance of subqueries,对于一款OLAPThe relational database plays a decisive role.

结合我们stonedbItems experience with subqueries actually optimized for subqueries,Starts a lecture series on subquery performance improvements,I hope you can learn from the actual optimization of sub-queries in our project,Can learn from experience,It also lets you know our company's focus on technology and the refinement of products.

开场白:

我先自我介绍一下吧,My name is Wu Shi,目前在stonedb开发部,Then take up the position of database R&D engineer.So keep doing itsubquerySubquery performance improves this thing.Subqueries are very important in the whole database,It is also a relatively core place involving the performance of the entire database.So it's specifically for this,And then combined with what you are doing and the problems that should be solved the most, A series of lectures were given,The purpose is, I just have an idea: Not just doing things right,更重要的,It's about doing things right the right way.

Make this thing public.The purpose is one aspect,Let everyone see the result of this thing and an output.More important,More importantly, let everyone see how we solve this problem,Then our way method,然后呢,给大家一个参考.

一. subquery(子查询)是什么

Before solving the subquery his performance issues,First of all, clearly define what the subquery is,Where is its border?如果根据mysql自己的定义,That is, the subquery is in a statementSELECT语句.

子查询定义:

A subquery is within a statementSELECT语句

A subquery is a SELECT statement within another statement.

例如:

SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

为什么需要子查询:

  1. 允许结构化查询,so that each part of the statement can be isolated;
  2. 提供了执行操作的替代方法,Otherwise these operations require complex joins and unions;
  3. Subqueries are more readable than complex joins or unions.Exactly what subqueries makeSQL成为“结构化查询语言”.

子查询的种类:

  • Subquery as Scalar Operand
  • Comparisons Using Subqueries
  • Subqueries with ANY, IN, or SOME
  • Subqueries with ALL
  • Row Subqueries
  • Subqueries with EXISTS or NOT EXISTS
  • Correlated Subqueries

二. 为何subquery会成为性能瓶颈

mysql查询处理过程:

Before explaining why subqueries can be a performance bottleneck,It is necessary to understand in generalmysqlHow the query is handled.On the basis of understanding each step of query processing,Find out what is causing performance problems.

This chart should be very clear to everyone.Complete lexical analysis and syntax analysis and grammar check,执行预处理, Then to query optimization,Then enter the executor to execute.

Factors that cause performance bottlenecks:

Caused by a resource conflict:

  • when the number of users increases, The application requires a lot of concurrency management
  • Increased lock activity increases data consistency pressure
  • Increased operating system pressure
  • 由于数据量增加, Each transaction requires an increased number of data accesses
  • Poor query optimizer design results in more logic required to return the same number of rowsIO
  • Processing that should be parallel is performed serially

Caused by resource exhaustion:

  • Hardware resources are exhausted
  • Caused by a lot of table scansIO短缺
  • Improper allocation of memory causes pages to be swapped in and out
  • Too many processes and threads overload the operating system

nested loop引发的

Until MySQL 5.6 nested loop was also the only algorithm available. As the name suggests, it works by nesting loops with one loop for each table in the join.

MySQL :: MySQL 5.6 Reference Manual :: 8.8.2 EXPLAIN Output Format

mysql到5.6版本位置,只有nested loop策略.To put it bluntly, there are actually twofor循环,For example, let's look at the picture on the right.On the left is a table on the right,Left as oneoffer,While doing an inquiry,It is convenient to check every bank on the left,All the lines on the right,See if you qualify.Then you can look at the pseudocode on the left.It is actually onen的平方的复杂度.

This is a very crude algorithm.没有经过任何优化.It is similar to the complexity of bubble sort.However, it has been optimized to a certain extent in subsequent versions.

mysql8 Subquery optimization enhancements

mysql8 Subquery optimization enhancements

MySQL :: MySQL 8.0 Reference Manual :: 8.2.1 Optimizing SELECT Statements

Join Algorithms

  • Nested Loop
  • Block Nested Loop
  • Hash Join

Join Optimizations

  • Index Merge
  • Multi-Range Read (MRR)
  • Batched Key Access (BKA)
  • Other Optimizations

主要join算法对比:

stonedbThe slow subqueries that actually occur in :

Here are some slowdowns that actually occurred in our projectSQL,基于tpch标准测试,The sign that follows is tpchsequence of statements.

You can take a look at these categories of slow subqueries,思考下mysql中是如何执行的.

三. StoneDB对于subquery的性能提升

when the system is designed, The performance of the system is also designed together

——by《Oracle方法论》

在讲之前,Then I think it is necessary to establish a concept for performance issues,就是系统.Part of the design of the system performance itself,If a system performance problem occurs.It must have been when the system was designed,Some places are not considered or some places are poorly designed,lead to an outbreak,It didn't happen by accident.
So in the beginning when designing a system,Think of it as part of a whole.

Query Optimization Methodology

  1. Verify what the problem is,This includes gathering evidence of the problem,And define what is the need to consider the solved problem
  2. Determine the cause of performance issues
  3. 确定解决方案
  4. Verify the effect of the changes

需要注意:

  1. Performance analysis must speak with data,so called slow,Must be able to be characterized, 可以借助一些工具进行
  2. The problem-solving process is an upward spiral

性能分析工具:

  • explain分析
  • phhmyadmin或mysql enterprise monitor
  • perf火焰图
  • The business log is buried

Take a slow subquery as an example to analyze the calming point:

慢子查询explain分析:

性能瓶颈点:

  1. nested loop
  2. 全表扫描

热力图分析:

解决性能瓶颈

  1. Columnar storage to reduce diskIO
  2. 避免Nested loop
  3. Enhance parallel processing capabilities

optimized slowSQL的explain分析:

Time-consuming comparison after optimization:

  1. 从3分13秒优化到了14秒

Optimized heatmap analysis:

结束语:

This lecture introduces the performance bottleneck of subqueries as an introduction,And take a simple optimized subquery example,Show that subquery optimization can be donesqlHuge boost in execution.The follow-up will focus on the technical points,敬请关注!

原网站

版权声明
本文为[Emperor Zun Wu Shi]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/222/202208092208124573.html