当前位置:网站首页>HQL statement tuning
HQL statement tuning
2022-04-23 12:57:00 【Look at the data at the top of the mountain】
1、 De duplicating sentences ( use group by Instead of distinct)
Group by It also has the function of weight removal , The details are as follows
select distinct customer_id
from test_join_order;
=>
select customer_id
from test_join_order
group by customer_id;
In a huge amount of data ( And there are many duplicate values ) when , You can start with group by duplicate removal , stay count() Count , Efficiency is higher than count(distinct col)
create table if not exists test1(
id int
)
row format delimited
fields terminated by ',';
insert into test1 values(1);
insert into test1 values(2);
insert into test1 values(1);
insert into test1 values(3);
insert into test1 values(1);
insert into test1 values(2);
insert into test1 values(4);
select count(distinct(id))
from test1;
select count(t1.id)
from(
select id
from test1
group by id) t1;
2、 Aggregation skills —— Using window functions grouping sets、cube、rollup
This window function is the improvement and encapsulation of the previous one
group by quantity,sex,age
grouping sets(quantity,sex,age)group by quantity,sex,age
with cube;group by quantity,sex
with rollup;
-- Number distribution
select quantity,count(*)
from test2
group by quantity;
2 1
3 2
4 1
5 1
7 2
-- Gender distribution
select sex,count(*)
from test2
group by sex;
0 3
1 4
-- Age distribution
select age,count(*)
from test2
group by age;
16 2
19 1
21 1
23 1
34 2
Shortcomings should be written three times SQL, It needs to be performed three times , Repeat the work , And trouble
-- An optimization method ( The aggregation results are all in the same column , Classification fields are partitioned by different columns )
select quantity,sex,age,count(*)
from test2
group by quantity,sex,age
grouping sets(quantity,sex,age)
NULL NULL 16 2
NULL NULL 19 1
NULL NULL 21 1
NULL NULL 23 1
NULL NULL 34 2
NULL 0 NULL 3
NULL 1 NULL 4
2 NULL NULL 1
3 NULL NULL 2
4 NULL NULL 1
5 NULL NULL 1
7 NULL NULL 2
Is to put the results of three columns together
Here are the test data
create table if not exists test2(
quantity int,
sex int,
age int
)
row format delimited
fields terminated by ',';
insert into test2 values(5,1,16);
insert into test2 values(3,0,23);
insert into test2 values(4,1,16);
insert into test2 values(7,0,34);
insert into test2 values(2,0,19);
insert into test2 values(7,1,21);
insert into test2 values(3,1,34);
cube: according to group by Dimensional ( all ) Combine to aggregate
select quantity,sex,age,count(*)
from test2
group by quantity,sex,age
grouping sets(quantity,sex,age,(sex,age),(quantity,sex),(quantity,age),(quantity,sex,age));
(sex,age) It's the same sex and age How many of them are there
The above can be written as
select quantity,sex,age,count(*)
from test2
group by quantity,sex,age
with cube;
cube
Britain [kjuːb] beautiful [kjuːb]
n.
Cube ; Cuboid ; Something cubic ( Especially food ); cube ; third power
vt.
seek … The cube of ; hold ( food ) Cut into cubes
rollup It is mainly used for hierarchical statistics , This is equivalent to the effect of summarizing each layer
select quantity ,sex ,sum(age)
from test2
group by quantity,sex
with rollup;
2 NULL 19
2 0 19
3 NULL 57
3 0 23
3 1 34
4 NULL 16
4 1 16
5 NULL 16
5 1 16
7 NULL 55
7 0 34
7 1 21
3、 Table join optimization
1、 The watch is in front of , The big watch is at the back
hive Suppose that the last one in the query is a large table , It caches other tables , Then scan the last watch
2、 Use the same connection to build
When the 3 A table or more join When the connection , If each on The clauses all use the same join key , So there's only one MR job
3、 Filter data as early as possible
Reduce the amount of data in each stage , Partition the partition table , At the same time, only the fields that need to be used
When logic is too complex , Import intermediate table
4、 How to solve data skew
1、 Data skew performance :
The task progress is maintained for a long time 99%(100%), View the task monitoring page , Found that only a small amount (1 One or more )reduce Subtask not completed . Because of the amount of data it processes and other reduce The difference is too big
2、 Causes and solutions of data skew :
1、 Data skew caused by null value
solve : If two tables are connected , The connection conditions used have many null values , It is suggested to add filtering in the connection conditions
eg:on a.user_id = b.user_id and a.user_id is not null
2、 Size table connection ( One of the tables is big , The other table is very small )
solve : Put the small table in memory , stay map End to do join
select /*+mapjoin(a)*/ a.user_id,a.user_name
from user_info a
join join_order b
where b.customer_id = a.customer_id;
The method of putting data into memory in advance
/*+mapjoin(a)*/
3、 The field data types of the two table connection conditions are inconsistent
solve : Convert the field data type of the connection condition to a consistent
on a.user_id = cast(b.user_id as string)
版权声明
本文为[Look at the data at the top of the mountain]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230615025280.html
边栏推荐
- Summary of JVM knowledge points - continuously updated
- STM32 project transplantation: transplantation between chip projects of different models: Ze to C8
- Buuctf Web [bjdctf2020] zjctf, but so
- Remote access to raspberry pie at home (Part 1)
- php生成json处理中文
- Van uploader upload picture implementation process, using native input to upload pictures
- 98. Error s.e.errormvcautoconfiguration $staticview reported by freemaker framework: cannot render error page for request
- The quill editor image zooms, multiple rich text boxes are used on one page, and the quill editor upload image address is the server address
- Process virtual address space partition
- 31. Next arrangement
猜你喜欢
将新增和编辑的数据同步更新到列表
梳理網絡IP代理的幾大用途
Record a website for querying compatibility, string Replaceall() compatibility error
云原生KubeSphere部署Mysql
leetcode:437. 路径总和 III【dfs 选还是不选?】
Trier les principales utilisations de l'Agent IP réseau
进程虚拟地址空间区域划分
The El table horizontal scroll bar is fixed at the bottom of the visual window
The quill editor image zooms, multiple rich text boxes are used on one page, and the quill editor upload image address is the server address
Softbank vision fund entered the Web3 security industry and led a new round of investment of US $60 million in certik
随机推荐
Luogu p5540 [balkanoi2011] timeismoney | minimum product spanning tree problem solution
Kubernetes 入門教程
STM32 control stepper motor (ULN2003 + 28byj)
leetcode:437. 路径总和 III【dfs 选还是不选?】
SSM框架系列——注解开发day2-2
洛谷P5540 [BalkanOI2011] timeismoney | 最小乘积生成树 题解
Mysql8 installation
BUUCTF WEB [BJDCTF2020]The mystery of ip
Learning materials
4.DRF 权限&访问频率&过滤&排序
【csnote】ER图
云原生KubeSphere部署Mysql
leetcode:437. Path sum III [DFS selected or not selected?]
Huawei cloud MVP email
5 free audio material websites, recommended collection
How to click an object to play an animation
Softbank vision fund entered the Web3 security industry and led a new round of investment of US $60 million in certik
Go language array operation
【每日一题】棋盘问题
有趣的IDEA插件推荐,给你的开发工作增添色彩