当前位置:网站首页>How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
2022-04-23 06:58:00 【Sebastien23】
How to use dba_hist_active_sess_history Analyze database history performance issues
- background
- Apply to
- details
-
- 1. Dump During the problem ASH data
- 2. Verify exported ASH Time range
- 3. Confirm the exact time range of the problem
- 4. Determine the of each sampling point top n event
- 5. Observe the waiting chain of each sampling point
- 6. Based on the first 5 Step principle to find out the final value of each sampling point top holder
- Other things about ASH Application
background
In many cases , When there is a performance problem with the database , We don't have a chance to gather enough diagnostic information , such as system state dump perhaps hang analyze, Even when the problem happens DBA Not at all . This makes it very difficult for us to diagnose the problem . So in this case , Can we collect some information afterwards to analyze the cause of the problem ? stay Oracle 10G Or later , The answer is yes . In this article, we will introduce a method through dba_hist_active_sess_history A method of analyzing problems based on data .
Apply to
Oracle 10G Or later , This article applies to any platform .
details
stay Oracle 10G in , We introduced AWR and ASH Sampling mechanism , There is a view gv$active_session_history Meeting Per second The clock will all nodes of the database Active Session Take a sample , and dba_hist_active_sess_history Will be gv$active_session_history The data in Every time 10 second Sample once and persist . Based on this feature , We can analyze dba_hist_active_sess_history Of Session Sampling , To locate the exact time frame in which the problem occurred , And the of each sampling point can be observed top event and top holder. Here is an example to illustrate in detail .
1. Dump During the problem ASH data
In order not to affect the production system , We can approximate the period of the problem ASH data export Come out and analyze on the tester .
be based on dba_hist_active_sess_history Create a new table t_ash, Then pass it through exp/imp Import to the tester . Execute... On the database where the problem occurred exp:
SQL> conn user/passwd
SQL> create table t_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and
TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');
$ exp user/passwd file=t_ash.dmp tables=(t_ash) log=t_ash.exp.log
Then import it into the tester :
$ imp user/passwd file=t_ash.dmp log=t_ash.imp.log
2. Verify exported ASH Time range
The proposal USES Oracle SQL Developer To prevent the output result from being broken into lines, which is not easy to observe .
set line 200 pages 1000
col sample_time for a25
col event for a40
alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select
t.dbid, t.instance_number, min(sample_time), max(sample_time), count(*) session_count
from t_ash t
group by t.dbid, t.instance_number
order by dbid, instance_number;
INSTANCE_NUMBER MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) SESSION_COUNT
1 2015-03-26 21:00:04.278 2015-03-26 22:59:48.387 2171
2 2015-03-26 21:02:12.047 2015-03-26 22:59:42.584 36
From the above output, we can see that the database has a total of 2 Nodes , The sampling time is... In total 2 Hours , node 1 Sampling ratio of nodes 2 A lot more , The problem may occur at the node 1 On .
3. Confirm the exact time range of the problem
Refer to the following script :
select
dbid, instance_number, sample_id, sample_time, count(*) session_count
from t_ash t
group by dbid, instance_number, sample_id, sample_time
order by dbid, instance_number, sample_time;
INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME SESSION_COUNT
1 36402900 2015-03-26 22:02:50.985 4
1 36402910 2015-03-26 22:03:01.095 1
1 36402920 2015-03-26 22:03:11.195 1
1 36402930 2015-03-26 22:03:21.966 21
1 36402940 2015-03-26 22:03:32.116 102
1 36402950 2015-03-26 22:03:42.226 181
1 36402960 2015-03-26 22:03:52.326 200
1 36402970 2015-03-26 22:04:02.446 227
1 36402980 2015-03-26 22:04:12.566 242
1 36402990 2015-03-26 22:04:22.666 259
1 36403000 2015-03-26 22:04:32.846 289
1 36403010 2015-03-26 22:04:42.966 147
1 36403020 2015-03-26 22:04:53.076 2
1 36403030 2015-03-26 22:05:03.186 4
1 36403040 2015-03-26 22:05:13.296 1
1 36403050 2015-03-26 22:05:23.398 1
Observe the of each sampling point of the above output active session The number of , A sudden increase in number often means that a problem has occurred . From the above output, it can be determined that the exact time of the problem is 2015-03-26 22:03:21 ~ 22:04:42, The problem lasted about 1.5 minute .
Be careful : Observe whether the above output is out of gear , For example, there is no sampling at some time .
4. Determine the of each sampling point top n event
What we specify here is top 2 event And the sampling time is marked out to observe the situation of all sampling points . If there's a lot of data , You can also open sample_time To observe the situation in a certain period of time . Pay attention to the last column session_count It refers to the waiting time at the sampling point event Of session Number .
select t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.event,
t.session_state,
t.c session_count
from (select t.*,
rank() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select
t.*,
count(*) over(partition by dbid, instance_number, sample_time, event) c,
row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:59:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
) t
where r1 = 1) t
where r < 3
order by dbid, instance_number, sample_time, r;
SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER EVENT SESSION_STATE SESSION_COUNT
36402900 22:02:50.985 1 ON CPU 3
36402900 22:02:50.985 1 db file sequential read WAITING 1
36402910 22:03:01.095 1 ON CPU 1
36402920 22:03:11.195 1 db file parallel read WAITING 1
36402930 22:03:21.966 1 cursor: pin S wait on X WAITING 11
36402930 22:03:21.966 1 latch: shared pool WAITING 4
36402940 22:03:32.116 1 cursor: pin S wait on X WAITING 83
36402940 22:03:32.116 1 SGA: allocation forcing component growth WAITING 16
36402950 22:03:42.226 1 cursor: pin S wait on X WAITING 161
36402950 22:03:42.226 1 SGA: allocation forcing component growth WAITING 17
36402960 22:03:52.326 1 cursor: pin S wait on X WAITING 177
36402960 22:03:52.326 1 SGA: allocation forcing component growth WAITING 20
36402970 22:04:02.446 1 cursor: pin S wait on X WAITING 204
36402970 22:04:02.446 1 SGA: allocation forcing component growth WAITING 20
36402980 22:04:12.566 1 cursor: pin S wait on X WAITING 219
36402980 22:04:12.566 1 SGA: allocation forcing component growth WAITING 20
36402990 22:04:22.666 1 cursor: pin S wait on X WAITING 236
36402990 22:04:22.666 1 SGA: allocation forcing component growth WAITING 20
36403000 22:04:32.846 1 cursor: pin S wait on X WAITING 265
36403000 22:04:32.846 1 SGA: allocation forcing component growth WAITING 20
36403010 22:04:42.966 1 enq: US - contention WAITING 69
36403010 22:04:42.966 1 latch: row cache objects WAITING 56
36403020 22:04:53.076 1 db file scattered read WAITING 1
36403020 22:04:53.076 1 db file sequential read WAITING 1
From the above output, we can find that the most serious waiting during the problem is cursor: pin S wait on X, Wait for the rush hour event Of session The number has reached 265 individual , Next is SGA: allocation forcing component growth, Fastigium session by 20 individual .
Be careful :
- Confirm again whether the above output is out of gear , Is there some time when there is no sampling .
- Pay attention to those session_state by ON CPU Output , Compare ON CPU The number of processes is the same as your OS Physics CPU The number of , If it approaches or exceeds physical CPU Number , Then you need to check OS At that time CPU Resource status , such as OSWatcher/NMON Tools such as , high CPU Run Queue May cause the problem , Of course, it may also be the result of the problem , Need to combine OSWatcher and ASH Time order to verify .
5. Observe the waiting chain of each sampling point
The principle is through dba_hist_active_sess_history.blocking_session Records of the holder To pass connect by Cascade query , Find out the final holder. stay RAC Environment , For each node ASH The sampling time is not consistent in many cases , So you can use this SQL The second paragraph of the note sample_time Make a little modification to make the difference between different nodes 1 The sampling time of seconds can be compared ( Note that it is best to also partition by Medium sample_time Make corresponding changes ). In this output isleaf=1 All is final holder, and iscycle=1 Represents deadlock ( That is, in the same sampling point a etc. b,b etc. c, and c And so on a, If this continues to happen , Then it is particularly noteworthy ). Using the following query, we can observe the blocking chain .
select
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
order siblings by dbid, sample_time;
LV ISLEAF ISCYCLE SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SESSION_STATE BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SESSION_STATUS
1 0 0 22:04:32.846 1 1259 3ajt2htrmb83y cursor: WAITING 1 537 VALID
2 1 0 22:04:32.846 1 537 3ajt2htrmb83y SGA: WAITING UNKNOWN
Note that in order to make the output easy to read , We will wait for event Abbreviated , The same below . As can be seen from the output above , At the same sampling point (22:04:32.846), node 1 session 1259 Waiting for the cursor: pin S wait on X, It is called node 1 session 537 Blocking , And node 1 session 537 Waiting again SGA: allocation forcing component growth, also ASH No samples were collected holder, So here cursor: pin S wait on X It's just a superficial phenomenon , Here's the problem SGA: allocation forcing component growth.
6. Based on the first 5 Step principle to find out the final value of each sampling point top holder
Such as the following SQL Each sampling point is listed top 2 Of blocker session, The final blocking probability is calculated session Count ( Reference resources blocking_session_count).
select t.lv,
t.iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.seq#,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status,
t.c blocking_session_count
from (select t.*,
row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select t.*,
count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
from (select
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.*
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior
blocking_session_serial# = session_serial#) t
where t.isleaf = 1) t
where r1 = 1) t
where r < 3
order by dbid, sample_time, r;
SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SEQ# SESSION_STATE BLOCKING_SESSION_STATUS BLOCKING_SESSION_COUNT
22:03:32.116 1 1136 1p4vyw2jan43d SGA: 1140 WAITING UNKNOWN 82
22:03:32.116 1 413 9g51p4bt1n7kz SGA: 7646 WAITING UNKNOWN 2
22:03:42.226 1 1136 1p4vyw2jan43d SGA: 1645 WAITING UNKNOWN 154
22:03:42.226 1 537 3ajt2htrmb83y SGA: 48412 WAITING UNKNOWN 4
22:03:52.326 1 1136 1p4vyw2jan43d SGA: 2150 WAITING UNKNOWN 165
22:03:52.326 1 537 3ajt2htrmb83y SGA: 48917 WAITING UNKNOWN 8
22:04:02.446 1 1136 1p4vyw2jan43d SGA: 2656 WAITING UNKNOWN 184
22:04:02.446 1 537 3ajt2htrmb83y SGA: 49423 WAITING UNKNOWN 10
22:04:12.566 1 1136 1p4vyw2jan43d SGA: 3162 WAITING UNKNOWN 187
22:04:12.566 1 2472 SGA: 1421 WAITING UNKNOWN 15
22:04:22.666 1 1136 1p4vyw2jan43d SGA: 3667 WAITING UNKNOWN 193
22:04:22.666 1 2472 SGA: 1926 WAITING UNKNOWN 25
22:04:32.846 1 1136 1p4vyw2jan43d SGA: 4176 WAITING UNKNOWN 196
22:04:32.846 1 2472 SGA: 2434 WAITING UNKNOWN 48
Note the above output , Like the first line , On behalf of the 22:03:32.116, node 1 Of session 1136 It finally blocked 82 individual session. Look down time , Visible nodes 1 Of session 1136 It was the most serious problem during the problem holder, It's blocked at every sampling point 100 Multiple session, And it keeps waiting SGA: allocation forcing component growth, Pay attention to its seq# You will find that event Of seq# Changing , Indicates that the session Not completely hang live , Because the time happens at night 22:00 about , This is obviously due to the automatic collection of Statistics job Lead to shared memory resize cause , So we can combine Scheduler/MMAN Of trace as well as dba_hist_memory_resize_ops The output of further determines the problem .
Be careful :
blocking_session_countRefers to a holder Finally blocked session Count , such as a ← \leftarrow ← b ← \leftarrow ← c (a By b Blocking ,b Has been c Blocking ), Only calculate c blocked 1 individual session, Because in the middle b Repetition may occur in different blocking chains .- If the final holder Has not been ash sampling ( Usually because of holder At leisure ), such as a ← \leftarrow ← c also b ← \leftarrow ← c (a By c Blocking , also b Also by c Blocking ), however c No sampling , Then the above script cannot c Statistics to the end holder in , This may lead to some omissions .
- Attention comparison
blocking_session_countThe number of is the same as 3 The total number of each sampling point queried in this step session_count Count , If the of each sampling pointblocking_session_countThe number is far less than the total session_count Count , That means most session There is no record of holder, Therefore, the results of this query do not represent . - stay Oracle 10g in ,ASH did not blocking_inst_id Column , In all the above scripts , You just need to remove the column , therefore 10g Of ASH Generally, it can only be used to diagnose the problem of single node .
Other things about ASH Application
In addition to ASH Data to find holder outside , We can also use it to get a lot of information ( Based on the database version ):
- Such as through
PGA_ALLOCATEDColumn to calculate the maximum of each sampling point PGA, total PGA To analyze ora-4030/Memory Swap Related issues ; - adopt
TEMP_SPACE_ALLOCATEDColumn to analyze temporary tablespace usage ; - adopt
IN_PARSE/IN_HARD_PARSE/IN_SQL_EXECUTIONColumn to analyze SQL be in parse Or the execution stage ; - adopt
CURRENT_OBJ#/CURRENT_FILE#/CURRENT_BLOCK#To make sure I/O Related to the object waiting to happen .
References
[1] https://blogs.oracle.com/database4cn/dbahistactivesesshistory
版权声明
本文为[Sebastien23]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230557416737.html
边栏推荐
- 【代码解析(5)】Communication-Efficient Learning of Deep Networks from Decentralized Data
- SQL学习|基础查询与排列
- Leak detection and vacancy filling (III)
- SQL学习|集合运算
- New formdata() when importing files
- Batch modify / batch update the value of a field in the database
- Kids and COVID: why young immune systems are still on top
- 数据库基本概念:OLTP/OLAP/HTAP、RPO/RTO、MPP
- DDOS攻击/防御介绍
- bcc安装和基本工具使用说明
猜你喜欢
随机推荐
offset和client獲取dom元素比特置信息
JS performance optimization
ansible模块之include_tasks:为什么加了tags后导入的任务没有执行?
tp5 报错variable type error: array解决方法
最近编程中遇到的一些问题2021/9/8
模仿扇贝短文阅读页面
修改Jupyter Notebook样式
Oracle Net Service:监听器与服务名解析方法
2021-09-18
thinkphp5 ---- object(think\response\Json)转数组
虚拟环境中使用jupyter notebook
LeetCode刷题|368最大整除子集(动态规划)
rdma 介绍
TypeScript(下)
使用百度智能云人脸检测接口实现照片质量检测
Tensorflow&&Pytorch常见报错
CentOS8搭建PHP8.0.3运行环境
ES6规范详解
1-2 JSX语法规则
Set and map








