当前位置:网站首页>How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
How to use DBA_ hist_ active_ sess_ History analysis database history performance problems
2022-04-23 06:58:00 【Sebastien23】
How to use dba_hist_active_sess_history Analyze database history performance issues
- background
- Apply to
- details
-
- 1. Dump During the problem ASH data
- 2. Verify exported ASH Time range
- 3. Confirm the exact time range of the problem
- 4. Determine the of each sampling point top n event
- 5. Observe the waiting chain of each sampling point
- 6. Based on the first 5 Step principle to find out the final value of each sampling point top holder
- Other things about ASH Application
background
In many cases , When there is a performance problem with the database , We don't have a chance to gather enough diagnostic information , such as system state dump perhaps hang analyze, Even when the problem happens DBA Not at all . This makes it very difficult for us to diagnose the problem . So in this case , Can we collect some information afterwards to analyze the cause of the problem ? stay Oracle 10G Or later , The answer is yes . In this article, we will introduce a method through dba_hist_active_sess_history
A method of analyzing problems based on data .
Apply to
Oracle 10G Or later , This article applies to any platform .
details
stay Oracle 10G in , We introduced AWR and ASH Sampling mechanism , There is a view gv$active_session_history
Meeting Per second The clock will all nodes of the database Active Session Take a sample , and dba_hist_active_sess_history
Will be gv$active_session_history
The data in Every time 10 second Sample once and persist . Based on this feature , We can analyze dba_hist_active_sess_history
Of Session Sampling , To locate the exact time frame in which the problem occurred , And the of each sampling point can be observed top event and top holder. Here is an example to illustrate in detail .
1. Dump During the problem ASH data
In order not to affect the production system , We can approximate the period of the problem ASH data export Come out and analyze on the tester .
be based on dba_hist_active_sess_history
Create a new table t_ash, Then pass it through exp/imp Import to the tester . Execute... On the database where the problem occurred exp:
SQL> conn user/passwd
SQL> create table t_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between
TO_TIMESTAMP ('<time_begin>', 'YYYY-MM-DD HH24:MI:SS') and
TO_TIMESTAMP ('<time_end>', 'YYYY-MM-DD HH24:MI:SS');
$ exp user/passwd file=t_ash.dmp tables=(t_ash) log=t_ash.exp.log
Then import it into the tester :
$ imp user/passwd file=t_ash.dmp log=t_ash.imp.log
2. Verify exported ASH Time range
The proposal USES Oracle SQL Developer To prevent the output result from being broken into lines, which is not easy to observe .
set line 200 pages 1000
col sample_time for a25
col event for a40
alter session set nls_timestamp_format='yyyy-mm-dd hh24:mi:ss.ff';
select
t.dbid, t.instance_number, min(sample_time), max(sample_time), count(*) session_count
from t_ash t
group by t.dbid, t.instance_number
order by dbid, instance_number;
INSTANCE_NUMBER MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) SESSION_COUNT
1 2015-03-26 21:00:04.278 2015-03-26 22:59:48.387 2171
2 2015-03-26 21:02:12.047 2015-03-26 22:59:42.584 36
From the above output, we can see that the database has a total of 2 Nodes , The sampling time is... In total 2 Hours , node 1 Sampling ratio of nodes 2 A lot more , The problem may occur at the node 1 On .
3. Confirm the exact time range of the problem
Refer to the following script :
select
dbid, instance_number, sample_id, sample_time, count(*) session_count
from t_ash t
group by dbid, instance_number, sample_id, sample_time
order by dbid, instance_number, sample_time;
INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME SESSION_COUNT
1 36402900 2015-03-26 22:02:50.985 4
1 36402910 2015-03-26 22:03:01.095 1
1 36402920 2015-03-26 22:03:11.195 1
1 36402930 2015-03-26 22:03:21.966 21
1 36402940 2015-03-26 22:03:32.116 102
1 36402950 2015-03-26 22:03:42.226 181
1 36402960 2015-03-26 22:03:52.326 200
1 36402970 2015-03-26 22:04:02.446 227
1 36402980 2015-03-26 22:04:12.566 242
1 36402990 2015-03-26 22:04:22.666 259
1 36403000 2015-03-26 22:04:32.846 289
1 36403010 2015-03-26 22:04:42.966 147
1 36403020 2015-03-26 22:04:53.076 2
1 36403030 2015-03-26 22:05:03.186 4
1 36403040 2015-03-26 22:05:13.296 1
1 36403050 2015-03-26 22:05:23.398 1
Observe the of each sampling point of the above output active session The number of , A sudden increase in number often means that a problem has occurred . From the above output, it can be determined that the exact time of the problem is 2015-03-26 22:03:21 ~ 22:04:42, The problem lasted about 1.5 minute .
Be careful : Observe whether the above output is out of gear , For example, there is no sampling at some time .
4. Determine the of each sampling point top n event
What we specify here is top 2 event And the sampling time is marked out to observe the situation of all sampling points . If there's a lot of data , You can also open sample_time To observe the situation in a certain period of time . Pay attention to the last column session_count It refers to the waiting time at the sampling point event Of session Number .
select t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.event,
t.session_state,
t.c session_count
from (select t.*,
rank() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select
t.*,
count(*) over(partition by dbid, instance_number, sample_time, event) c,
row_number() over(partition by dbid, instance_number, sample_time, event order by 1) r1
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:59:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
) t
where r1 = 1) t
where r < 3
order by dbid, instance_number, sample_time, r;
SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER EVENT SESSION_STATE SESSION_COUNT
36402900 22:02:50.985 1 ON CPU 3
36402900 22:02:50.985 1 db file sequential read WAITING 1
36402910 22:03:01.095 1 ON CPU 1
36402920 22:03:11.195 1 db file parallel read WAITING 1
36402930 22:03:21.966 1 cursor: pin S wait on X WAITING 11
36402930 22:03:21.966 1 latch: shared pool WAITING 4
36402940 22:03:32.116 1 cursor: pin S wait on X WAITING 83
36402940 22:03:32.116 1 SGA: allocation forcing component growth WAITING 16
36402950 22:03:42.226 1 cursor: pin S wait on X WAITING 161
36402950 22:03:42.226 1 SGA: allocation forcing component growth WAITING 17
36402960 22:03:52.326 1 cursor: pin S wait on X WAITING 177
36402960 22:03:52.326 1 SGA: allocation forcing component growth WAITING 20
36402970 22:04:02.446 1 cursor: pin S wait on X WAITING 204
36402970 22:04:02.446 1 SGA: allocation forcing component growth WAITING 20
36402980 22:04:12.566 1 cursor: pin S wait on X WAITING 219
36402980 22:04:12.566 1 SGA: allocation forcing component growth WAITING 20
36402990 22:04:22.666 1 cursor: pin S wait on X WAITING 236
36402990 22:04:22.666 1 SGA: allocation forcing component growth WAITING 20
36403000 22:04:32.846 1 cursor: pin S wait on X WAITING 265
36403000 22:04:32.846 1 SGA: allocation forcing component growth WAITING 20
36403010 22:04:42.966 1 enq: US - contention WAITING 69
36403010 22:04:42.966 1 latch: row cache objects WAITING 56
36403020 22:04:53.076 1 db file scattered read WAITING 1
36403020 22:04:53.076 1 db file sequential read WAITING 1
From the above output, we can find that the most serious waiting during the problem is cursor: pin S wait on X
, Wait for the rush hour event Of session The number has reached 265 individual , Next is SGA: allocation forcing component growth
, Fastigium session by 20 individual .
Be careful :
- Confirm again whether the above output is out of gear , Is there some time when there is no sampling .
- Pay attention to those session_state by ON CPU Output , Compare ON CPU The number of processes is the same as your OS Physics CPU The number of , If it approaches or exceeds physical CPU Number , Then you need to check OS At that time CPU Resource status , such as OSWatcher/NMON Tools such as , high CPU Run Queue May cause the problem , Of course, it may also be the result of the problem , Need to combine OSWatcher and ASH Time order to verify .
5. Observe the waiting chain of each sampling point
The principle is through dba_hist_active_sess_history.blocking_session
Records of the holder To pass connect by Cascade query , Find out the final holder. stay RAC Environment , For each node ASH The sampling time is not consistent in many cases , So you can use this SQL The second paragraph of the note sample_time Make a little modification to make the difference between different nodes 1 The sampling time of seconds can be compared ( Note that it is best to also partition by Medium sample_time Make corresponding changes ). In this output isleaf=1 All is final holder, and iscycle=1 Represents deadlock ( That is, in the same sampling point a etc. b,b etc. c, and c And so on a, If this continues to happen , Then it is particularly noteworthy ). Using the following query, we can observe the blocking chain .
select
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior blocking_session_serial# = session_serial#
order siblings by dbid, sample_time;
LV ISLEAF ISCYCLE SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SESSION_STATE BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SESSION_STATUS
1 0 0 22:04:32.846 1 1259 3ajt2htrmb83y cursor: WAITING 1 537 VALID
2 1 0 22:04:32.846 1 537 3ajt2htrmb83y SGA: WAITING UNKNOWN
Note that in order to make the output easy to read , We will wait for event Abbreviated , The same below . As can be seen from the output above , At the same sampling point (22:04:32.846), node 1 session 1259 Waiting for the cursor: pin S wait on X
, It is called node 1 session 537 Blocking , And node 1 session 537 Waiting again SGA: allocation forcing component growth
, also ASH No samples were collected holder, So here cursor: pin S wait on X
It's just a superficial phenomenon , Here's the problem SGA: allocation forcing component growth
.
6. Based on the first 5 Step principle to find out the final value of each sampling point top holder
Such as the following SQL Each sampling point is listed top 2 Of blocker session, The final blocking probability is calculated session Count ( Reference resources blocking_session_count).
select t.lv,
t.iscycle,
t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.seq#,
t.session_state,
t.blocking_inst_id,
t.blocking_session,
t.blocking_session_status,
t.c blocking_session_count
from (select t.*,
row_number() over(partition by dbid, instance_number, sample_time order by c desc) r
from (select t.*,
count(*) over(partition by dbid, instance_number, sample_time, session_id) c,
row_number() over(partition by dbid, instance_number, sample_time, session_id order by 1) r1
from (select
level lv,
connect_by_isleaf isleaf,
connect_by_iscycle iscycle,
t.*
from t_ash t
/*where sample_time >
to_timestamp('2013-11-17 13:55:00',
'yyyy-mm-dd hh24:mi:ss')
and sample_time <
to_timestamp('2013-11-17 14:10:00',
'yyyy-mm-dd hh24:mi:ss')*/
start with blocking_session is not null
connect by nocycle
prior dbid = dbid
and prior sample_time = sample_time
/*and ((prior sample_time) - sample_time between interval '-1'
second and interval '1' second)*/
and prior blocking_inst_id = instance_number
and prior blocking_session = session_id
and prior
blocking_session_serial# = session_serial#) t
where t.isleaf = 1) t
where r1 = 1) t
where r < 3
order by dbid, sample_time, r;
SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_ID EVENT SEQ# SESSION_STATE BLOCKING_SESSION_STATUS BLOCKING_SESSION_COUNT
22:03:32.116 1 1136 1p4vyw2jan43d SGA: 1140 WAITING UNKNOWN 82
22:03:32.116 1 413 9g51p4bt1n7kz SGA: 7646 WAITING UNKNOWN 2
22:03:42.226 1 1136 1p4vyw2jan43d SGA: 1645 WAITING UNKNOWN 154
22:03:42.226 1 537 3ajt2htrmb83y SGA: 48412 WAITING UNKNOWN 4
22:03:52.326 1 1136 1p4vyw2jan43d SGA: 2150 WAITING UNKNOWN 165
22:03:52.326 1 537 3ajt2htrmb83y SGA: 48917 WAITING UNKNOWN 8
22:04:02.446 1 1136 1p4vyw2jan43d SGA: 2656 WAITING UNKNOWN 184
22:04:02.446 1 537 3ajt2htrmb83y SGA: 49423 WAITING UNKNOWN 10
22:04:12.566 1 1136 1p4vyw2jan43d SGA: 3162 WAITING UNKNOWN 187
22:04:12.566 1 2472 SGA: 1421 WAITING UNKNOWN 15
22:04:22.666 1 1136 1p4vyw2jan43d SGA: 3667 WAITING UNKNOWN 193
22:04:22.666 1 2472 SGA: 1926 WAITING UNKNOWN 25
22:04:32.846 1 1136 1p4vyw2jan43d SGA: 4176 WAITING UNKNOWN 196
22:04:32.846 1 2472 SGA: 2434 WAITING UNKNOWN 48
Note the above output , Like the first line , On behalf of the 22:03:32.116, node 1 Of session 1136 It finally blocked 82 individual session. Look down time , Visible nodes 1 Of session 1136 It was the most serious problem during the problem holder, It's blocked at every sampling point 100 Multiple session, And it keeps waiting SGA: allocation forcing component growth
, Pay attention to its seq#
You will find that event Of seq# Changing , Indicates that the session Not completely hang live , Because the time happens at night 22:00 about , This is obviously due to the automatic collection of Statistics job Lead to shared memory resize cause , So we can combine Scheduler/MMAN Of trace as well as dba_hist_memory_resize_ops
The output of further determines the problem .
Be careful :
blocking_session_count
Refers to a holder Finally blocked session Count , such as a ← \leftarrow ← b ← \leftarrow ← c (a By b Blocking ,b Has been c Blocking ), Only calculate c blocked 1 individual session, Because in the middle b Repetition may occur in different blocking chains .- If the final holder Has not been ash sampling ( Usually because of holder At leisure ), such as a ← \leftarrow ← c also b ← \leftarrow ← c (a By c Blocking , also b Also by c Blocking ), however c No sampling , Then the above script cannot c Statistics to the end holder in , This may lead to some omissions .
- Attention comparison
blocking_session_count
The number of is the same as 3 The total number of each sampling point queried in this step session_count Count , If the of each sampling pointblocking_session_count
The number is far less than the total session_count Count , That means most session There is no record of holder, Therefore, the results of this query do not represent . - stay Oracle 10g in ,ASH did not blocking_inst_id Column , In all the above scripts , You just need to remove the column , therefore 10g Of ASH Generally, it can only be used to diagnose the problem of single node .
Other things about ASH Application
In addition to ASH Data to find holder outside , We can also use it to get a lot of information ( Based on the database version ):
- Such as through
PGA_ALLOCATED
Column to calculate the maximum of each sampling point PGA, total PGA To analyze ora-4030/Memory Swap Related issues ; - adopt
TEMP_SPACE_ALLOCATED
Column to analyze temporary tablespace usage ; - adopt
IN_PARSE/IN_HARD_PARSE/IN_SQL_EXECUTION
Column to analyze SQL be in parse Or the execution stage ; - adopt
CURRENT_OBJ#/CURRENT_FILE#/CURRENT_BLOCK#
To make sure I/O Related to the object waiting to happen .
References
[1] https://blogs.oracle.com/database4cn/dbahistactivesesshistory
版权声明
本文为[Sebastien23]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230557416737.html
边栏推荐
- el-date-picker限制选择范围,从当前时间到两个月前
- 【代码解析(5)】Communication-Efficient Learning of Deep Networks from Decentralized Data
- PHP background parsing after JQ serialization
- The arithmetic square root of X in leetcode
- try --finally
- 基于DPDK实现VPC和IDC间互联互通的高性能网关
- 【不积跬步无以至千里】MySQL报大量unauthenticated user连接错误
- 百度地图坐标、Google坐标、腾讯坐标相互转化
- ansible模块之include_tasks:为什么加了tags后导入的任务没有执行?
- freeCodeCamp----arithmetic_arranger练习
猜你喜欢
随机推荐
Binary sum of leetcode questions
postMan 传参总结
微信小程序
TP5 使用redis
Method of MySQL password expiration
JS realizes modal box dragging
Scientists say Australian plan to cull up to 10,000 wild horses doesn’t go far enough
mysql中sum (if)_mysql 中sum (if())
百度地图案例-修改地图样式
重启Oracle监听器会中断已有连接吗
redis 常见问题
【漏网之鱼】Ansible AWX调用playbook传参问题
Leak detection and vacancy filling (IV)
并发优化请求
JS regular matching first assertion and last assertion
EF CORE在ASP.NET CORE项目中基于数据库优先模式生成实体模型
.NET类型转移
LeetCode刷题|38外观数组
页面缓存问题解决方法(慎用)
openvswitch vlan网络实践