Oracle Wait Analysis Techniques
Oracle Tips by Burleson Consulting
Oracle provides a special views called v$session_wait and v$system_event that provide detailed information about the wait state for ongoing Oracle transactions. For the complete script downloads and more details on the concepts of this article, see my book "Oracle Tuning: The Definitive Reference". This article explores how Oracle waits can be analyzed to look for system-wide bottlenecks and individual objects that experience wait issues. Wait event analysis for Oracle cane be broken down into three areas:
- Time-based event analysis - The Oracle STATSPACK utility can be used to show trends in wait event over long time periods, and fluctuations in waits can often provide useful information.
- System-wide event analysis - We will start at a high level and show scripts that can be used to track system wide events, show events for background processes.
- Session wait events - These are real-time waits at the exact moment that you run the script. We will show techniques for locating specific database objects that may be causing the wait issue.
Let's begin with a quick review of Oracle wait events so we clearly understand how they can help us tune our database. When running the system-wide scripts, you will see areas where the database is spending most of its time waiting. However, high waits do not always indicate a bottleneck or a problem. High waits may indicate a bottleneck, but some waits are a normal part of database operations. In general, the system-wide wait events tell us where the database spends most of its time.
For example, high waits on 'db file sequential read' events may indicate a disk bottleneck, but you must check your average disk queue length for each disk spindle to be sure that these waits are abnormal. In one case, an Oracle shop ran the script for system-wide wait events and discovered that their RAID-5 configuration was causing a huge amount of disk enqueues. The shop reorganized their disk to RAID 0+1 and experienced a 3x performance improvement for the whole database.
Prerequisites for Monitoring Oracle System Waits
Collecting Oracle wait events requires that the Oracle parameter is timed_statistics=true. In Oracle9i and Oracle 10g this is the default, but Oracle9i and prior require you to use set timed_statistics.
We also have to filter out wait events that are not helpful to our tuning effort. The list below shows all system "idle" wait events that have no meaningful information:
For example, high waits on 'db file sequential read' events may indicate a disk bottleneck, but you must check your average disk queue length for each disk spindle to be sure that these waits are abnormal. In one case, an Oracle shop ran the script for system-wide wait events and discovered that their RAID-5 configuration was causing a huge amount of disk enqueues. The shop reorganized their disk to RAID 0+1 and experienced a 3x performance improvement for the whole database.
Prerequisites for Monitoring Oracle System Waits
Collecting Oracle wait events requires that the Oracle parameter is timed_statistics=true. In Oracle9i and Oracle 10g this is the default, but Oracle9i and prior require you to use set timed_statistics.
We also have to filter out wait events that are not helpful to our tuning effort. The list below shows all system "idle" wait events that have no meaningful information:
- dispatcher timer
- lock element cleanup
- Null event
- parallel query dequeue wait
- parallel query idle wait - Slaves
- pipe get
- PL/SQL lock timer
- pmon timer
- rdbms ipc message
- slave wait
- smon timer
- SQL*Net break/reset to client
- SQL*Net message from client
- SQL*Net message to client
- SQL*Net more data to client
- virtual circuit status
Once we have proper scripts ready, we can start exploring Oracle wait events. Let's start with time-series wait event analysis using the Oracle STATSPACK utility.
Event Wait Analysis with STATSPACK
In STATSPACK, you can get a snapshot of Oracle wait events every hour and plot changes in wait behavior over time. You can also set thresholds, and report only on wait events that exceed your pre-defined threshold. Here is the script that is commonly used for exception reporting of wait events.
-- prompt
-- prompt
-- prompt ***********************************************************
-- prompt Excessive waits on background events
-- prompt ***********************************************************
-- prompt
ttitle 'High waits on background events|Rollup by hour'
column mydate heading 'Yr. Mo Dy Hr' format a13;
column event format a30;
column total_waits heading 'tot waits' format 999,999;
column time_waited heading 'time wait' format 999,999;
column total_timeouts heading 'timeouts' format 9,999;
break on to_char(snap_time,'yyyy-mm-dd') skip 1;
select
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
e.event,
e.total_waits - nvl(b.total_waits,0) total_waits,
e.time_waited - nvl(b.time_waited,0) time_waited,
e.total_timeouts - nvl(b.total_timeouts,0) total_timeouts
from
stats$bg_event_summary b,
stats$bg_event_summary e,
stats$snapshot sn
where
snap_time > sysdate-&1
and
e.event not like '%timer'
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
Here is the output from this script. As we can see, we get a time-series result, showing us those days and hours when our thresholds are exceeded. If we look at this listing, we see that every evening between 10:00 p.m. and 11:00 p.m., we experience high waits on the redo logs.
Wed Aug 21 page 1
High waits on background events
Rollup by hour
Yr. Mo Dy Hr EVENT tot waits time wait timeouts
------------- ------------------------------ --------- --------- --------
2002-08-18 22 LGWR wait for redo copy 9,326 1,109 286
2002-08-18 23 LGWR wait for redo copy 8,506 316 8
2002-08-18 23 buffer busy waits 214 21,388 206
2002-08-19 00 LGWR wait for redo copy 498 5 0
2002-08-19 01 LGWR wait for redo copy 497 15 0
2002-08-19 22 LGWR wait for redo copy 9,207 1,433 427
2002-08-19 22 buffer busy waits 529 53,412 515
2002-08-19 23 LGWR wait for redo copy 9,066 367 9
2002-08-19 23 buffer busy waits 250 24,479 236
2002-08-20 00 LGWR wait for redo copy 771 16 0
2002-08-20 22 LGWR wait for redo copy 8,030 2,013 634
2002-08-20 22 buffer busy waits 356 35,583 343
2002-08-20 23 LGWR wait for redo copy 8,021 579 86
2002-08-20 23 buffer busy waits 441 44,677 432
2002-08-21 00 LGWR wait for redo copy 1,013 26 1
2002-08-21 00 rdbms ipc reply 160 30,986 149
2002-08-21 01 LGWR wait for redo copy 541 17 0
Using Queries Against v$event_wait
Here is the most commonly used script for displaying system-wide events, called current_waits.sql.
set pages 999
set lines 90
column c1 heading 'Event|Name' format a30
column c2 heading 'Total|Waits' format 999,999,999
column c3 heading 'Seconds|Waiting' format 999,999
column c4 heading 'Total|Timeouts' format 999,999,999
column c5 heading 'Average|Wait|(in secs)' format 99.999
ttitle 'System-wide Wait Analysis|for current wait events'
select
event c1,
total_waits c2,
time_waited / 100 c3,
total_timeouts c4,
average_wait /100 c5
from
sys.v_$system_event
where
event not in (
'dispatcher timer',
'lock element cleanup',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'pmon timer',
'rdbms ipc message',
'slave wait',
'smon timer',
'SQL*Net break/reset to client',
'SQL*Net message from client',
'SQL*Net message to client',
'SQL*Net more data to client',
'virtual circuit status',
'WMON goes to sleep'
)
AND
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
Below is the output from this script. As we see, the report shows all cumulative waits since instance startup time.
Tue Aug 20 page 1
System-wide Wait Analysis
for current wait events
Average
Event Total Seconds Total Wait
Name Waits Waiting Timeouts (in secs)
------------------------------ ------------ -------- ------------ ---------
db file sequential read 2,902,850 3,829 0 .001
latch free 2,248,864 496 1,524,235 .000
PX Deq: Table Q Normal 2,080,463 4,469 248 .002
PX Deq Credit: send blkd 1,321,019 52,251 23,032 .040
direct path read 986,951 6,931 0 .007
PX Deq Credit: need buffer 800,970 1,091 84 .001
log file parallel write 415,175 5,078 2 .012
direct path write 321,096 9,342 0 .029
PX Deq: Execution Msg 198,768 56,384 26,020 .284
log file sequential read 118,480 164 0 .001
PX Deq: Execute Reply 92,487 5,628 2,407 .061
log file sync 87,295 1,327 7 .015
db file scattered read 70,112 162 0 .002
enqueue 44,335 1,468 354 .033
PX Deq: Join ACK 42,503 77 0 .002
file open 28,545 8 0 .000
PX Deq: Signal ACK 26,753 253 6,705 .009
log file switch completion 54 18 5 .341
control file parallel write 23,785 340 0 .014
SQL*Net more data from client 19,847 878 0 .044
PX Deq: Parse Reply 17,885 26 0 .001
db file parallel write 17,745 1,293 4 .073
PX Deq: Msg Fragment 15,612 18 8 .001
rdbms ipc reply 11,013 167 66 .015
PX Deq: Table Q qref 7,778 2 0 .000
LGWR wait for redo copy 7,608 2 43 .000
control file sequential read 4,098 1 0 .000
buffer busy waits 2,970 9 0 .003
PX Deq: Table Q Sample 2,053 25 27 .012
library cache pin 1,353 6 0 .004
PX Deq: Table Q Get Keys 530 1 0 .001
local write wait 317 177 163 .560
file identify 315 0 0 .000
refresh controlfile command 214 0 0 .002
PX qref latch 135 105 102 .780
log file single write 80 2 0 .021
imm op 67 0 0 .000
process startup 61 2 0 .036
write complete waits 33 34 33 1.032
library cache load lock 25 6 2 .256
log buffer space 20 6 0 .286
db file single write 6 0 0 .007
row cache lock 5 0 0 .006
db file parallel read 2 0 0 .010
instance state change 2 0 0 .000
reliable message 1 0 0 .010
library cache lock 1 0 0 .020
This should give a list of "current" wait events, but some events are more important than others. These are the one's to watch-out for:
Event Wait Analysis with STATSPACK
In STATSPACK, you can get a snapshot of Oracle wait events every hour and plot changes in wait behavior over time. You can also set thresholds, and report only on wait events that exceed your pre-defined threshold. Here is the script that is commonly used for exception reporting of wait events.
-- prompt
-- prompt
-- prompt ***********************************************************
-- prompt Excessive waits on background events
-- prompt ***********************************************************
-- prompt
ttitle 'High waits on background events|Rollup by hour'
column mydate heading 'Yr. Mo Dy Hr' format a13;
column event format a30;
column total_waits heading 'tot waits' format 999,999;
column time_waited heading 'time wait' format 999,999;
column total_timeouts heading 'timeouts' format 9,999;
break on to_char(snap_time,'yyyy-mm-dd') skip 1;
select
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
e.event,
e.total_waits - nvl(b.total_waits,0) total_waits,
e.time_waited - nvl(b.time_waited,0) time_waited,
e.total_timeouts - nvl(b.total_timeouts,0) total_timeouts
from
stats$bg_event_summary b,
stats$bg_event_summary e,
stats$snapshot sn
where
snap_time > sysdate-&1
and
e.event not like '%timer'
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
Here is the output from this script. As we can see, we get a time-series result, showing us those days and hours when our thresholds are exceeded. If we look at this listing, we see that every evening between 10:00 p.m. and 11:00 p.m., we experience high waits on the redo logs.
Wed Aug 21 page 1
High waits on background events
Rollup by hour
Yr. Mo Dy Hr EVENT tot waits time wait timeouts
------------- ------------------------------ --------- --------- --------
2002-08-18 22 LGWR wait for redo copy 9,326 1,109 286
2002-08-18 23 LGWR wait for redo copy 8,506 316 8
2002-08-18 23 buffer busy waits 214 21,388 206
2002-08-19 00 LGWR wait for redo copy 498 5 0
2002-08-19 01 LGWR wait for redo copy 497 15 0
2002-08-19 22 LGWR wait for redo copy 9,207 1,433 427
2002-08-19 22 buffer busy waits 529 53,412 515
2002-08-19 23 LGWR wait for redo copy 9,066 367 9
2002-08-19 23 buffer busy waits 250 24,479 236
2002-08-20 00 LGWR wait for redo copy 771 16 0
2002-08-20 22 LGWR wait for redo copy 8,030 2,013 634
2002-08-20 22 buffer busy waits 356 35,583 343
2002-08-20 23 LGWR wait for redo copy 8,021 579 86
2002-08-20 23 buffer busy waits 441 44,677 432
2002-08-21 00 LGWR wait for redo copy 1,013 26 1
2002-08-21 00 rdbms ipc reply 160 30,986 149
2002-08-21 01 LGWR wait for redo copy 541 17 0
Using Queries Against v$event_wait
Here is the most commonly used script for displaying system-wide events, called current_waits.sql.
set pages 999
set lines 90
column c1 heading 'Event|Name' format a30
column c2 heading 'Total|Waits' format 999,999,999
column c3 heading 'Seconds|Waiting' format 999,999
column c4 heading 'Total|Timeouts' format 999,999,999
column c5 heading 'Average|Wait|(in secs)' format 99.999
ttitle 'System-wide Wait Analysis|for current wait events'
select
event c1,
total_waits c2,
time_waited / 100 c3,
total_timeouts c4,
average_wait /100 c5
from
sys.v_$system_event
where
event not in (
'dispatcher timer',
'lock element cleanup',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'pmon timer',
'rdbms ipc message',
'slave wait',
'smon timer',
'SQL*Net break/reset to client',
'SQL*Net message from client',
'SQL*Net message to client',
'SQL*Net more data to client',
'virtual circuit status',
'WMON goes to sleep'
)
AND
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
Below is the output from this script. As we see, the report shows all cumulative waits since instance startup time.
Tue Aug 20 page 1
System-wide Wait Analysis
for current wait events
Average
Event Total Seconds Total Wait
Name Waits Waiting Timeouts (in secs)
------------------------------ ------------ -------- ------------ ---------
db file sequential read 2,902,850 3,829 0 .001
latch free 2,248,864 496 1,524,235 .000
PX Deq: Table Q Normal 2,080,463 4,469 248 .002
PX Deq Credit: send blkd 1,321,019 52,251 23,032 .040
direct path read 986,951 6,931 0 .007
PX Deq Credit: need buffer 800,970 1,091 84 .001
log file parallel write 415,175 5,078 2 .012
direct path write 321,096 9,342 0 .029
PX Deq: Execution Msg 198,768 56,384 26,020 .284
log file sequential read 118,480 164 0 .001
PX Deq: Execute Reply 92,487 5,628 2,407 .061
log file sync 87,295 1,327 7 .015
db file scattered read 70,112 162 0 .002
enqueue 44,335 1,468 354 .033
PX Deq: Join ACK 42,503 77 0 .002
file open 28,545 8 0 .000
PX Deq: Signal ACK 26,753 253 6,705 .009
log file switch completion 54 18 5 .341
control file parallel write 23,785 340 0 .014
SQL*Net more data from client 19,847 878 0 .044
PX Deq: Parse Reply 17,885 26 0 .001
db file parallel write 17,745 1,293 4 .073
PX Deq: Msg Fragment 15,612 18 8 .001
rdbms ipc reply 11,013 167 66 .015
PX Deq: Table Q qref 7,778 2 0 .000
LGWR wait for redo copy 7,608 2 43 .000
control file sequential read 4,098 1 0 .000
buffer busy waits 2,970 9 0 .003
PX Deq: Table Q Sample 2,053 25 27 .012
library cache pin 1,353 6 0 .004
PX Deq: Table Q Get Keys 530 1 0 .001
local write wait 317 177 163 .560
file identify 315 0 0 .000
refresh controlfile command 214 0 0 .002
PX qref latch 135 105 102 .780
log file single write 80 2 0 .021
imm op 67 0 0 .000
process startup 61 2 0 .036
write complete waits 33 34 33 1.032
library cache load lock 25 6 2 .256
log buffer space 20 6 0 .286
db file single write 6 0 0 .007
row cache lock 5 0 0 .006
db file parallel read 2 0 0 .010
instance state change 2 0 0 .000
reliable message 1 0 0 .010
library cache lock 1 0 0 .020
This should give a list of "current" wait events, but some events are more important than others. These are the one's to watch-out for:
- async disk IO
- control file parallel write
- control file sequential read
- db file parallel write
- db file scattered read
- db file sequential read
- direct path read
- direct path write
- log file parallel write
- log file sync
For details on singular wait events, consult Appendix A of the Oracle9i Reference Manual or check MOSC.
Once we have a system wide picture of wait events, we can drill-down into wait events that are directly associated with background processes.
set pages 999
set lines 90
column c1 heading 'System|ID' format 999
column c2 heading 'Prcs' format a10
column c3 heading 'Event|Name' format a30
column c4 heading 'Total|Waits' format 999,999
column c5 heading 'Time|Waited|(in secs)' format 999,999
column c6 heading 'Avg|Wait|secs' format 999
column c7 heading 'Avg|Wait|secs' format 999
column c8 heading 'Max|Wait|(in secs)' format 999
ttitle 'System-wide Wait Analysis|Detail Analysis|for current wait events'
select
b.sid c1,
decode(b.username,NULL,c.name,b.username) c2,
event c3,
a.total_waits c4,
round((a.time_waited / 100),2) c5,
a.total_timeouts c6,
round((average_wait / 100),2) c7,
round((a.max_wait / 100),2) c8
from
sys.v_$session_event a,
sys.v_$session b,
sys.v_$bgprocess c
where
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
event NOT IN
(
'lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data to client',
'dispatcher timer',
'Null event',
'io done',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
order by 4 desc
;
Here is the output from this script. Here we see each background process and the time each has spent waiting within Oracle.
Tue Aug 20 page 1
System-wide Wait Analysis
Detail Analysis
for current wait events
Time Avg Avg Max
System Event Total Waited Wait Wait Wait
ID Prcs Name Waits (in secs) secs secs (in secs)
------ ---------- ------------------------------ -------- --------- ---- ---- ---------
3 LGWR log file parallel write 415,082 5,076 2 0 1
11 ARC0 log file sequential read 108,834 139 0 0 0
4 CKPT control file parallel write 23,035 327 0 0 1
4 CKPT file open 20,842 4 0 0 0
2 DBW0 db file parallel write 17,684 1,292 4 0 1
4 CKPT direct path read 9,745 106 0 0 6
25 ARC1 log file sequential read 9,603 25 0 0 0
3 LGWR LGWR wait for redo copy 7,608 2 43 0 0
4 CKPT direct path write 3,743 2 0 0 0
5 SMON db file scattered read 1,824 7 0 0 0
4 CKPT control file sequential read 1,355 0 0 0 0
5 SMON db file sequential read 794 2 0 0 0
156 CPA1_OWNER db file sequential read 677 3 0 0 0
3 LGWR control file sequential read 468 0 0 0 0
11 ARC0 control file sequential read 433 0 0 0 0
3 LGWR control file parallel write 271 6 0 0 0
2 DBW0 control file sequential read 253 0 0 0 0
3 LGWR file open 190 0 0 0 0
160 CPA1_USER log file sync 152 2 0 0 0
25 ARC1 control file sequential read 143 0 0 0 0
2 DBW0 file open 112 0 0 0 0
5 SMON file open 111 0 0 0 0
2 DBW0 direct path read 108 0 0 0 0
3 LGWR direct path read 108 2 0 0 0
2 DBW0 file identify 108 0 0 0 0
3 LGWR direct path write 107 0 0 0 0
11 ARC0 control file parallel write 103 2 0 0 0
11 ARC0 file identify 102 0 0 0 0
12 file open 91 0 0 0 0
16 file open 90 1 0 0 0
20 file open 89 1 0 0 0
3 LGWR log file single write 80 2 0 0 0
3 LGWR file identify 80 0 0 0 0
183 CPA1_USER log file sync 73 1 0 0 0
11 ARC0 file open 70 0 0 0 0
3 LGWR imm op 67 0 0 0 0
3 LGWR log file sequential read 43 0 0 0 0
15 file open 38 0 0 0 0
6 RECO db file sequential read 36 0 0 0 0
156 CPA1_OWNER latch free 34 0 21 0 0
160 CPA1_USER latch free 30 0 20 0 0
2 DBW0 latch free 29 1 29 0 0
5 SMON latch free 17 0 11 0 0
125 SYS latch free 14 0 7 0 0
183 CPA1_USER latch free 14 0 9 0 0
3 LGWR latch free 11 0 11 0 0
156 CPA1_OWNER file open 11 0 0 0 0
25 ARC1 control file parallel write 9 0 0 0 0
183 CPA1_USER db file sequential read 9 0 0 0 0
25 ARC1 file identify 9 0 0 0 0
115 CPA1_USER log file sync 8 0 0 0 0
25 ARC1 file open 8 0 0 0 0
4 CKPT file identify 4 0 0 0 0
15 latch free 3 0 3 0 0
2 DBW0 process startup 3 0 0 0 0
160 CPA1_USER db file sequential read 3 0 0 0 0
11 ARC0 enqueue 2 4 0 2 3
132 CPA1_USER log file sync 2 0 0 0 0
125 SYS db file sequential read 2 0 0 0 0
4 CKPT latch free 1 0 1 0 0
12 latch free 1 0 1 0 0
11 ARC0 process startup 1 0 0 0 0
3 LGWR enqueue 1 3 0 3 3
6 RECO file open 1 0 0 0 0
183 CPA1_USER file open 1 0 0 0 0
125 SYS file open 1 0 0 0 0
160 CPA1_USER file open 1 0 0 0 0
166 ANAND file open 1 0 0 0 0
166 ANAND db file sequential read 1 0 0 0 0
125 SYS log file sync 1 0 0 0 0
5 SMON buffer busy waits 1 0 0 0 0
25 ARC1 enqueue 1 0 0 0 0
3 LGWR process startup 1 0 0 0 0
16 latch free 1 0 1 0 0
6 RECO latch free 1 0 1 0 0
From all of the above reports, we see the following areas for improvement:
1 - Event waits for parallel query dequeues. We need to check the default degree of parallelism for database objects and turn-off parallelism at the system level.
2 - Event waits for "db file sequential reads." This is most likely due to segment header contention on the indexes, but it could also be due to disk contention. We will start by increasing the number of freelists on the indexes. If the waits persist, the offending index should be striped across multiple disk spindles.
These are very general wait conditions, but they can sometimes be fixed by changing parameters or object characteristics. Some possible solutions include:
Once we have a system wide picture of wait events, we can drill-down into wait events that are directly associated with background processes.
set pages 999
set lines 90
column c1 heading 'System|ID' format 999
column c2 heading 'Prcs' format a10
column c3 heading 'Event|Name' format a30
column c4 heading 'Total|Waits' format 999,999
column c5 heading 'Time|Waited|(in secs)' format 999,999
column c6 heading 'Avg|Wait|secs' format 999
column c7 heading 'Avg|Wait|secs' format 999
column c8 heading 'Max|Wait|(in secs)' format 999
ttitle 'System-wide Wait Analysis|Detail Analysis|for current wait events'
select
b.sid c1,
decode(b.username,NULL,c.name,b.username) c2,
event c3,
a.total_waits c4,
round((a.time_waited / 100),2) c5,
a.total_timeouts c6,
round((average_wait / 100),2) c7,
round((a.max_wait / 100),2) c8
from
sys.v_$session_event a,
sys.v_$session b,
sys.v_$bgprocess c
where
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
event NOT IN
(
'lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data to client',
'dispatcher timer',
'Null event',
'io done',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
order by 4 desc
;
Here is the output from this script. Here we see each background process and the time each has spent waiting within Oracle.
Tue Aug 20 page 1
System-wide Wait Analysis
Detail Analysis
for current wait events
Time Avg Avg Max
System Event Total Waited Wait Wait Wait
ID Prcs Name Waits (in secs) secs secs (in secs)
------ ---------- ------------------------------ -------- --------- ---- ---- ---------
3 LGWR log file parallel write 415,082 5,076 2 0 1
11 ARC0 log file sequential read 108,834 139 0 0 0
4 CKPT control file parallel write 23,035 327 0 0 1
4 CKPT file open 20,842 4 0 0 0
2 DBW0 db file parallel write 17,684 1,292 4 0 1
4 CKPT direct path read 9,745 106 0 0 6
25 ARC1 log file sequential read 9,603 25 0 0 0
3 LGWR LGWR wait for redo copy 7,608 2 43 0 0
4 CKPT direct path write 3,743 2 0 0 0
5 SMON db file scattered read 1,824 7 0 0 0
4 CKPT control file sequential read 1,355 0 0 0 0
5 SMON db file sequential read 794 2 0 0 0
156 CPA1_OWNER db file sequential read 677 3 0 0 0
3 LGWR control file sequential read 468 0 0 0 0
11 ARC0 control file sequential read 433 0 0 0 0
3 LGWR control file parallel write 271 6 0 0 0
2 DBW0 control file sequential read 253 0 0 0 0
3 LGWR file open 190 0 0 0 0
160 CPA1_USER log file sync 152 2 0 0 0
25 ARC1 control file sequential read 143 0 0 0 0
2 DBW0 file open 112 0 0 0 0
5 SMON file open 111 0 0 0 0
2 DBW0 direct path read 108 0 0 0 0
3 LGWR direct path read 108 2 0 0 0
2 DBW0 file identify 108 0 0 0 0
3 LGWR direct path write 107 0 0 0 0
11 ARC0 control file parallel write 103 2 0 0 0
11 ARC0 file identify 102 0 0 0 0
12 file open 91 0 0 0 0
16 file open 90 1 0 0 0
20 file open 89 1 0 0 0
3 LGWR log file single write 80 2 0 0 0
3 LGWR file identify 80 0 0 0 0
183 CPA1_USER log file sync 73 1 0 0 0
11 ARC0 file open 70 0 0 0 0
3 LGWR imm op 67 0 0 0 0
3 LGWR log file sequential read 43 0 0 0 0
15 file open 38 0 0 0 0
6 RECO db file sequential read 36 0 0 0 0
156 CPA1_OWNER latch free 34 0 21 0 0
160 CPA1_USER latch free 30 0 20 0 0
2 DBW0 latch free 29 1 29 0 0
5 SMON latch free 17 0 11 0 0
125 SYS latch free 14 0 7 0 0
183 CPA1_USER latch free 14 0 9 0 0
3 LGWR latch free 11 0 11 0 0
156 CPA1_OWNER file open 11 0 0 0 0
25 ARC1 control file parallel write 9 0 0 0 0
183 CPA1_USER db file sequential read 9 0 0 0 0
25 ARC1 file identify 9 0 0 0 0
115 CPA1_USER log file sync 8 0 0 0 0
25 ARC1 file open 8 0 0 0 0
4 CKPT file identify 4 0 0 0 0
15 latch free 3 0 3 0 0
2 DBW0 process startup 3 0 0 0 0
160 CPA1_USER db file sequential read 3 0 0 0 0
11 ARC0 enqueue 2 4 0 2 3
132 CPA1_USER log file sync 2 0 0 0 0
125 SYS db file sequential read 2 0 0 0 0
4 CKPT latch free 1 0 1 0 0
12 latch free 1 0 1 0 0
11 ARC0 process startup 1 0 0 0 0
3 LGWR enqueue 1 3 0 3 3
6 RECO file open 1 0 0 0 0
183 CPA1_USER file open 1 0 0 0 0
125 SYS file open 1 0 0 0 0
160 CPA1_USER file open 1 0 0 0 0
166 ANAND file open 1 0 0 0 0
166 ANAND db file sequential read 1 0 0 0 0
125 SYS log file sync 1 0 0 0 0
5 SMON buffer busy waits 1 0 0 0 0
25 ARC1 enqueue 1 0 0 0 0
3 LGWR process startup 1 0 0 0 0
16 latch free 1 0 1 0 0
6 RECO latch free 1 0 1 0 0
From all of the above reports, we see the following areas for improvement:
1 - Event waits for parallel query dequeues. We need to check the default degree of parallelism for database objects and turn-off parallelism at the system level.
2 - Event waits for "db file sequential reads." This is most likely due to segment header contention on the indexes, but it could also be due to disk contention. We will start by increasing the number of freelists on the indexes. If the waits persist, the offending index should be striped across multiple disk spindles.
These are very general wait conditions, but they can sometimes be fixed by changing parameters or object characteristics. Some possible solutions include:
- Improve the optimizers' propensity to use indexes by reducing the value of optimizer_index_cost_adj to a value less than 100, sometimes reducing unnecessary full-table scans
- Check and adjust the value of parallel_threads_per_cpu to reduce parallel query
- Check for segment header contention/waits on index headers
- Create multiple segment header blocks for stressed indexes (using alter
index xxx storage (freelists 1))
-
- Distribute heavy impact tables and indexes onto a faster disk, or stripe the object across multiple disks.
Session Detail Event Waits Once we have covered the system and background waits, we can find tasks that are currently waiting. This script is tricky because it must be running at the exact moment of a wait. Some Oracle professionals run this script every 60 seconds, sending an e-mail when an important wait occurs:
clear columns
set pages 999
set lines 100
col c1 heading 'SID' format 999
col c2 heading 'User|Name' format a16
col c3 heading 'Event|Name' format a10
col c4 heading 'Secs|Wait' format 9.99
col c5 heading 'Wait|Time' format 9.99
col c6 heading 'state' format a10
col c7 heading 'P1 text' format a10
col c8 heading 'P1|Val' format 999,999,999
col c9 heading 'P1|Raw' format a10
col c10 heading 'P2|Text' format a10
col c11 heading 'P2|Val' format 999,999,999
col c12 heading 'P2|Raw' format a10
col c13 heading 'P3|Text' format a10
col c14 heading 'P3|Val' format 999,999,999
col c15 heading 'P3|Raw' format a10
ttitle 'Individual process wait times'
SELECT
a.sid c1,
decode(
b.username,
NULL,c.name,
b.username) c2,
a.event c3,
a.seconds_in_wait c4,
a.wait_time c5,
a.state c6,
a.p1text c7,
a.p1 c8,
-- a.p1raw c9,
a.p2text c10,
a.p2 c11,
-- a.p2raw c12,
a.p3text c13,
a.p3 c14
-- a.p3raw c15
FROM
sys.v_$session_wait a,
sys.v_$session b,
sys.v_$bgprocess c
where
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
and
event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data to client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
order by
4 desc
;
Here is the output from this script. Here we see two types of wait conditions:
clear columns
set pages 999
set lines 100
col c1 heading 'SID' format 999
col c2 heading 'User|Name' format a16
col c3 heading 'Event|Name' format a10
col c4 heading 'Secs|Wait' format 9.99
col c5 heading 'Wait|Time' format 9.99
col c6 heading 'state' format a10
col c7 heading 'P1 text' format a10
col c8 heading 'P1|Val' format 999,999,999
col c9 heading 'P1|Raw' format a10
col c10 heading 'P2|Text' format a10
col c11 heading 'P2|Val' format 999,999,999
col c12 heading 'P2|Raw' format a10
col c13 heading 'P3|Text' format a10
col c14 heading 'P3|Val' format 999,999,999
col c15 heading 'P3|Raw' format a10
ttitle 'Individual process wait times'
SELECT
a.sid c1,
decode(
b.username,
NULL,c.name,
b.username) c2,
a.event c3,
a.seconds_in_wait c4,
a.wait_time c5,
a.state c6,
a.p1text c7,
a.p1 c8,
-- a.p1raw c9,
a.p2text c10,
a.p2 c11,
-- a.p2raw c12,
a.p3text c13,
a.p3 c14
-- a.p3raw c15
FROM
sys.v_$session_wait a,
sys.v_$session b,
sys.v_$bgprocess c
where
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
and
event NOT IN
('lock element cleanup',
'pmon timer',
'rdbms ipc message',
'smon timer',
'SQL*Net message from client',
'SQL*Net break/reset to client',
'SQL*Net message to client',
'SQL*Net more data to client',
'dispatcher timer',
'Null event',
'parallel query dequeue wait',
'parallel query idle wait - Slaves',
'pipe get',
'PL/SQL lock timer',
'slave wait',
'virtual circuit status',
'WMON goes to sleep'
)
order by
4 desc
;
Here is the output from this script. Here we see two types of wait conditions:
- User CPA1 is waiting to access file 116, block 51253 - We can use the get_object_by_block_nbr.sql script to see the exact block.
- IUD_READ_ONLY users are waiting on a parallel query dequeue
SID Name Name Wait Time state P1 text Val Text
---- ---------------- ---------- ----- ----- ---------- ---------- ------------ ----------
P2 P3 P3
Val Text Val
------------ ---------- ------------
75 CPA1_USER db file se .00 .00 WAITING file# 116 block#
quential r
ead
51,253 blocks 1
52 IUD1_READ_ONLY PX Deq Cre .00 .00 WAITING sleeptime/ 268,566,527 passes
dit: send senderid
blkd
284 qref 0
253 IUD1_READ_ONLY PX Deq Cre .00 .00 WAITING sleeptime/ 268,566,527 passes
dit: send senderid
blkd
284 qref 0
151 IUD1_READ_ONLY PX Deq Cre .00 .00 WAITING sleeptime/ 268,566,527 passes
dit: send senderid
blkd
284 qref 0
147 IUD1_READ_ONLY PX Deq Cre .00 .00 WAITING sleeptime/ 268,566,527 passes
dit: send senderid
blkd
284 qref 0
Methods for Using Event Wait Data
Because this is a short article, let's confine our wait discussion to the most onerous wait event, the db file sequential read wait event.
In our example from the system-wide reports we noted that the number one wait in v$system_event is "db file sequential reads." This wait event is usually for index reads. If we look at the detail listing for these wait events, we see that the script returns the file number and block number where Oracle was waiting for the event.
SID Name Name Wait Time state P1 text Val Text
---- ---------------- ---------- ----- ----- ---------- ---------- ------------ ----------
P2 P3 P3
Val Text Val
------------ ---------- ------------
75 CPA1_USER db file se .00 .00 WAITING file# 116 block#
quential r
ead
51,253 blocks 1
Here we see that the CPA1_USER was waiting on access to file number 16, block number 51,253. Given this information, we can drill down into dba_extents and dba_data_files to see the data file that is causing the waits.
Here is a simple script that will display the file name for any file number:
accept filenbr prompt 'Enter the file#: '
select
tablespace_name,
file_name
from
dba_data_files
where
file_id = &&filenbr
;
When we run the script we can quickly see the name of the file that is experiencing the "db file sequential reads" wait:
Enter the file#: 10
TABLESPACE_NAME
------------------------------
FILE_NAME
--------------------------------------
USERS
C:\ORACLE\ORADATA\DIOGENES\USERS02.DBF
To properly locate the source of the wait we need more than just the file name We can use the get_object_by_block_nbr.sql script to display all object on the target block:
set pages 999
set verify off
set lines 80
col c1 heading 'Segment|Name' format a40
col c2 heading 'Segment|Type' format a10
col c3 heading 'First|Block|of Segment' format 999,999,999,999
col c4 heading 'Last|Block|of Segment' format 999,999,999,999
accept filenbr prompt 'Enter the file number: '
accept inblock prompt 'Enter the block number: '
select
segment_name c1,
segment_type c2,
block_id c3
-- block_id+blocks c4
from
dba_extents
where
&filenbr = file_id
SEE CODE DEPOT FOR LICENSED SCRIPT DOWNLOADS
;
Here we see the exact source of the wait event is an index called IDX_EVENTCASE_STATUS_OVERAGE.
Enter the file number: 116
Enter the block number: 51253
First
Segment Segment Block
Name Type of Segment
---------------------------------------- ---------- ----------------
IDX_EVENTCASE_STATUS_OVERAGE INDEX 51,205
So, why are sessions waiting on index access? The most common cause is a freelist shortage. Prior to Oracle bitmap freelists (automatic extent management), segment header contention can happen if multiple tasks compete to simultaneously update the same index. The script can easily check the number of freelists for this object:
clear columns
col c1 heading 'Table|Name' format a20
col c2 heading 'Table|Freelists' format 99
col c3 heading 'Index|Name' format a20
col c4 heading 'Index|Freelists' format 99
select distinct
t.table_name c1,
t.freelists c2,
index_name c3,
i.freelists c4
from
dba_tables t,
dba_indexes i
where
t.table_name = i.table_name
and
i.index_name = 'IDX_EVENTCASE_STATUS_OVERAGE'
Of course, there could be other problems such as a high disk enqueue on the disk that contains the index, but we should always try to add freelists first. If more freelists do not solve the wait issue, then the index must be placed on a faster disk, a less busy disk, or striped across multiple disks.
Also, note that because we have the SID for the waiting task, we can use the SID to join into v$session and v$sql to see the SQL that was causing the wait condition.
Guy Harrison notes this query for wait events using the v$sys_time_model ASH table:
COLUMN wait_class format a20
COLUMN name format a30
COLUMN time_secs format 999,999,999,999.99
COLUMN pct format 99.99
COLUMN name format a30
COLUMN time_secs format 999,999,999,999.99
COLUMN pct format 99.99
SELECT
wait_class,
NAME,
ROUND (time_secs, 2) time_secs,
ROUND (time_secs * 100 / SUM (time_secs) OVER (), 2) pct
FROM
(SELECT
n.wait_class,
e.event NAME,
e.time_waited / 100 time_secs
FROM
v$system_event e,
v$event_name n
WHERE
n.NAME = e.event AND n.wait_class <> 'Idle'
AND
time_waited > 0
UNION
SELECT
'CPU',
'server CPU',
SUM (VALUE / 1000000) time_secs
FROM
v$sys_time_model
WHERE
stat_name IN ('background cpu time', 'DB CPU'))
ORDER BY
time_secs DESC;
wait_class,
NAME,
ROUND (time_secs, 2) time_secs,
ROUND (time_secs * 100 / SUM (time_secs) OVER (), 2) pct
FROM
(SELECT
n.wait_class,
e.event NAME,
e.time_waited / 100 time_secs
FROM
v$system_event e,
v$event_name n
WHERE
n.NAME = e.event AND n.wait_class <> 'Idle'
AND
time_waited > 0
UNION
SELECT
'CPU',
'server CPU',
SUM (VALUE / 1000000) time_secs
FROM
v$sys_time_model
WHERE
stat_name IN ('background cpu time', 'DB CPU'))
ORDER BY
time_secs DESC;
Other Wait Analysis Techniques
This article would not be complete without mentioning the 10046 trace events. Proponents of the 10046 approach to wait event tuning will say that using the 10046 approach is the best and easiest way to analyze Oracle event waits.
If you want to drill-down into trace event analysis, there are several companies that sell software to analyze trace files for wait events.
Conclusion
The script output screenshot eliminates the confusion
ReplyDelete