Problem statement:Let's assume you are supporting multiple production databases running in a multiple node rac cluster and your OEM metric alert for CPU is setup for warning and critical threshold. You can not make out anything out of that alert and you need to drill down what caused that spike.
Traditional approach is we login into the database server and try to find out top process using top command.Once we identify top process id we try to find whether those process id part of any database by joining v$session and v$process with a common filed as process address called PADDR
TOP itself utilizes lots of CPU so it is recommended not to use when your CPU utilization is more than 90%.So what is our alternative in that case?
Here oartop utility comes into play which is very handy and can be utilized efficiently with little bit scripting.
oartop is bundled with Autonomous Health Framework Aka AHF.
Algorithm to develop script
- Get location of AHF repo directory.
- Inside repo directory you will find a directory called oswtop which contains all top output already stored by OSWacther
- Read latest file and store CPU utilization from that file
- Read top oracle process and find corresponding database name
- Now use oratop utility in batch cycle mode so that you can redirect output
Copyright (c) 2011, Oracle. All rights reserved.
Connecting
Cycle 1 - oratop: Release 16.5.1 Production on Fri Aug 8 14:09:31 MST 2025
Oracle 19c - 14:08:38 Primary r/w ****** up: 164d, 16 sn, 1 ins, 3 er, 84T sz, 8G sga, 0%fra, archivelog 41.6%db
ID CPU %CPU LOAD AAS ASC ASI ASW ISW REDO TEMP IORT MBPS IOPS R/S W/S LIO GCPS %FRE PGA NETV UTPS UCPS SQRT %DBC %DBW
----------------------------------------------------------------------------------------------------------------------------------
1 2 31.5 2.5 0.8 1 0 0 15 1.1k 6M 133u 0.3 27.8 18.9 8.9 2.5 0 40.5 1.9G 193k 0 0.6 6.6m 6.5 35.1
EVENT (C) T/O WAIT TIME AVG %DBT WAIT_CLASS
----------------------------------------------------------------------------------------------------------------------------------
resmgr:cpu quantum 1.4G 4.0y 89m 85.4 Scheduler
RMAN backup & recovery I/O 113M 89d 68m 5.3 System I/O
DB CPU 70d 4.1
db file sequential read 8.2G 59d 600u 3.5 User I/O
Disk file operations I/O 72M 4.4d 5.4m 0.3 User I/O
ID SID SPID USERNAME PROGRAM SRV SERVICE OPN SQLID/BLOCKER E/T %CPU %LIO PGA STS STE WAIT_CLASS EVENT/OBJECT NAME W/T
----------------------------------------------------------------------------------------------------------------------------------
1 142 66186 CCB_STAR SQL Devel DED SYS$USE 128d 0 0 4.9M ACT RUN On CPU 41u
1 94 87860 DBSNMP JDBC Thin DED SYS$USE 11s 0 0 4.7M INA WAI Idle SQL*Net message fro 11s
1 121 75603 ETLADMIN replicat@ DED SYS$USE 9.0s 0 0 6M INA WAI Idle SQL*Net message fro 8.9s
1 146 75618 ETLADMIN replicat@ DED SYS$USE 2.0s 0 0 9.3M INA WAI Idle SQL*Net message fro 1.6s
1 95 30022 SYS python3@t DED SYS$USE 50s 0 0 5.7M INA WAI Idle SQL*Net message fro 50s
1 67 44678 EIMSOA JDBC Thin DED SYS$USE 1.4t 0 0 3.8M INA WAI Idle SQL*Net message fro 1.4t
1 907 38167 DBSNMP JDBC Thin DED SYS$USE SEL 5qdptu5gnkayq 1.1t 0 0 22M INA WAI Idle SQL*Net message fro 1.1t
1 54 6950 EIMSOA JDBC Thin DED SYS$USE 1.6t 0 0 3M INA WAI Idle SQL*Net message fro 1.7t
1 55 8469 APPS oracle@eb DED SYS$USE SEL 3zwvgah1vg9ug 2.2d 0 0 6.6M INA WAI Idle SQL*Net message fro 1.2h
1 842 30030 SYS python3@t DED SYS$USE 44s 0 0 5.1M INA WAI Idle SQL*Net message fro 44s
1 847 95526 DBSNMP JDBC Thin DED SYS$USE 1.5t 0 0 4M INA WAI Idle SQL*Net message fro 1.5t
1 856 99179 EIMSOA JDBC Thin DED SYS$USE 1.3t 0 0 3.7M INA WAI Idle SQL*Net message fro 1.3t
1 857 77324 EIMSOA JDBC Thin DED SYS$USE 1.1t 0 0 4M INA WAI Idle SQL*Net message fro 1.2t
1 865 3337 EIMSOA JDBC Thin DED SYS$USE 1.8t 0 0 3.7M INA WAI Idle SQL*Net message fro 1.8t
1 905 42297 EIMSOA JDBC Thin DED SYS$USE 1.2t 0 0 4.3M INA WAI Idle SQL*Net message fro 1.2t
1 60 52837 ETLADMIN Toad.exe DED SYS$USE 127d 0 0 4.5M KIL WAI SQL*Net message fro 8.8s
# Set environment variables explicitly
PATH=/opt/oracle/dcs/bin:/opt/oracle/oak/onecmd:/opt/oracle/dcs/bin:/opt/oracle/oak/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export PATH
# Get repository directory
repdir=$(/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl get repositorydir | grep repositorydir | awk -F "|" '{print $3}' | xargs)
echo "Repository directory: $repdir"
# Find oswtop directory
oswd=$(find "$repdir" -name "oswtop" -type d | head -1)
if [ -d "$oswd" ]; then
echo "Found oswtop directory: $oswd"
cd "$oswd" || exit 1
# Find the first .dat file
oswf=$(find . -name "*.dat" | sort | head -1)
echo "First .dat file: $oswf"
if [ -f "$oswf" ]; then
line=$(grep -n "top -" "$oswf" | tail -1 | awk -F ":" '{print $1}')
nline=$((line + 30))
fname="/u10/scripts/dba/output/oswtop_$(hostname)_$(date '+%Y%m%d%H%M').log"
echo "Output file: $fname"
sed -n "${line},${nline}p" "$oswf" > "$fname"
utilization=$(grep "%Cpu(s)" "$fname" | awk -F ":" '{print $2}' | awk '{print $1}')
echo "CPU Utilization: $utilization"
if (( $(echo "$utilization < 90" | bc -l) )); then
echo "CPU utilization below threshold. Exiting."
exit 0
fi
if [ -f "$fname" ]; then
sed -i '/root/d' "$fname"
spid=$(grep oracle "$fname" | head -1 | awk '{print $1}')
echo "Oracle SPID: $spid"
if [ -n "$spid" ] && grep -q ORACLE_SID "/proc/$spid/environ"; then
export dbname=$(tr '\0' '\n' < "/proc/$spid/environ" | grep ORACLE_SID | awk -F "=" '{print $2}')
echo $dbname > /home/oracle/dbname.txt
su - oracle -c 'export dbname=`cat /home/oracle/dbname.txt`;/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl analyze -comp oratop -bn1 -database "$dbname" -f' >/home/oracle/oratop.log 2 >&1
mailx -s "Check top oracle process from $(hostname)" <replace wih your emailid> < /home/oracle/oratop.log
else
echo "No ORACLE_SID found for SPID $spid"
fi
else
echo "Output file not found!"
fi
else
echo "No .dat file found!"
fi
else
echo "oswtop directory not found!"
fi
There is an youtube video you can refer that also...
Oracle Database performance tuning using oratop - tips and tricks by Sandesh Rao
Hope you have learnt something useful...