Explore Oracle Stuff

Problem statement:Let's assume you are supporting multiple production databases running in a multiple node rac cluster and your OEM metric alert for CPU is setup for warning and critical threshold. You can not make out anything out of that alert and you need to drill down what caused that spike.

Traditional approach is we login into the database server and try to find out top process using top command.Once we identify top process id we try to find whether those process id part of any database by joining v$session and v$process with a common filed as process address called PADDR

TOP itself utilizes lots of CPU so it is recommended not to use when your CPU utilization is more than 90%.So what is our alternative in that case?

Here oartop utility comes into play which is very handy and can be utilized efficiently with little bit scripting.

oartop is bundled with Autonomous Health Framework Aka AHF.

Algorithm to develop script

Get location of AHF repo directory.
Inside repo directory you will find a directory called oswtop which contains all top output already stored by OSWacther
Read latest file and store CPU utilization from that file
Read top oracle process and find corresponding database name
Now use oratop utility in batch cycle mode so that you can redirect output

Sample output from oartop

oratop: Release 16.5.1 Production on Fri Aug 8 14:09:30 MST 2025

Copyright (c) 2011, Oracle. All rights reserved.

Connecting
Cycle 1 - oratop: Release 16.5.1 Production on Fri Aug 8 14:09:31 MST 2025

Oracle 19c - 14:08:38 Primary r/w ****** up: 164d, 16 sn, 1 ins, 3 er, 84T sz,   8G sga,    0%fra, archivelog           41.6%db
ID CPU %CPU LOAD AAS ASC ASI ASW ISW REDO TEMP IORT MBPS IOPS R/S W/S LIO GCPS %FRE PGA NETV UTPS UCPS SQRT %DBC %DBW
----------------------------------------------------------------------------------------------------------------------------------
1   2 31.5 2.5 0.8    1    0    0   15 1.1k   6M 133u 0.3 27.8 18.9 8.9 2.5    0 40.5 1.9G 193k    0 0.6 6.6m 6.5 35.1

EVENT (C)                                                              T/O WAIT TIME   AVG %DBT                      WAIT_CLASS
----------------------------------------------------------------------------------------------------------------------------------
resmgr:cpu quantum                                                          1.4G 4.0y   89m 85.4                       Scheduler
RMAN backup & recovery I/O                                                  113M   89d   68m   5.3                      System I/O
DB CPU                                                                             70d         4.1
db file sequential read                                                     8.2G   59d 600u   3.5                        User I/O
Disk file operations I/O                                                     72M 4.4d 5.4m   0.3                        User I/O

ID   SID     SPID USERNAME PROGRAM   SRV SERVICE OPN SQLID/BLOCKER E/T %CPU %LIO PGA STS STE WAIT_CLASS EVENT/OBJECT NAME    W/T
----------------------------------------------------------------------------------------------------------------------------------
1   142    66186 CCB_STAR SQL Devel DED SYS$USE                   128d    0    0 4.9M ACT RUN            On CPU               41u
1    94    87860 DBSNMP   JDBC Thin DED SYS$USE                    11s    0    0 4.7M INA WAI Idle       SQL*Net message fro 11s
1   121    75603 ETLADMIN replicat@ DED SYS$USE                   9.0s    0    0   6M INA WAI Idle       SQL*Net message fro 8.9s
1   146    75618 ETLADMIN replicat@ DED SYS$USE                   2.0s    0    0 9.3M INA WAI Idle       SQL*Net message fro 1.6s
1    95    30022 SYS      python3@t DED SYS$USE                    50s    0    0 5.7M INA WAI Idle       SQL*Net message fro 50s
1    67    44678 EIMSOA   JDBC Thin DED SYS$USE                   1.4t    0    0 3.8M INA WAI Idle       SQL*Net message fro 1.4t
1   907    38167 DBSNMP   JDBC Thin DED SYS$USE SEL 5qdptu5gnkayq 1.1t    0    0 22M INA WAI Idle       SQL*Net message fro 1.1t
1    54     6950 EIMSOA   JDBC Thin DED SYS$USE                   1.6t    0    0   3M INA WAI Idle       SQL*Net message fro 1.7t
1    55     8469 APPS     oracle@eb DED SYS$USE SEL 3zwvgah1vg9ug 2.2d    0    0 6.6M INA WAI Idle       SQL*Net message fro 1.2h
1   842    30030 SYS      python3@t DED SYS$USE                    44s    0    0 5.1M INA WAI Idle       SQL*Net message fro 44s
1   847    95526 DBSNMP   JDBC Thin DED SYS$USE                   1.5t    0    0   4M INA WAI Idle       SQL*Net message fro 1.5t
1   856    99179 EIMSOA   JDBC Thin DED SYS$USE                   1.3t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.3t
1   857    77324 EIMSOA   JDBC Thin DED SYS$USE                   1.1t    0    0   4M INA WAI Idle       SQL*Net message fro 1.2t
1   865     3337 EIMSOA   JDBC Thin DED SYS$USE                   1.8t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.8t
1   905    42297 EIMSOA   JDBC Thin DED SYS$USE                   1.2t    0    0 4.3M INA WAI Idle       SQL*Net message fro 1.2t
1    60    52837 ETLADMIN Toad.exe DED SYS$USE                   127d    0    0 4.5M KIL WAI            SQL*Net message fro 8.8s

So you can easily check database session wait event details with SQL_ID and can go to more deeper analysis.

Disclaimer:Here is a sample code but please test in your environment before implementing this.Each environment is different so I can not guarantee it will run successfully.

Please mind that this has been tested from root user

#!/bin/bash
# Set environment variables explicitly
PATH=/opt/oracle/dcs/bin:/opt/oracle/oak/onecmd:/opt/oracle/dcs/bin:/opt/oracle/oak/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export PATH
# Get repository directory
repdir=$(/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl get repositorydir | grep repositorydir | awk -F "|" '{print $3}' | xargs)
echo "Repository directory: $repdir"
# Find oswtop directory
oswd=$(find "$repdir" -name "oswtop" -type d | head -1)
if [ -d "$oswd" ]; then
echo "Found oswtop directory: $oswd"
cd "$oswd" || exit 1
# Find the first .dat file
oswf=$(find . -name "*.dat" | sort | head -1)
echo "First .dat file: $oswf"
if [ -f "$oswf" ]; then
line=$(grep -n "top -" "$oswf" | tail -1 | awk -F ":" '{print $1}')
nline=$((line + 30))
fname="/u10/scripts/dba/output/oswtop_$(hostname)_$(date '+%Y%m%d%H%M').log"
echo "Output file: $fname"
sed -n "${line},${nline}p" "$oswf" > "$fname"
utilization=$(grep "%Cpu(s)" "$fname" | awk -F ":" '{print $2}' | awk '{print $1}')
echo "CPU Utilization: $utilization"
if (( $(echo "$utilization < 90" | bc -l) )); then
echo "CPU utilization below threshold. Exiting."
exit 0
fi
if [ -f "$fname" ]; then
sed -i '/root/d' "$fname"
spid=$(grep oracle "$fname" | head -1 | awk '{print $1}')
echo "Oracle SPID: $spid"
if [ -n "$spid" ] && grep -q ORACLE_SID "/proc/$spid/environ"; then
export dbname=$(tr '\0' '\n' < "/proc/$spid/environ" | grep ORACLE_SID | awk -F "=" '{print $2}')
echo $dbname > /home/oracle/dbname.txt
su - oracle -c 'export dbname=`cat /home/oracle/dbname.txt`;/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl analyze -comp oratop -bn1 -database "$dbname" -f' >/home/oracle/oratop.log 2 >&1
mailx -s "Check top oracle process from $(hostname)" <replace wih your emailid> < /home/oracle/oratop.log
else
echo "No ORACLE_SID found for SPID $spid"
fi
else
echo "Output file not found!"
fi
else
echo "No .dat file found!"
fi
else
echo "oswtop directory not found!"
fi

There is an youtube video you can refer that also...

Oracle Database performance tuning using oratop - tips and tricks by Sandesh Rao

Hope you have learnt something useful...

Explore Oracle Stuff

Friday, August 8, 2025

Be at top with oratop

Add OEM database targets using emcli

Report Abuse

Labels