Friday, August 8, 2025

Be at top with oratop

 Problem statement:Let's assume you are supporting multiple production databases running in a multiple node rac cluster and your OEM metric alert for CPU is setup for  warning and critical threshold. You can not make out anything out of that alert and you need to drill down what caused that spike.

Traditional approach is we login into the database server and try to find out top process using top command.Once we identify top process id we try to find whether those process id part of any database by joining v$session and v$process with a common filed as process address called PADDR

TOP itself utilizes lots of CPU so it is recommended not to use when your CPU utilization is more than 90%.So what is our alternative in that case?

Here oartop utility comes into play which is very handy and can be utilized efficiently with little bit scripting.

oartop is bundled with Autonomous Health Framework Aka AHF.

Algorithm to develop script

  1.  Get location of AHF repo directory. 
  2. Inside repo directory you will find a directory called oswtop which contains all top output already stored by OSWacther
  3. Read latest file and store CPU utilization from that file
  4. Read top oracle process and find corresponding database name
  5. Now use oratop utility in batch cycle mode so that you can redirect output
Sample output from oartop

oratop: Release 16.5.1 Production on Fri Aug  8 14:09:30 MST 2025

Copyright (c) 2011, Oracle.  All rights reserved.


Connecting
Cycle 1 - oratop: Release 16.5.1 Production on Fri Aug  8 14:09:31 MST 2025

Oracle 19c - 14:08:38 Primary  r/w ****** up: 164d,  16 sn,  1 ins,  3 er,  84T sz,   8G sga,    0%fra, archivelog           41.6%db
ID CPU  %CPU LOAD  AAS  ASC  ASI  ASW  ISW  REDO TEMP IORT MBPS IOPS  R/S  W/S  LIO GCPS  %FRE  PGA  NETV UTPS UCPS SQRT %DBC %DBW
----------------------------------------------------------------------------------------------------------------------------------
 1   2  31.5  2.5  0.8    1    0    0   15  1.1k   6M 133u  0.3 27.8 18.9  8.9  2.5    0  40.5 1.9G  193k    0  0.6 6.6m  6.5 35.1

EVENT (C)                                                              T/O  WAIT  TIME   AVG  %DBT                      WAIT_CLASS
----------------------------------------------------------------------------------------------------------------------------------
resmgr:cpu quantum                                                          1.4G  4.0y   89m  85.4                       Scheduler
RMAN backup & recovery I/O                                                  113M   89d   68m   5.3                      System I/O
DB CPU                                                                             70d         4.1                               
db file sequential read                                                     8.2G   59d  600u   3.5                        User I/O
Disk file operations I/O                                                     72M  4.4d  5.4m   0.3                        User I/O

ID   SID     SPID USERNAME PROGRAM   SRV SERVICE OPN SQLID/BLOCKER  E/T %CPU %LIO  PGA STS STE WAIT_CLASS EVENT/OBJECT NAME    W/T
----------------------------------------------------------------------------------------------------------------------------------
 1   142    66186 CCB_STAR SQL Devel DED SYS$USE                   128d    0    0 4.9M ACT RUN            On CPU               41u
 1    94    87860 DBSNMP   JDBC Thin DED SYS$USE                    11s    0    0 4.7M INA WAI Idle       SQL*Net message fro  11s
 1   121    75603 ETLADMIN replicat@ DED SYS$USE                   9.0s    0    0   6M INA WAI Idle       SQL*Net message fro 8.9s
 1   146    75618 ETLADMIN replicat@ DED SYS$USE                   2.0s    0    0 9.3M INA WAI Idle       SQL*Net message fro 1.6s
 1    95    30022 SYS      python3@t DED SYS$USE                    50s    0    0 5.7M INA WAI Idle       SQL*Net message fro  50s
 1    67    44678 EIMSOA   JDBC Thin DED SYS$USE                   1.4t    0    0 3.8M INA WAI Idle       SQL*Net message fro 1.4t
 1   907    38167 DBSNMP   JDBC Thin DED SYS$USE SEL 5qdptu5gnkayq 1.1t    0    0  22M INA WAI Idle       SQL*Net message fro 1.1t
 1    54     6950 EIMSOA   JDBC Thin DED SYS$USE                   1.6t    0    0   3M INA WAI Idle       SQL*Net message fro 1.7t
 1    55     8469 APPS     oracle@eb DED SYS$USE SEL 3zwvgah1vg9ug 2.2d    0    0 6.6M INA WAI Idle       SQL*Net message fro 1.2h
 1   842    30030 SYS      python3@t DED SYS$USE                    44s    0    0 5.1M INA WAI Idle       SQL*Net message fro  44s
 1   847    95526 DBSNMP   JDBC Thin DED SYS$USE                   1.5t    0    0   4M INA WAI Idle       SQL*Net message fro 1.5t
 1   856    99179 EIMSOA   JDBC Thin DED SYS$USE                   1.3t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.3t
 1   857    77324 EIMSOA   JDBC Thin DED SYS$USE                   1.1t    0    0   4M INA WAI Idle       SQL*Net message fro 1.2t
 1   865     3337 EIMSOA   JDBC Thin DED SYS$USE                   1.8t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.8t
 1   905    42297 EIMSOA   JDBC Thin DED SYS$USE                   1.2t    0    0 4.3M INA WAI Idle       SQL*Net message fro 1.2t
 1    60    52837 ETLADMIN Toad.exe  DED SYS$USE                   127d    0    0 4.5M KIL WAI            SQL*Net message fro 8.8s

So you can easily check database session wait event details with SQL_ID and can go to more deeper analysis.

Disclaimer:Here is a sample code but please test in your environment before implementing this.Each environment is different so I can not guarantee it will run successfully.

Please mind that this has been tested from root user

#!/bin/bash
# Set environment variables explicitly
PATH=/opt/oracle/dcs/bin:/opt/oracle/oak/onecmd:/opt/oracle/dcs/bin:/opt/oracle/oak/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export PATH
# Get repository directory
repdir=$(/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl get repositorydir | grep repositorydir | awk -F "|" '{print $3}' | xargs)
echo "Repository directory: $repdir"
# Find oswtop directory
oswd=$(find "$repdir" -name "oswtop" -type d | head -1)
if [ -d "$oswd" ]; then
    echo "Found oswtop directory: $oswd"
    cd "$oswd" || exit 1
    # Find the first .dat file
    oswf=$(find . -name "*.dat" | sort | head -1)
    echo "First .dat file: $oswf"
    if [ -f "$oswf" ]; then
        line=$(grep -n "top -" "$oswf" | tail -1 | awk -F ":" '{print $1}')
        nline=$((line + 30))
        fname="/u10/scripts/dba/output/oswtop_$(hostname)_$(date '+%Y%m%d%H%M').log"
        echo "Output file: $fname"
        sed -n "${line},${nline}p" "$oswf" > "$fname"
        utilization=$(grep "%Cpu(s)" "$fname" | awk -F ":" '{print $2}' | awk '{print $1}')
        echo "CPU Utilization: $utilization"
        if (( $(echo "$utilization < 90" | bc -l) )); then
            echo "CPU utilization below threshold. Exiting."
            exit 0
        fi
        if [ -f "$fname" ]; then
            sed -i '/root/d' "$fname"
            spid=$(grep oracle "$fname" | head -1 | awk '{print $1}')
            echo "Oracle SPID: $spid"
            if [ -n "$spid" ] && grep -q ORACLE_SID "/proc/$spid/environ"; then
                export dbname=$(tr '\0' '\n' < "/proc/$spid/environ" | grep ORACLE_SID | awk -F "=" '{print $2}')
                echo $dbname > /home/oracle/dbname.txt
                su - oracle -c 'export dbname=`cat /home/oracle/dbname.txt`;/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl analyze -comp oratop -bn1 -database "$dbname" -f' >/home/oracle/oratop.log 2 >&1
                mailx -s "Check top oracle process from $(hostname)" <replace wih your emailid> < /home/oracle/oratop.log
            else
                echo "No ORACLE_SID found for SPID $spid"
            fi
        else
            echo "Output file not found!"
        fi
    else
        echo "No .dat file found!"
    fi
else
    echo "oswtop directory not found!"
fi

There is an youtube video you can refer that also...

Oracle Database performance tuning using oratop - tips and tricks by Sandesh Rao

Hope you have learnt something useful...

Migrate database from ODA X8-2 to X11-2

Thank you very much for your interest.Hope you will learn something useful.   Current Configuration Future configuration ...