Showing posts with label ASM exploration. Show all posts
Showing posts with label ASM exploration. Show all posts

Wednesday, December 11, 2024

Explore fun inside the ASM in ODA

 Thanks for your interest. Today I will walk you through an interesting incident which was not known to me.

Please find the below environment details

odacli describe-system

Appliance Information

----------------------------------------------------------------

                     ID: ***********************

               Platform: X8-2-HA

        Data Disk Count: 48

         CPU Core Count: 4

                Created: November 21, 2022 6:03:28 PM MST

System Information

----------------------------------------------------------------

                   Name: ***************

            Domain Name: **************

              Time Zone: America/Phoenix

             DB Edition: EE

            DNS Servers: **************

            NTP Servers: ***************

Disk Group Information

----------------------------------------------------------------

DG Name                   Redundancy                Percentage

------------------------- ------------------------- ------------

DATA                      NORMAL                    90

RECO                      NORMAL                    10

To fix some issue it was decided to reboot node.We got the outage from application and initiated reboot.Two nodes came back online but when we check crs status we found CRS is down.

crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

We tried to stop and start cluster like below

crsctl stop cluster -all

crsctl start cluster -all

When we check crs resource status we found that asm was down.

Issue still persisists and we started digging the the asm alert log and found below error

WARNING: Disk Group DATA containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted


We tried to start up ASM instance manually but it failed with below error

ERROR: unrecoverable error ORA-15311 raised in ASM I/O path; terminating process 45315 
Process termination requested for pid 45315 [source = rdbms], [info = 2] [request issued by pid: 45315, uid: 1000]
2024-11-21T12:41:12.327170-07:00
PMON (ospid: 45112): terminating the instance due to ORA error 493


There were some incident reported in ADR repo with below ORA-07445

ORA-07445: exception encountered: core dump [kfdp_isPstDskIdeal()+777] [SIGSEGV] [ADDR:0x0] [PC:0x6A6DA29] [Address not mapped to object] []

Searching in metalink we found below document

ODA: ASM Instance Crashes with ORA-07445 [KFDP_ISPSTDSKIDEAL] When Remote ASM Instance is Shutdown (Doc ID 2959654.1

This document says that there is one asm metadata file callled asmappl.config which stores partner status table. If there is a mismatch between nodes asm instance will not come up.

Upon investigation we found that /opt/oracle/extapi/asmappl.config file has missing disk information in both the nodes.

Snippet of that file looks like this


 cat /opt/oracle/extapi/asmappl.config

attr max_disk_count 100
attr appliance_name  ODA
attr diskstring      AFD:*
attr file_version  2
attr oda_version   3
attr jbod_count    1
attr jbod_slot_count 24
attr data_slot_count 24
attr reco_slot_count 24
attr redo_slot_count 24
attr max_missing     0
attr min_partners    2
attr agent_sql_identifier  "/*+ _OAK_AsmCookie "
attr rdbms_compatibility   12.1.0.2
attr asm_compatibility     12.2.0.1
attr _asm_hbeatiowait 100
disk  AFD:SSD_E0_S00_1908716880P1               0                     00               1      DATA
disk  AFD:SSD_E0_S00_1908716880P2               0                     00               2      RECO
disk  AFD:SSD_E0_S01_1908717424P1               0                     01               1      DATA
disk  AFD:SSD_E0_S01_1908717424P2               0                     01               2      RECO
disk  AFD:SSD_E0_S02_1908700272P1               0                     02               1      DATA

partners DATA 0.00  0.02  0.03  0.06  0.07  0.10  0.11  0.14  0.15
partners DATA 0.01  0.02  0.03  0.06  0.07  0.10  0.11  0.14  0.15
partners DATA 0.02  0.00  0.01  0.20  0.21  0.16  0.17  0.12  0.13
partners DATA 0.03  0.00  0.01  0.20  0.21  0.16  0.17  0.12  0.13
partners DATA 0.04  0.06  0.07  0.10  0.11  0.14  0.15  0.18  0.19
partners DATA 0.05  0.06  0.07  0.10  0.11  0.14  0.15  0.18  0.19

Above contains partner disk information which means
disk 0,2,3,6,7,10,11,14,15 are partners while creating redundant copy of data.If disk in slot 0 fails then rebalance of data will happen among partners of the lost disks.

Please go through below link for more details and visualization.


We created SR and suppoprt engineer gave us correct asmappl.config file.We have corrected in both the node and clusterware was restarted in both the node successfully.

Now you have understood how important is the file asmappl.config and thought process comes how to restore from good backup.For this you can schedule a job in crontab file and take a backup of asmappl,config file.

I will recommend to take snap using odabr utility.Basically below command will create a new LV named root_snap,u01_snap and opt_snap and give you more wider option of taking backup.

command is /opt/opt/odabr backup -snap

Now use lvdisplay command and you can see newly created LV with the name root_snap,u01_snap and opt_snap.Next question comes in mind how will you restore from that logical volumes since it is not presented as a file.

Very simple you create a directory under /root like mkdir -p /u02

Then mount the device path of that LV like below

mount /dev/mapper/VolGroupSys-opt_snap /u02

Then go inside the directory /u02/oracle/extapi and you can see asmappl.config file and can easily restore it after comparing.

Hope you have learnt something very useful...










\T

When SQL plan baseline is not used

 Thank you very much for your interest. Problem Statement: Lets assume you are reported about an application query running slow in one envir...