Thanks for your interest. Today I will walk you through an interesting incident which was not known to me.
Please find the below environment details
odacli describe-system
Appliance Information
----------------------------------------------------------------
ID: ***********************
Platform: X8-2-HA
Data Disk Count: 48
CPU Core Count: 4
Created: November 21, 2022 6:03:28 PM MST
System Information
----------------------------------------------------------------
Name: ***************
Domain Name: **************
Time Zone: America/Phoenix
DB Edition: EE
DNS Servers: **************
NTP Servers: ***************
Disk Group Information
----------------------------------------------------------------
DG Name Redundancy Percentage
------------------------- ------------------------- ------------
DATA NORMAL 90
RECO NORMAL 10
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
We tried to stop and start cluster like below
crsctl stop cluster -all
crsctl start cluster -all
When we check crs resource status we found that asm was down.
Issue still persisists and we started digging the the asm alert log and found below error
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
Searching in metalink we found below document
ODA: ASM Instance Crashes with ORA-07445 [KFDP_ISPSTDSKIDEAL] When Remote ASM Instance is Shutdown (Doc ID 2959654.1
This document says that there is one asm metadata file callled asmappl.config which stores partner status table. If there is a mismatch between nodes asm instance will not come up.
Upon investigation we found that /opt/oracle/extapi/asmappl.config file has missing disk information in both the nodes.
Snippet of that file looks like this
cat /opt/oracle/extapi/asmappl.config
attr appliance_name ODA
attr diskstring AFD:*
attr file_version 2
attr oda_version 3
attr jbod_count 1
attr jbod_slot_count 24
attr data_slot_count 24
attr reco_slot_count 24
attr redo_slot_count 24
attr max_missing 0
attr min_partners 2
attr agent_sql_identifier "/*+ _OAK_AsmCookie "
attr rdbms_compatibility 12.1.0.2
attr asm_compatibility 12.2.0.1
attr _asm_hbeatiowait 100
disk AFD:SSD_E0_S00_1908716880P1 0 00 1 DATA
disk AFD:SSD_E0_S00_1908716880P2 0 00 2 RECO
disk AFD:SSD_E0_S01_1908717424P1 0 01 1 DATA
disk AFD:SSD_E0_S01_1908717424P2 0 01 2 RECO
disk AFD:SSD_E0_S02_1908700272P1 0 02 1 DATA
We created SR and suppoprt engineer gave us correct asmappl.config file.We have corrected in both the node and clusterware was restarted in both the node successfully.
Now you have understood how important is the file asmappl.config and thought process comes how to restore from good backup.For this you can schedule a job in crontab file and take a backup of asmappl,config file.
I will recommend to take snap using odabr utility.Basically below command will create a new LV named root_snap,u01_snap and opt_snap and give you more wider option of taking backup.
command is /opt/opt/odabr backup -snap
Now use lvdisplay command and you can see newly created LV with the name root_snap,u01_snap and opt_snap.Next question comes in mind how will you restore from that logical volumes since it is not presented as a file.
Very simple you create a directory under /root like mkdir -p /u02
Then mount the device path of that LV like below
mount /dev/mapper/VolGroupSys-opt_snap /u02
Then go inside the directory /u02/oracle/extapi and you can see asmappl.config file and can easily restore it after comparing.
Hope you have learnt something very useful...
\T