Explore Oracle Stuff: December 2024

Thanks for your interest. Today I will walk you through an interesting incident which was not known to me.

Please find the below environment details

odacli describe-system

Appliance Information

----------------------------------------------------------------

ID: ***********************

Platform: X8-2-HA

Data Disk Count: 48

CPU Core Count: 4

Created: November 21, 2022 6:03:28 PM MST

System Information

----------------------------------------------------------------

Name: ***************

Domain Name: **************

Time Zone: America/Phoenix

DB Edition: EE

DNS Servers: **************

NTP Servers: ***************

Disk Group Information

----------------------------------------------------------------

DG Name Redundancy Percentage

------------------------- ------------------------- ------------

DATA NORMAL 90

RECO NORMAL 10

To fix some issue it was decided to reboot node.We got the outage from application and initiated reboot.Two nodes came back online but when we check crs status we found CRS is down.

crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

We tried to stop and start cluster like below

crsctl stop cluster -all

crsctl start cluster -all

When we check crs resource status we found that asm was down.

Issue still persisists and we started digging the the asm alert log and found below error

WARNING: Disk Group DATA containing voting files is not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted

We tried to start up ASM instance manually but it failed with below error

ERROR: unrecoverable error ORA-15311 raised in ASM I/O path; terminating process 45315

Process termination requested for pid 45315 [source = rdbms], [info = 2] [request issued by pid: 45315, uid: 1000]

2024-11-21T12:41:12.327170-07:00

PMON (ospid: 45112): terminating the instance due to ORA error 493

There were some incident reported in ADR repo with below ORA-07445

ORA-07445: exception encountered: core dump [kfdp_isPstDskIdeal()+777] [SIGSEGV] [ADDR:0x0] [PC:0x6A6DA29] [Address not mapped to object] []

Searching in metalink we found below document

ODA: ASM Instance Crashes with ORA-07445 [KFDP_ISPSTDSKIDEAL] When Remote ASM Instance is Shutdown (Doc ID 2959654.1

This document says that there is one asm metadata file callled asmappl.config which stores partner status table. If there is a mismatch between nodes asm instance will not come up.

Upon investigation we found that /opt/oracle/extapi/asmappl.config file has missing disk information in both the nodes.

Snippet of that file looks like this

cat /opt/oracle/extapi/asmappl.config

attr max_disk_count 100
attr appliance_name ODA
attr diskstring AFD:*
attr file_version 2
attr oda_version 3
attr jbod_count 1
attr jbod_slot_count 24
attr data_slot_count 24
attr reco_slot_count 24
attr redo_slot_count 24
attr max_missing 0
attr min_partners 2
attr agent_sql_identifier "/*+ _OAK_AsmCookie "
attr rdbms_compatibility 12.1.0.2
attr asm_compatibility 12.2.0.1
attr _asm_hbeatiowait 100
disk AFD:SSD_E0_S00_1908716880P1 0 00 1 DATA
disk AFD:SSD_E0_S00_1908716880P2 0 00 2 RECO
disk AFD:SSD_E0_S01_1908717424P1 0 01 1 DATA
disk AFD:SSD_E0_S01_1908717424P2 0 01 2 RECO
disk AFD:SSD_E0_S02_1908700272P1 0 02 1 DATA

partners DATA 0.00 0.02 0.03 0.06 0.07 0.10 0.11 0.14 0.15

partners DATA 0.01 0.02 0.03 0.06 0.07 0.10 0.11 0.14 0.15

partners DATA 0.02 0.00 0.01 0.20 0.21 0.16 0.17 0.12 0.13

partners DATA 0.03 0.00 0.01 0.20 0.21 0.16 0.17 0.12 0.13

partners DATA 0.04 0.06 0.07 0.10 0.11 0.14 0.15 0.18 0.19

partners DATA 0.05 0.06 0.07 0.10 0.11 0.14 0.15 0.18 0.19

Above contains partner disk information which means

disk 0,2,3,6,7,10,11,14,15 are partners while creating redundant copy of data.If disk in slot 0 fails then rebalance of data will happen among partners of the lost disks.

Please go through below link for more details and visualization.

Oracle Database Appliance Disk Partnership on High-Availability Storage

We created SR and suppoprt engineer gave us correct asmappl.config file.We have corrected in both the node and clusterware was restarted in both the node successfully.

Now you have understood how important is the file asmappl.config and thought process comes how to restore from good backup.For this you can schedule a job in crontab file and take a backup of asmappl,config file.

I will recommend to take snap using odabr utility.Basically below command will create a new LV named root_snap,u01_snap and opt_snap and give you more wider option of taking backup.

command is /opt/opt/odabr backup -snap

Now use lvdisplay command and you can see newly created LV with the name root_snap,u01_snap and opt_snap.Next question comes in mind how will you restore from that logical volumes since it is not presented as a file.

Very simple you create a directory under /root like mkdir -p /u02

Then mount the device path of that LV like below

mount /dev/mapper/VolGroupSys-opt_snap /u02

Then go inside the directory /u02/oracle/extapi and you can see asmappl.config file and can easily restore it after comparing.

Hope you have learnt something very useful...

Explore Oracle Stuff

Wednesday, December 11, 2024

Explore fun inside the ASM in ODA

When SQL plan baseline is not used

Report Abuse

Labels