Tuesday, August 19, 2025

Vector similarity search using Oracle Database 23 ai and apex 24.2

 Thank you very much for your interest..

Problem statement: Many organizations uses various document store application to store their important process/technical documents. Those document repository grows over the time and finding right document during actual need becomes challenging. Specifically, someone who joins the team as a fresher find it difficult to find right documents for reference. This article demonstrates how to efficiently search using oracle database 23 ai as document store and APEX 24.2 as a frontend application.

Assumption: Prior knowledge in basic database administration and Apex workspace setup

Pre-requisite: -

1.       Install oracle database 23 ai software in engineered system like Exadata or Oracle database appliance

2.       Create a pluggable database and a dedicated tablespace for storing data related to documents.

3.       Create a schema(docadmin) with necessary privileges and create below 2 tables.I will explain in detail later how these tables work

CREATE TABLE "DOCSTORE"."MYDOCS1"

   ("ID" INTEGER GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 1 CACHE 20) PRIMARY KEY,

                    "FILENAME" VARCHAR2(500 BYTE),

                    "FILE_SIZE" NUMBER(*,0),

                    "FILE_TYPE" VARCHAR2(100 BYTE),

                    "FILE_CONTENT" BLOB

                      ) LOGGING;

CREATE TABLE "DOCSTORE"."VECTORE_STORE1"

   (               "DOCID" NUMBER(*,0) NOT NULL ENABLE,

                    "EMBED_ID" NUMBER,

                    "EMBED_DATA" VARCHAR2(4000 BYTE),

                    "EMBED_VECTOR" VECTOR,

                     FOREIGN KEY ("DOCID")

                      REFERENCES "DOCSTORE"."MYDOCS" ("ID") ENABLE);

4.        Choose a file path where we need to copy our documents and create an oracle directory based on that.

5.       Insert some document in the table mydocs using below insert statement from sqlplus or any other client using variable where FILE_TRANSFER is the name of the oracle directory

insert into docstore.mydocs(filename,file_size,file_type,file_content) values('&&filename',dbms_lob.getlength(to_blob(bfilename('FILE_TRANSFER','&&filename'))),'DOCX',to_blob(bfilename('FILE_TRANSFER','&&filename')));

commit;

Lets deep dive now how we can utilize vector database as backend and apex as a front end to support search based on RAG aka retrieval-augmented generation.

Login in your workspace and create a simple application from App builder.We want to create a search page where we can search based on similarity using RAG.To do that we have to create search configuration like below

Follow below navigation in apex

Shared Components>Search Configurations















Here we can see that provider name is showing as 'ALL_MINILM_L12_V2'.Lets take a pause and see in below steps how this is appearing in drop down list.

Ø  Download onnx model using below link

https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/onnx-pipeline-models-text-embedding.html#GUID-E7C08BA2-B2B9-4081-9050-B9EB3EA46FA6

once we unzip we can see file with .onnx model and a read me how to import that model within the database.It is self explanatory so just validate whether you can see imported model using below query.

select model_name, algorithm, mining_function from user_mining_models where  model_name='ALL_MINILM_L12_V2';

  

Ø  Apex is still not aware of this imported model.Go to Workspace Utilities and then Vector Providers. Create following entry so that Apex is aware of the imported model.Once it is done it will appear as drop down as a vector provider in search configuration.

A screenshot of a computer

AI-generated content may be incorrect.

 

Ø  Continue search configuration now. Here we provide our Label,Table/View name and column name which must be a vector column,in our case it is embed_vector.You can chose search type between exact and similar.Find below reference to know more details.

https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/understand-approximate-similarity-search-using-vector-indexes.html

 

A screenshot of a computer

AI-generated content may be incorrect.

 

Ø  If you want to use similarity search then set vector_meory_size database parameter as 1G and create below index.

CREATE VECTOR INDEX vector_store_idx ON docstore.vectore_store (embed_vector) ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE COSINE WITH TARGET ACCURACY 95;

Ø  Now focus on column mapping where we want to see real textual data not vectorized data.You are now ready to similarity search using vector database

 

Ø  Now,we are in the most critical steps of vectorizing our text/clob data.Every model has some limitation utpo which it can generate embeddings and that limitation depends on number of token in the text.For my case I have assumed it as 300 so I have split document content with 300 tokens and generated vector embedding based on that.Below is the code for generating vector embedding

insert into DOCSTORE.VECTORE_STORE (docid,embed_id,embed_data,embed_vector)

select id,embed_id,text_chunk,embed_vector from docstore.mydocs doc cross join table(

dbms_vector_chain.utl_to_embeddings(

dbms_vector_chain.utl_to_chunks(

                                dbms_vector_chain.utl_to_text(doc.file_content),

                                                                json('{"by":"words","max":"300","split":"sentence","normalize":"all"}')

                                                                ),

                                                                json('{"provider":"database","model":"ALL_MINILM_L12_V2"}')

                                                                )

                                                                ) embed

                                                                cross join json_table(

                                                                embed.column_value,

                                                                '$[*]' COLUMNS (

                                                                embed_id NUMBER PATH '$.embed_id',

                                                                text_chunk varchar2(4000) PATH '$.embed_data',

                                                                embed_vector CLOB PATH '$.embed_vector')) as et ;

Basically, we should write a trigger for every row insert on master table mydocs so that it can generate vector embedding and insert immediately in child table.

For more information refer below link

https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html

 

Ø  Once embedding is generated search using app like below.I have not searched with ORA- error rather I have written plain text asking oracle error

A close-up of a computer screen

AI-generated content may be incorrect.

Hope you have learned something useful.

Friday, August 8, 2025

Be at top with oratop

 Problem statement:Let's assume you are supporting multiple production databases running in a multiple node rac cluster and your OEM metric alert for CPU is setup for  warning and critical threshold. You can not make out anything out of that alert and you need to drill down what caused that spike.

Traditional approach is we login into the database server and try to find out top process using top command.Once we identify top process id we try to find whether those process id part of any database by joining v$session and v$process with a common filed as process address called PADDR

TOP itself utilizes lots of CPU so it is recommended not to use when your CPU utilization is more than 90%.So what is our alternative in that case?

Here oartop utility comes into play which is very handy and can be utilized efficiently with little bit scripting.

oartop is bundled with Autonomous Health Framework Aka AHF.

Algorithm to develop script

  1.  Get location of AHF repo directory. 
  2. Inside repo directory you will find a directory called oswtop which contains all top output already stored by OSWacther
  3. Read latest file and store CPU utilization from that file
  4. Read top oracle process and find corresponding database name
  5. Now use oratop utility in batch cycle mode so that you can redirect output
Sample output from oartop

oratop: Release 16.5.1 Production on Fri Aug  8 14:09:30 MST 2025

Copyright (c) 2011, Oracle.  All rights reserved.


Connecting
Cycle 1 - oratop: Release 16.5.1 Production on Fri Aug  8 14:09:31 MST 2025

Oracle 19c - 14:08:38 Primary  r/w ****** up: 164d,  16 sn,  1 ins,  3 er,  84T sz,   8G sga,    0%fra, archivelog           41.6%db
ID CPU  %CPU LOAD  AAS  ASC  ASI  ASW  ISW  REDO TEMP IORT MBPS IOPS  R/S  W/S  LIO GCPS  %FRE  PGA  NETV UTPS UCPS SQRT %DBC %DBW
----------------------------------------------------------------------------------------------------------------------------------
 1   2  31.5  2.5  0.8    1    0    0   15  1.1k   6M 133u  0.3 27.8 18.9  8.9  2.5    0  40.5 1.9G  193k    0  0.6 6.6m  6.5 35.1

EVENT (C)                                                              T/O  WAIT  TIME   AVG  %DBT                      WAIT_CLASS
----------------------------------------------------------------------------------------------------------------------------------
resmgr:cpu quantum                                                          1.4G  4.0y   89m  85.4                       Scheduler
RMAN backup & recovery I/O                                                  113M   89d   68m   5.3                      System I/O
DB CPU                                                                             70d         4.1                               
db file sequential read                                                     8.2G   59d  600u   3.5                        User I/O
Disk file operations I/O                                                     72M  4.4d  5.4m   0.3                        User I/O

ID   SID     SPID USERNAME PROGRAM   SRV SERVICE OPN SQLID/BLOCKER  E/T %CPU %LIO  PGA STS STE WAIT_CLASS EVENT/OBJECT NAME    W/T
----------------------------------------------------------------------------------------------------------------------------------
 1   142    66186 CCB_STAR SQL Devel DED SYS$USE                   128d    0    0 4.9M ACT RUN            On CPU               41u
 1    94    87860 DBSNMP   JDBC Thin DED SYS$USE                    11s    0    0 4.7M INA WAI Idle       SQL*Net message fro  11s
 1   121    75603 ETLADMIN replicat@ DED SYS$USE                   9.0s    0    0   6M INA WAI Idle       SQL*Net message fro 8.9s
 1   146    75618 ETLADMIN replicat@ DED SYS$USE                   2.0s    0    0 9.3M INA WAI Idle       SQL*Net message fro 1.6s
 1    95    30022 SYS      python3@t DED SYS$USE                    50s    0    0 5.7M INA WAI Idle       SQL*Net message fro  50s
 1    67    44678 EIMSOA   JDBC Thin DED SYS$USE                   1.4t    0    0 3.8M INA WAI Idle       SQL*Net message fro 1.4t
 1   907    38167 DBSNMP   JDBC Thin DED SYS$USE SEL 5qdptu5gnkayq 1.1t    0    0  22M INA WAI Idle       SQL*Net message fro 1.1t
 1    54     6950 EIMSOA   JDBC Thin DED SYS$USE                   1.6t    0    0   3M INA WAI Idle       SQL*Net message fro 1.7t
 1    55     8469 APPS     oracle@eb DED SYS$USE SEL 3zwvgah1vg9ug 2.2d    0    0 6.6M INA WAI Idle       SQL*Net message fro 1.2h
 1   842    30030 SYS      python3@t DED SYS$USE                    44s    0    0 5.1M INA WAI Idle       SQL*Net message fro  44s
 1   847    95526 DBSNMP   JDBC Thin DED SYS$USE                   1.5t    0    0   4M INA WAI Idle       SQL*Net message fro 1.5t
 1   856    99179 EIMSOA   JDBC Thin DED SYS$USE                   1.3t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.3t
 1   857    77324 EIMSOA   JDBC Thin DED SYS$USE                   1.1t    0    0   4M INA WAI Idle       SQL*Net message fro 1.2t
 1   865     3337 EIMSOA   JDBC Thin DED SYS$USE                   1.8t    0    0 3.7M INA WAI Idle       SQL*Net message fro 1.8t
 1   905    42297 EIMSOA   JDBC Thin DED SYS$USE                   1.2t    0    0 4.3M INA WAI Idle       SQL*Net message fro 1.2t
 1    60    52837 ETLADMIN Toad.exe  DED SYS$USE                   127d    0    0 4.5M KIL WAI            SQL*Net message fro 8.8s

So you can easily check database session wait event details with SQL_ID and can go to more deeper analysis.

Disclaimer:Here is a sample code but please test in your environment before implementing this.Each environment is different so I can not guarantee it will run successfully.

Please mind that this has been tested from root user

#!/bin/bash
# Set environment variables explicitly
PATH=/opt/oracle/dcs/bin:/opt/oracle/oak/onecmd:/opt/oracle/dcs/bin:/opt/oracle/oak/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export PATH
# Get repository directory
repdir=$(/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl get repositorydir | grep repositorydir | awk -F "|" '{print $3}' | xargs)
echo "Repository directory: $repdir"
# Find oswtop directory
oswd=$(find "$repdir" -name "oswtop" -type d | head -1)
if [ -d "$oswd" ]; then
    echo "Found oswtop directory: $oswd"
    cd "$oswd" || exit 1
    # Find the first .dat file
    oswf=$(find . -name "*.dat" | sort | head -1)
    echo "First .dat file: $oswf"
    if [ -f "$oswf" ]; then
        line=$(grep -n "top -" "$oswf" | tail -1 | awk -F ":" '{print $1}')
        nline=$((line + 30))
        fname="/u10/scripts/dba/output/oswtop_$(hostname)_$(date '+%Y%m%d%H%M').log"
        echo "Output file: $fname"
        sed -n "${line},${nline}p" "$oswf" > "$fname"
        utilization=$(grep "%Cpu(s)" "$fname" | awk -F ":" '{print $2}' | awk '{print $1}')
        echo "CPU Utilization: $utilization"
        if (( $(echo "$utilization < 90" | bc -l) )); then
            echo "CPU utilization below threshold. Exiting."
            exit 0
        fi
        if [ -f "$fname" ]; then
            sed -i '/root/d' "$fname"
            spid=$(grep oracle "$fname" | head -1 | awk '{print $1}')
            echo "Oracle SPID: $spid"
            if [ -n "$spid" ] && grep -q ORACLE_SID "/proc/$spid/environ"; then
                export dbname=$(tr '\0' '\n' < "/proc/$spid/environ" | grep ORACLE_SID | awk -F "=" '{print $2}')
                echo $dbname > /home/oracle/dbname.txt
                su - oracle -c 'export dbname=`cat /home/oracle/dbname.txt`;/opt/oracle/dcs/oracle.ahf/tfa/bin/tfactl analyze -comp oratop -bn1 -database "$dbname" -f' >/home/oracle/oratop.log 2 >&1
                mailx -s "Check top oracle process from $(hostname)" <replace wih your emailid> < /home/oracle/oratop.log
            else
                echo "No ORACLE_SID found for SPID $spid"
            fi
        else
            echo "Output file not found!"
        fi
    else
        echo "No .dat file found!"
    fi
else
    echo "oswtop directory not found!"
fi

There is an youtube video you can refer that also...

Oracle Database performance tuning using oratop - tips and tricks by Sandesh Rao

Hope you have learnt something useful...

Friday, August 1, 2025

Killer CLOB in SQL query

 Thank you very much for your intereset.I hope you will learn something new.

Problem statement: Recently I faced an query performance issue where one of the query was accessing clob column as one of the predicate but it has no index.Below steps will guide you how to troubleshoot and resolve this problem.

 

Please find the below execution plan what was using by the query

 

PLAN_TABLE_OUTPUT                                                                                                                                                                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID  asnvud9knm10w, child number 0                                                                                                                                                                  

-------------------------------------                                                                                                                                                                  

SELECT      CRE_DTTM     , OUTMSG_ID FROM      F1_OUTMSG  WHERE                                                                                                                                        

OUTMSG_TYPE_CD = :1      AND XML_SOURCE LIKE :2  /* SQL for Zone                                                                                                                                       

'CM_RETOUTMSG' */ /* com.splwg.base.web.dataExplorer.common.DataExplorer                                                                                                                               

SQLExecuter */                                                                                                                                                                                         

                                                                                                                                                                                                       

Plan hash value: 1903859126                                                                                                                                                                            

                                                                                                                                                                                                       

-------------------------------------------------------------------------------------------------                                                                                                      

| Id  | Operation                           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                      

-------------------------------------------------------------------------------------------------                                                                                                      

|   0 | SELECT STATEMENT                    |           |       |       | 44458 (100)|          |                                                                                                      

|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| F1_OUTMSG |  4746 |  7791K| 44458   (1)| 00:00:02 |                                                                                                      

|*  2 |   INDEX SKIP SCAN                   | FT010S1   | 94919 |       |  2003   (1)| 00:00:01 |                                                                                                      

-------------------------------------------------------------------------------------------------                                                                                                      

                                                                                                                                                                                                       

Outline Data                                                                                                                                                                                           

-------------                                                                                                                                                                                          

                                                                                                                                                                                                       

  /*+                                                                                                                                                                                                  

      BEGIN_OUTLINE_DATA                                                                                                                                                                               

      IGNORE_OPTIM_EMBEDDED_HINTS                                                                                                                                                                      

      OPTIMIZER_FEATURES_ENABLE('19.1.0')                                                                                                                                                              

      DB_VERSION('19.1.0')                                                                                                                                                                             

      OPT_PARAM('optimizer_index_caching' 100)                                                                                                                                                         

      ALL_ROWS                                                                                                                                                                                         

      OUTLINE_LEAF(@"SEL$1")                                                                                                                                                                           

      INDEX_SS(@"SEL$1" "F1_OUTMSG"@"SEL$1" ("F1_OUTMSG"."OUTMSG_STATUS_FLG"                                                                                                                           

              "F1_OUTMSG"."OUTMSG_TYPE_CD"))                                                                                                                                                           

      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "F1_OUTMSG"@"SEL$1")                                                                                                                                        

      END_OUTLINE_DATA                                                                                                                                                                                 

  */                                                                                                                                                                                                   

                                                                                                                                                                                                       

Peeked Binds (identified by position):                                                                                                                                                                 

--------------------------------------                                                                                                                                                                 

                                                                                                                                                                                                       

   1 - :1 (CHAR(30), CSID=871): 'CM_MWM_FA'                                                                                                                                                            

   2 - :2 (CHAR(30), CSID=871): '%11547883006349%'                                                                                                                                                     

                                                                                                                                                                                                       

Predicate Information (identified by operation id):                                                                                                                                                    

---------------------------------------------------                                                                                                                                                    

                                                                                                                                                                                                       

   1 - filter("XML_SOURCE" /*+ LOB_BY_VALUE */  LIKE :2)                                                                                                                                               

   2 - access("OUTMSG_TYPE_CD"=:1)                                                                                                                                                                     

       filter("OUTMSG_TYPE_CD"=:1)                                                                                                                                                                     

                                                                                                                                                                                                       

 

47 rows selected.

 

Plan Statistics extracted using awrsqrpt shows high elapsed time,CPU time and I/O wait time

  • % Snap Total shows the % of the statistic for the SQL statement compared to the instance total

Stat Name

Statement Total

Per Execution

% Snap Total

Elapsed Time (ms)

100,290,508

33,733.77

29.16

CPU Time (ms)

67,214,569

22,608.33

28.20

Executions

2,973

1.00

0.00

Buffer Gets

913,937,049

307,412.39

4.79

Disk Reads

839,221,506

282,281.03

34.62

Parse Calls

179

0.06

0.00

Rows

2,959

1.00

 

User I/O Wait Time (ms)

37,445,098

12,595.05

33.46

Cluster Wait Time (ms)

0

0.00

0.00

Application Wait Time (ms)

0

0.00

0.00

Concurrency Wait Time (ms)

5,654

1.90

1.00

Invalidations

0

 

 

Version Count

47

 

 

Sharable Mem(KB)

1,531

 

 

We will test this manually using bind variables we have found in above steps

Created script like below to test it using bind variables

 

set serverout on

set time on

alter session set current_schema=CISUSER;

alter session set nls_date_format='/mm/dd/yyyy HH24:MI:SS';

declare

v1 CHAR(128) :='CMAMIMTRONOF';                                                 

v2 CHAR(128) :='%89928748669102%'; 

vSQL varchar2(4000);

begin                                

vSQL :='SELECT      CRE_DTTM     , OUTMSG_ID FROM      F1_OUTMSG  WHERE      OUTMSG_TYPE_CD = :1      AND XML_SOURCE LIKE :2  /* SQL for Zone ''CM_RETOUTMSG'' */ /* com.splwg.base.web.dataExplorer.common.DataExplorerSQLExecuter */';  

dbms_output.put_line(vSQL);

execute immediate vSQL using v1,v2;

end;

/

 

Recent execution highlighted below shows same plan hash value with zero rows but previous execution per rows becomes  (23312935043/1000000/676=34.48) which is very high for OLTP application.

 

CHILD_NUMBER LAST_LOAD_TIME                 PLAN_HASH_VALUE SQL_PLAN_BASELINE                  ELAPSED_TIME ROWS_PROCESSED

------------               --------------------------            ---------------------------- ------------------------------               --------------------- --------------

           0                  2025-07-16/03:02:41          1903859126                                                                              23312935043            676

           1                 2025-07-16/08:01:30           1903859126                                                                              30535504            102

           2                2025-07-16/08:19:22            1903859126                                                                              1686              0

 

Since above bind variables does not return any value so I have filtered based on highlighted predicate to get the outmsg_id and using that I have found some matching data

SELECT      CRE_DTTM     , OUTMSG_ID FROM      F1_OUTMSG  WHERE      OUTMSG_TYPE_CD = 'CMAMIMTRONOF'      AND XML_SOURCE LIKE '%58343768923659%'  /* SQL for Zone ''CM_RETOUTMSG'' */ /* com.splwg.base.web.dataExplorer.common.DataExplorerSQLExecuter */;

 

 

Created below domain index

CREATE INDEX cisadm.cm_idex_xml_source tablespace cists_01 ON cisadm.f1_outmsg(xml_source) INDEXTYPE IS CTXSYS.CONTEXT;   

 

When I rewrite the query like this I get below execution plan

SQL> SELECT      CRE_DTTM     , OUTMSG_ID FROM      F1_OUTMSG  WHERE OUTMSG_TYPE_CD = 'CMAMIMTRONOF' and contains(XML_SOURCE,'2657136589')>0;

 

Execution Plan

----------------------------------------------------------

 

---------------------------------------------------------------------------------------

| Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)|

---------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                    |     1 |  1689 |     1   (0)|

|*  1 |  TABLE ACCESS BY INDEX ROWID| F1_OUTMSG          |     1 |  1689 |     1   (0)|

|*  2 |   DOMAIN INDEX              | CM_IDEX_XML_SOURCE | 16801 |       |     1   (0)|

---------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("OUTMSG_TYPE_CD"='CMAMIMTRONOF')

   2 - access("CTXSYS"."CONTAINS"("XML_SOURCE",'2657136589')>0)

 

So we see that I/O and CPU cost has come down significantly.

 

OLD execution plan:-

Execution Plan

----------------------------------------------------------

 

--------------------------------------------------------------------------------------

| Id  | Operation                           | Name      | Rows  | Bytes | Cost (%CPU)|

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |           |   841 |  1377K|  9158   (1)|

|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| F1_OUTMSG |   841 |  1377K|  9158   (1)|

|*  2 |   INDEX SKIP SCAN                   | FT010S1   | 16819 |       |  1612   (0)|

--------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("XML_SOURCE" /*+ LOB_BY_VALUE */  LIKE '%58343768923659%')

   2 - access("OUTMSG_TYPE_CD"='CMAMIMTRONOF')

       filter("OUTMSG_TYPE_CD"='CMAMIMTRONOF')

 

So you can see CPU and I/O cost is very hish compared to new plan which is found  after creating domain index and changing the query.

 

Conclusion: When there is a predicate in your query which is accessing CLOB column better create Oracle Text based index.


Migrate database from ODA X8-2 to X11-2

Thank you very much for your interest.Hope you will learn something useful.   Current Configuration Future configuration ...