Tuesday, August 19, 2025

Vector similarity search using Oracle Database 23 ai and apex 24.2

 Thank you very much for your interest..

Problem statement: Many organizations uses various document store application to store their important process/technical documents. Those document repository grows over the time and finding right document during actual need becomes challenging. Specifically, someone who joins the team as a fresher find it difficult to find right documents for reference. This article demonstrates how to efficiently search using oracle database 23 ai as document store and APEX 24.2 as a frontend application.

Assumption: Prior knowledge in basic database administration and Apex workspace setup

Pre-requisite: -

1.       Install oracle database 23 ai software in engineered system like Exadata or Oracle database appliance

2.       Create a pluggable database and a dedicated tablespace for storing data related to documents.

3.       Create a schema(docadmin) with necessary privileges and create below 2 tables.I will explain in detail later how these tables work

CREATE TABLE "DOCSTORE"."MYDOCS1"

   ("ID" INTEGER GENERATED BY DEFAULT ON NULL AS IDENTITY ( START WITH 1 CACHE 20) PRIMARY KEY,

                    "FILENAME" VARCHAR2(500 BYTE),

                    "FILE_SIZE" NUMBER(*,0),

                    "FILE_TYPE" VARCHAR2(100 BYTE),

                    "FILE_CONTENT" BLOB

                      ) LOGGING;

CREATE TABLE "DOCSTORE"."VECTORE_STORE1"

   (               "DOCID" NUMBER(*,0) NOT NULL ENABLE,

                    "EMBED_ID" NUMBER,

                    "EMBED_DATA" VARCHAR2(4000 BYTE),

                    "EMBED_VECTOR" VECTOR,

                     FOREIGN KEY ("DOCID")

                      REFERENCES "DOCSTORE"."MYDOCS" ("ID") ENABLE);

4.        Choose a file path where we need to copy our documents and create an oracle directory based on that.

5.       Insert some document in the table mydocs using below insert statement from sqlplus or any other client using variable where FILE_TRANSFER is the name of the oracle directory

insert into docstore.mydocs(filename,file_size,file_type,file_content) values('&&filename',dbms_lob.getlength(to_blob(bfilename('FILE_TRANSFER','&&filename'))),'DOCX',to_blob(bfilename('FILE_TRANSFER','&&filename')));

commit;

Lets deep dive now how we can utilize vector database as backend and apex as a front end to support search based on RAG aka retrieval-augmented generation.

Login in your workspace and create a simple application from App builder.We want to create a search page where we can search based on similarity using RAG.To do that we have to create search configuration like below

Follow below navigation in apex

Shared Components>Search Configurations















Here we can see that provider name is showing as 'ALL_MINILM_L12_V2'.Lets take a pause and see in below steps how this is appearing in drop down list.

Ø  Download onnx model using below link

https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/onnx-pipeline-models-text-embedding.html#GUID-E7C08BA2-B2B9-4081-9050-B9EB3EA46FA6

once we unzip we can see file with .onnx model and a read me how to import that model within the database.It is self explanatory so just validate whether you can see imported model using below query.

select model_name, algorithm, mining_function from user_mining_models where  model_name='ALL_MINILM_L12_V2';

  

Ø  Apex is still not aware of this imported model.Go to Workspace Utilities and then Vector Providers. Create following entry so that Apex is aware of the imported model.Once it is done it will appear as drop down as a vector provider in search configuration.

A screenshot of a computer

AI-generated content may be incorrect.

 

Ø  Continue search configuration now. Here we provide our Label,Table/View name and column name which must be a vector column,in our case it is embed_vector.You can chose search type between exact and similar.Find below reference to know more details.

https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/understand-approximate-similarity-search-using-vector-indexes.html

 

A screenshot of a computer

AI-generated content may be incorrect.

 

Ø  If you want to use similarity search then set vector_meory_size database parameter as 1G and create below index.

CREATE VECTOR INDEX vector_store_idx ON docstore.vectore_store (embed_vector) ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE COSINE WITH TARGET ACCURACY 95;

Ø  Now focus on column mapping where we want to see real textual data not vectorized data.You are now ready to similarity search using vector database

 

Ø  Now,we are in the most critical steps of vectorizing our text/clob data.Every model has some limitation utpo which it can generate embeddings and that limitation depends on number of token in the text.For my case I have assumed it as 300 so I have split document content with 300 tokens and generated vector embedding based on that.Below is the code for generating vector embedding

insert into DOCSTORE.VECTORE_STORE (docid,embed_id,embed_data,embed_vector)

select id,embed_id,text_chunk,embed_vector from docstore.mydocs doc cross join table(

dbms_vector_chain.utl_to_embeddings(

dbms_vector_chain.utl_to_chunks(

                                dbms_vector_chain.utl_to_text(doc.file_content),

                                                                json('{"by":"words","max":"300","split":"sentence","normalize":"all"}')

                                                                ),

                                                                json('{"provider":"database","model":"ALL_MINILM_L12_V2"}')

                                                                )

                                                                ) embed

                                                                cross join json_table(

                                                                embed.column_value,

                                                                '$[*]' COLUMNS (

                                                                embed_id NUMBER PATH '$.embed_id',

                                                                text_chunk varchar2(4000) PATH '$.embed_data',

                                                                embed_vector CLOB PATH '$.embed_vector')) as et ;

Basically, we should write a trigger for every row insert on master table mydocs so that it can generate vector embedding and insert immediately in child table.

For more information refer below link

https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html

 

Ø  Once embedding is generated search using app like below.I have not searched with ORA- error rather I have written plain text asking oracle error

A close-up of a computer screen

AI-generated content may be incorrect.

Hope you have learned something useful.

No comments:

Post a Comment

Migrate database from ODA X8-2 to X11-2

Thank you very much for your interest.Hope you will learn something useful.   Current Configuration Future configuration ...