This is a long overdue post which arose initially from a discussion with Philipp Hartenfeller way back in August 2024. I made a note to blog about it, headed off to CloudWorld and of course promptly forgot about it 😃

When you are tinkering with the cool new vector features in 23ai FREE, you might hit some problems when you try to create a vector index.



SQL> create vector index t_ix on t (v) organization neighbor partitions
  2  distance cosine
  3  with target accuracy 80;

ERROR at line 1:
ORA-51908: Centroid assignment failed for the neighbor partitions vector index.

After digging around a bit, the root cause was discovered in the alert log.

ORA-00700: soft internal error, arguments: [IVF Index Creation: failed to assign centroid id], [12954], [125815], [RAG_CHUNKS_DISTABCE_IDX],
[ORA-12954: The request exceeds the maximum allowed database size of 12 GB.]

At first glance you may be thinking “Well, it was probably a large table, and thus a large index took us over the 12G free limit“. However, you might hit this issue even if your source table is quite small. Let’s explore this with a test case on a standard 23ai database with no space constraints.

I’ll create a table with 60,000 rows with a vector column in each row. The vector in each row will have 3000 dimensions, and I’ll built it with the LATERAL join. (For details on why I think LATERAL is so cool, check out my video here on that)

 



SQL> create table t as
  2  select rownum r, vector(x) v
  3  from
  4   ( select 5 y from dual connect by level <= 60000 ),
  5   lateral
  6    (
  7    select '['||listagg(round(dbms_random.value(-y,y),5),',') ||']' x
  8    from dual
  9    connect by level <= 3072
 10   );

Table created.

You can see my table occupies ~1GB of space, being the table data itself and the two LOB segments that support the VECTOR datatype (a LOB and LOB INDEX segment respectively).



SQL> select count(*), sum(bytes)/1024/1024/1024 gb from user_segments
  2  where tablespace_name = 'VEC';

  COUNT(*)         GB
---------- ----------
         3 1.05596924

But 1GB is way under the 12GB limit on 23ai FREE, so it would seem odd that I would burn another 11G just for an index. Let’s go ahead and add that index.



SQL> create vector index t_ix on t (v) organization neighbor partitions
  2  distance cosine
  3  with target accuracy 80;

Index created.

Unlike the 23ai FREE instance, the index creates without problems, but lets take a look at the space used now in my tablespace.



SQL> select count(*), sum(bytes)/1024/1024/1024 gb from user_segments
  2  where tablespace_name = 'VEC';

  COUNT(*)         GB
---------- ----------
      1472 14.8938599

Wow! Another 13GB of space and nearly 1500 segments! This is not a bug – it is actually by design. A vector take a “divide and conquer” approach to indexing. Conceptually we are trying to arrange the vectors into logical groups, so that when searching for the vector that is closest to a presented input vector, we only need to scan a subset of the entire set of vectors in the table. When you think “logical groups” then (hopefully) partitioning is a technology to comes to your mind, and that is exactly what we do. Our vector indexes take advantage of our partitioning features.



SQL> select count(*) from user_tab_partitions
  2  where tablespace_name = 'VEC';

  COUNT(*)
----------
       489

Partitioning now “serves two masters” so to speak. It is being used to handle vector indexes, but also we have always had partitioning to handle very large databases. For the latter requirement, partitioning has an in-built storage allocation efficiency mechanism built into it, namely, when allocating extents in a tablespace, we allocate them in 8MB chunks rather than the typical 64K in an auto-allocate locally managed tablespace.

Thus for the ~500 partitions in our new vector index, we went ahead and grabbed 500 x 8MB partitions for the table, and similarly for any of the secondary segments required to support our vector index. For a large table, this is the ideal thing to do.

But for your experiments on a space limited 23ai FREE instance…it naturally is not ideal 😃

Luckily, for experiments on 23ai FREE, you can override this default behaviour for partition extent size. We’ll drop the index and recreate it after setting the hidden parameter “_partition_large_extents”.



SQL> drop index t_ix;

Index dropped.

SQL> alter session set "_partition_large_extents" = false;

Session altered.

SQL> create vector index t_ix on t (v) organization neighbor partitions
  2  distance cosine
  3  with target accuracy 80;

Index created.

SQL> select sum(bytes)/1024/1024/1024 gb from user_segments
  2  where tablespace_name = 'VEC';

        GB
----------
2.77191162

Now our index fits easily within the space constraints of 23ai FREE, and you are explore all the cool features with vectors and vector indexes in 23ai FREE.

Enjoy!

Got some thoughts? Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending