The new vector datatype and associated vector indexes are two of the big features in Oracle Database 23ai. The ability to use machine learning models to do quick and efficient semantic searches I feel will really change the concept of searching through data in your database. No more of those embarrassing conversations we used to have with customers: “Well, you searched for ‘HK’ not ‘Hong Kong’ which is why you got no results” and the like.
If you are exploring vector indexes in 23ai, there are currently some nuances you need to be aware of when it comes to how they are stored. To demonstrate, I’ll create a user to build some vector data.
SQL> grant db_developer_role to demo identified by demo;
Grant succeeded.
SQL> alter user demo quota unlimited on users;
User altered.
SQL> alter user demo quota unlimited on vec;
User altered.
Now I’ll create a table with the new VECTOR data type. This also shows that we don’t need to use any elaborate machine learning facilities to build vectors – they are just arrays of numbers, so I’m using DBMS_RANDOM and LISTAGG to build some random number string lists to convert to vectors. (Of course, these vector values are not of much use for anything besides this demo 🙂 )
Notice that I’ve explicitly nominated that the table should be stored in the VEC tablespace.
SQL> conn demo/demo@db23
Connected.
SQL> create table t tablespace vec as
2 select rownum r, vector(x) v
3 from
4 ( select 5 y from dual connect by level <= 1000 ),
5 lateral
6 (
7 select '['||listagg(round(dbms_random.value(-y,y),5),',') ||']' x
8 from dual
9 connect by level <= 3072
10 );
Table created.
SQL> select segment_name, tablespace_name, sum(bytes)
2 from user_segments
3 group by segment_name, tablespace_name;
SEGMENT_NAME TABLESPACE_NAME SUM(BYTES)
------------------------------ -------------------- ----------
T VEC 131072
SYS_IL0000089848C00002$ VEC 65536
SYS_LOB0000089848C00002$ VEC 25427968
Once the table is created, you can see above that (as expected) the table and the underlying LOB segments where are vectors reside, are all present in the VEC tablespace.
Now I’ll create a vector index on the vector column. Conceptually this is similar to Text indexes, in that we are creating a number of internal structures to support the index. A vector “index” consists of several tables, indexes and LOB segment to implement the functionality. I have also explicitly nominated that the index will be stored in the VEC tablespace, but look at where the resultant segments have been stored.
SQL> create vector index t_ix on t (v) organization neighbor partitions
2 distance cosine
3 with target accuracy 80
4 tablespace vec;
Index created.
SQL> select segment_name, tablespace_name, sum(bytes)
2 from user_segments
3 group by segment_name, tablespace_name;
SEGMENT_NAME TABLESPACE_NAME SUM(BYTES)
------------------------------ -------------------- ----------
SYS_C0018376 USERS 65536
SYS_C0018378 USERS 131072
SYS_IL0000089848C00002$ VEC 65536
SYS_IL0000089852C00002$ USERS 65536
SYS_IL0000089856C00003$ USERS 3997696
SYS_LOB0000089848C00002$ VEC 25427968
SYS_LOB0000089852C00002$ USERS 2359296
SYS_LOB0000089856C00003$ USERS 511705088
T VEC 131072
VECTOR$T_IX$89848_89851_0$IVF_ USERS 65536
VECTOR$T_IX$89848_89851_0$IVF_ USERS 1023410176
11 rows selected.
Currently, vector indexes will consume storage as defined by the DEFAULT TABLESPACE property for the database user that owns the index. The TABLESPACE attribute does not come into play.
Hence if you want to control the tablespace for the vector index, you can alter the default tablespace for the user before creating the index.
SQL> drop index t_ix;
Index dropped.
SQL> alter user demo default tablespace vec;
User altered.
SQL> create vector index t_ix on t (v) organization neighbor partitions
2 distance cosine
3 with target accuracy 80
4 tablespace vec;
Index created.
SQL> select segment_name, tablespace_name, sum(bytes)
2 from user_segments
3 group by segment_name, tablespace_name;
SEGMENT_NAME TABLESPACE_NAME SUM(BYTES)
------------------------------ -------------------- ----------
SYS_C0018379 VEC 65536
SYS_C0018381 VEC 131072
SYS_IL0000089848C00002$ VEC 65536
SYS_IL0000090049C00002$ VEC 65536
SYS_IL0000090053C00003$ VEC 3997696
SYS_LOB0000089848C00002$ VEC 25427968
SYS_LOB0000090049C00002$ VEC 2359296
SYS_LOB0000090053C00003$ VEC 520093696
T VEC 131072
VECTOR$T_IX$89848_90048_0$IVF_ VEC 65536
VECTOR$T_IX$89848_90048_0$IVF_ VEC 1020264448
This may change in a future release update – we’re looking at either respecting the tablespace attribute (as per a standard B-tree index) or potentially being able to set preferences (similar to a Text index). In either case, keep an eye on release updates for 23ai.




Leave a reply to iudithd5bf8e4d8d Cancel reply