Most customers I encounter nowadays are generally sticking to the defaults when creating their databases from 12c onwards. This means that they’ll be using the UTF8 class of charactersets when configuring their database. As a quick segue from the topic at hand, I think this is a good thing. Having a UTF8 characterset avoids a lot of hassles later when applications encounter a need to store extended characters, and let’s face it, the moment you need to store someone’s name, then at some stage you are going to hit this.
But back to large objects. It is important to realise that a UTF8 characterset has implications with how LOBs will be stored in the database. In a single byte characterset database, then if you need to store a 1 megabyte text file in a CLOB, then it will consume approximately 1 megabyte. But lets look at what happens when you are using a multi-byte database.
SQL> host dir /s /b c:\temp > c:\temp\file1.txt SQL> host dir c:\temp\file1.txt Volume in drive C is OS Volume Serial Number is 66D0-4AB5 Directory of c:\temp 22/03/2020 07:19 PM 635,750 file1.txt 1 File(s) 635,750 bytes 0 Dir(s) 54,663,430,144 bytes free
So I have a text file of about 630k, and I’ll now store that as a single CLOB in the database (which was built with AL32UTF8)
SQL> select property_name, property_value 2 from database_properties 3 where property_name like 'NLS%'; PROPERTY_NAME PROPERTY_VALUE ------------------------------ ----------------------------- NLS_RDBMS_VERSION 184.108.40.206.0 NLS_NCHAR_CONV_EXCP FALSE NLS_LENGTH_SEMANTICS BYTE NLS_COMP BINARY NLS_DUAL_CURRENCY $ NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_SORT BINARY NLS_DATE_LANGUAGE AMERICAN NLS_DATE_FORMAT DD-MON-RR NLS_CALENDAR GREGORIAN NLS_NUMERIC_CHARACTERS ., NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_CHARACTERSET AL32UTF8 NLS_ISO_CURRENCY AMERICA NLS_CURRENCY $ NLS_TERRITORY AMERICA NLS_LANGUAGE AMERICAN SQL> create tablespace small_extents 2 datafile 'X:\ORACLE\ORADATA\DB19\PDB1\SMALL01.DBF' size 20m 3 extent management local uniform size 128k; Tablespace created. SQL> create table t ( c clob ) 2 lob ( c )store as basicfile 3 tablespace small_extents; Table created. SQL> SQL> declare 2 bf bfile; 3 cb clob; 4 5 dest int := 1; 6 src int := 1; 7 cs int := 0; 8 ctx int := 0; 9 warn int := 0; 10 begin 11 insert into t (c) values (empty_clob()) 12 returning c into cb; 13 14 bf := bfilename('TEMP', 'file1.txt'); 15 dbms_lob.fileopen(bf, dbms_lob.file_readonly); 16 dbms_lob.loadclobfromfile ( 17 dest_lob => cb, 18 src_bfile => bf, 19 amount => dbms_lob.lobmaxsize, 20 dest_offset => dest, 21 src_offset => src, 22 bfile_csid => cs , 23 lang_context => ctx, 24 warning => warn); 25 dbms_lob.fileclose(bf); 26 27 commit; 28 end; 29 / PL/SQL procedure successfully completed.
So the CLOB has been stored. Let us take a look at the size of the CLOB in the database.
SQL> select segment_name, bytes 2 from user_segments 3 where segment_name = 'T' 4 or segment_name = ( select segment_name from user_lobs where table_name = 'T' ) 5 or segment_name = ( select index_name from user_lobs where table_name = 'T' ); SEGMENT_NAME BYTES ------------------------------ ---------- SYS_IL0000082130C00001$ 131072 SYS_LOB0000082130C00001$ 1572864 T 131072 3 rows selected. SQL> SQL> select bytes 2 from user_extents 3 where segment_name = 'SYS_LOB0000082130C00001$'; BYTES ---------- 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 131072 12 rows selected. SQL>
Even without the LOB index, the LOB data is just over double the size of the original text file. This might come as a surprise, but it is documented in the Large Object Guide
CLOB data is stored in a format that is compatible with UCS-2 if the database character set is multibyte
Hence, if you are doing a database sizing estimate, and you intend to store text data in CLOBs, make sure to take this into consideration when planning your size. In case you are thinking that this is an unreasonable “tax” to pay, don’t forget that all we are really doing is catering to the potential of the file containing extended characters. For example, if I open the original file with a text editor and save that file in UTF16 format, then the file size is similar to that of the CLOB.
SQL> host dir c:\temp\file1* Volume in drive C is OS Volume Serial Number is 66D0-4AB5 Directory of c:\temp 22/03/2020 07:19 PM 635,750 file1.txt 22/03/2020 07:33 PM 1,271,500 file1u.txt 2 File(s) 1,907,250 bytes 0 Dir(s) 54,639,943,680 bytes free