Most customers I encounter nowadays are generally sticking to the defaults when creating their databases from 12c onwards. This means that they’ll be using the UTF8 class of charactersets when configuring their database. As a quick segue from the topic at hand, I think this is a good thing. Having a UTF8 characterset avoids a lot of hassles later when applications encounter a need to store extended characters, and let’s face it, the moment you need to store someone’s name, then at some stage you are going to hit this.
But back to large objects. It is important to realise that a UTF8 characterset has implications with how LOBs will be stored in the database. In a single byte characterset database, then if you need to store a 1 megabyte text file in a CLOB, then it will consume approximately 1 megabyte. But lets look at what happens when you are using a multi-byte database.
SQL> host dir /s /b c:\temp > c:\temp\file1.txt
SQL> host dir c:\temp\file1.txt
Volume in drive C is OS
Volume Serial Number is 66D0-4AB5
Directory of c:\temp
22/03/2020 07:19 PM 635,750 file1.txt
1 File(s) 635,750 bytes
0 Dir(s) 54,663,430,144 bytes free
So I have a text file of about 630k, and I’ll now store that as a single CLOB in the database (which was built with AL32UTF8)
SQL> select property_name, property_value
2 from database_properties
3 where property_name like 'NLS%';
PROPERTY_NAME PROPERTY_VALUE
------------------------------ -----------------------------
NLS_RDBMS_VERSION 19.0.0.0.0
NLS_NCHAR_CONV_EXCP FALSE
NLS_LENGTH_SEMANTICS BYTE
NLS_COMP BINARY
NLS_DUAL_CURRENCY $
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_SORT BINARY
NLS_DATE_LANGUAGE AMERICAN
NLS_DATE_FORMAT DD-MON-RR
NLS_CALENDAR GREGORIAN
NLS_NUMERIC_CHARACTERS .,
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET AL32UTF8
NLS_ISO_CURRENCY AMERICA
NLS_CURRENCY $
NLS_TERRITORY AMERICA
NLS_LANGUAGE AMERICAN
SQL> create tablespace small_extents
2 datafile 'X:\ORACLE\ORADATA\DB19\PDB1\SMALL01.DBF' size 20m
3 extent management local uniform size 128k;
Tablespace created.
SQL> create table t ( c clob )
2 lob ( c )store as basicfile
3 tablespace small_extents;
Table created.
SQL>
SQL> declare
2 bf bfile;
3 cb clob;
4
5 dest int := 1;
6 src int := 1;
7 cs int := 0;
8 ctx int := 0;
9 warn int := 0;
10 begin
11 insert into t (c) values (empty_clob())
12 returning c into cb;
13
14 bf := bfilename('TEMP', 'file1.txt');
15 dbms_lob.fileopen(bf, dbms_lob.file_readonly);
16 dbms_lob.loadclobfromfile (
17 dest_lob => cb,
18 src_bfile => bf,
19 amount => dbms_lob.lobmaxsize,
20 dest_offset => dest,
21 src_offset => src,
22 bfile_csid => cs ,
23 lang_context => ctx,
24 warning => warn);
25 dbms_lob.fileclose(bf);
26
27 commit;
28 end;
29 /
PL/SQL procedure successfully completed.
So the CLOB has been stored. Let us take a look at the size of the CLOB in the database.
SQL> select segment_name, bytes
2 from user_segments
3 where segment_name = 'T'
4 or segment_name = ( select segment_name from user_lobs where table_name = 'T' )
5 or segment_name = ( select index_name from user_lobs where table_name = 'T' );
SEGMENT_NAME BYTES
------------------------------ ----------
SYS_IL0000082130C00001$ 131072
SYS_LOB0000082130C00001$ 1572864
T 131072
3 rows selected.
SQL>
SQL> select bytes
2 from user_extents
3 where segment_name = 'SYS_LOB0000082130C00001
Even without the LOB index, the LOB data is just over double the size of the original text file. This might come as a surprise, but it is documented in the Large Object Guide
CLOB data is stored in a format that is compatible with UCS-2 if the database character set is multibyte
Hence, if you are doing a database sizing estimate, and you intend to store text data in CLOBs, make sure to take this into consideration when planning your size. In case you are thinking that this is an unreasonable “tax” to pay, don’t forget that all we are really doing is catering to the potential of the file containing extended characters. For example, if I open the original file with a text editor and save that file in UTF16 format, then the file size is similar to that of the CLOB.
SQL> host dir c:\temp\file1*
Volume in drive C is OS
Volume Serial Number is 66D0-4AB5
Directory of c:\temp
22/03/2020 07:19 PM 635,750 file1.txt
22/03/2020 07:33 PM 1,271,500 file1u.txt
2 File(s) 1,907,250 bytes
0 Dir(s) 54,639,943,680 bytes free
;
BYTES
———-
131072
131072
131072
131072
131072
131072
131072
131072
131072
131072
131072
131072
12 rows selected.
SQL>Even without the LOB index, the LOB data is just over double the size of the original text file. This might come as a surprise, but it is documented in the Large Object Guide
CLOB data is stored in a format that is compatible with UCS-2 if the database character set is multibyte
Hence, if you are doing a database sizing estimate, and you intend to store text data in CLOBs, make sure to take this into consideration when planning your size. In case you are thinking that this is an unreasonable “tax” to pay, don’t forget that all we are really doing is catering to the potential of the file containing extended characters. For example, if I open the original file with a text editor and save that file in UTF16 format, then the file size is similar to that of the CLOB.