Loading LOB from a file

March 1, 2016

2 minutes

I observed this idiosyncracy recently when loading some lob from external files using PL/SQL:

First we’ll create a file to be loaded, and ‘wc’ tells us it is 75 bytes in size



SQL> !echo "This is line 1" > /tmp/lobfile
SQL> !echo "This is line 2" >> /tmp/lobfile
SQL> !echo "This is line 3" >> /tmp/lobfile
SQL> !echo "This is line 4" >> /tmp/lobfile
SQL> !echo "This is line 5" >> /tmp/lobfile
SQL> !wc /tmp/lobfile
       5      20      75 /tmp/lobfile

Then create a standard routine to load it into the database


SQL> create or replace directory TMP as '/tmp';

Directory created.

SQL> drop table lob_tab;

Table dropped.

SQL> create table lob_tab ( the_lob clob );

Table created.

SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    amt       number;
  5  begin
  6    l_bfile := bfilename( 'TMP', 'lobfile' );
  7
  8    insert into lob_tab (the_lob)
  9    values ( empty_clob() )
 10    returning the_lob into v_lob;
 11
 12    amt := dbms_lob.getlength( l_bfile );
 13    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 14    dbms_lob.loadfromfile( v_lob, l_bfile ,amt);
 15
 16    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 17    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 18    dbms_lob.fileclose( l_bfile );
 19  end;
 20  /
File length is: 75
Loaded length is: 37   <==== !!!!

PL/SQL procedure successfully completed.

So what has happened to the second half of the LOB ?!

It turns out to be related to the character set. If you’re using UTF8 (as this database is), then loadfromfile must assume the possibility of multibyte characters (since the bfile could be binary). Thus you get two bytes per character and hence “garbage” in the lob.

The workaround is to use sqlldr, or load the lob without using loadfromfile, eg you could load it piecewise:


SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    buf       raw(32767);
  5    vc        varchar2(32767);
  6    amt       number;
  7    v_offset  number := 1;
  8  begin
  9    l_bfile := bfilename( 'TMP', 'lobfile' );
 10
 11    insert into lob_tab (the_lob)
 12    values ( empty_clob() )
 13    returning the_lob into v_lob;
 14
 15    amt := dbms_lob.getlength( l_bfile );
 16    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 17    dbms_lob.read(l_bfile,amt,v_offset,buf);
 18    vc := utl_raw.cast_to_varchar2(buf);
 19    dbms_lob.writeappend(v_lob,amt,vc);
 20
 21    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 22    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 23    dbms_lob.fileclose( l_bfile );
 24  end;
 25  /
File length is: 75
Loaded length is: 75

PL/SQL procedure successfully completed.

Or you could use LOADCLOBFROMFILE with the extra parameters


SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    amt       number;
  5
  6    d_offset number := 1;
  7    s_offset number := 1;
  8    csid     number := 0;
  9    lang     number := 0;
 10    warning  number;
 11  begin
 12    l_bfile := bfilename( 'TMP', 'lobfile' );
 13
 14    insert into lob_tab (the_lob)
 15    values ( empty_clob() )
 16    returning the_lob into v_lob;
 17
 18    amt := dbms_lob.getlength( l_bfile );
 19    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 20    dbms_lob.LOADCLOBFROMFILE( v_lob, l_bfile ,amt, d_offset,s_offset,csid, lang,warning );
 21
 22    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 23    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 24    dbms_lob.fileclose( l_bfile );
 25  end;
 26  /
File length is: 75
Loaded length is: 75

PL/SQL procedure successfully completed.

It might look like a bug, but it’s not really – if you look carefully, you’ll see that this behaviour is documented in the DBMS_LOB manual.

3 responses to “Loading LOB from a file”

Tim Hall

April 26, 2018 at 1:42 am

Hi.

LOADFROMFILE is actually deprecated from 12.2 onward.

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/arpls/DBMS_LOB.html#GUID-25F5EE35-1DB7-4F96-AB87-F81700484932

I only know this because someone pointed it out to me a few weeks ago, and I was forced to go back and refactor a bunch of articles… 🙂

Cheers

Tim…

Reply
Pradeep

March 2, 2020 at 3:46 am

Hi Connor,
Appreciate your efforts while sharing plenty of your knowledge to the people in need like me 🙂
I need your help to fix a problem to split a CLOB variable into multiple CLOB variables by using a common delimiter.
So far I have been using following code to satisfy my need to read large csv (coming from data warehouse tables) files through plsql and load the data into Oracle tables. This is more likely mimicking SQL Loader concept.

Since we have limitation to read only 32767 characters for each line using UTL_FILE.FOPEN, we are facing file read error when the length exceeds. So, I am trying to replace UTL_FILE with BFILE. Using BFILE I can open and read entire contents of my csv but, I am unable to chunk the CLOB into multiple CLOB variables. I tried to use DBMS_LOB.SUBSTR but, that function also returns a VARCHAR2 so, I cannot accommodate entire line.

I am assuming this can be possible if we use LOADCLOBFROMFILE with proper offset and buffer values. But, I couldnt make it work.

Can you please check and help me if you see a way to read a file (line by line) into a CLOB variable and process further?

DECLARE
l_table_name VARCHAR2(30);
l_file_handle utl_file.file_type;
l_file_exists BOOLEAN;
l_file_name VARCHAR2(35) DEFAULT UPPER(l_table_name)||’.csv’;
l_file_dir VARCHAR2(50) DEFAULT ‘DATA_FILES’;
l_column_list CLOB DEFAULT NULL;
l_values_list CLOB DEFAULT NULL;
l_file_size NUMBER DEFAULT 0;
l_block_size NUMBER DEFAULT 0;
l_curr_position NUMBER DEFAULT 0;
l_list_idx NUMBER DEFAULT 0;
l_insert_cnt NUMBER DEFAULT 0;

TYPE l_list_tab
IS
TABLE OF CLOB
INDEX BY BINARY_INTEGER;

l_list_t l_list_tab;

— Function to parse values in csv file before we insert into staging tables
FUNCTION parse_values (p_lob_str CLOB)
RETURN CLOB
IS
l_lob_str CLOB DEFAULT NULL;
l_column_val CLOB DEFAULT NULL;
l_values_list CLOB DEFAULT NULL;
l_search_pos NUMBER DEFAULT 1;
l_delim_pos NUMBER DEFAULT 0;
l_search_chr CHAR DEFAULT NULL;
l_qualifier VARCHAR2(4) DEFAULT NULL;
l_seq_value NUMBER DEFAULT 0;
BEGIN
— To remove new line and line feed characters from given string
l_lob_str := REPLACE (REPLACE (p_lob_str, CHR(10), ”), CHR(13), ”)||’,’;

LOOP
l_search_chr := dbms_lob.substr(l_lob_str, 1, l_search_pos);
EXIT WHEN dbms_lob.getlength(l_lob_str) = l_file_size;

— Current position will be 0 for the first line
IF l_curr_position = 0
THEN
utl_file.get_line(l_file_handle, l_column_list);
l_column_list := REPLACE (REPLACE (l_column_list, CHR(10), ”), CHR(13), ”);
ELSE
utl_file.get_line(l_file_handle, l_values_list);
l_list_idx := l_list_idx + 1;
l_list_t (l_list_idx) := parse_values(l_values_list);
END IF;
END LOOP;

— Close the file after reading
utl_file.fclose(l_file_handle);

— Check if we have both header and lines data available to process
IF l_list_t.COUNT > 0
THEN
FOR idx IN l_list_t.FIRST .. l_list_t.LAST
LOOP
EXECUTE IMMEDIATE ‘INSERT INTO ‘||p_table_name||’ (‘||l_column_list||’)
VALUES (‘||l_list_t(idx)||’)’;
l_insert_cnt := l_insert_cnt + SQL%ROWCOUNT;
END LOOP;
COMMIT;
log(‘No of Records Inserted Into ‘||p_table_name||’ : ‘||l_insert_cnt);
ELSE
log(‘Error: Cannot Process ‘||l_file_name||’ Due to Invalid Data !’);
END IF;

ELSE
dbms_output.put_line(‘Error: ‘||l_file_name||’ is not avaialble under ‘||l_file_dir||’ !’);
END IF;

EXCEPTION
WHEN OTHERS THEN
log(‘Exception in Main : ‘||SQLERRM);
utl_file.fclose(l_file_handle);
ROLLBACK;
END;

Reply
1. Connor McDonald
  
  June 10, 2020 at 7:55 am
  
  Find the position of your delimiter using DBMS_LOB.INSTR (say this is “x”) and then use that as an input to Use DBMS_LOB.COPY to extract that portion of the clob into another.
  
  Reply