Loading LOB from a file

Posted by

I observed this idiosyncracy recently when loading some lob from external files using PL/SQL:

First we’ll create a file to be loaded, and ‘wc’ tells us it is 75 bytes in size



SQL> !echo "This is line 1" > /tmp/lobfile
SQL> !echo "This is line 2" >> /tmp/lobfile
SQL> !echo "This is line 3" >> /tmp/lobfile
SQL> !echo "This is line 4" >> /tmp/lobfile
SQL> !echo "This is line 5" >> /tmp/lobfile
SQL> !wc /tmp/lobfile
       5      20      75 /tmp/lobfile

Then create a standard routine to load it into the database


SQL> create or replace directory TMP as '/tmp';

Directory created.

SQL> drop table lob_tab;

Table dropped.

SQL> create table lob_tab ( the_lob clob );

Table created.

SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    amt       number;
  5  begin
  6    l_bfile := bfilename( 'TMP', 'lobfile' );
  7
  8    insert into lob_tab (the_lob)
  9    values ( empty_clob() )
 10    returning the_lob into v_lob;
 11
 12    amt := dbms_lob.getlength( l_bfile );
 13    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 14    dbms_lob.loadfromfile( v_lob, l_bfile ,amt);
 15
 16    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 17    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 18    dbms_lob.fileclose( l_bfile );
 19  end;
 20  /
File length is: 75
Loaded length is: 37   <==== !!!!

PL/SQL procedure successfully completed.

So what has happened to the second half of the LOB ?!

It turns out to be related to the character set. If you’re using UTF8 (as this database is), then loadfromfile must assume the possibility of multibyte characters (since the bfile could be binary). Thus you get two bytes per character and hence “garbage” in the lob.

The workaround is to use sqlldr, or load the lob without using loadfromfile, eg you could load it piecewise:


SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    buf       raw(32767);
  5    vc        varchar2(32767);
  6    amt       number;
  7    v_offset  number := 1;
  8  begin
  9    l_bfile := bfilename( 'TMP', 'lobfile' );
 10
 11    insert into lob_tab (the_lob)
 12    values ( empty_clob() )
 13    returning the_lob into v_lob;
 14
 15    amt := dbms_lob.getlength( l_bfile );
 16    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 17    dbms_lob.read(l_bfile,amt,v_offset,buf);
 18    vc := utl_raw.cast_to_varchar2(buf);
 19    dbms_lob.writeappend(v_lob,amt,vc);
 20
 21    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 22    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 23    dbms_lob.fileclose( l_bfile );
 24  end;
 25  /
File length is: 75
Loaded length is: 75

PL/SQL procedure successfully completed.

Or you could use LOADCLOBFROMFILE with the extra parameters


SQL> declare
  2    v_lob     clob;
  3    l_bfile   bfile;
  4    amt       number;
  5
  6    d_offset number := 1;
  7    s_offset number := 1;
  8    csid     number := 0;
  9    lang     number := 0;
 10    warning  number;
 11  begin
 12    l_bfile := bfilename( 'TMP', 'lobfile' );
 13
 14    insert into lob_tab (the_lob)
 15    values ( empty_clob() )
 16    returning the_lob into v_lob;
 17
 18    amt := dbms_lob.getlength( l_bfile );
 19    dbms_lob.fileopen( l_bfile ,dbms_lob.file_readonly);
 20    dbms_lob.LOADCLOBFROMFILE( v_lob, l_bfile ,amt, d_offset,s_offset,csid, lang,warning );
 21
 22    dbms_output.put_line('File length is: '||dbms_lob.getlength( l_bfile ));
 23    dbms_output.put_line('Loaded length is: '||dbms_lob.getlength(v_lob));
 24    dbms_lob.fileclose( l_bfile );
 25  end;
 26  /
File length is: 75
Loaded length is: 75

PL/SQL procedure successfully completed.

It might look like a bug, but it’s not really – if you look carefully, you’ll see that this behaviour is documented in the DBMS_LOB manual.

3 comments

  1. Hi Connor,
    Appreciate your efforts while sharing plenty of your knowledge to the people in need like me 🙂
    I need your help to fix a problem to split a CLOB variable into multiple CLOB variables by using a common delimiter.
    So far I have been using following code to satisfy my need to read large csv (coming from data warehouse tables) files through plsql and load the data into Oracle tables. This is more likely mimicking SQL Loader concept.

    Since we have limitation to read only 32767 characters for each line using UTL_FILE.FOPEN, we are facing file read error when the length exceeds. So, I am trying to replace UTL_FILE with BFILE. Using BFILE I can open and read entire contents of my csv but, I am unable to chunk the CLOB into multiple CLOB variables. I tried to use DBMS_LOB.SUBSTR but, that function also returns a VARCHAR2 so, I cannot accommodate entire line.

    I am assuming this can be possible if we use LOADCLOBFROMFILE with proper offset and buffer values. But, I couldnt make it work.

    Can you please check and help me if you see a way to read a file (line by line) into a CLOB variable and process further?

    DECLARE
    l_table_name VARCHAR2(30);
    l_file_handle utl_file.file_type;
    l_file_exists BOOLEAN;
    l_file_name VARCHAR2(35) DEFAULT UPPER(l_table_name)||’.csv’;
    l_file_dir VARCHAR2(50) DEFAULT ‘DATA_FILES’;
    l_column_list CLOB DEFAULT NULL;
    l_values_list CLOB DEFAULT NULL;
    l_file_size NUMBER DEFAULT 0;
    l_block_size NUMBER DEFAULT 0;
    l_curr_position NUMBER DEFAULT 0;
    l_list_idx NUMBER DEFAULT 0;
    l_insert_cnt NUMBER DEFAULT 0;

    TYPE l_list_tab
    IS
    TABLE OF CLOB
    INDEX BY BINARY_INTEGER;

    l_list_t l_list_tab;

    — Function to parse values in csv file before we insert into staging tables
    FUNCTION parse_values (p_lob_str CLOB)
    RETURN CLOB
    IS
    l_lob_str CLOB DEFAULT NULL;
    l_column_val CLOB DEFAULT NULL;
    l_values_list CLOB DEFAULT NULL;
    l_search_pos NUMBER DEFAULT 1;
    l_delim_pos NUMBER DEFAULT 0;
    l_search_chr CHAR DEFAULT NULL;
    l_qualifier VARCHAR2(4) DEFAULT NULL;
    l_seq_value NUMBER DEFAULT 0;
    BEGIN
    — To remove new line and line feed characters from given string
    l_lob_str := REPLACE (REPLACE (p_lob_str, CHR(10), ”), CHR(13), ”)||’,’;

    LOOP
    l_search_chr := dbms_lob.substr(l_lob_str, 1, l_search_pos);
    EXIT WHEN dbms_lob.getlength(l_lob_str) = l_file_size;

    — Current position will be 0 for the first line
    IF l_curr_position = 0
    THEN
    utl_file.get_line(l_file_handle, l_column_list);
    l_column_list := REPLACE (REPLACE (l_column_list, CHR(10), ”), CHR(13), ”);
    ELSE
    utl_file.get_line(l_file_handle, l_values_list);
    l_list_idx := l_list_idx + 1;
    l_list_t (l_list_idx) := parse_values(l_values_list);
    END IF;
    END LOOP;

    — Close the file after reading
    utl_file.fclose(l_file_handle);

    — Check if we have both header and lines data available to process
    IF l_list_t.COUNT > 0
    THEN
    FOR idx IN l_list_t.FIRST .. l_list_t.LAST
    LOOP
    EXECUTE IMMEDIATE ‘INSERT INTO ‘||p_table_name||’ (‘||l_column_list||’)
    VALUES (‘||l_list_t(idx)||’)’;
    l_insert_cnt := l_insert_cnt + SQL%ROWCOUNT;
    END LOOP;
    COMMIT;
    log(‘No of Records Inserted Into ‘||p_table_name||’ : ‘||l_insert_cnt);
    ELSE
    log(‘Error: Cannot Process ‘||l_file_name||’ Due to Invalid Data !’);
    END IF;

    ELSE
    dbms_output.put_line(‘Error: ‘||l_file_name||’ is not avaialble under ‘||l_file_dir||’ !’);
    END IF;

    EXCEPTION
    WHEN OTHERS THEN
    log(‘Exception in Main : ‘||SQLERRM);
    utl_file.fclose(l_file_handle);
    ROLLBACK;
    END;

    1. Find the position of your delimiter using DBMS_LOB.INSTR (say this is “x”) and then use that as an input to Use DBMS_LOB.COPY to extract that portion of the clob into another.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.