A new line on NEWLINE

Posted by

Recently I was doing a simple external table load using a CSV file, but was getting an interesting error. My file looked simple enough


"ID","EMAIL","TIMESTAMP","SUBJECT","STATUS","STATUS_TS"
"2012348048","john@anon.com","05/02/2000","Subject 1","5","09/04/2007"
"2412348048","mike@anon.com","05/02/2000","Subject 1","5","09/16/2002"
"348543169051","sue@anon.com","03/10/2001","Subject 1","5","03/24/2008"
"348396029762","mary@anon.com","03/10/2001","Subject 1","5","03/10/2001"
"1212348047","sam@anon.com","05/02/2000","Subject 1","5","05/02/2000"
"1612348048","vincent@anon.com","05/02/2000","Subject 1,"5","06/02/2006"
...
...

So it should have been a fairly straightforward external table definition to access it


SQL> create table T
  2        ( ID int
  3         ,EMAIL varchar2(255)
  4         ,TIMESTAMP date
  5         ,SUBJECT varchar2(512)
  6         ,STATUS int
  7         ,STATUS_TS date
  8      )
  9      organization external
 10      ( default directory TMP
 11        access parameters
 12        ( records delimited by newline 
 13          fields terminated by ',' optionally enclosed by '"'
 14          ( ID, EMAIL, "TIMESTAMP" DATE 'MM/DD/YYYY', SUBJECT, STATUS, STATUS_TS DATE 'MM/DD/YYYY'
 15          )
 16        )
 17        location ('my_report.csv')
 18    )
 19    ;

Table created.

At which point, things went downhill rapidly 🙂 When I queried the external table, I got the following error


SQL> select * from T;
select * from T;
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04020: found record longer than buffer size supported,
524288, in c:\temp\my_report.csv (offset=1048576)


The clue here is that a record was seen to be longer than allowed. Since I know all my “records” are quite short, it must be the definition of what ends a record that is causing a problem. problem.  In my case, the file had come from a system with a different characterset, and hence the concept of “newline” differs between characterset/target platform. Once I included that information into the external table definition, all was fine.


SQL> create table T
  2        ( ID int
  3         ,EMAIL varchar2(255)
  4         ,TIMESTAMP date
  5         ,SUBJECT varchar2(512)
  6         ,STATUS int
  7         ,STATUS_TS date
  8      )
  9      organization external
 10      ( default directory TMP
 11        access parameters
 12        ( records delimited by '\n' characterset AL32UTF8
 13          fields terminated by ',' optionally enclosed by '"'
 14          ( ID, EMAIL, "TIMESTAMP" DATE 'MM/DD/YYYY', SUBJECT, STATUS, STATUS_TS DATE 'MM/DD/YYYY'
 15          )
 16        )
 17        location ('my_report.csv')
 18    )
 19    ;

Table created.

SQL> select * from T where rownum < 10;

...

16,308 rows selected.


One comment

  1. The power of the Oracle’s external table feature is just awesome. No one in my organization believed that I could take files from an IBM Mainframe (EBCDIC), binary sftp them to an HP-UX server (ASCII) and load them via an external table definition using the ORACEL_LOADER access driver and the proper characterset like they were plain text files. This was in 11R2.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.