Take care with regular expressions

Posted by

In an Office Hours session a couple of months back, I covered an important change that comes to regular expressions once you upgrade to 12c Release 2. You can see the video covering the issue here:

but for the TL;DR brigade reading this post: Regular expressions are not deterministic when you take NLS settings into account and thus cannot be used in constraints and/or function-based indexes.

This is just a post to quickly revisit the topic for anyone thinking of upgrading from an earlier release to 12c Release 2. An AskTOM question came in asking what would happen to such constraints during the upgrade process.

The sad answer is … nothing. For example, if you successfully (and by strict definition, incorrectly) created a constraint with a regular expression in 11g, then after the upgrade, that constraint will still be present in your new 12c Release 2 system. It will continue to work as it did in 11g, and even if you disable/enable it, or put it through a validate command to exercise the data, it will work as it did before.

To be honest, I do not like this, because it can become what I call a “sleeper” problem. If, 6 months after you upgrade, you happen to drop and recreate that constraint you’ll be most distressed to find that it cannot be added, and you will have most probably long forgotten that it was caused by an event that occurred 6 months ago, namely the upgrade. And perhaps worse, you now have an index or constraint that could potentially be corrupted by innocent tinkering with session NLS settings.

So before you upgrade, definitely take a cursory glance through your constraint definitions and take remedial action if needed.

8 comments

  1. It isn’t just regular expressions that throw up some funnies with NLS settings.
    Having function based indexes with a TO_CHAR over a date can throw a wobbler too.

    create table date_format_test (id number not null, date_val date not null);

    drop index date_format_test_ix1 ;
    create index date_format_test_ix1 on date_format_test (to_char(date_val,’DD-Mon-YYYY’));

    truncate table date_format_test;

    alter session set nls_date_format=’DD-Mon-YYYY’;
    alter session set nls_calendar=’Gregorian’;

    alter session set nls_language = ‘English’;
    insert into date_format_test (id, date_val) values (1, date ‘2018-12-01’);

    alter session set nls_language = ‘French’;
    insert into date_format_test (id, date_val) values (2, date ‘2018-12-01’);

    alter session set nls_language = ‘German’;
    insert into date_format_test (id, date_val) values (3, date ‘2018-12-01′);

    select /*+ INDEX s*/ to_char(date_val,’DD-Mon-YYYY’) from date_format_test s;

    TO_CHAR(DATE_VAL,’DD-MON-YYYY’)
    01-Dec-2018
    01-Dez-2018
    01-Déc. -2018

    Even purely numeric date masks can be non-deterministic if you want to try the Persian nls_calendar setting.

      1. Technically, the TO_CHAR can’t be made deterministic because you can only supply the language as an NLS parameter and not the calendar (which can impact the YYYY result) or the territory (which can impact the single digit D format due to disagreement on when the week starts).

        If Oracle go hardline on not allowing non-deterministic expressions, then TO_CHAR (and conversely TO_DATE) would need to be disallowed. I’ll weigh up using only user-defined functions rather than rely on what Oracle may do with built-ins.

        Anyway, thanks for the timely heads up. Got our 12.1 to 12.2 migration next week and was digging through our DBA_TAB_COLS yesterday.

  2. Hi Chris,

    Interesting post and with the explanation provided it actually makes good sense to not allow REGEXP functions in check constraints. Having a few of these in our system, I went on to try it out in our databases. I was a bit surprised that I was allowed to make this table (and check constraint) on 12.2 and 18:

    create table tab1(col1 varchar2(10) check(regexp_like(col1, ‘^[a-zA-Z]+$’)));

    In your video it is said that REGEXP_LIKE is affected as well. If that is the case, why is the above allowed. What am I missing here?

    1. Yeah…that’s a bit nasty…

      SQL> create table t ( x varchar2(100),
      2 constraint CHK check ( regexp_like(x,'[^A-Z]’)));

      Table created.

      SQL>
      SQL> alter session set nls_language = ‘AMERICAN’;

      Session altered.

      SQL> insert into t values (‘AAAÖXXX’);

      1 row created.

      SQL> alter session set nls_language = ‘GERMAN’;

      Session wurde geändert.

      SQL> insert into t values (‘AAAÖXXX’);
      insert into t values (‘AAAÖXXX’)
      *
      FEHLER in Zeile 1:
      ORA-02290: CHECK-Constraint (MCDONAC.CHK) verletzt

      1. Is there any reason why REGEXP_LIKE is allowed in creation of a check constraint? If I understand your video correct, it should not be allowed to use REGEXP_LIKE.

        Is it an oversight by development, or should we expect that it will be disallowed in future releases?

  3. Hello Connor,

    Sorry for this late comment on the same issue.

    First of all, using REGEXP_LIKE in check constraints is still allowed in 19c.

    But, regardless of this, I think that LiveSQL does have a problem with using session level language settings,
    so, when trying to reproduce in LiveSQL the test case from your last comment, the behavior is different:

    create table t ( x varchar2(100),
    constraint CHK check ( regexp_like(x,'[^A-Z]’)));

    — Table created.

    alter session set nls_language = AMERICAN;

    — Statement processed.

    insert into t values (‘AAAÖXXX’);

    — 1 row(s) inserted.

    alter session set nls_language = GERMAN;

    — Statement processed.

    insert into t values (‘AAAÖXXX’);

    — 1 row(s) inserted.

    select * from t
    /

    X
    ———————
    AAAÖXXX
    AAAÖXXX

    2 rows selected.

    Looks like in LiveSQL, setting nls_language at the session level DOES NOT have any influence on the sorting order:

    alter session set nls_language = GERMAN;

    select * from dual
    where ‘Ö’ between ‘A’ and ‘Z’
    /

    no data found

    I also tried using NLS_COMP and NLS_SORT, with the same result:

    alter session set nls_comp = LINGUISTIC;
    alter session set nls_sort = GERMAN;

    select * from dual
    where ‘Ö’ between ‘A’ and ‘Z’
    /

    no data found

    Only the following seems to work:

    select * from dual
    where ‘Ö’ between ‘A’ and ‘Z’ COLLATE GERMAN
    /

    DUMMY
    ———
    X

    Setting the NLS_LANGUAGE at session level does however influence the error messages:

    alter session set nls_language = german;

    select * from non_existing_table;

    ORA-00942: Tabelle oder View nicht vorhanden

    But, these tests lead me to discover the following even stranger behavior:

    alter session set nls_language = american;
    alter session set nls_comp = binary;
    alter session set nls_sort = binary;

    alter table t
    drop constraint chk
    /

    — Table altered.

    alter table t
    modify x collate german
    /

    — Table altered.

    alter table t
    add constraint CHK check ( regexp_like(x,'[^A-Z]’) )
    /

    — Table altered.

    select * from t
    where x = ‘AAAÖXXX’
    /

    X
    ——————-
    AAAÖXXX
    AAAÖXXX

    2 rows selected.

    insert into t values (‘AAAÖXXX’);

    ORA-02290: check constraint (SQL_YAKAQKTFVITSDZSGRYUPUMWVK.CHK) violated
    ORA-06512: at “SYS.DBMS_SQL”, line 1721

    So, I could change the column collation and recreate the constraint with existing data that violates it, without error,
    however, trying to insert identical data now fails.

    And an additional question:

    If we use different language settings for NLS_LANGUAGE and NLS_SORT, is it correct that
    in linguistic sorting (set by NLS_COMP=LINGUISTIC) the NLS_SORT setting will override the NLS_LANGUAGE setting ?

    Thanks a lot in advance if you can check this weird LiveSQL behavior, and, maybe also an Oracle bug.

    Cheers & Best Regards,
    Iudith

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.