The AskTOM data model

I popped out a tweet yesterday in Throwback Thursday style showing the date of the first question we took AskTOM – 18 years ago! Many entire computer systems don’t even last that long, and AskTOM hasn’t really needed to change that much in those 18 years. We’ve added a new skin, added the ability to have multiple AskTOM experts on the answer team, and of course, our new-for-2018 Office Hours program, which gives everyone free access to experts inside Oracle Corporation.

John replied to my tweet asking if we could show the ERD.

image

So here it is – I’ve taken out a few things here and there that don’t have any impact on the public facing part of AskTOM. And I’ve included some notes underneath about each table’s purpose in AskTOM.

Relational_1

ATE_ADMINS

The administrators on AskTOM. There are various privilege levels such as being able to see the admin pages, view unanswered questions and actually answer them.

ATE_HOME_MESSAGES

Any messages that we want to appear on the Questions home screen, such as upcoming conferences etc.

ATE_SUPPORTING_FILES

Files that support the answering of questions. We’ll sometimes link in scripts, images etc to better answer a question. They are all stored in here, in the database of course.

ATE_PRESENTATIONS

The presentations and other resources that you can download from the Resources page.

image

ATE_FEEDBACK

Our recently added facility to allow people to give us feedback on the AskTOM application.

image

ATE_SUBMITTED_QUESTIONS

The bread and butter of AskTOM. Your questions are stored here, along with our answer.

ATE_POSTING_STATUS

image

When you submit a question, we record it as “New, Never Read”. Once we start working on it, we will then mark it as “Read, Not Answered” so that another AskTOM administrator does not work on it. We might then have a little back-and-forth with our original submitter if we need more detail (which is a polite way of saying they didn’t read the Question Guidelines, and hence we didn’t get a test case or sample data or nicely formatted code etc etc) Smile. So that is “Need More Info” and “More Info Supplied” statuses.

Once we’ve answered the question, if we think the general community will benefit it will be “Answered Publish” which puts it up on the home page, otherwise its “Answered, Not Published”, which means the original submitter will see it, but no-one else. As you can see from the numbers in the picture (taken today), for the 18,500 questions you can see on AskTOM, there’s another 5,500 we’ve answered that don’t make the front page.

“No Further Action” is what happens when we ask for more information and the submitter decides to ghost us, or if a submitter decides the best way to communicate with us is via a torrent of insults. Either way, we ain’t bothering with that question again Smile

ATE_QUESTION_REVIEWS

The review comments from the community, and our follow-ups for each question that we answer.

image

ATE_REVIEW_STATUSES

We read every single review that comes into AskTOM. They come in with a status of “New”, we’ll “Assign” them to an administrator, and once we’re done with it, we mark it “Complete”. Yup, that’s over 125,000 reviews we’ve taken.

image

ATE_QUESTION_STATUS_HIST

We track our own performance on AskTOM so that we can better serve you all. So every time we change a question status, eg from “New” to “Need More Info” to “Answered” etc, we log the time at which the status change occurred so we can review it later to see how we’re tracking.

ATE_OBSOLETE_QUESTIONS

18 years of taking and answering questions means that some of the answers may no longer be correct or up to date. If a reviewer brings this to our attention, we’ll record it here as an action for an administrator to look at and rectify. If you’re wondering why it is not just a flag on the question, that’s because a single question may become obsolete more than once in as time marches on.

image

ATE_QUESTION_VIEWS

Every time you click on a question in AskTOM, we record that fact. Why? Because that drives the metrics on knowing which questions are most sought after by the community. We get hundreds of thousands of question views every week in AskTOM.

ATE_QUESTIONS_TOP20

This drives our “Popular Questions” page on AskTOM.

image

ATE_QUESTIONS_SOCIAL_MEDIA

Whenever we tweet out about a particular AskTOM question, or pop it on Facebook, we log it here – because we don’t want to be spamming you. Smile

And that’s all there is to it. All driven by SQL, PL/SQL and of course the latest and greatest version of Application Express.

Please keep your foreign keys

I came across an interesting blog post the other day about whether databases should be (declaratively) enforcing the foreign key relationships between tables.  The blog post discussed the issue of foreign keys being temporarily disabled to perform data loading, and then encountering the problem of what to do when those foreign keys cannot be re-enabled due to bad data.  Perhaps they should just be omitted altogether ?  I don’t want to put words in the author’s mouth, because he stressed he was not taking sides in the “should we” or “shouldn’t we” debate on declarative foreign keys, but the concluding part of the blog was:

image

I find that is a bit like saying:

“Well, seat belts are compulsory but people are still getting injured in car accidents, so you may as well just not bother with seat belts at all”

So here’s some information from the other perspective – why constraints are so important for your database in terms of both data correctness and performance.

Firstly here is an AskTOM magazine article that demonstrates the benefits of declarative constraints to both the optimizer, and to the availability of technologies such as materialized view rewrite.

Also, to respond to the particular topic in the blog post about handling data loads, here is a video from the Real World Performance group about how you can validate massive sets of data for referential integrity, and then still apply the constraints efficiently to give the optimizer all those benefits for subsequent query processing.

But if all of that seems to complicated to read and digest, perhaps an easier way of understanding the importance of constraints is to watch my simple video about Australian wildlife Smile

Data denormalization … another take

I read an interesting article the other day about data modelling and data denormalization.

I’ll paraphrase the topic and requirement here (apologies to the original author for any omissions or excessive brevity).

We have a fictional application with chat rooms, people subscribing to those chat rooms, and posting messages in the chat rooms.  To satisfy this we have the following data model

image

So for general queries, logging new messages etc, that’s all fine.  But an additional requirement is present.  When user “X” logs on to the application, they want to see how many rooms that they subscribe to have unread messages, where “unread” is defined as: there is a message in the MSGS table for that room with a timestamp more recent than the LAST_VIEWED timestamp for that user/room combination in ROOM_USERS.

Now if the MSGS table has millions of records, and users check them infrequently, then a query along the lines of:


select count(distinct m.room_id)
from   MSGS m,
       ROOM_USERS ru
where  m.tstamp > ru.last_viewed
and    ru.user_id = ...

is perhaps a scalability threat.  For the sake of this discussion, let’s assume it is.

The article then gets onto the topic of denormalization, and validly makes the point that this is a non-trivial task.  I’m not disputing any of that content.  The desired functionality is to be able to cheaply identify the number of unread rooms without scanning the MSGS table.

But there is also this statement

Oracle’s materialized views seem to offer some of the most advanced functionality out there, but are still very limited.
There’s no database in the market today that can implement our desired definition of User.numUnreadFields via a materialized view, as far as I know.

So I figured I would try to help out Smile

First let’s build some objects to fit out model




SQL> create table users ( user_id int primary key, uname varchar2(20));

Table created.

SQL> create table rooms ( room_id int primary key, rname varchar2(20));

Table created.

SQL> create table room_users (
  2    room_id int ,
  3    user_id int,
  4    last_viewed date not null,
  5    constraint room_users_pk primary key ( user_id, room_id ),
  6    constraint room_users_fk_users foreign key ( user_id  ) references users ( user_id ),
  7    constraint room_users_fk_rooms foreign key ( room_id  ) references rooms ( room_id )
  8    );

Table created.

SQL> create table msgs (
  2    msg_id int primary key,
  3    sender_user_id int not null,
  4    room_id int not null,
  5    tstamp     date not null,
  6    text varchar2(100) not null,
  7    constraint msgs_fk_membership foreign key ( sender_user_id ,room_id )
  8      references room_users ( user_id , room_id)
  9  );

Table created.

SQL> create index msgs_ix1 on msgs ( room_id );

Index created.

SQL> create index msgs_ix2 on msgs ( sender_user_id );

Index created.

SQL> create index msgs_ix3 on msgs ( tstamp );

Index created.

SQL> insert into users
  2  select rownum, 'User '||rownum
  3  from dual
  4  connect by level <= 100; 100 rows created. SQL> insert into rooms
  2  select 10000+rownum, 'Room '||(10000+rownum)
  3  from dual
  4  connect by level <= 50; 50 rows created. SQL> insert into room_users
  2  select 10000+rownum, rownum, trunc(sysdate) from dual connect by level <= 10; 10 rows created. SQL> insert into room_users
  2  select 10000+rownum, rownum+2, trunc(sysdate-2) from dual connect by level <= 10; 10 rows created. SQL> insert into room_users
  2  select 10000+rownum, rownum+7, trunc(sysdate-7) from dual connect by level <= 10;

10 rows created.

So I’ve created some objects, put some constraints and typical indexes that I might expect to see in such an application. Here’s our users, rooms and the subscriptions



SQL> select * from users;

   USER_ID UNAME
---------- --------------------
         1 User 1
         2 User 2
         3 User 3
         4 User 4
         5 User 5
         6 User 6
         7 User 7
         8 User 8
         9 User 9
        10 User 10
        11 User 11
        12 User 12
        13 User 13
        14 User 14
        15 User 15
        16 User 16
        17 User 17
        18 User 18
        19 User 19
        20 User 20
        21 User 21
        22 User 22
        23 User 23
        24 User 24
        25 User 25
        26 User 26
        27 User 27
        28 User 28
        29 User 29
        30 User 30
        31 User 31
        32 User 32
        33 User 33
        34 User 34
        35 User 35
        36 User 36
        37 User 37
        38 User 38
        39 User 39
        40 User 40
        41 User 41
        42 User 42
        43 User 43
        44 User 44
        45 User 45
        46 User 46
        47 User 47
        48 User 48
        49 User 49
        50 User 50
        51 User 51
        52 User 52
        53 User 53
        54 User 54
        55 User 55
        56 User 56
        57 User 57
        58 User 58
        59 User 59
        60 User 60
        61 User 61
        62 User 62
        63 User 63
        64 User 64
        65 User 65
        66 User 66
        67 User 67
        68 User 68
        69 User 69
        70 User 70
        71 User 71
        72 User 72
        73 User 73
        74 User 74
        75 User 75
        76 User 76
        77 User 77
        78 User 78
        79 User 79
        80 User 80
        81 User 81
        82 User 82
        83 User 83
        84 User 84
        85 User 85
        86 User 86
        87 User 87
        88 User 88
        89 User 89
        90 User 90
        91 User 91
        92 User 92
        93 User 93
        94 User 94
        95 User 95
        96 User 96
        97 User 97
        98 User 98
        99 User 99
       100 User 100

100 rows selected.

SQL> select * from rooms;

   ROOM_ID RNAME
---------- --------------------
     10001 Room 10001
     10002 Room 10002
     10003 Room 10003
     10004 Room 10004
     10005 Room 10005
     10006 Room 10006
     10007 Room 10007
     10008 Room 10008
     10009 Room 10009
     10010 Room 10010
     10011 Room 10011
     10012 Room 10012
     10013 Room 10013
     10014 Room 10014
     10015 Room 10015
     10016 Room 10016
     10017 Room 10017
     10018 Room 10018
     10019 Room 10019
     10020 Room 10020
     10021 Room 10021
     10022 Room 10022
     10023 Room 10023
     10024 Room 10024
     10025 Room 10025
     10026 Room 10026
     10027 Room 10027
     10028 Room 10028
     10029 Room 10029
     10030 Room 10030
     10031 Room 10031
     10032 Room 10032
     10033 Room 10033
     10034 Room 10034
     10035 Room 10035
     10036 Room 10036
     10037 Room 10037
     10038 Room 10038
     10039 Room 10039
     10040 Room 10040
     10041 Room 10041
     10042 Room 10042
     10043 Room 10043
     10044 Room 10044
     10045 Room 10045
     10046 Room 10046
     10047 Room 10047
     10048 Room 10048
     10049 Room 10049
     10050 Room 10050

50 rows selected.

SQL> select * from room_users;

   ROOM_ID    USER_ID LAST_VIEW
---------- ---------- ---------
     10001          1 06-OCT-16
     10002          2 06-OCT-16
     10003          3 06-OCT-16
     10004          4 06-OCT-16
     10005          5 06-OCT-16
     10006          6 06-OCT-16
     10007          7 06-OCT-16
     10008          8 06-OCT-16
     10009          9 06-OCT-16
     10010         10 06-OCT-16
     10001          3 04-OCT-16
     10002          4 04-OCT-16
     10003          5 04-OCT-16
     10004          6 04-OCT-16
     10005          7 04-OCT-16
     10006          8 06-OCT-16
     10007          9 04-OCT-16
     10008         10 04-OCT-16
     10009         11 04-OCT-16
     10010         12 04-OCT-16
     10001          8 06-OCT-16
     10002          9 29-SEP-16
     10003         10 29-SEP-16
     10004         11 29-SEP-16
     10005         12 29-SEP-16
     10006         13 29-SEP-16
     10007         14 29-SEP-16
     10008         15 29-SEP-16
     10009         16 29-SEP-16
     10010         17 29-SEP-16

30 rows selected.

SQL>

Now lets add some messages, with timestamps spanning the last month or so


SQL> insert into msgs
  2  select rownum,
  3         mod(rownum,10)+1,
  4         10000+mod(rownum,10)+1,
  5         sysdate-32+rownum/10,
  6         'text '||rownum
  7  from dual
  8  connect by level <= 300; 300 rows created. SQL>
SQL>
SQL> insert into msgs
  2  select 1000+rownum,
  3         2+mod(rownum,10)+1,
  4         10000+mod(rownum,10)+1,
  5         sysdate-32+rownum/10,
  6         'text '||(rownum+1000)
  7  from dual
  8  connect by level <= 300; 300 rows created. SQL>
SQL>
SQL> insert into msgs
  2  select 2000+rownum,
  3         7+mod(rownum,10)+1,
  4         10000+mod(rownum,10)+1,
  5         sysdate-32+rownum/10,
  6         'text '||(rownum+2000)
  7  from dual
  8  connect by level <= 300;

300 rows created.

Let’s now see if we can get a materialized view to help us out. Obviously we want this view to be fast refreshable, so first we’ll put materialized logs on the appropriate tables


SQL> create materialized view log on msgs
  2  with primary key, rowid,
  3     sequence (room_id, sender_user_id,tstamp) including new values;

Materialized view log created.

SQL> create materialized view log on room_users
  2  with primary key, rowid,
  3     sequence (last_viewed) including new values;

Materialized view log created.

Now lets add a materialized view to capture the information we need


SQL> create materialized view MSGS_MV refresh fast on commit as
  2  select
  3    ru.room_id,
  4    ru.user_id,
  5    ru.last_viewed,
  6    count(*) msg_count,
  7    count(case when m.tstamp > ru.last_viewed then 1 end ) has_unread
  8  from msgs m,
  9       room_users ru
 10  where m.room_id = ru.room_id
 11  group by
 12    ru.room_id,
 13    ru.user_id,
 14    ru.last_viewed;

Materialized view created.

So we’ve got a row for each room/user pairing, plus we are bringing the LAST_VIEWED attribute along for the ride. The COUNT(*) is not needed by our application, but is required to make the materialized view fast refreshable. The second COUNT is a count of the unread messages in this room for this user. Note that this is NOT the requirement satisfied yet, because we are looking for a ROOM count, not a MESSAGE count. But lets look at this view in action now.


SQL> select
  2       mv.room_id,
  3       mv.user_id,
  4       mv.last_viewed,
  5       case when mv.has_unread > 0 then 'Y' end unread,
  6       r.rname,
  7       u.uname
  8  from MSGS_MV mv,
  9       users u,
 10       rooms r
 11  where mv.user_id = u.user_id
 12  and   mv.room_id = r.room_id
 13  order by 2,1;

   ROOM_ID    USER_ID LAST_VIEW U RNAME                UNAME
---------- ---------- --------- - -------------------- --------------------
     10001          1 06-OCT-16   Room 10001           User 1
     10002          2 06-OCT-16   Room 10002           User 2
     10001          3 04-OCT-16 Y Room 10001           User 3
     10003          3 06-OCT-16   Room 10003           User 3
     10002          4 04-OCT-16   Room 10002           User 4
     10004          4 06-OCT-16   Room 10004           User 4
     10003          5 04-OCT-16   Room 10003           User 5
     10005          5 06-OCT-16   Room 10005           User 5
     10004          6 04-OCT-16   Room 10004           User 6
     10006          6 06-OCT-16   Room 10006           User 6
     10005          7 04-OCT-16 Y Room 10005           User 7
     10007          7 06-OCT-16   Room 10007           User 7
     10001          8 29-SEP-16 Y Room 10001           User 8
     10006          8 04-OCT-16 Y Room 10006           User 8
     10008          8 06-OCT-16   Room 10008           User 8
     10002          9 29-SEP-16 Y Room 10002           User 9
     10007          9 04-OCT-16 Y Room 10007           User 9
     10009          9 06-OCT-16   Room 10009           User 9
     10003         10 29-SEP-16 Y Room 10003           User 10
     10008         10 04-OCT-16 Y Room 10008           User 10
     10010         10 06-OCT-16   Room 10010           User 10
     10004         11 29-SEP-16 Y Room 10004           User 11
     10009         11 04-OCT-16 Y Room 10009           User 11
     10005         12 29-SEP-16 Y Room 10005           User 12
     10010         12 04-OCT-16 Y Room 10010           User 12
     10006         13 29-SEP-16 Y Room 10006           User 13
     10007         14 29-SEP-16 Y Room 10007           User 14
     10008         15 29-SEP-16 Y Room 10008           User 15
     10009         16 29-SEP-16 Y Room 10009           User 16
     10010         17 29-SEP-16 Y Room 10010           User 17

30 rows selected.

We can see with a simple CASE statement, we can now isolate per user/room combination, which rooms are “unread”. Normally we would run such a query with a given user_id, but I’m showing all of the data here just for explanation purposes. From here, satisfying our room count requirement is trivial


SQL> select user_id, count(unread)
  2  from
  3  (
  4  select
  5       mv.room_id,
  6       mv.user_id,
  7       mv.last_viewed,
  8       case when mv.has_unread > 0 then 'Y' end unread,
  9       r.rname,
 10       u.uname
 11  from MSGS_MV mv,
 12       users u,
 13       rooms r
 14  where mv.user_id = u.user_id
 15  and   mv.room_id = r.room_id
 16  )
 17  group by user_id
 18  order by 1;

   USER_ID COUNT(UNREAD)
---------- -------------
         1             0
         2             0
         3             1
         4             0
         5             0
         6             0
         7             1
         8             2
         9             2
        10             2
        11             2
        12             2
        13             1
        14             1
        15             1
        16             1
        17             1

17 rows selected.

And there we have it. Once again I’m showing ALL users here, but the same applies for any single user. We have the unread room count using the materialized view and not from the MSGS table.

So how do we keep this materialized view in sync with the detail ? Well…that’s already taken care of, because it is fast refreshable on commit. We can see from the two above outputs that user 8 has two unread rooms, namely room 10001 and 10006. So let’s have her login and jump into chat room 10001.


SQL> update room_users
  2  set last_viewed = sysdate
  3  where room_id = 10001
  4  and user_id = 8;

1 row updated.

SQL>
SQL> commit;

Commit complete.

and now let’s take a look at our unread room query


SQL> select user_id, count(unread)
  2  from
  3  (
  4  select
  5       mv.room_id,
  6       mv.user_id,
  7       mv.last_viewed,
  8       case when mv.has_unread > 0 then 'Y' end unread,
  9       r.rname,
 10       u.uname
 11  from MSGS_MV mv,
 12       users u,
 13       rooms r
 14  where mv.user_id = u.user_id
 15  and   mv.room_id = r.room_id
 16  )
 17  group by user_id
 18  order by 1;

   USER_ID COUNT(UNREAD)
---------- -------------
         1             0
         2             0
         3             1
         4             0
         5             0
         6             0
         7             1
         8             1  <===
         9             2
        10             2
        11             2
        12             2
        13             1
        14             1
        15             1
        16             1
        17             1

17 rows selected.

As we expect, the unread count for user 8 has dropped by 1. Now she jumps rooms…


SQL> update room_users
  2  set last_viewed = sysdate
  3  where room_id = 10006
  4  and user_id = 8;

1 row updated.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select user_id, count(unread)
  2  from
  3  (
  4  select
  5       mv.room_id,
  6       mv.user_id,
  7       mv.last_viewed,
  8       case when mv.has_unread > 0 then 'Y' end unread,
  9       r.rname,
 10       u.uname
 11  from MSGS_MV mv,
 12       users u,
 13       rooms r
 14  where mv.user_id = u.user_id
 15  and   mv.room_id = r.room_id
 16  )
 17  group by user_id
 18  order by 1;

   USER_ID COUNT(UNREAD)
---------- -------------
         1             0
         2             0
         3             1
         4             0
         5             0
         6             0
         7             1
         8             0  <===
         9             2
        10             2
        11             2
        12             2
        13             1
        14             1
        15             1
        16             1
        17             1

17 rows selected.

And as we can see, we have cleared her room status.  It’s all been handled by the database, and the application does not need do any of the “heavy lifting”.

Just to reiterate, I’m not trying to denounce the author’s statements – denormalization IS hard to get right, especially if you try to hand craft it in a multi-user environment. It’s very easy to end up with something that appears to work but silently corrupts the derived data over time.

But with a little care and thought, you’ll be amazed at what materialized views in Oracle are capable of.

Tightening up your data model

Having NOT NULL constraints declared on columns that are genuinely not nullable not only is good practice, but can yield performance gains. Here’s a little routine that lists those columns that probably need a NOT NULL constraint. It looks at all columns and

  • check to see if a constraint with definition “COLUMN_NAME” IS NOT NULL is present
  • counts the null values

If the latter is zero, and no constraint was found, then we print out ‘alter table … ‘ DDL for those columns that may need such a constraint defined. It works across a nominated schema and/or single table.


SQL> create or replace
  2  procedure check_uncons_columns_for_null(
  3                   p_schema varchar2,
  4                   p_table_name varchar2 default null) is
  5    cursor c_list is
  6     select t.table_name, t.column_name,
  7            max(t.column_name) over (partition by t.table_name  ) as lastcol
  8     from (
  9       select a.table_name, a.column_name
 10       from dba_tab_columns a
 11       where a.owner = p_schema
 12       and a.table_name = nvl(upper(p_table_name),a.table_name)
 13       ) t,
 14       (
 15       select a.table_name, b.column_name, a.search_condition
 16       from dba_cons_columns b, dba_constraints a
 17       where a.owner = p_schema
 18       and   a.constraint_type = 'C'
 19       and   a.table_name = nvl(upper(p_table_name),a.table_name)
 20       and   a.table_name = b.table_name
 21       and   a.owner = b.owner
 22       and   a.constraint_name = b.constraint_name
 23       ) c
 24    where t.table_name = c.table_name(+)
 25    and   t.column_name = c.column_name(+)
 26    order by 1,2;
 27
 28    str0 varchar2(32767);
 29    str1 varchar2(32767);
 30    str2 varchar2(32767);
 31    str3 varchar2(32767);
 32
 33    search_cond varchar2(32767);
 34
 35    prev varchar2(100) := '*';
 36    cnt number;
 37    trailer varchar2(5) := ','||chr(10);
 38
 39  procedure do_sql(thesql varchar2) is
 40    tcursor integer;
 41    dummy integer;
 42  begin
 43    tcursor := dbms_sql.open_cursor;
 44    dbms_sql.parse(tcursor,thesql,2);
 45    dummy := dbms_sql.execute(tcursor);
 46    dbms_sql.close_cursor(tcursor);
 47  end;
 48
 49  begin
 50  for i in c_list loop
 51
 52    if prev != i.table_name then
 53      str0 := 'declare ';
 54      str1 := 'begin select ';
 55      str2 := ' into ';
 56      str3 := ' '; cnt := 1;
 57    end if;
 58
 59    --
 60    -- approximation only
 61    --
 62    if search_cond is null or search_cond != '"'||i.column_name||'" IS NOT NULL' then
 63      str0 := str0 || 'v'||ltrim(cnt)||' number;';
 64      str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
 65      str2 := str2 || 'v'||ltrim(cnt)||trailer;
 66      str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
 67        'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
 68        ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
 69    end if;
 70    if i.column_name = i.lastcol then
 71      str1 := rtrim(str1,trailer);
 72      str2 := rtrim(str2,trailer) ||' from '||p_schema||'.'||i.table_name||';';
 73      str3 := rtrim(str3,trailer) ||' end;';
 74      do_sql(str0||' '||str1||' '||str2||' '||str3);
 75    end if;
 76    prev := i.table_name;
 77    cnt := cnt + 1;
 78
 79
 80  end loop;
 81  end;
 82  /

Procedure created.

SQL> set serverout on
SQL> exec check_uncons_columns_for_null(user)
alter table SCOTT.CHD modify (A not null);
alter table SCOTT.CHD modify (DATA not null);
alter table SCOTT.CHD modify (F not null);
alter table SCOTT.CHD modify (H not null);
alter table SCOTT.CHD modify (W1 not null);
alter table SCOTT.CHD modify (W2 not null);
alter table SCOTT.PAR modify (A not null);
alter table SCOTT.PAR modify (DATA not null);
alter table SCOTT.PAR modify (F not null);
alter table SCOTT.PAR modify (W not null);

PL/SQL procedure successfully completed.

SQL>
SQL>

A justification for referential integrity constraints

Yes, I know what you’re thinking.

“Here we go again.  Another database dude about to get stuck into us telling us about the importance of foreign keys for data integrity”

and yes, I could easily do that.  But you’ve seen that before (and by the way, it’s of course still totally valid – you should have all those integrity constraints defined!)

But here’s another reason, albeit a little tongue in cheek Smile

First I create a simple user with a couple of tables.


SQL> drop user demo1 cascade;

User dropped.

SQL>
SQL> grant create session, create table to demo1 identified by demo1;

Grant succeeded.

SQL>
SQL> alter user demo1 quota unlimited on users;

User altered.

SQL>
SQL> create table demo1.t1 ( x int, y int, constraint pk primary key ( x ) ) ;

Table created.

SQL>
SQL> create table demo1.t2 ( x int, y int) ;

Table created.

SQL>
SQL> alter table demo1.t2 add constraint pk2 primary key ( x );

Table altered.

SQL>
SQL> insert into demo1.t1 values (1,1);

1 row created.

SQL> insert into demo1.t1 values (2,2);

1 row created.

SQL> commit;

Commit complete.


And now, I take a datapump export of that schema


expdp compression=all dumpfile=DATA_PUMP_DIR:demo1.dmp logfile=DATA_PUMP_DIR:demo1.log schemas=demo1

Export: Release 12.1.0.2.0 - Production on Mon Nov 23 08:51:44 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Starting "SYS"."SYS_EXPORT_SCHEMA_01":  sys/******** AS SYSDBA compression=all dumpfile=DATA_PUMP_DIR:demo1.dmp logfile=DATA_PUMP_DIR:demo1.
log schemas=demo1
Estimate in progress using BLOCKS method...
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 64 KB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/TABLESPACE_QUOTA
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type SCHEMA_EXPORT/STATISTICS/MARKER
. . exported "DEMO1"."T1"                                4.804 KB       2 rows
. . exported "DEMO1"."T2"                                    0 KB       0 rows
Master table "SYS"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_SCHEMA_01 is:
  C:\ORACLE\ADMIN\NP12\DPDUMP\DEMO1.DMP
Job "SYS"."SYS_EXPORT_SCHEMA_01" successfully completed at Mon Nov 23 08:52:10 2015 elapsed 0 00:00:23

Now I want to clone that schema to another, but in order to do it as efficiently as possible, I’m going to exclude indexes and constraints (they will be added later).


impdp directory=DATA_PUMP_DIR exclude=index exclude=constraint exclude=ref_constraint exclude=index_statistics exclude=table_statistics content=all dumpfile=demo1.dmp logfile=demo1.implog remap_schema=demo1:demo2 
Import: Release 12.1.0.2.0 - Production on Mon Nov 23 09:05:36 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Username: /as sysdba

Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
ORA-39002: invalid operation
ORA-39168: Object path REF_CONSTRAINT was not found.

Well…this is a disaster!  Because I had no referential integrity constraints, my import failed to find any…and hence could not exclude them!  This is actually documented as expected behaviour (Datapump Import Fails With ORA-39168, Doc ID 430702.1) and is done as a safety mechanism – if the datapump file was corrupt, we don’t want to be half way through importing a file and find bits and pieces misssing, so the checks are done up front and the import abandoned.

So the moral of the story is … make sure you’ve got referential integrity constraints Smile

Why I love working on AskTom

Today a question came in, and Swapnasis provided a nice simple test case, which makes my job so much easier.  Here’s a snippet from it


SQL> create table TTT(
  2    testid integer not null,
  3    value integer not null,
  4    time timestamp not null,
  5    unique(testid, time)
  6  );

Table created.

See line 5 ?

I’ve been working with Oracle for (well…I’d rather not say Smile) but lets just says “a number of years” dating back to version 6 of the database.  But in all of that time, I never knew that you could just write “unique”.  I have always done “constraint MY_CONSTRAINT unique (col1, col2)”.

Its probably also the reason I dont mind doing a few AskTom’s like today, ie, on a Saturday morning over breakfast coffee. Its free education Smile

#neverstoplearning

Datatype discipline

(This is based on some ‘issues’ a client was having, so the example is architected to make a particular point, but is relevant nonetheless)

In the distributed database world, we often have different names for the same piece of information.

Consider two tables which hold customer information. In one table, they called the customer key “REF_NUM”, and in the other table “CUST_ID”.

That’s not ideal, but it’s not a big drama because at least the data types align.

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL NUMBER(10)
                   T2_DATA                                              VARCHAR2(1000)

When I do a join between these two tables, it costs me about 9400 I/O’s

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  1  db block gets
               9408  consistent gets
                  0  physical reads

Consider now the same two tables, with exactly the same data, but the data types are not aligned. In one system, they used a numeric, and in the other system they used a string

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL VARCHAR2(10)
                   T2_DATA                                              VARCHAR2(1000)

Now look what happens to my SQL

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  2  db block gets
              18428  consistent gets
                  0  physical reads

It costs DOUBLE the amount of I/O’s – it runs twice as slowly.

Moreover, it’s not just performance that is at risk…You’re whole application might just start to crash randomly

Using the same tables above (with the conflicting data types) here’s a SQL that works fine “today”…

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

          COUNT(*)
        ----------
                71

Then a pesky user comes along and does what users do…adds data to the one of the tables.

        SQL> insert into T2 values ('X123','100some data');

        1 row created.

And now let’s re-run that same query again

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

        ERROR: ORA-01722: invalid number

Ker-splat….your application is toast…

Not being disciplined with data type selection causes pain later.