Building community via the speaker community

I recently participated in the Oracle Developer Community Yatra tour throughout India. It is a hectic event with 7 cities covered in a mere 9 days, so you can imagine how frantic the pace was. A typical day would be:

  • 7am – breakfast
  • 8am – check out of hotel and leave for the venue
  • 9am – speak all day, host Q&A
  • 6pm – leave straight from venue to the airport
  • 8pm – dinner at airport, and fly to next city
  • 1am – get to next hotel, grab some sleep before doing it all again in 6 hours time

Yet as a speaker in this event, it never felt that the logistics of the event were out of control.  This is mainly due to the incredible work of the people in the AIOUG (All India Oracle User Group) coordinated by Sai Penumuru. The smoothness of the organization prompted me to blog about how user groups could follow the AIOUG lead in terms of running successful events.

Every time I do an event in India, not only do I come away with a stronger network and connection with the attendees, I also gain new and strong friendships with the speakers and this is in no small part due to organizational skills of Sai and the volunteers.

So from that reference point – namely, the smart way in which Sai and the user group foster a great feeling of community amongst the speakers, I thought I’d share the mechanisms via which user groups can organize events that make speakers feel welcome and keen to return.
Common accommodation

When I travel to India, Sai provides a recommended hotel for all speakers. Obviously, no speaker is compelled to stay there, but we all know that the recommendation Sai makes has taken in account:

  • facilities the hotel provides,
  • hotel price to find a balance for speakers either having company funding or funding themselves,
  • transport time to/from the conference venue.

So most of us will always use his recommendation, and it makes the decision making process easy.  But most of all, it is a catalyst for speakers to meet in a casual environment outside the conference hours, and build relationships.

Common transport

The AIOUG organizes transport to and from the venue, and from the airport as well. And for when this is not possible, they will provide a recommended transport means (such as Uber etc) so that speakers never have the risk of picking an unreliable or unsafe transport option. For multi-city events such as Yatra, the AIOUG also recommends common itinerary for flights, once again making the entire planning process much easier for speakers.

A communication mechanism

Before a conference, AIOUG sets up a Whatsapp group containing all of the speakers, and the key representatives from the user group. In this way, all communication is sent in a consistent fashion.  We don’t have to jump between email, twitter, etc to see if any correspondence has been missed. And this group also helps build the relationship between speakers and user group. Humourous stories and pictures can be shared, but most importantly, if there is an issue or problem – everyone is aware of it immediately, making resolution is fast and effective.

And perhaps most importantly, it helps accentuate the human element – messages are sent as people land or takeoff, and when people arrive at the hotel.  Organizers regularly send messages making sure no-one is having any difficulties.  All these things make the speakers feel more comfortable before, during and after the event.

A typical day

Perhaps the most valuable piece of information that is conveyed by the user group is ‘local knowledge’. For example, each evening a whatsapp message would be sent out detailing

  • hotel pickup time,
  • expected transit time,
  • who to contact/what to do when arriving at venue,
  • what identification requirements may be needed on site
    etc

So even though it may be a first visit to a city for the speakers, there is a degree of familiarity with what is expected to happen, and hence knowledge of whether things are departing from the norm.

Onsite assistance

The most stressful 5 minutes for any speaker is the time they are setting up for their talk. Making sure projectors work, internet connectivity, schedule changes etc – are all things that can sabotage a good talk for a speaker. The AIOUG always has someone visit the room in that critical 5 minutes, so a speaker does not have to go hunting for technical assistance.

 

In summary, as you can see, none of these things are particular arduous to do, but the benefits are huge for speakers.  We feel comfortable and welcome, which means a much better chance of a successful talks, and makes us much more likely to want to return.

So if you are reading this, and are part of a user group committee, please consider some of the tips above for your local events. If your speakers have a good experience, they’ll be much more keen to offer continued support for your events.

Humble pie made with NULL strings

I was helping out a client a while back with an issue where a panicked email came into my inbox along the lines of “SELECT IS BROKEN IN ORACLE!!!”, which seemed perhaps a little extreme Smile. So I pursued it further asking for some concrete details, and I must concede it had me a little bamboozled for a while. I’ve simplified the example to keep it easy to digest, but the premise is the same.

My colleague had a table with a couple of VARCHAR2 columns:

image

and the data inside the two columns for the single row was identical:

image

The panicky email was sent because a simple query to check that the two column were identical was not working as expected:

image

At this point, I was pretty relaxed about the situation because this is a “problem” I have seen many many times over the years, especially as people come to Oracle from other database management systems. So I responded to the email:

“Yeah…I know what the issue is. You have trailing spaces in one of the columns. Try RTRIM”

and filed the original email in that special folder where emails never resurface Smile

image

But as quickly as I had hit “Send”, a reply came back with “That doesn’t work either”. Working on the assumption that my email had not been explicit enough (and perhaps I was being a little smug about it) I logged in to the system to run a query on the same table, and lo and behold – my smugness was wiped off my face Smile

image

So then I tried a variation on a theme and concatenated a known value to the end of each column to see if I was missing anything:

image

So everything looks OK but obviously something is awry here. When you have doubts on the data, a good option is to use the DUMP function to see exactly what bytes are stored:

image

and the problem is revealed.  The second column in the table had a trailing ASCII code zero after the word SUCCESS. This can be a nasty (and common) issue when developers are using C or similar languages to store data in an Oracle database. In many languages, ASCII zero is used to terminate an arbitrary length string, and hence it can easily end up being incorrectly stored along with the string bytes in a VARCHAR2 column especially if you are not using the delivered string handling functions in the pre-compiler layer (such as Pro*C). This can also be due to a confusion between the concept of NULL (two L’s) in the database, and the concept of the NUL (one L) character in a programming language.

Compounding the issue is that a casual glance at the data (as per the above screen shots from SQL Developer) typically suggest all is well – the NUL character is not observed. Things can get even worse – I’ve seen some GUI database tools interpret the NUL character as a true NULL and hence when a column contains only a single NUL character, those tools will report the column as being empty (ie NULL), which just makes for even more confusion. Thankfully SQL Developer does not do that, and the null indicator column can be used to see the difference between the two:

image

So I sent a terse email back to the developer reminding them about NUL versus NULL and that if they are using the pre-compiler correctly, then everything will take care of itself. But I must admit, I had to chuckle when they replied with a link to an AskTOM question I answered recently describing the following:

We made the same mistake ourselves! Albeit in a very specific circumstance. Look what happens when you force a STOP command onto a running scheduler job:



SQL> create table t ( x timestamp, y int );

Table created.

SQL> create or replace
  2  procedure myproc is
  3  begin
  4    for i in 1 .. 20
  5    loop
  6      insert into t values (systimestamp,i);commit;
  7      dbms_lock.sleep(2);
  8    end loop;
  9  end;
 10  /

Procedure created.

SQL>
SQL> begin
  2    dbms_scheduler.create_job (
  3      job_name        => 'myjob',
  4      job_type        => 'plsql_block',
  5      job_action      => 'begin myproc; end;',
  6      start_date      => systimestamp,
  7      repeat_interval => 'freq=minutely; bysecond=0;',
  8      enabled         => true);
  9  end;
 10  /

PL/SQL procedure successfully completed.
--
-- waited for 1 successful execution, and then stop the second one in flight
--
SQL> exec dbms_scheduler.stop_job('MYJOB',force=>true)

PL/SQL procedure successfully completed.

SQL> select status, session_id, dump(session_id)
  2  from USER_SCHEDULER_JOB_RUN_DETAILS;

STATUS
------------------------------
SESSION_ID
-------------------------------------------------------------
DUMP(SESSION_ID)
-------------------------------------------------------------
SUCCEEDED
984,53037
Typ=1 Len=9: 57,56,52,44,53,51,48,51,55

STOPPED
983,28542
Typ=1 Len=10: 57,56,51,44,50,56,53,52,50,0  <<=== whoops!

I’ve logged this as a bug but in the interim, if you need the SESSION_ID from the scheduler views, you might want to add a: RTRIM(SESSION_ID, chr(0)) around the query.

Hoist by my own NULL petard Smile

Hyper-partitioned index avoidance thingamajig

As you can tell, I have no idea on a name for what I am about to describe. So let me start from the beginning, and set the scene for an idea I have to utilize a cool new 18c feature.

Often in a transactional-style system the busiest table (let us call it SALES for the sake of this discussion) is also

  • the biggest table, after all, it has all of our sales in it,
  • the most demanded for table, in that, almost every query in our application wants to access it in some way shape or form.

This is in effect the database version of the Pareto Principle. Everyone wants a slice of that SALES “pie”, and the piece of that pie that is in most demand is typically the most recent data. Your application may have pages that will be showing:

  • sales in the past hour,
  • sales in the past day,
  • products sold in the past hour,
  • largest purchase amount for sales in the past few hours,
  • suspicious activity for today,
  • A customer will want their sales for the last few days

The list goes on and on, the common theme being that the data being requested is bound by a range of time in the recent past. Our SALES table already will have a primary key, probably being some sort of unique transaction ID for each purchase, but to the satisfy the style of requests in the list above, we will probably need:

  • an index on the time(stamp) of the sale,
  • an index on the product ID that was sold,
  • an index on the customer ID who made the purchase,
  • potentially even an index on the sale amount

You can see the troubled waters into which we are sailing here. It is our biggest and busiest table, and here we are, adding index after index after index to improve query performance, whilst at the same time:

  1. adding overhead/contention to DML operations on the SALES table,
  2. increasing the size of the database,
  3. increasing the duration of the backups,
  4. increasing time for maintenance and copies to Development and Test

None of this is looking great but we might be thinking “What choice do we have?”

Here is perhaps an alternate strategy: Let’s not create any indexes.  The challenges (1) through (4) above evaporate to non-existence. But of course, now we have a remaining challenge in getting those queries to run efficiently.

I’ll tackle that in a different way – I will partition the SALES table very “aggressively”, hence my blog post title “hyper-partitioning”. Considering the typical query requirements I listed above, I will partition my SALES down to as small as a partition for every hour.


SQL> create table sales
  2    ( ts   timestamp,
  3      id   int,
  4      amt  number,
  5      product int,
  6      customer int,
  7      item_cnt int,
  8      terminal int,
  9      operator int,
 10      credit_card int,
         ...
         ...
 17    )
 18  partition by range ( ts )
 19  interval ( numtodsinterval(1,'HOUR') )
 20  (
 21    partition p1 values less than ( timestamp '2018-07-01 00:00:00' )
 22  );

Table created.

SQL>
SQL> insert /*+ APPEND */ into sales (ts,id,amt,product,customer)
  2  select date '2018-07-01' + rownum / 240, rownum, dbms_random.value(1,100),
  3        dbms_random.value(1,100),dbms_random.value(1,100)
  4  from dual
  5  connect by level 
SQL> set serverout on
SQL> declare
  2    h varchar2(1000);
  3  begin
  4  for i in (
  5    select partition_name, high_value
  6    from user_tab_partitions
  7    where table_name = 'SALES'
  8    and   interval = 'YES'
  9    and   partition_name like 'SYS_P%'
 10    order by partition_position
 11  ) loop
 12    h := i.high_value;
 13    execute immediate 'select to_char('||h||'-0.01,''yyyymmdd_hh24'') from dual' into h;
 14    execute immediate 'alter table sales rename partition '||i.partition_name||' to p'||h;
 15  end loop;
 16  end;
 17  /

PL/SQL procedure successfully completed.

SQL>
SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ ------------------------------------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701_00                   TIMESTAMP' 2018-07-01 01:00:00'
P20180701_01                   TIMESTAMP' 2018-07-01 02:00:00'
P20180701_02                   TIMESTAMP' 2018-07-01 03:00:00'
P20180701_03                   TIMESTAMP' 2018-07-01 04:00:00'
P20180701_04                   TIMESTAMP' 2018-07-01 05:00:00'
P20180701_05                   TIMESTAMP' 2018-07-01 06:00:00'
P20180701_06                   TIMESTAMP' 2018-07-01 07:00:00'
P20180701_07                   TIMESTAMP' 2018-07-01 08:00:00'
P20180701_08                   TIMESTAMP' 2018-07-01 09:00:00'
P20180701_09                   TIMESTAMP' 2018-07-01 10:00:00'
P20180701_10                   TIMESTAMP' 2018-07-01 11:00:00'
P20180701_11                   TIMESTAMP' 2018-07-01 12:00:00'
P20180701_12                   TIMESTAMP' 2018-07-01 13:00:00'
P20180701_13                   TIMESTAMP' 2018-07-01 14:00:00'
P20180701_14                   TIMESTAMP' 2018-07-01 15:00:00'
P20180701_15                   TIMESTAMP' 2018-07-01 16:00:00'
P20180701_16                   TIMESTAMP' 2018-07-01 17:00:00'
P20180701_17                   TIMESTAMP' 2018-07-01 18:00:00'
P20180701_18                   TIMESTAMP' 2018-07-01 19:00:00'
P20180701_19                   TIMESTAMP' 2018-07-01 20:00:00'
P20180701_20                   TIMESTAMP' 2018-07-01 21:00:00'
P20180701_21                   TIMESTAMP' 2018-07-01 22:00:00'
P20180701_22                   TIMESTAMP' 2018-07-01 23:00:00'
P20180701_23                   TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

41 rows selected.

I’ve run a small anonymous block to rename the (system-named) interval partitions into some sensible names to reflect the date range the partition covers. Let’s now look at the typical queries we will now be performing on the SALES table:


SQL> set autotrace traceonly explain
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     9 |   387 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and product = 12;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PRODUCT"=12 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and customer = 25;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("CUSTOMER"=25 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select max(amt) from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    33 |     7   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    33 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
|*  3 |    TABLE ACCESS FULL      | SALES |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> set autotrace off

All of them scan a tiny portion of the data, namely, just the hours of sales data relevant to the query, and the query response times will be relatively consistent for all cases no matter which customer, product or other predicate will be passed because the data to be scanned is a fixed number of hours.

But there’s a problem here. If I am partitioning to the hour, or even to the minute…then it won’t be long before I have a lot of partitions. In the latter case (minutes) I will be up to over 500,000 partitions in the first year of SALES alone! That is a lot of database metadata to store. There is the partitions themselves, plus optimizer statistics on them, plus historical optimizer statistics, plus potentially histograms on every column.  A lot of optimizer data might lead to expensive parse times because there is just so much information to wade through when optimizing queries.

But we only need the extreme granularity of partitions for the SALES table for today. Once today ticks over and becomes “yesterday”, then we might only need a partition for the entire day.  And once “yesterday” ticks over to “last week”, then maybe only weekly partitions are needed and so forth.

One of the cool things in 18c is ability to do this style of maintenance with negligible disruption to service. Because I have named my partitions in a logical fashion, here is a simple routine to merge “yesterdays” hourly partitions into a single one for the day.


SQL> set serverout on
SQL> declare
  2    d date := date '2018-07-01';
  3    ddl varchar2(4000);
  4  begin
  5    select listagg(partition_name||chr(10),',') within group ( order by partition_position )
  6    into   ddl
  7    from   user_tab_partitions
  8    where  table_name = 'SALES'
  9    and    partition_name like 'P'||to_char(d,'yyyymmdd')||'%';
 10
 11    ddl := 'alter table sales merge partitions '||ddl||' into partition p'||to_char(d,'yyyymmdd')||' online';
 12
 13    dbms_output.put_line(ddl);
 14    execute immediate ddl;
 15  end;
 16  /
alter table sales merge partitions
 P20180701_00
,P20180701_01
,P20180701_02
,P20180701_03
,P20180701_04
,P20180701_05
,P20180701_06
,P20180701_07
,P20180701_08
,P20180701_09
,P20180701_10
,P20180701_11
,P20180701_12
,P20180701_13
,P20180701_14
,P20180701_15
,P20180701_16
,P20180701_17
,P20180701_18
,P20180701_19
,P20180701_20
,P20180701_21
,P20180701_22
,P20180701_23
 into partition p20180701 online

PL/SQL procedure successfully completed.

Now I have a single partition for yesterday’s data, and hourly partitions for today’s data.


SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ -------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701                      TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

18 rows selected.

I stress – this is not my recommendation to race out and partition every transactional table you have, and drop all of the indexes Smile. But it with so many online partitioning operations in 18c, it raises some exciting new opportunities there were not available in previous releases. So start thinking about how you can exploit this to get advantages with the partitioning option that might sit “outside the box” of the standard usage.

18.3 As easy as 1…2…3

Well, finally it’s here! 18c for on-premise installation so the world can all get stuck into the cool new features of the latest release on their own laptops Smile  At least that is what I’ll be doing!

Naturally as soon as I heard the news, I downloaded the software and got ready to set aside the day for installation and creation of an 18c database. But I didn’t need that long – I didn’t need that long at all. Just a few clicks and a few commands and there it was – my 18c database up and running.

Check out how easy it is with my three videos.

Software Installation

Listener Creation

Database Creation

It really is as easy as 1…2…3

Enjoy 18c !

Searching in Oracle Database documentation

Just a quick heads up with something I see from time to time in Chrome (but not in Firefox or any other browser).

Occasionally when doing a search, the results are not limited as per my criteria.  For example, if I am searching for information about Spatial in the Licensing Guide:

image

then when I click the Search button, the results might come back with a far broader search range:

image

If you experience this, there is an easy workaround – simply re-run the search. It only seems to occur (for me at least) on the first execution of the search.  Clicking the Search button again on the page, yielded the correct result second time around:

image

Let me know if you’ve seen similar behaviour – if it is not just me, I’ll pass it along to the Documentation team.

Happy searching!

Complex materialized views and fast refresh

Just a quick discovery that came across the AskTOM “desk” recently. We have an outstanding bug in some instances of fast refresh materialized views when the definition of the materialized view references a standard view.

Here’s a simple demo of the issue – I’ll use a simplified version of the EMP and DEPT tables, linked by a foreign key in the usual way:


SQL> create table dept(dept_id number(10) primary key, dname varchar2(20));
 
Table created.
 
SQL> create table emp(empid number(20) primary key, ename varchar2(20), sal number(10,2), dept_id number(10) references dept(dept_id));
 
Table created.
 
SQL> insert into dept values(10,'IT');
 
1 row created.
 
SQL> insert into dept values(20,'HR');
 
1 row created.
 
SQL> insert into dept values(30,'MAT');
 
1 row created.
 
SQL> insert into emp values(1,'MIKE',20000,10);
 
1 row created.
 
SQL> insert into emp values(2,'JOHN',30000,20);
 
1 row created.
 
SQL> insert into emp values(3,'SUE',20000,20);
 
1 row created.
 
SQL> insert into emp values(4,'TOM',40000,30);
 
1 row created.
 
SQL>
SQL> commit;
 
Commit complete.

One of the cool things with materialized views is that even with complicated SQL definitions (such as joins), the materialized view can still be fast refreshable as long as the materialized view logs and database constraints are correctly defined.


SQL> create materialized view log on dept
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL>
SQL> create materialized view log on emp
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  8  from emp a, dept b
  9  where a.dept_id=b.dept_id;
 
Materialized view created.
 

Note: See the documentation for DBMS_MVIEW.EXPLAIN_MVIEW for how to check on the refresh characteristics of a materialized view (or potential materialized view).

Now I’ll repeat the same experiment, but I’ll wrap that SQL that joins EMP and DEPT within a standard view called VW. Since a view is just stored SQL text, and the previous usage of the same SQL worked fine, we’d expect no difference in functionality.  However, the results do not meet the expectation.


SQL> drop materialized view mv ;
 
Materialized view dropped.
 
SQL>
SQL> create or replace view vw
  2  as
  3  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  4  from emp a, dept b
  5  where a.dept_id=b.dept_id;
 
View created.
 
SQL>
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select * from vw;
select * from vw
              *
ERROR at line 7:
ORA-12015: cannot create a fast refresh materialized view from a complex query

This is not a flaw in our DDL – it is a bug in the database that will be fixed in due course. So if you have standard views being used within your materialized view definitions, and you are getting unexpected restrictions on whether the materialized views can be fast refreshed, try a simple workaround of in-lining the view text directly.  You might have hit this bug.

Standard Edition–different optimizer but still cool

One cool technique that the optimizer can employ is the BITMAP CONVERSION TO ROWIDS method to take advantage of B-tree indexes in a means that we would normally associate with a bitmap index. This can be particularly useful with multiple predicates on individually indexed columns because it lets us establish the rows of interest before having to visit the heap blocks.  Here’s an example of that in action, even when the indexes in question are Text indexes.

Enterprise Edition plan


SQL> create table MY_TAB ( col1 varchar2(50), col2 varchar2(50));

Table created.

SQL>
SQL> insert into MY_TAB
  2  select dbms_random.string('x', 30), dbms_random.string('x', 30)
  3  from dual
  4  connect by level <= 100000;

100000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> create index MY_TAB_IX1 on MY_TAB(col1) indextype is ctxsys.context;

Index created.

SQL> create index MY_TAB_IX2 on MY_TAB(col2) indextype is ctxsys.context;

Index created.

SQL>
SQL> exec  dbms_stats.gather_table_stats('','MY_TAB');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4174159475

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |     1 |    62 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MY_TAB      |     1 |    62 |     2   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |             |       |       |            |          |
|   3 |    BITMAP OR                        |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   5 |      SORT ORDER BY                  |             |       |       |            |          |
|*  6 |       DOMAIN INDEX                  | MY_TAB_IX1  |       |       |     1   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   8 |      SORT ORDER BY                  |             |       |       |            |          |
|*  9 |       DOMAIN INDEX                  | MY_TAB_IX2  |       |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   9 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

But a quick trip to the licensing guide states the following about the optimizer in Standard Edition:

The following methods are not available in SE:

Bitmapped index, bitmapped join index, and bitmap plan conversions

which make sense given that you do not have access to bitmap indexes in Standard Edition. 

But all is not lost.  Even though the bitmap conversion optimization is not available, the optimizer can still come up with alternate and intelligent mechanisms to extract the required rows. Whereas you might think that an OR condition on two different columns might force a full table scan, here is the same example as above in Standard Edition.

Standard Edition plan


SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1568130183

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     2 |   108 |     2   (0)| 00:00:01 |
|   1 |  VIEW                         | VW_ORE_A5827389 |     2 |   108 |     2   (0)| 00:00:01 |
|   2 |   UNION-ALL                   |                 |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  4 |     DOMAIN INDEX              | MY_TAB_IX1      |       |       |     1   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  6 |     DOMAIN INDEX              | MY_TAB_IX2      |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   5 - filter(LNNVL("CTXSYS"."CONTAINS"("COL1",'string1')>0))
   6 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

SQL> set autotrace off

A “UNION ALL” suggests the two separate table access paths will potentially return duplicate rows. But notice the nice touch on line 5 – when probing the rows via MY_TAB_IDX2, the optimizer threw in an additional FILTER (LNNVL(“CTXSYS”.”CONTAINS”(“COL1”,’string1′)>0)) to remove those rows that will be returned by the the “partnering” half of the UNION ALL.  In this way, we avoided a potentially expensive sort to remove duplicated results.

So you might see the occasional difference between optimizer plans for Enterprise edition versus Standard edition – but both should do a fine job at executing your queries Smile

(Both tests done on 12.2)

UTL_FILE_DIR and 18c

I wrote a blog post called The Death of UTL_FILE which attracted a comment from a reader:

“There is NO chance to stay at UTL_FILE as it is DESUPPORTED starting with database Version 18c”

This is not the case, but since I wanted to clarify what has changed in 18c, it warrants this small but separate blog post. When UTL_FILE first into existence in Oracle 7, the concept of directory object did not apply to UTL_FILE. Clearly we could not just let UTL_FILE to write to any destination, otherwise a malicious person could write a little PL/SQL block like this:


declare
  f utl_file.file_type;
begin
  for i in ( select 
                regexp_substr(name,'(.*)\\(.*)', 1, 1, 'i', 1) path,
                regexp_substr(name,'(.*)\\(.*)', 1, 1, 'i', 2) name
             from v$datafile 
             order by file# desc ) 
  loop
    f := utl_file.fopen(i.path,i.name,'W');
  end loop;
  end;
  utl_file.fclose_all;
end;
/

and voila! No more database. So an initialization parameter was created to nominate which directories UTL_FILE was allowed to access.  This parameter was called UTL_FILE_DIR, and the above code hopefully is justification enough to show that you should never ever set UTL_FILE_DIR to “*”, meaning it could write to wherever the OS permissions on the Oracle software account would allow it.  Bye Bye datafiles…bye bye database Smile

In more recent versions, UTL_FILE was improved so that the directory parameter could be supplied as a directory object. This is a much tighter implementation because read and write privileges on directory objects can be controlled from within the database.

In 18c, it is not UTL_FILE that has been de-supported, it is the ability to use the older style convention of hard-coded path names for the directory that is no longer allowed. You have to use directory objects. If you set the traditional UTL_FILE_DIR path in the spfile it will be ignored, and you’ll get a warning on startup.


SQL> alter system set utl_file_dir = 'c:\temp' scope=spfile;

System altered.


SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.


--
-- alert log
--
Obsolete system parameters with specified values:
  utl_file_dir
End of obsolete system parameter listing

But even if you set UTL_FILE_DIR in the spfile, you will not be allowed to use OS directory paths in the UTL_FILE dir calls.


SQL> declare
  2    f utl_file.file_type;
  3  begin
  4    f := utl_file.fopen('c:\temp','demo.dat','W');
  5    utl_file.fclose_all;
  6  end;
  7  /
declare
*
ERROR at line 1:
ORA-29280: invalid directory object
ORA-06512: at "SYS.UTL_FILE", line 41
ORA-06512: at "SYS.UTL_FILE", line 478
ORA-06512: at line 4

If you prefer a video version of this explanation, I had some fun riding my bike when I talked about this when 18c first came out earlier this year.

More triggers are better

Yes, you heard me correctly. If you have got one trigger on a table, then you might be surprised to find that perhaps having a second one will be a better option. Then again, I also love the sweet scent of a clickbaity, inflammatory blog post title to draw the readers in Smile so you’ll just have to read on to see which is true.

As much as I am not a fan of using triggers to populate metadata in tables, I also concede that it is a popular approach taken by many of us as we code up our applications. Sure, we’d like to have our application code store that all important “who did this change” information with each database row, but more often than not, this get shoe-horned into the codebase via triggers as an afterthought to the development process.

But our well-read developer will often be thinking: “I’m a savvy developer – I’ve been reading blogs, and websites that tell me that less triggers is better, so I’ll use as few as I can” which leads to a scenario like the one I’ll present now.

I’ll start with a standard parent and child table setup, connected via an obvious foreign key


SQL> create table par (
  2    id           number(10)    not null primary key,
  3    description  varchar2(20)  not null,
  4    created_by   varchar2(20)  not null,
  5    updated_by   varchar2(20)  not null
  6    );

Table created.

SQL>
SQL> create sequence par_seq;

Sequence created.

SQL>
SQL> create sequence chd_seq;

Sequence created.

SQL>
SQL> create table chd (
  2    id           number(10)    not null primary key,
  3    par_id       number(10)    not null,
  4    constraint  fk foreign key ( par_id ) references par ( id )
  5    );

Table created.

SQL>
SQL>
SQL> create index chd_fk_idx on chd (par_id);

Index created.

I’ve got a sequence for each table, so in order to “bind” that sequence to the primary key for each table I’ll create a trigger.  So I’ll add some trigger code to implement some other common application requirements at the same time:

  • Populate the primary key with a sequence value,
  • Populate the CREATED_BY, UPDATED_BY columns on insert of a new row
  • Amend the UPDATED_BY column when I later update that row

Here is a trigger to implement that.


SQL> create or replace trigger trg_par
  2  before insert or update on par
  3  for each row
  4  begin
  5      if inserting then
  6         :new.id := par_seq.nextval;
  7         :new.created_by := user;
  8         :new.updated_by := user;
  9      end if;
 10
 11      if updating then
 12         :new.updated_by := user;
 13      end if;
 14  end;
 15  /

Trigger created.

Now my application is ready go.  I’ll insert some data to verify that my trigger has not broken any functionality.


SQL> insert into par (description) values ('test');

1 row created.

SQL> insert into par (description) values ('test2');

1 row created.

SQL> select * from par;

  ID DESCRIPTION          CREATED_BY           UPDATED_BY
---- -------------------- -------------------- --------------------
   1 test                 MCDONAC              MCDONAC
   2 test2                MCDONAC              MCDONAC

2 rows selected.

SQL>
SQL> insert into chd (id,par_id) values (chd_seq.nextval,1);

1 row created.

SQL> insert into chd (id,par_id) values (chd_seq.nextval,2);

1 row created.

SQL> select * from chd;

  ID     PAR_ID
---- ----------
   1          1
   2          2

2 rows selected.

SQL>
SQL> commit;

Commit complete.

So far so good. Let us see now what happens when I do a simple update on the parent table on the DESCRIPTION column.  Note that this column is not involved with any primary key, or index or foreign key relationship – it’s just a simple string column.


SQL> update par set description = 'anything' where id = 1;

1 row updated.

SQL> select locked_mode, object_name
  2  from v$locked_object l, all_objects ob
  3  where ob.object_id =l.object_id;

LOCKED_MODE OBJECT_NAME
----------- ----------------------------------------
          3 PAR
          3 CHD

2 rows selected.

Notice the locks that have been taken. Understandably, I had to take a lock on the PAR table because I’ve just updated a row on the table, so I need to make sure that no-one does anything like drop it, or modify the structure whilst I’ve got an outstanding transaction. But here is the interesting part – we also took a lock on the child table CHD. That seems superfluous because as I said – we are not performing an update of anything that could possibly impact the child table in any way. 

Why did the database choose to add this extra lock?

The cause is due to the way I coded the trigger.  Notice that the trigger has a reference to the primary key in the trigger body:


  5      if inserting then
  6         :new.id := par_seq.nextval;

Even though that part of the trigger body is not executed (because I am performing an update not an insert) the mere presence of it is enough for the database to cautiously add that extra lock on the CHD table because we “just might” be messing around with the parent table primary key here.

So now I’ll roll back that update and convert the single triggers to two triggers – one for insert and one dedicated for update.


SQL> rollback;

Rollback complete.

SQL>
SQL> drop trigger trg_par ;

Trigger dropped.

SQL>
SQL> create or replace trigger trg1_par
  2  before insert on par
  3  for each row
  4  begin
  5      :new.id := par_seq.nextval;
  6      :new.created_by := user;
  7      :new.updated_by := user;
  8  end;
  9  /

Trigger created.

SQL>
SQL> create or replace trigger trg2_par
  2  before update on par
  3  for each row
  4  begin
  5      if updating then
  6          :new.updated_by := user;
  7      end if;
  8  end;
  9  /

Trigger created.

Now I’ll run the same update


SQL> update par set description = 'anything' where id = 1;

1 row updated.

SQL> select locked_mode, object_name
  2  from v$locked_object l, all_objects ob
  3  where ob.object_id =l.object_id;

LOCKED_MODE OBJECT_NAME
----------- ----------------------------------------
          3 PAR

1 row selected.

Because the reference to the primary key columns for table PAR are no longer in the firing (update) trigger, I no longer longer see the lock on the CHD table.

In practice, it is unlikely that this additional lock is going to cause you a lot of harm – it will block some operations on the child table such as a truncate or DDL to make a structural change, so it is reasonably to assume that these are rare occurrences. But I’m a fan of the philosophy of: Have as few locks as possible, but always as many as required. So in this case, you might to want to consider opting for two triggers instead of one.

And of course, perhaps getting to zero triggers might be the best option Smile

DDL for constraints – subtle things

The DBMS_METADATA package is very cool. I remember the days of either hand-crafting DDL statements based on queries to the data dictionary, or many a DBA will be familiar with running “imp show=y” or “imp indexfile=…” in order to then laboriously extract the DDL required from the import log file.  DBMS_METADATA removed all of those annoyances to give us a simple API to get the true and complete DDL for a database object.

But when extracting DDL from the database using the DBMS_METADATA package, you need to be aware of some subtleties especially if you plan on executing that DDL in the database.

Consider this example – I have a few tables and I want to extract the referential integrity constraints for one of them. Being a good cautious developer Smile I’ll just output the DDL first before attempt to do any execution of the statements:


SQL> create table tab1(id number, name varchar2(100),
  2                      constraint pk_tab1_id primary key(id));

Table created.

SQL> create table tab2(id number, name varchar2(100),
  2                      constraint pk_tab2_id primary key(id));

Table created.

SQL> create table tab3(id number, name varchar2(100), int_id number,
  2                      constraint pk_tab3_id primary key(id),
  3                      constraint fk_tab1_id foreign key(int_id) references tab1(id),
  4                      constraint fk_tab2_id foreign key(int_id) references tab2(id));

Table created.

SQL>
SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4          dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line(i.ddl);
 10            --execute immediate i.ddl;
 11    end loop;
 12  end;
 13  /

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

That all looks fine – I have my two ALTER statements ready to go.  So now I’ll comment back in the ‘execute immediate’ command and all should be fine.


SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4             dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line(i.ddl);
 10            execute immediate i.ddl;
 11    end loop;
 12  end;
 13  /

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE
begin
*
ERROR at line 1:
ORA-01735: invalid ALTER TABLE option
ORA-06512: at line 10
ORA-06512: at line 10

The seems an odd result. Since in this simple example I’m just running the commands straight back into the same database, I might have expected a “Constraint already exists”, or “Object name exists” style of error, but this is different. This error is telling that the statement is invalid – which obviously should not be the case if it came straight out of DBMS_METADATA. But a simple amendment to my anonymous block will reveal the answer. I will output a line of dashes each time I cycle through the cursor loop



SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4             dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line('========================================');
 10            dbms_output.put_line(i.ddl);
 11            --execute immediate i.ddl;
 12    end loop;
 13  end;
 14  /
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

And therein lies the issue. I got two ALTER commands back on screen, but in reality they both came back from a single row fetched from the cursor. The ALTER commands were separated by a carriage return, but if I try to execute that single row, then the statement is invalid because it is an attempt to run two ALTER commands in a single statement. The output looks like two rows from the cursor but it was not.

That is a problem if I want to store that DDL in a table or a file, because I can’t use it as it currently stands, and I don’t want to have to write some scripts to parse that DDL to add semi-colons or split it into multiple commands, because one of the motivations for DBMS_METADATA in the first place was to avoid all that irritation.

There is an easy fix to this. Rather than getting the dependent DDL for a table, we can get the “direct” DDL for the constraints themselves.  In this way, you’ll get a row from the cursor for each constraint, and hence one DDL statement per constraint as well.



SQL> set serverout on
SQL> begin
  2    for i in (
  3         select t.table_name,
  4                dbms_metadata.get_ddl('REF_CONSTRAINT', constraint_name) ddl
  5         from user_constraints t
  6         where table_name = 'TAB3'
  7         and constraint_type = 'R'
  8      )
  9    loop
 10            dbms_output.put_line('========================================');
 11            dbms_output.put_line(i.ddl);
 12            --execute immediate i.ddl;
 13    end loop;
 14  end;
 15  /
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

The death of UTL_FILE

In a previous post I covered a technique to improve the performance of UTL_FILE, but concluded the post with a teaser: “you probably don’t need to use UTL_FILE ever again”.

image

Time for me to back that statement up with some concrete evidence.

UTL_FILE can read and write files. This blog post will cover the writing functionality of UTL_FILE and why I think you probably don’t need UTL_FILE for this. I’ll come back to UTL_FILE to read files in a future post.

There are two possibilities when it comes to writing a file:

  • The file is being requested by a client program and the results should be stored on a client machine. In that case, you do not use UTL_FILE anyway – you use the tools available on the client, for example, SQL Developer, SQLcl, SQL*Plus or a myriad of other tools, OR
  • the file is to be written by the database server to a location on the database server itself, or to a location that is accessible to the database server. This is where UTL_FILE has been used, the main motivations being that it can be part of existing database-resident PL/SQL code, and does not require clients to have direct SQL or OS level access to the database server.

So here is my assertion – most of the time, UTL_FILE is used to write out a file because we can’t make use an easier tool like those mentioned above directly on the database server.  After all, who wants to write code like:


SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /

PL/SQL procedure successfully completed.

when most of the time if I was doing this in a tool it would be simply:


SQL> set markup csv on
SQL> spool c:\tmp\demo.dat
SQL> select * from t;
SQL> spool off

But in recent versions of the database, you can do exactly this! The database scheduler has been enhanced to be able to to run SQL*Plus style scripts directly out of the database without needing to give OS access to database server, or the SQL*Plus executable. In the previous post I unloaded 40million rows to a file in CSV format and the responsibility for formatting the data into CSV format fell to me – I had to do all of the “heavy lifting”. Using the scheduler and the SQL_SCRIPT job type, it is as simple as writing a SQL*Plus script, and submitting it as a job.


SQL> declare
  2    l_script   VARCHAR2(32767) :=
  3  'conn /@db
  4  set markup csv on
  5  set arraysize 500
  6  set pages 0
  7  set lines 200
  8  set trimspool on
  9  spool c:\tmp\demo_sched.dat
 10  select * from t;
 11  spool off';
 12  begin
 13    dbms_scheduler.create_job(
 14      job_name        => 'UNLOAD_DATA',
 15      job_type        => 'SQL_SCRIPT',
 16      job_action      => l_script,
 17      credential_name => 'MY_ACCOUNT',
 18      enabled         => true
 19    );
 20  end;
 21  /

PL/SQL procedure successfully completed.

You will want to ensure that you have some controls over the usage of this feature, and what credentials the scripts will run under.  Also in my example, I’ve got a connection wallet setup so that I do not have to code any passwords into the connection string for my scheduler job. But suddenly it has become easy to get access to the scripting tools we are used to on our own client machines, and utilize them on the database server.

“Slow down there cowboy…” I hear you exclaim. “…That might be fine for simple SELECT * scripts, but my UTL_FILE procedures have a lot of complex logic to construct the file data”.

Even if you are using UTL_FILE because you are performing some complicated algorithms to generate the data that will be written to file, you can still utilize this scheduler facility.  After your complicated logic is completed, ultimately you typically will have a line of data you need to write to a file.  And if you have a line of data, then it is trivial to port that procedure to become a pipelined function.  And once you have a pipelined function, then we have a simple query mechanism that can be used to spool the output.  For example, if my original procedure is:


SQL> create or replace
  2  procedure my_procedure is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... writing each line to file with UTL_FILE
 12    ...
 13    end loop;
 14  end;
 15  /

then we change none of the logic – we only need replace all of that messy UTL_FILE code with a simple pipe command to allow querying that function as if it was source of rows, and then spool it to a file in the usual way.


SQL> create or replace
  2  function my_function return sys.odcivarchar2list pipelined is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... pipe row ( the_line );
 12    ...
 13    end loop;
 14    return;
 15  end;
 16  /

SQL> spool c:\tmp\demo.dat
SQL> select * from my_function();
SQL> spool off

So next time you’re cranking out some cumbersome UTL_FILE code to write a file, take a moment to see if the scheduler can look after some of the work for you. I’ll talk about (not) using UTL_FILE to read files in a future post.

When WHEN went faster

Yeah…try saying that blog post title 10 times in a row as fast as you can Smile

But since we’re talking about doing things fast, this is just a quick post about a conversation I had a twitter yesterday about the WHEN clause in a trigger.

 

image

That is an easy benchmark to whip up – I just need a couple of tables, each with a simple a trigger differing only by their usage of the WHEN clause.  Here is my setup:


SQL> create table t1 ( x int );

Table created.

SQL> create table t2 ( x int );

Table created.

SQL> create or replace
  2  trigger TRG1
  3  before insert on t1
  4  for each row
  5  begin
  6    if sys_context('USERENV','SID') = 0 then
  7       :new.x := 0;
  8    end if;
  9  end;
 10  /

Trigger created.

SQL> create or replace
  2  trigger TRG2
  3  before insert on t2
  4  for each row
  5  when ( sys_context('USERENV','SID') = 0 )
  6  begin
  7     :new.x := 0;
  8  end;
  9  /

Trigger created.

I’m using an always false condition so the content of the trigger will never be fired (except for the WHEN clause and the IF-test).  Now I’m ready to slam a bunch of inserts into each table and measure the performance.


SQL> set timing on
SQL> insert into t1
  2  select 1 from dual
  3  connect by level <= 1000000;

1000000 rows created.

Elapsed: 00:00:02.52

SQL> insert into t2
  2  select 1 from dual
  3  connect by level <= 1000000;

1000000 rows created.

Elapsed: 00:00:00.41

There is an immediate urge here to go screaming from the roof tops that WHEN is the all conquering hero of this benchmark. And yes, it is indeed faster but make sure you keep the results in perspective.  We gained back about 2 seconds for 1 million inserts. For me, once I’m inserting millions of rows into a table I’m starting to think not about trigger performance but whether this is the kind of table I want to be having triggers on at all.