Take care with regular expressions

In an Office Hours session a couple of months back, I covered an important change that comes to regular expressions once you upgrade to 12c Release 2. You can see the video covering the issue here:

but for the TL;DR brigade reading this post: Regular expressions are not deterministic when you take NLS settings into account and thus cannot be used in constraints and/or function-based indexes.

This is just a post to quickly revisit the topic for anyone thinking of upgrading from an earlier release to 12c Release 2. An AskTOM question came in asking what would happen to such constraints during the upgrade process.

The sad answer is … nothing. For example, if you successfully (and by strict definition, incorrectly) created a constraint with a regular expression in 11g, then after the upgrade, that constraint will still be present in your new 12c Release 2 system. It will continue to work as it did in 11g, and even if you disable/enable it, or put it through a validate command to exercise the data, it will work as it did before.

To be honest, I do not like this, because it can become what I call a “sleeper” problem. If, 6 months after you upgrade, you happen to drop and recreate that constraint you’ll be most distressed to find that it cannot be added, and you will have most probably long forgotten that it was caused by an event that occurred 6 months ago, namely the upgrade. And perhaps worse, you now have an index or constraint that could potentially be corrupted by innocent tinkering with session NLS settings.

So before you upgrade, definitely take a cursory glance through your constraint definitions and take remedial action if needed.

Gooey GUIDs

Do a quick Google search and you’ll find plenty of blog posts about why GUIDs are superior to integers for a unique identifier, and of course, an equal number of posts about why integers are superior to GUIDs. In the Oracle world, most people have been using sequence numbers since they were pretty much the only option available to us in earlier versions. But developers coming from other platforms often prefer GUIDs simply due to their familiarity with them.

I’m pretty much ambivalent when it comes to which one to use.  In fact, a good example is the AskTOM database -which had exclusively sequence-based primary keys on inception, but as the database has evolved and developers of different backgrounds and ages have worked on it, there is now of mix of strategies. The older tables have sequence based primary keys, and many of the newer tables have GUIDs as primary keys. Don’t get me wrong – I’m not advocating that you should have a mix – for a given database schema I’d recommend picking one regime and sticking with it. But my point is, that even with the mix of approaches in the AskTOM schema, I’ve never encountered any problems or issues with either.

However, there is one use case where I would strongly recommend using some caution on the use of GUIDs, and that is in the arena of systems that load data in bulk (eg data warehouses).

GUIDs are not cheap to generate. A quick look at the history on the structure and generation of unique IDs at https://en.wikipedia.org/wiki/Universally_unique_identifier all come down to a common component – the need to generate a good random number, and “good” can be a struggle for computers because you need algorithms that ensure sufficient randomness and distribution of the generated numbers.  That takes CPU cycles and whilst that is something that you will never notice when using a GUID for the 100 customers in your customer table, it definitely will be noticeable if you are going to attach a GUID to every one of your 10 million daily sales transactions, or telephone records, or similar.

Here’s a simple example where I’ll load 100 million rows into a table.  First I’ll try the conventional (when it comes to Oracle) approach of using a sequence number to uniquely identify each row.



SQL> create sequence seq cache 50000;

Sequence created.

SQL>
SQL> create table t1
  2  ( pk int,
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.


SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t1
  2  select seq.nextval, int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:03:31.42
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL>

3 minutes 30 seconds for 100 million rows is pretty good performance for a laptop, although obviously the table structure here is very simple.

And now I’ll repeat the exercise with the same table structure, but using a raw column to hold the output of a call to SYS_GUID().



SQL>
SQL>
SQL> create table t2
  2  ( pk raw(20),
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.

SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t2
  2  select sys_guid(), int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:30:56.78
SQL> commit;

Commit complete.

Elapsed: 00:00:00.03

That’s right – we’ve blown out to 30 minutes. As you can see, there can be a large cost when the row volumes (and hence number of calls to generate a GUID) get large. We can even take the INSERT out of the equation here, and simply do a raw stress test to see how many GUIDs we can call from the SQL engine using the following test harness.



SQL> create table t ( sz int, dur interval day to second );

Table created.

SQL>
SQL>
SQL> declare
  2    ts_start timestamp;
  3    ts_end   timestamp;
  4    iter int;
  5    dummy raw(32);
  6  begin
  7   for i in 1 .. 8 loop
  8    iter := power(10,i);
  9
 10    ts_start := systimestamp;
 11    if iter <= 10000 then
 12       select max(x) into dummy from
 13       (
 14       select sys_guid() x from
 15       ( select 1 from dual connect by level <= iter )
 16       );
 17    else
 18       select max(x) into dummy from
 19       (
 20       select sys_guid() x from
 21       ( select 1 from dual connect by level <= iter/10000 ),
 22       ( select 1 from dual connect by level <= 10000 )
 23       );
 24    end if;
 25
 26    ts_end := systimestamp;
 27    insert into t values (iter, ts_end - ts_start );
 28    commit;
 29
 30   end loop;
 31  end;
 32  /

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t order by 1;

        SZ DUR
---------- ----------------------------------------------------------
        10 +00 00:00:00.000000
       100 +00 00:00:00.000000
      1000 +00 00:00:00.015000
     10000 +00 00:00:00.172000
    100000 +00 00:00:01.607000
   1000000 +00 00:00:16.083000
  10000000 +00 00:02:49.713000
 100000000 +00 00:26:46.570000
 
 

I’m not trying to scare you off GUIDs – but like any functionality or feature you’re using to build applications, make sure you test it for business requirements you need to satisfy and make an informed decision on how best to use (or not use) it.

Let’s talk about Techiquette

I’m typing this at about 38,000 feet in the air. I’m travelling home from the Oracle Latin America tour – it was a great event, but I’ll save the details for another blog post. In reality, I should be sleeping, but as luck would have it one of the lights on our plane is malfunctioning. So whilst the rest of the plane is sleeping happily in restful darkness, my row and the rows around it are bathed in blazing artificial light Sad smile

image

Ah…the joys of travel. Anyway, I digress.

Before I boarded the flight, I did what most people do in the airport. I signed my life away to a deluge of advertising emails as one does in order to get 30 precious minutes of free wi-fi. I checked some emails, and had a glance through twitter when I came upon this one.

image

As you might expect, there’s plenty of social media activity about this tweet, much of it about the contradiction between the language and what you might expect to come out of the Office of the President of the United States. I’m not going to enter into that debate, or get into political discussion. Let’s face it – Donald Trump strikes me as more a Sybase man (Hey don’t flame me Sybase readers…these are just jokes ok?)

But the tweet made me think about the kind of language we sometimes see on AskTOM and on technical forums in general.

So let’s talk about etiquette on technical forums, or “Techiquette” as I’ve coined it.

We’re actually pretty lucky in technical circles with our discussion forums because most of the conversation resolves around topics that can be backed up with cold hard facts. In those instances, even robust discussion stays civil because it’s pretty hard to disagree with solid evidence when its presented.  A hypothetical example might be something like:

Forum Participant #1: “I need to have a table with 500 columns”
Forum Participant #2: “You cannot – Oracle is limited to 255 columns”
Forum Participant #3: “I disagree – here is an example DDL demonstrating 800 columns”

Participant #2 can’t really dispute the response, because the example is right there in black and white.

But here is where I think conversations can go off the rails and slide downhill into a abuse and insult. It is when the content is both technical and emotional, for example:

Forum Participant #1: “I love using RMAN – it made our backup processes much easier”
Forum Participant #2: “RMAN sucks – it is too complex, anyone using it obviously has no clue about anything”
Forum Participant #1: “Dumb ass”

and away we go…The conversation goes from friendly to flame war.

The problem is – I don’t want my forums to be devoid of emotion, even when the topic is technical. Ever been to a presentation where the presenter does not show any joy or enthusiasm for the material? It’s like spending 45 minutes having root canal work on your teeth. We want people to be passionate about the things that interest them.

So how do we avoid forums descending into abuse whilst still encouraging emotion, and hence potentially emotionally charged content?

Here is my ground rules:

  • Expressing an affirmative or agreeing position? Positive emotions 100% allowed and encouraged.
  • Expressing a negative or disagreeing position? Express it as dispassionately as possible. Stick solely to the facts.

Simple mathematics tells us that if we can stick to the above, the average “positivity index” in a tech conversation will always be greater than or equal to zero.

I’m not claiming to be an angel here – I don’t always manage to follow my own guidelines as well as I should. But I’m always striving to improve, and hopefully if we can all follow them to the best of our ability, then in the main, we’ll have much more civil and interesting community conversations as a result.

Building community via the speaker community

I recently participated in the Oracle Developer Community Yatra tour throughout India. It is a hectic event with 7 cities covered in a mere 9 days, so you can imagine how frantic the pace was. A typical day would be:

  • 7am – breakfast
  • 8am – check out of hotel and leave for the venue
  • 9am – speak all day, host Q&A
  • 6pm – leave straight from venue to the airport
  • 8pm – dinner at airport, and fly to next city
  • 1am – get to next hotel, grab some sleep before doing it all again in 6 hours time

Yet as a speaker in this event, it never felt that the logistics of the event were out of control.  This is mainly due to the incredible work of the people in the AIOUG (All India Oracle User Group) coordinated by Sai Penumuru. The smoothness of the organization prompted me to blog about how user groups could follow the AIOUG lead in terms of running successful events.

Every time I do an event in India, not only do I come away with a stronger network and connection with the attendees, I also gain new and strong friendships with the speakers and this is in no small part due to organizational skills of Sai and the volunteers.

So from that reference point – namely, the smart way in which Sai and the user group foster a great feeling of community amongst the speakers, I thought I’d share the mechanisms via which user groups can organize events that make speakers feel welcome and keen to return.

Common accommodation

When I travel to India, Sai provides a recommended hotel for all speakers. Obviously, no speaker is compelled to stay there, but we all know that the recommendation Sai makes has taken in account:

  • facilities the hotel provides,
  • hotel price to find a balance for speakers either having company funding or funding themselves,
  • transport time to/from the conference venue.

So most of us will always use his recommendation, and it makes the decision making process easy.  But most of all, it is a catalyst for speakers to meet in a casual environment outside the conference hours, and build relationships.

Common transport

The AIOUG organizes transport to and from the venue, and from the airport as well. And for when this is not possible, they will provide a recommended transport means (such as Uber etc) so that speakers never have the risk of picking an unreliable or unsafe transport option. For multi-city events such as Yatra, the AIOUG also recommends common itinerary for flights, once again making the entire planning process much easier for speakers.

A communication mechanism

Before a conference, AIOUG sets up a Whatsapp group containing all of the speakers, and the key representatives from the user group. In this way, all communication is sent in a consistent fashion.  We don’t have to jump between email, twitter, etc to see if any correspondence has been missed. And this group also helps build the relationship between speakers and user group. Humourous stories and pictures can be shared, but most importantly, if there is an issue or problem – everyone is aware of it immediately, making resolution is fast and effective.

And perhaps most importantly, it helps accentuate the human element – messages are sent as people land or takeoff, and when people arrive at the hotel.  Organizers regularly send messages making sure no-one is having any difficulties.  All these things make the speakers feel more comfortable before, during and after the event.

A typical day

Perhaps the most valuable piece of information that is conveyed by the user group is ‘local knowledge’. For example, each evening a whatsapp message would be sent out detailing

  • hotel pickup time,
  • expected transit time,
  • who to contact/what to do when arriving at venue,
  • what identification requirements may be needed on site
    etc

So even though it may be a first visit to a city for the speakers, there is a degree of familiarity with what is expected to happen, and hence knowledge of whether things are departing from the norm.

Onsite assistance

The most stressful 5 minutes for any speaker is the time they are setting up for their talk. Making sure projectors work, internet connectivity, schedule changes etc – are all things that can sabotage a good talk for a speaker. The AIOUG always has someone visit the room in that critical 5 minutes, so a speaker does not have to go hunting for technical assistance.

 

In summary, as you can see, none of these things are particular arduous to do, but the benefits are huge for speakers.  We feel comfortable and welcome, which means a much better chance of a successful talks, and makes us much more likely to want to return.

So if you are reading this, and are part of a user group committee, please consider some of the tips above for your local events. If your speakers have a good experience, they’ll be much more keen to offer continued support for your events.

Humble pie made with NULL strings

I was helping out a client a while back with an issue where a panicked email came into my inbox along the lines of “SELECT IS BROKEN IN ORACLE!!!”, which seemed perhaps a little extreme Smile. So I pursued it further asking for some concrete details, and I must concede it had me a little bamboozled for a while. I’ve simplified the example to keep it easy to digest, but the premise is the same.

My colleague had a table with a couple of VARCHAR2 columns:

image

and the data inside the two columns for the single row was identical:

image

The panicky email was sent because a simple query to check that the two column were identical was not working as expected:

image

At this point, I was pretty relaxed about the situation because this is a “problem” I have seen many many times over the years, especially as people come to Oracle from other database management systems. So I responded to the email:

“Yeah…I know what the issue is. You have trailing spaces in one of the columns. Try RTRIM”

and filed the original email in that special folder where emails never resurface Smile

image

But as quickly as I had hit “Send”, a reply came back with “That doesn’t work either”. Working on the assumption that my email had not been explicit enough (and perhaps I was being a little smug about it) I logged in to the system to run a query on the same table, and lo and behold – my smugness was wiped off my face Smile

image

So then I tried a variation on a theme and concatenated a known value to the end of each column to see if I was missing anything:

image

So everything looks OK but obviously something is awry here. When you have doubts on the data, a good option is to use the DUMP function to see exactly what bytes are stored:

image

and the problem is revealed.  The second column in the table had a trailing ASCII code zero after the word SUCCESS. This can be a nasty (and common) issue when developers are using C or similar languages to store data in an Oracle database. In many languages, ASCII zero is used to terminate an arbitrary length string, and hence it can easily end up being incorrectly stored along with the string bytes in a VARCHAR2 column especially if you are not using the delivered string handling functions in the pre-compiler layer (such as Pro*C). This can also be due to a confusion between the concept of NULL (two L’s) in the database, and the concept of the NUL (one L) character in a programming language.

Compounding the issue is that a casual glance at the data (as per the above screen shots from SQL Developer) typically suggest all is well – the NUL character is not observed. Things can get even worse – I’ve seen some GUI database tools interpret the NUL character as a true NULL and hence when a column contains only a single NUL character, those tools will report the column as being empty (ie NULL), which just makes for even more confusion. Thankfully SQL Developer does not do that, and the null indicator column can be used to see the difference between the two:

image

So I sent a terse email back to the developer reminding them about NUL versus NULL and that if they are using the pre-compiler correctly, then everything will take care of itself. But I must admit, I had to chuckle when they replied with a link to an AskTOM question I answered recently describing the following:

We made the same mistake ourselves! Albeit in a very specific circumstance. Look what happens when you force a STOP command onto a running scheduler job:



SQL> create table t ( x timestamp, y int );

Table created.

SQL> create or replace
  2  procedure myproc is
  3  begin
  4    for i in 1 .. 20
  5    loop
  6      insert into t values (systimestamp,i);commit;
  7      dbms_lock.sleep(2);
  8    end loop;
  9  end;
 10  /

Procedure created.

SQL>
SQL> begin
  2    dbms_scheduler.create_job (
  3      job_name        => 'myjob',
  4      job_type        => 'plsql_block',
  5      job_action      => 'begin myproc; end;',
  6      start_date      => systimestamp,
  7      repeat_interval => 'freq=minutely; bysecond=0;',
  8      enabled         => true);
  9  end;
 10  /

PL/SQL procedure successfully completed.
--
-- waited for 1 successful execution, and then stop the second one in flight
--
SQL> exec dbms_scheduler.stop_job('MYJOB',force=>true)

PL/SQL procedure successfully completed.

SQL> select status, session_id, dump(session_id)
  2  from USER_SCHEDULER_JOB_RUN_DETAILS;

STATUS
------------------------------
SESSION_ID
-------------------------------------------------------------
DUMP(SESSION_ID)
-------------------------------------------------------------
SUCCEEDED
984,53037
Typ=1 Len=9: 57,56,52,44,53,51,48,51,55

STOPPED
983,28542
Typ=1 Len=10: 57,56,51,44,50,56,53,52,50,0  <<=== whoops!

I’ve logged this as a bug but in the interim, if you need the SESSION_ID from the scheduler views, you might want to add a: RTRIM(SESSION_ID, chr(0)) around the query.

Hoist by my own NULL petard Smile

Hyper-partitioned index avoidance thingamajig

As you can tell, I have no idea on a name for what I am about to describe. So let me start from the beginning, and set the scene for an idea I have to utilize a cool new 18c feature.

Often in a transactional-style system the busiest table (let us call it SALES for the sake of this discussion) is also

  • the biggest table, after all, it has all of our sales in it,
  • the most demanded for table, in that, almost every query in our application wants to access it in some way shape or form.

This is in effect the database version of the Pareto Principle. Everyone wants a slice of that SALES “pie”, and the piece of that pie that is in most demand is typically the most recent data. Your application may have pages that will be showing:

  • sales in the past hour,
  • sales in the past day,
  • products sold in the past hour,
  • largest purchase amount for sales in the past few hours,
  • suspicious activity for today,
  • A customer will want their sales for the last few days

The list goes on and on, the common theme being that the data being requested is bound by a range of time in the recent past. Our SALES table already will have a primary key, probably being some sort of unique transaction ID for each purchase, but to the satisfy the style of requests in the list above, we will probably need:

  • an index on the time(stamp) of the sale,
  • an index on the product ID that was sold,
  • an index on the customer ID who made the purchase,
  • potentially even an index on the sale amount

You can see the troubled waters into which we are sailing here. It is our biggest and busiest table, and here we are, adding index after index after index to improve query performance, whilst at the same time:

  1. adding overhead/contention to DML operations on the SALES table,
  2. increasing the size of the database,
  3. increasing the duration of the backups,
  4. increasing time for maintenance and copies to Development and Test

None of this is looking great but we might be thinking “What choice do we have?”

Here is perhaps an alternate strategy: Let’s not create any indexes.  The challenges (1) through (4) above evaporate to non-existence. But of course, now we have a remaining challenge in getting those queries to run efficiently.

I’ll tackle that in a different way – I will partition the SALES table very “aggressively”, hence my blog post title “hyper-partitioning”. Considering the typical query requirements I listed above, I will partition my SALES down to as small as a partition for every hour.


SQL> create table sales
  2    ( ts   timestamp,
  3      id   int,
  4      amt  number,
  5      product int,
  6      customer int,
  7      item_cnt int,
  8      terminal int,
  9      operator int,
 10      credit_card int,
         ...
         ...
 17    )
 18  partition by range ( ts )
 19  interval ( numtodsinterval(1,'HOUR') )
 20  (
 21    partition p1 values less than ( timestamp '2018-07-01 00:00:00' )
 22  );

Table created.

SQL>
SQL> insert /*+ APPEND */ into sales (ts,id,amt,product,customer)
  2  select date '2018-07-01' + rownum / 240, rownum, dbms_random.value(1,100),
  3        dbms_random.value(1,100),dbms_random.value(1,100)
  4  from dual
  5  connect by level 
SQL> set serverout on
SQL> declare
  2    h varchar2(1000);
  3  begin
  4  for i in (
  5    select partition_name, high_value
  6    from user_tab_partitions
  7    where table_name = 'SALES'
  8    and   interval = 'YES'
  9    and   partition_name like 'SYS_P%'
 10    order by partition_position
 11  ) loop
 12    h := i.high_value;
 13    execute immediate 'select to_char('||h||'-0.01,''yyyymmdd_hh24'') from dual' into h;
 14    execute immediate 'alter table sales rename partition '||i.partition_name||' to p'||h;
 15  end loop;
 16  end;
 17  /

PL/SQL procedure successfully completed.

SQL>
SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ ------------------------------------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701_00                   TIMESTAMP' 2018-07-01 01:00:00'
P20180701_01                   TIMESTAMP' 2018-07-01 02:00:00'
P20180701_02                   TIMESTAMP' 2018-07-01 03:00:00'
P20180701_03                   TIMESTAMP' 2018-07-01 04:00:00'
P20180701_04                   TIMESTAMP' 2018-07-01 05:00:00'
P20180701_05                   TIMESTAMP' 2018-07-01 06:00:00'
P20180701_06                   TIMESTAMP' 2018-07-01 07:00:00'
P20180701_07                   TIMESTAMP' 2018-07-01 08:00:00'
P20180701_08                   TIMESTAMP' 2018-07-01 09:00:00'
P20180701_09                   TIMESTAMP' 2018-07-01 10:00:00'
P20180701_10                   TIMESTAMP' 2018-07-01 11:00:00'
P20180701_11                   TIMESTAMP' 2018-07-01 12:00:00'
P20180701_12                   TIMESTAMP' 2018-07-01 13:00:00'
P20180701_13                   TIMESTAMP' 2018-07-01 14:00:00'
P20180701_14                   TIMESTAMP' 2018-07-01 15:00:00'
P20180701_15                   TIMESTAMP' 2018-07-01 16:00:00'
P20180701_16                   TIMESTAMP' 2018-07-01 17:00:00'
P20180701_17                   TIMESTAMP' 2018-07-01 18:00:00'
P20180701_18                   TIMESTAMP' 2018-07-01 19:00:00'
P20180701_19                   TIMESTAMP' 2018-07-01 20:00:00'
P20180701_20                   TIMESTAMP' 2018-07-01 21:00:00'
P20180701_21                   TIMESTAMP' 2018-07-01 22:00:00'
P20180701_22                   TIMESTAMP' 2018-07-01 23:00:00'
P20180701_23                   TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

41 rows selected.

I’ve run a small anonymous block to rename the (system-named) interval partitions into some sensible names to reflect the date range the partition covers. Let’s now look at the typical queries we will now be performing on the SALES table:


SQL> set autotrace traceonly explain
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     9 |   387 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and product = 12;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PRODUCT"=12 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and customer = 25;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("CUSTOMER"=25 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select max(amt) from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    33 |     7   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    33 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
|*  3 |    TABLE ACCESS FULL      | SALES |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> set autotrace off

All of them scan a tiny portion of the data, namely, just the hours of sales data relevant to the query, and the query response times will be relatively consistent for all cases no matter which customer, product or other predicate will be passed because the data to be scanned is a fixed number of hours.

But there’s a problem here. If I am partitioning to the hour, or even to the minute…then it won’t be long before I have a lot of partitions. In the latter case (minutes) I will be up to over 500,000 partitions in the first year of SALES alone! That is a lot of database metadata to store. There is the partitions themselves, plus optimizer statistics on them, plus historical optimizer statistics, plus potentially histograms on every column.  A lot of optimizer data might lead to expensive parse times because there is just so much information to wade through when optimizing queries.

But we only need the extreme granularity of partitions for the SALES table for today. Once today ticks over and becomes “yesterday”, then we might only need a partition for the entire day.  And once “yesterday” ticks over to “last week”, then maybe only weekly partitions are needed and so forth.

One of the cool things in 18c is ability to do this style of maintenance with negligible disruption to service. Because I have named my partitions in a logical fashion, here is a simple routine to merge “yesterdays” hourly partitions into a single one for the day.


SQL> set serverout on
SQL> declare
  2    d date := date '2018-07-01';
  3    ddl varchar2(4000);
  4  begin
  5    select listagg(partition_name||chr(10),',') within group ( order by partition_position )
  6    into   ddl
  7    from   user_tab_partitions
  8    where  table_name = 'SALES'
  9    and    partition_name like 'P'||to_char(d,'yyyymmdd')||'%';
 10
 11    ddl := 'alter table sales merge partitions '||ddl||' into partition p'||to_char(d,'yyyymmdd')||' online';
 12
 13    dbms_output.put_line(ddl);
 14    execute immediate ddl;
 15  end;
 16  /
alter table sales merge partitions
 P20180701_00
,P20180701_01
,P20180701_02
,P20180701_03
,P20180701_04
,P20180701_05
,P20180701_06
,P20180701_07
,P20180701_08
,P20180701_09
,P20180701_10
,P20180701_11
,P20180701_12
,P20180701_13
,P20180701_14
,P20180701_15
,P20180701_16
,P20180701_17
,P20180701_18
,P20180701_19
,P20180701_20
,P20180701_21
,P20180701_22
,P20180701_23
 into partition p20180701 online

PL/SQL procedure successfully completed.

Now I have a single partition for yesterday’s data, and hourly partitions for today’s data.


SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ -------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701                      TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

18 rows selected.

I stress – this is not my recommendation to race out and partition every transactional table you have, and drop all of the indexes Smile. But it with so many online partitioning operations in 18c, it raises some exciting new opportunities there were not available in previous releases. So start thinking about how you can exploit this to get advantages with the partitioning option that might sit “outside the box” of the standard usage.

18.3 As easy as 1…2…3

Well, finally it’s here! 18c for on-premise installation so the world can all get stuck into the cool new features of the latest release on their own laptops Smile  At least that is what I’ll be doing!

Naturally as soon as I heard the news, I downloaded the software and got ready to set aside the day for installation and creation of an 18c database. But I didn’t need that long – I didn’t need that long at all. Just a few clicks and a few commands and there it was – my 18c database up and running.

Check out how easy it is with my three videos.

Software Installation

Listener Creation

Database Creation

It really is as easy as 1…2…3

Enjoy 18c !