Connor and Chris at OpenWorld

Chris and I will be at OpenWorld next month, so our session details are below, but you can also click on the links below to add entries to your calendar to make sure you don’t miss us. Don’t forget to use the official Schedule Builder to make sure you have booked your spot at our sessions!

If you can’t get to our sessions, you might catch us wandering the halls or at The Hub. Feel free to come and say Hello and talk tech!

See you at OpenWorld!

Addenda: See the slides from my talks here

Calendar ICS files

Tune ANY SQL in 20 minutes (.ics)

Fast Lane to Database Success (.ics)

Developers – Don’t Be The Person That Discombobulates Your Database (.ics)

18 Things Developers Will Love About Database 18c (.ics)

Why Isn’t My Query Using An Index (.ics)

 

Session Details

image

 

image

image

 

image

The full stack developer….is BACK for 2018!

Last year, I flexed my technology muscles Smile by building on the fine ground work of Lucas Jellema in using some Node, some REST, and some JSON to extract the full Oracle Openworld speaker catalogue, and then added some JSON parsing in the database, some SQL and slapped a nice helping on Application Express on top of it all, to end up with an Application Express rendition of the Oracle Openworld speaker catalogue.

So it is that time of year again, and whilst some of the bits and pieces have changed, I’ve managed to stumble my way through all of the difference pieces once again, and made some enhancements along the way to give you the new and improved 2018 version!

It’s waiting there on apex.oracle.com for you to explore.

https://tinyurl.com/oow18catalog

Enjoy!

Modifying tables without losing materialized views

Whilst the ever increasing speed of storage and servers, and features likes In-memory are decreasing the need for large numbers of materialized views in databases, there are still plenty of use cases where they can be a useful performance or integrity tool.

But what if that materialized view takes minutes or hours to build? Then even if it is fast refresh enabled, than fast refresh is no of use if we have to rebuild the entire materialized view from scratch due to an structural change in the underlying table. 

For example, let’s say I have a table and a simple fast refresh materialized view on that table as below:


SQL> create table t(x number(8) primary key);

Table created.

SQL> insert into t values(55);

1 row created.

SQL> create materialized view log on t with primary key, rowid;

Materialized view log created.

SQL> create materialized view mv
  2  build immediate
  3  refresh fast on demand as
  4  select * from t;

Materialized view created.

SQL>
SQL> select * from t;

         X
----------
        55

1 row selected.

SQL> select * from mv;

         X
----------
        55

1 row selected.

SQL> insert into t values (10);

1 row created.

SQL> insert into t values (20);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t;

         X
----------
        10
        20
        55

3 rows selected.

SQL> select * from mv;

         X
----------
        55

1 row selected.

SQL>
SQL> exec dbms_mview.refresh('mv');

PL/SQL procedure successfully completed.

SQL> select * from mv;

         X
----------
        10
        20
        55

3 rows selected.

For the sake of this discussion, let’s assume the build of materialized takes hours. Naturally we want to avoid having to a rebuild (or do a complete refresh) of that materialized view. But then…the inevitable happens. We need to change the table T. In this case, the values for column X now exceed the limits of the definition.


SQL> insert into t values (123123123);
insert into t values (123123123)
                      *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column

Obviously, this is a trivial fix for the table. We simply alter the column to make it larger (which does not require any downtime or reorganization of the data).


SQL> alter table t modify x number(10);

But all is not well…Having a materialized view means that either the materialized view log, or the materialized view itself may have replicas of that column, so they remain “broken”


SQL> insert into t values (123123123);
insert into t values (123123123)
            *
ERROR at line 1:
ORA-12096: error in materialized view log on "MCDONAC"."T"
ORA-01438: value larger than specified precision allowed for this column

Notice the subtle difference in the error message.  It is not an error on table T, but an error on the materialized view log. You might have the urge to simply jump and run the alter commands on the materialized view log and the materialized view. And you might even find that this approach works. But please note – this approach is not supported, and thus we can’t guarantee that it will (a) work, or (b) not create problems later down the track when you attempt to refresh the view or perform other operations.

But if direct alteration is not support, how do we solve the problem without having to rebuild the entire materialized view from scratch?

The solution here is the option to preserve the materialized view as a standalone table. We can drop the definition of the materialized view but hold on to the table that supports it. Now that it is a standalone table, we can alter the column definition so that it matches our source table T. Notice that before I drop the definition, I perform a final refresh to make sure the materialized is totally up to date – so there is a little bit of coordination required here to make sure that you do not lose any changes that occur to table T during the process.


SQL> exec dbms_mview.refresh('mv');

PL/SQL procedure successfully completed.

SQL> drop materialized view mv preserve table;

Materialized view dropped.

SQL> drop materialized view log on t ;

Materialized view log dropped.

SQL>
SQL> alter table t modify x number(10);

Table altered.

SQL> alter table mv modify x number(10);

Table altered.

We still don’t have our materialized view back though. But we can recreate it without needing a full build cycle, using the PREBUILT table clause.


SQL> create materialized view log on t with PRIMARY KEY, rowid;

Materialized view log created.

SQL>
SQL> create materialized view mv
  2  on prebuilt table
  3  refresh fast  on demand as
  4  select * from t;

Materialized view created.

And we are done! An easy and supported means of altering the materialized view structure without a full rebuild of the data.

Complex materialized views? Try a table first

Just a quick post today that arose from an AskTOM question a little while back. Over the years and across the evolution of various versions of the Oracle database, the amount of “power” you could pack into a CREATE TABLE statement as grown. For example, I can do a CREATE-TABLE-AS-SELECT, along with a partitioning clause, including an explicit CREATE INDEX command and add constraints all in one atomic operation.



SQL> create table t
  2    (id1,
  3     id2,
  4     owner,
  5     object_name,
  6     constraint t_pk primary key ( id1) using index
  7        ( create index t_pk on t ( id1, owner) )
  8    )
  9  partition by range ( id2 )
 10  (
 11    partition p1 values less than ( 100000 ),
 12    partition p2 values less than ( 200000 )
 13  )
 14  as
 15    select
 16      object_id id1,
 17      object_id id2,
 18      owner,
 19      object_name
 20  from dba_objects
 21  where object_id is not null;

Table created.

That’s pretty cool, but one of the most common times you will be writing DDL that includes a “select * from” as part of the DDL definition, is during the creation of materialized views, because the view text naturally will contain a select statement. Unfortunately, we are not quite as “generous” when it comes to accepting all of the various physical implementation options when it comes to DDL for materialized views. Trying to create a materialized view that matches our table T above is a struggle.


SQL> create materialized view t_mv
  2    (id1,
  3     id2,
  4     owner,
  5     object_name,
  6     constraint t_mv_pk primary key ( id1) using index
  7        ( create index t_mv_pk on t_mv ( id1, owner) )
  8    )
  9  partition by range ( id2 )
 10  (
 11    partition p1 values less than ( 100000 ),
 12    partition p2 values less than ( 200000 )
 13  )
 14  refresh complete on demand
 15  as
 16    select
 17      id1,
 18      id2,
 19      owner,
 20      object_name
 21  from t;
   constraint t_mv_pk primary key ( id1) using index
              *
ERROR at line 6:
ORA-00907: missing right parenthesis


SQL>
SQL> create materialized view t_mv
  2    (id1,
  3     id2,
  4     owner,
  5     object_name,
  6     constraint t_mv_pk primary key ( id1) using index
  7        ( create index t_mv_pk on t_mv ( id1, owner) )
  8    )
  9  refresh complete on demand
 10  partition by range ( id2 )
 11  (
 12    partition p1 values less than ( 100000 ),
 13    partition p2 values less than ( 200000 )
 14  )
 15  as
 16    select
 17      id1,
 18      id2,
 19      owner,
 20      object_name
 21  from t;
   constraint t_mv_pk primary key ( id1) using index
              *
ERROR at line 6:
ORA-00907: missing right parenthesis

When you encounter this limitation, don’t forget that one of the nice things you can do with materialized view creation is pre-create the underlying table with all of its flexibility:



SQL> create table t_mv
  2    (id1,
  3     id2,
  4     owner,
  5     object_name,
  6     constraint t_mv_pk primary key ( id1) using index
  7        ( create index t_mv_pk on t_mv ( id1, owner) )
  8    )
  9  partition by range ( id2 )
 10  (
 11    partition p1 values less than ( 100000 ),
 12    partition p2 values less than ( 200000 )
 13  )
 14  as
 15    select
 16      id1,
 17      id2,
 18      owner,
 19      object_name
 20  from t;

Table created.

And then once that it done, you can use the PREBUILT TABLE clause to create your materialized view which will now satisfy all of the underlying physical structure elements you desired.



SQL> create materialized view t_mv
  2  on prebuilt table
  3  refresh complete on demand
  4  as select
  5      id1,
  6      id2,
  7      owner,
  8      object_name
  9  from t;

Materialized view created.

SQL>
SQL>

18c database creation on Windows

Hopefully you’ve followed my very simple and easy guide to downloading the 18c database software for Windows. But of course, software on its own is not much use – we need a database! So let’s get cracking and create one. Using the Start menu like I’ve done below, or using the Windows panels, locate the Database Configuration assistant and start it.

image

 

After a few seconds the initial screen will ask what you want to do.  Choose “Create Database”.

image

 

If you like you could just go with “Typical Configuration” and you’ll be done in just a couple of clicks, but I always prefer to opt for the “Advanced Configuration” for two reasons. Firstly, even if you accept all of the defaults, it gives you a better idea of what options are going to be installed, where the files will be stored etc.  And secondly…well…we all like to think of ourselves as advanced don’t we Smile

image

 

For just research and exploration on your own Windows machine, you’ll probably want to opt for just a single instance database.  RAC takes a bit more setup and complexity. For a faster install, choose one of the options that has the datafiles included.  This way, the installer will just copy some existing files and seed them as your database, rather then building the entire instance from scratch.

image

 

Now choose a unique and memorable name for your database. I’ve reached into the depths of my imagination and come up with “db18” for my version 18 database. Go figure Smile. I’d also recommend you go with a container database configuration, because that is the strategic direction for Oracle going forward, so if you are going to have a database to skill up on, it makes sense for that database to be a container database.

image

 

To keep things simple, I’m just nominating 1 single location for all of my database files. It can be anywhere but a common convention is that wherever you house them, you’ll have a folder called “oradata” and then a folder for each database you create under that.

image

 

I’m skipping the fast recovery area and archiving at this stage. If I start to get serious with testing things like backup and recovery, then I would revisit this after database creation to enable at least archiving so that I can explore all the goodies that RMAN has to offer.

image

 

On a brand new installation, it is likely you will not have a pre-existing listener to choose from. (If you had a previous installation, or had run the Network Configuration Assistant already, then you would see a listener to use).

I’m creating one called LISTENER18. The default port is normally 1521, but I’ve opted for 1518 just to align it with the version I’m using.

image

 

I’m skipping Data Vault and Label Security, but hopefully you can now see why it’s cool to go with the “Advanced Configuration” – you get to see all the potential functionality areas of the database that you might want to explore.

image

 

Now you choose how much of your server/desktop/laptop you’re going to allow this database to grab. My machine has plenty of RAM, but it also has a stack of other database versions running on it to handle my AskTOM daily tasks. So I’ll keep this dude at around 6G.

image

 

Now we’ll flick across the other tabs on this screen to see if there anything of note. The default for processes seems to have gone up in this version (I think it used to be around 300 in 12c) but in any event, that’s more than enough for me on this machine.

image

 

I have simple rule for character sets – UTF all the way. Single byte charactersets are soooo last century. We live in a global village, so you should be able to handle characters from all over the world!

image

 

And for the last tab, I’m going to opt for the sample schemas, so that when my database is created I’m not just left with an empty database shell. I want some sample data there so I can jump straight in and start experimenting with this release.

image

 

If I’m a DBA, I might go for some EM management to see how that all works, but for now, I’m skipping that.

image

 

Being a sandbox for experimenting, I’m setting all the passwords to a common value. Naturally this will not what you’ll be doing for your production databases!

image

 

So now I’m ready to go. I always tick the “Generate Scripts” option because it lets me see what is actually occurring when the creation assistant is doing its job. Even so, the scripts are probably not what I would use to automate a database install, since the ‘dbca’ command has nice command line option nowadays, along with the ‘-silent’ option so you can create an entire database with just a single command.

image

 

I finally get a summary of what is about to occur, and we’re off! Database creation is go for launch!

image

 

The total time to create your database will depending on your hardware, in particular how fast your storage is. I’ve done a few creations now using both flash storage and conventional hard drives, and unsurprisingly the flash storage is faster. You’re probably looking at around 10 minutes to complete.

image

 

When your database creation is complete, you’ll get the standard summary screen and you are ready to go.

image

 

And here it is – the finished product! My 18c database on Windows is ready to go. By default, the underlying Windows service will have a Start status of “Automatic” which means your database will start every time Windows starts. If you are short on memory, or do not want the database started unless you explicitly want it to, you can set this to Manual via “services.msc”

image

 

You can watch the video version of this installation here

Enjoy your 18c Windows database !

18c Database installation on Windows

If you’re a Windows enterprise, or you want to run your 18c database on your Windows laptop/desktop for research and education, then there has been some good news this week.  The software is now available to you on the OTN network page.  Here’s a walk through of the software installation process

Head to the standard database downloads page

http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html

Accept the license agreement and choose the Windows version to download

image

Note – if you want to see all of the various Windows 18c components (grid, client, examples, etc), you can get that here

Once you have downloaded the software, note that when you unzip it, you are unzipping it directly into place, not into a staging area.  So unzip to a folder that you intend to be your ORACLE_HOME location.

image

Once the unzip has been completed, in the base directory where you unzipped the files, there will be a setup.exe file.  Double click on that to launch the installer

image

The familiar java based software installer will appear. For this blog post, I opted to solely go with software configuration – I’ll cover database creation in a separate post.

image

For my use, I’m just using a home laptop, so single instance for me. If you do want RAC, you’ll be needing to download more components anyway (eg Grid)

image

You’ll then get the standard pre-installation checks on your machine.  For reference, the machine I’m installing the software on – its a 32G RAM machine on Windows 10, and I did not get any warnings.

image

I nominated my existing ORACLE_BASE location as the target for this ORACLE_HOME as well

image

and went with the default option of using a virtual Windows account to own the software:

image

Note: The first time I did this install, I went with the Windows Built-In account because I had an old 11g database installation under the same ORACLE_BASE, and I thought at least it would be consistent.  Whilst the installation and subsequent use of 18c worked fine, and my 11g instance was fine, it totally hosed by 12c installation which was also under the same ORACLE_BASE.  The 12c installation had been done with the default virtual account, and it appears the subsequent addition of 18c using the built-in Windows SYSTEM account reset the permissions on the critial diagnostic directory paths.  From that point on, my 12c installation could not start and got “permission denied” errors when trying to access various destinations.

So my advice would be – adopt a consistent approach for any software under a common ORACLE_BASE.

Next you choose the edition you want to install.  Enterprise for me naturally Smile

image

And then you are ready to install. Just hit the Install button, sit back and relax.

image

You will see the familiar progress dialog box, and like all progress boxes from any vendor, the percentage complete will typically have no true bearing on how long the process will take Smile For the record, my installation took around 8-10mins at this stage, most of it configuring the central inventory.

image

All things going well, you finally get the confirmation screen, and voila! You’re 18c database software is installed!

image

Here’s an (accelerated) video showing the above steps as they were performed on my machine.

Enjoy 18c on Windows!

Take care with regular expressions

In an Office Hours session a couple of months back, I covered an important change that comes to regular expressions once you upgrade to 12c Release 2. You can see the video covering the issue here:

but for the TL;DR brigade reading this post: Regular expressions are not deterministic when you take NLS settings into account and thus cannot be used in constraints and/or function-based indexes.

This is just a post to quickly revisit the topic for anyone thinking of upgrading from an earlier release to 12c Release 2. An AskTOM question came in asking what would happen to such constraints during the upgrade process.

The sad answer is … nothing. For example, if you successfully (and by strict definition, incorrectly) created a constraint with a regular expression in 11g, then after the upgrade, that constraint will still be present in your new 12c Release 2 system. It will continue to work as it did in 11g, and even if you disable/enable it, or put it through a validate command to exercise the data, it will work as it did before.

To be honest, I do not like this, because it can become what I call a “sleeper” problem. If, 6 months after you upgrade, you happen to drop and recreate that constraint you’ll be most distressed to find that it cannot be added, and you will have most probably long forgotten that it was caused by an event that occurred 6 months ago, namely the upgrade. And perhaps worse, you now have an index or constraint that could potentially be corrupted by innocent tinkering with session NLS settings.

So before you upgrade, definitely take a cursory glance through your constraint definitions and take remedial action if needed.

Gooey GUIDs

Do a quick Google search and you’ll find plenty of blog posts about why GUIDs are superior to integers for a unique identifier, and of course, an equal number of posts about why integers are superior to GUIDs. In the Oracle world, most people have been using sequence numbers since they were pretty much the only option available to us in earlier versions. But developers coming from other platforms often prefer GUIDs simply due to their familiarity with them.

I’m pretty much ambivalent when it comes to which one to use.  In fact, a good example is the AskTOM database -which had exclusively sequence-based primary keys on inception, but as the database has evolved and developers of different backgrounds and ages have worked on it, there is now of mix of strategies. The older tables have sequence based primary keys, and many of the newer tables have GUIDs as primary keys. Don’t get me wrong – I’m not advocating that you should have a mix – for a given database schema I’d recommend picking one regime and sticking with it. But my point is, that even with the mix of approaches in the AskTOM schema, I’ve never encountered any problems or issues with either.

However, there is one use case where I would strongly recommend using some caution on the use of GUIDs, and that is in the arena of systems that load data in bulk (eg data warehouses).

GUIDs are not cheap to generate. A quick look at the history on the structure and generation of unique IDs at https://en.wikipedia.org/wiki/Universally_unique_identifier all come down to a common component – the need to generate a good random number, and “good” can be a struggle for computers because you need algorithms that ensure sufficient randomness and distribution of the generated numbers.  That takes CPU cycles and whilst that is something that you will never notice when using a GUID for the 100 customers in your customer table, it definitely will be noticeable if you are going to attach a GUID to every one of your 10 million daily sales transactions, or telephone records, or similar.

Here’s a simple example where I’ll load 100 million rows into a table.  First I’ll try the conventional (when it comes to Oracle) approach of using a sequence number to uniquely identify each row.



SQL> create sequence seq cache 50000;

Sequence created.

SQL>
SQL> create table t1
  2  ( pk int,
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.


SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t1
  2  select seq.nextval, int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:03:31.42
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL>

3 minutes 30 seconds for 100 million rows is pretty good performance for a laptop, although obviously the table structure here is very simple.

And now I’ll repeat the exercise with the same table structure, but using a raw column to hold the output of a call to SYS_GUID().



SQL>
SQL>
SQL> create table t2
  2  ( pk raw(20),
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.

SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t2
  2  select sys_guid(), int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:30:56.78
SQL> commit;

Commit complete.

Elapsed: 00:00:00.03

That’s right – we’ve blown out to 30 minutes. As you can see, there can be a large cost when the row volumes (and hence number of calls to generate a GUID) get large. We can even take the INSERT out of the equation here, and simply do a raw stress test to see how many GUIDs we can call from the SQL engine using the following test harness.



SQL> create table t ( sz int, dur interval day to second );

Table created.

SQL>
SQL>
SQL> declare
  2    ts_start timestamp;
  3    ts_end   timestamp;
  4    iter int;
  5    dummy raw(32);
  6  begin
  7   for i in 1 .. 8 loop
  8    iter := power(10,i);
  9
 10    ts_start := systimestamp;
 11    if iter <= 10000 then
 12       select max(x) into dummy from
 13       (
 14       select sys_guid() x from
 15       ( select 1 from dual connect by level <= iter )
 16       );
 17    else
 18       select max(x) into dummy from
 19       (
 20       select sys_guid() x from
 21       ( select 1 from dual connect by level <= iter/10000 ),
 22       ( select 1 from dual connect by level <= 10000 )
 23       );
 24    end if;
 25
 26    ts_end := systimestamp;
 27    insert into t values (iter, ts_end - ts_start );
 28    commit;
 29
 30   end loop;
 31  end;
 32  /

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t order by 1;

        SZ DUR
---------- ----------------------------------------------------------
        10 +00 00:00:00.000000
       100 +00 00:00:00.000000
      1000 +00 00:00:00.015000
     10000 +00 00:00:00.172000
    100000 +00 00:00:01.607000
   1000000 +00 00:00:16.083000
  10000000 +00 00:02:49.713000
 100000000 +00 00:26:46.570000
 
 

I’m not trying to scare you off GUIDs – but like any functionality or feature you’re using to build applications, make sure you test it for business requirements you need to satisfy and make an informed decision on how best to use (or not use) it.

Let’s talk about Techiquette

I’m typing this at about 38,000 feet in the air. I’m travelling home from the Oracle Latin America tour – it was a great event, but I’ll save the details for another blog post. In reality, I should be sleeping, but as luck would have it one of the lights on our plane is malfunctioning. So whilst the rest of the plane is sleeping happily in restful darkness, my row and the rows around it are bathed in blazing artificial light Sad smile

image

Ah…the joys of travel. Anyway, I digress.

Before I boarded the flight, I did what most people do in the airport. I signed my life away to a deluge of advertising emails as one does in order to get 30 precious minutes of free wi-fi. I checked some emails, and had a glance through twitter when I came upon this one.

image

As you might expect, there’s plenty of social media activity about this tweet, much of it about the contradiction between the language and what you might expect to come out of the Office of the President of the United States. I’m not going to enter into that debate, or get into political discussion. Let’s face it – Donald Trump strikes me as more a Sybase man (Hey don’t flame me Sybase readers…these are just jokes ok?)

But the tweet made me think about the kind of language we sometimes see on AskTOM and on technical forums in general.

So let’s talk about etiquette on technical forums, or “Techiquette” as I’ve coined it.

We’re actually pretty lucky in technical circles with our discussion forums because most of the conversation resolves around topics that can be backed up with cold hard facts. In those instances, even robust discussion stays civil because it’s pretty hard to disagree with solid evidence when its presented.  A hypothetical example might be something like:

Forum Participant #1: “I need to have a table with 500 columns”
Forum Participant #2: “You cannot – Oracle is limited to 255 columns”
Forum Participant #3: “I disagree – here is an example DDL demonstrating 800 columns”

Participant #2 can’t really dispute the response, because the example is right there in black and white.

But here is where I think conversations can go off the rails and slide downhill into a abuse and insult. It is when the content is both technical and emotional, for example:

Forum Participant #1: “I love using RMAN – it made our backup processes much easier”
Forum Participant #2: “RMAN sucks – it is too complex, anyone using it obviously has no clue about anything”
Forum Participant #1: “Dumb ass”

and away we go…The conversation goes from friendly to flame war.

The problem is – I don’t want my forums to be devoid of emotion, even when the topic is technical. Ever been to a presentation where the presenter does not show any joy or enthusiasm for the material? It’s like spending 45 minutes having root canal work on your teeth. We want people to be passionate about the things that interest them.

So how do we avoid forums descending into abuse whilst still encouraging emotion, and hence potentially emotionally charged content?

Here is my ground rules:

  • Expressing an affirmative or agreeing position? Positive emotions 100% allowed and encouraged.
  • Expressing a negative or disagreeing position? Express it as dispassionately as possible. Stick solely to the facts.

Simple mathematics tells us that if we can stick to the above, the average “positivity index” in a tech conversation will always be greater than or equal to zero.

I’m not claiming to be an angel here – I don’t always manage to follow my own guidelines as well as I should. But I’m always striving to improve, and hopefully if we can all follow them to the best of our ability, then in the main, we’ll have much more civil and interesting community conversations as a result.

Building community via the speaker community

I recently participated in the Oracle Developer Community Yatra tour throughout India. It is a hectic event with 7 cities covered in a mere 9 days, so you can imagine how frantic the pace was. A typical day would be:

  • 7am – breakfast
  • 8am – check out of hotel and leave for the venue
  • 9am – speak all day, host Q&A
  • 6pm – leave straight from venue to the airport
  • 8pm – dinner at airport, and fly to next city
  • 1am – get to next hotel, grab some sleep before doing it all again in 6 hours time

Yet as a speaker in this event, it never felt that the logistics of the event were out of control.  This is mainly due to the incredible work of the people in the AIOUG (All India Oracle User Group) coordinated by Sai Penumuru. The smoothness of the organization prompted me to blog about how user groups could follow the AIOUG lead in terms of running successful events.

Every time I do an event in India, not only do I come away with a stronger network and connection with the attendees, I also gain new and strong friendships with the speakers and this is in no small part due to organizational skills of Sai and the volunteers.

So from that reference point – namely, the smart way in which Sai and the user group foster a great feeling of community amongst the speakers, I thought I’d share the mechanisms via which user groups can organize events that make speakers feel welcome and keen to return.
Common accommodation

When I travel to India, Sai provides a recommended hotel for all speakers. Obviously, no speaker is compelled to stay there, but we all know that the recommendation Sai makes has taken in account:

  • facilities the hotel provides,
  • hotel price to find a balance for speakers either having company funding or funding themselves,
  • transport time to/from the conference venue.

So most of us will always use his recommendation, and it makes the decision making process easy.  But most of all, it is a catalyst for speakers to meet in a casual environment outside the conference hours, and build relationships.

Common transport

The AIOUG organizes transport to and from the venue, and from the airport as well. And for when this is not possible, they will provide a recommended transport means (such as Uber etc) so that speakers never have the risk of picking an unreliable or unsafe transport option. For multi-city events such as Yatra, the AIOUG also recommends common itinerary for flights, once again making the entire planning process much easier for speakers.

A communication mechanism

Before a conference, AIOUG sets up a Whatsapp group containing all of the speakers, and the key representatives from the user group. In this way, all communication is sent in a consistent fashion.  We don’t have to jump between email, twitter, etc to see if any correspondence has been missed. And this group also helps build the relationship between speakers and user group. Humourous stories and pictures can be shared, but most importantly, if there is an issue or problem – everyone is aware of it immediately, making resolution is fast and effective.

And perhaps most importantly, it helps accentuate the human element – messages are sent as people land or takeoff, and when people arrive at the hotel.  Organizers regularly send messages making sure no-one is having any difficulties.  All these things make the speakers feel more comfortable before, during and after the event.

A typical day

Perhaps the most valuable piece of information that is conveyed by the user group is ‘local knowledge’. For example, each evening a whatsapp message would be sent out detailing

  • hotel pickup time,
  • expected transit time,
  • who to contact/what to do when arriving at venue,
  • what identification requirements may be needed on site
    etc

So even though it may be a first visit to a city for the speakers, there is a degree of familiarity with what is expected to happen, and hence knowledge of whether things are departing from the norm.

Onsite assistance

The most stressful 5 minutes for any speaker is the time they are setting up for their talk. Making sure projectors work, internet connectivity, schedule changes etc – are all things that can sabotage a good talk for a speaker. The AIOUG always has someone visit the room in that critical 5 minutes, so a speaker does not have to go hunting for technical assistance.

 

In summary, as you can see, none of these things are particular arduous to do, but the benefits are huge for speakers.  We feel comfortable and welcome, which means a much better chance of a successful talks, and makes us much more likely to want to return.

So if you are reading this, and are part of a user group committee, please consider some of the tips above for your local events. If your speakers have a good experience, they’ll be much more keen to offer continued support for your events.

Humble pie made with NULL strings

I was helping out a client a while back with an issue where a panicked email came into my inbox along the lines of “SELECT IS BROKEN IN ORACLE!!!”, which seemed perhaps a little extreme Smile. So I pursued it further asking for some concrete details, and I must concede it had me a little bamboozled for a while. I’ve simplified the example to keep it easy to digest, but the premise is the same.

My colleague had a table with a couple of VARCHAR2 columns:

image

and the data inside the two columns for the single row was identical:

image

The panicky email was sent because a simple query to check that the two column were identical was not working as expected:

image

At this point, I was pretty relaxed about the situation because this is a “problem” I have seen many many times over the years, especially as people come to Oracle from other database management systems. So I responded to the email:

“Yeah…I know what the issue is. You have trailing spaces in one of the columns. Try RTRIM”

and filed the original email in that special folder where emails never resurface Smile

image

But as quickly as I had hit “Send”, a reply came back with “That doesn’t work either”. Working on the assumption that my email had not been explicit enough (and perhaps I was being a little smug about it) I logged in to the system to run a query on the same table, and lo and behold – my smugness was wiped off my face Smile

image

So then I tried a variation on a theme and concatenated a known value to the end of each column to see if I was missing anything:

image

So everything looks OK but obviously something is awry here. When you have doubts on the data, a good option is to use the DUMP function to see exactly what bytes are stored:

image

and the problem is revealed.  The second column in the table had a trailing ASCII code zero after the word SUCCESS. This can be a nasty (and common) issue when developers are using C or similar languages to store data in an Oracle database. In many languages, ASCII zero is used to terminate an arbitrary length string, and hence it can easily end up being incorrectly stored along with the string bytes in a VARCHAR2 column especially if you are not using the delivered string handling functions in the pre-compiler layer (such as Pro*C). This can also be due to a confusion between the concept of NULL (two L’s) in the database, and the concept of the NUL (one L) character in a programming language.

Compounding the issue is that a casual glance at the data (as per the above screen shots from SQL Developer) typically suggest all is well – the NUL character is not observed. Things can get even worse – I’ve seen some GUI database tools interpret the NUL character as a true NULL and hence when a column contains only a single NUL character, those tools will report the column as being empty (ie NULL), which just makes for even more confusion. Thankfully SQL Developer does not do that, and the null indicator column can be used to see the difference between the two:

image

So I sent a terse email back to the developer reminding them about NUL versus NULL and that if they are using the pre-compiler correctly, then everything will take care of itself. But I must admit, I had to chuckle when they replied with a link to an AskTOM question I answered recently describing the following:

We made the same mistake ourselves! Albeit in a very specific circumstance. Look what happens when you force a STOP command onto a running scheduler job:



SQL> create table t ( x timestamp, y int );

Table created.

SQL> create or replace
  2  procedure myproc is
  3  begin
  4    for i in 1 .. 20
  5    loop
  6      insert into t values (systimestamp,i);commit;
  7      dbms_lock.sleep(2);
  8    end loop;
  9  end;
 10  /

Procedure created.

SQL>
SQL> begin
  2    dbms_scheduler.create_job (
  3      job_name        => 'myjob',
  4      job_type        => 'plsql_block',
  5      job_action      => 'begin myproc; end;',
  6      start_date      => systimestamp,
  7      repeat_interval => 'freq=minutely; bysecond=0;',
  8      enabled         => true);
  9  end;
 10  /

PL/SQL procedure successfully completed.
--
-- waited for 1 successful execution, and then stop the second one in flight
--
SQL> exec dbms_scheduler.stop_job('MYJOB',force=>true)

PL/SQL procedure successfully completed.

SQL> select status, session_id, dump(session_id)
  2  from USER_SCHEDULER_JOB_RUN_DETAILS;

STATUS
------------------------------
SESSION_ID
-------------------------------------------------------------
DUMP(SESSION_ID)
-------------------------------------------------------------
SUCCEEDED
984,53037
Typ=1 Len=9: 57,56,52,44,53,51,48,51,55

STOPPED
983,28542
Typ=1 Len=10: 57,56,51,44,50,56,53,52,50,0  <<=== whoops!

I’ve logged this as a bug but in the interim, if you need the SESSION_ID from the scheduler views, you might want to add a: RTRIM(SESSION_ID, chr(0)) around the query.

Hoist by my own NULL petard Smile

Hyper-partitioned index avoidance thingamajig

As you can tell, I have no idea on a name for what I am about to describe. So let me start from the beginning, and set the scene for an idea I have to utilize a cool new 18c feature.

Often in a transactional-style system the busiest table (let us call it SALES for the sake of this discussion) is also

  • the biggest table, after all, it has all of our sales in it,
  • the most demanded for table, in that, almost every query in our application wants to access it in some way shape or form.

This is in effect the database version of the Pareto Principle. Everyone wants a slice of that SALES “pie”, and the piece of that pie that is in most demand is typically the most recent data. Your application may have pages that will be showing:

  • sales in the past hour,
  • sales in the past day,
  • products sold in the past hour,
  • largest purchase amount for sales in the past few hours,
  • suspicious activity for today,
  • A customer will want their sales for the last few days

The list goes on and on, the common theme being that the data being requested is bound by a range of time in the recent past. Our SALES table already will have a primary key, probably being some sort of unique transaction ID for each purchase, but to the satisfy the style of requests in the list above, we will probably need:

  • an index on the time(stamp) of the sale,
  • an index on the product ID that was sold,
  • an index on the customer ID who made the purchase,
  • potentially even an index on the sale amount

You can see the troubled waters into which we are sailing here. It is our biggest and busiest table, and here we are, adding index after index after index to improve query performance, whilst at the same time:

  1. adding overhead/contention to DML operations on the SALES table,
  2. increasing the size of the database,
  3. increasing the duration of the backups,
  4. increasing time for maintenance and copies to Development and Test

None of this is looking great but we might be thinking “What choice do we have?”

Here is perhaps an alternate strategy: Let’s not create any indexes.  The challenges (1) through (4) above evaporate to non-existence. But of course, now we have a remaining challenge in getting those queries to run efficiently.

I’ll tackle that in a different way – I will partition the SALES table very “aggressively”, hence my blog post title “hyper-partitioning”. Considering the typical query requirements I listed above, I will partition my SALES down to as small as a partition for every hour.


SQL> create table sales
  2    ( ts   timestamp,
  3      id   int,
  4      amt  number,
  5      product int,
  6      customer int,
  7      item_cnt int,
  8      terminal int,
  9      operator int,
 10      credit_card int,
         ...
         ...
 17    )
 18  partition by range ( ts )
 19  interval ( numtodsinterval(1,'HOUR') )
 20  (
 21    partition p1 values less than ( timestamp '2018-07-01 00:00:00' )
 22  );

Table created.

SQL>
SQL> insert /*+ APPEND */ into sales (ts,id,amt,product,customer)
  2  select date '2018-07-01' + rownum / 240, rownum, dbms_random.value(1,100),
  3        dbms_random.value(1,100),dbms_random.value(1,100)
  4  from dual
  5  connect by level 
SQL> set serverout on
SQL> declare
  2    h varchar2(1000);
  3  begin
  4  for i in (
  5    select partition_name, high_value
  6    from user_tab_partitions
  7    where table_name = 'SALES'
  8    and   interval = 'YES'
  9    and   partition_name like 'SYS_P%'
 10    order by partition_position
 11  ) loop
 12    h := i.high_value;
 13    execute immediate 'select to_char('||h||'-0.01,''yyyymmdd_hh24'') from dual' into h;
 14    execute immediate 'alter table sales rename partition '||i.partition_name||' to p'||h;
 15  end loop;
 16  end;
 17  /

PL/SQL procedure successfully completed.

SQL>
SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ ------------------------------------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701_00                   TIMESTAMP' 2018-07-01 01:00:00'
P20180701_01                   TIMESTAMP' 2018-07-01 02:00:00'
P20180701_02                   TIMESTAMP' 2018-07-01 03:00:00'
P20180701_03                   TIMESTAMP' 2018-07-01 04:00:00'
P20180701_04                   TIMESTAMP' 2018-07-01 05:00:00'
P20180701_05                   TIMESTAMP' 2018-07-01 06:00:00'
P20180701_06                   TIMESTAMP' 2018-07-01 07:00:00'
P20180701_07                   TIMESTAMP' 2018-07-01 08:00:00'
P20180701_08                   TIMESTAMP' 2018-07-01 09:00:00'
P20180701_09                   TIMESTAMP' 2018-07-01 10:00:00'
P20180701_10                   TIMESTAMP' 2018-07-01 11:00:00'
P20180701_11                   TIMESTAMP' 2018-07-01 12:00:00'
P20180701_12                   TIMESTAMP' 2018-07-01 13:00:00'
P20180701_13                   TIMESTAMP' 2018-07-01 14:00:00'
P20180701_14                   TIMESTAMP' 2018-07-01 15:00:00'
P20180701_15                   TIMESTAMP' 2018-07-01 16:00:00'
P20180701_16                   TIMESTAMP' 2018-07-01 17:00:00'
P20180701_17                   TIMESTAMP' 2018-07-01 18:00:00'
P20180701_18                   TIMESTAMP' 2018-07-01 19:00:00'
P20180701_19                   TIMESTAMP' 2018-07-01 20:00:00'
P20180701_20                   TIMESTAMP' 2018-07-01 21:00:00'
P20180701_21                   TIMESTAMP' 2018-07-01 22:00:00'
P20180701_22                   TIMESTAMP' 2018-07-01 23:00:00'
P20180701_23                   TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

41 rows selected.

I’ve run a small anonymous block to rename the (system-named) interval partitions into some sensible names to reflect the date range the partition covers. Let’s now look at the typical queries we will now be performing on the SALES table:


SQL> set autotrace traceonly explain
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     9 |   387 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and product = 12;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PRODUCT"=12 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and customer = 25;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("CUSTOMER"=25 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select max(amt) from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    33 |     7   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    33 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
|*  3 |    TABLE ACCESS FULL      | SALES |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> set autotrace off

All of them scan a tiny portion of the data, namely, just the hours of sales data relevant to the query, and the query response times will be relatively consistent for all cases no matter which customer, product or other predicate will be passed because the data to be scanned is a fixed number of hours.

But there’s a problem here. If I am partitioning to the hour, or even to the minute…then it won’t be long before I have a lot of partitions. In the latter case (minutes) I will be up to over 500,000 partitions in the first year of SALES alone! That is a lot of database metadata to store. There is the partitions themselves, plus optimizer statistics on them, plus historical optimizer statistics, plus potentially histograms on every column.  A lot of optimizer data might lead to expensive parse times because there is just so much information to wade through when optimizing queries.

But we only need the extreme granularity of partitions for the SALES table for today. Once today ticks over and becomes “yesterday”, then we might only need a partition for the entire day.  And once “yesterday” ticks over to “last week”, then maybe only weekly partitions are needed and so forth.

One of the cool things in 18c is ability to do this style of maintenance with negligible disruption to service. Because I have named my partitions in a logical fashion, here is a simple routine to merge “yesterdays” hourly partitions into a single one for the day.


SQL> set serverout on
SQL> declare
  2    d date := date '2018-07-01';
  3    ddl varchar2(4000);
  4  begin
  5    select listagg(partition_name||chr(10),',') within group ( order by partition_position )
  6    into   ddl
  7    from   user_tab_partitions
  8    where  table_name = 'SALES'
  9    and    partition_name like 'P'||to_char(d,'yyyymmdd')||'%';
 10
 11    ddl := 'alter table sales merge partitions '||ddl||' into partition p'||to_char(d,'yyyymmdd')||' online';
 12
 13    dbms_output.put_line(ddl);
 14    execute immediate ddl;
 15  end;
 16  /
alter table sales merge partitions
 P20180701_00
,P20180701_01
,P20180701_02
,P20180701_03
,P20180701_04
,P20180701_05
,P20180701_06
,P20180701_07
,P20180701_08
,P20180701_09
,P20180701_10
,P20180701_11
,P20180701_12
,P20180701_13
,P20180701_14
,P20180701_15
,P20180701_16
,P20180701_17
,P20180701_18
,P20180701_19
,P20180701_20
,P20180701_21
,P20180701_22
,P20180701_23
 into partition p20180701 online

PL/SQL procedure successfully completed.

Now I have a single partition for yesterday’s data, and hourly partitions for today’s data.


SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ -------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701                      TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

18 rows selected.

I stress – this is not my recommendation to race out and partition every transactional table you have, and drop all of the indexes Smile. But it with so many online partitioning operations in 18c, it raises some exciting new opportunities there were not available in previous releases. So start thinking about how you can exploit this to get advantages with the partitioning option that might sit “outside the box” of the standard usage.