From file names to directory hierarchy

I had a fun request come in from a colleague the other day.  They had a simple list of fully qualified file names and they needed to present that data in the familiar hierarchical tree layout. 

To demonstrate, I took a little trip down memory lane Smile and grabbed a subset of presentations I’ve done over the years.


SQL> create table t ( fname varchar2(1000));

Table created.

SQL>
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205.pdf');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205a.pdf');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254old.ppt');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254_full.ppt');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\257.ppt');

1 row created.
...
...
...

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer_full.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short_comp.pptx');

1 row created.

SQL> select count(*) from t;

  COUNT(*)
----------
       634

1 row selected.

So the challenge was to present this list of files in a nice tree format which mirrored my directory structure.

(Truth be told, my preference would be that the query would do a better job than I do in trying to organize my files, but that is a blog post for another day Smile)

Step 1 was to use a little LATERAL trickery to parse out all of the elements of the file names into one row per “element”, where “element” an unqualified file name, or single level folder name.


SQL> select id, seq, token
  2  from
  3    ( select rownum id, fname from t ) t_base,
  4    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  5            from dual
  6            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  7            );

        ID        SEQ TOKEN
---------- ---------- ------------------------------------------------------------------------------------
         1          1 C:
         1          2 Users
         1          3 Connor
         1          4 Presentations
         1          5 2002
         1          6 scene_200205.pdf
         2          1 C:
         2          2 Users
         2          3 Connor
         2          4 Presentations
         2          5 2002
         2          6 scene_200205a.pdf
         3          1 C:
         3          2 Users
         3          3 Connor
         3          4 Presentations
         3          5 2002
         3          6 254old.ppt
         4          1 C:
         4          2 Users
         4          3 Connor
         4          4 Presentations
         4          5 2002
         4          6 254_full.ppt
         ...
         ...

We can use that as input to a standard LAG function to associate each file/folder with its parent, using the SEQ column to provide sequencing within each ID


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  )
 11  select
 12      id,
 13      seq,
 14      token,
 15      lag(token) over ( partition by id order by seq ) as parent
 16  from data
 17  order by id, seq desc;

        ID        SEQ TOKEN                                              PARENT
---------- ---------- -------------------------------------------------- -----------------------------------
         1          6 scene_200205.pdf                                   2002
         1          5 2002                                               Presentations
         1          4 Presentations                                      Connor
         1          3 Connor                                             Users
         1          2 Users                                              C:
         1          1 C:
         2          6 scene_200205a.pdf                                  2002
         2          5 2002                                               Presentations
         2          4 Presentations                                      Connor
         2          3 Connor                                             Users
         2          2 Users                                              C:
         2          1 C:
         3          6 254old.ppt                                         2002
         3          5 2002                                               Presentations
         3          4 Presentations                                      Connor
         3          3 Connor                                             Users
         3          2 Users                                              C:
        ...
        ...

That’s looking good, but we have duplicates in the sense that multiple files will roll up to a single folder (eg the “C:” root is repeated).. So we DISTINCT it out to leave just the relationships we need


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  )
 11  select
 12    distinct
 13      token,
 14      lag(token) over ( partition by id order by seq ) as parent
 15  from data
 16  /

TOKEN                                              PARENT
-------------------------------------------------- --------------------------------------------------
C:
254old.ppt                                         2002
257.ppt                                            2002
plus2.ppt                                          2002
254.ppt                                            UKOUG2002_final
error_handling.pdf                                 2003
2004                                               Presentations
mcdonald1.pdf                                      2004
sun_storage_brief.ppt                              sun_storage_presn
ausoug05_memory_x.ppt                              2005
jul05_read_consistency.ppt                         2005
saoug_oracle_home.ppt                              2005
plsql_80_a58236.pdf                                sql_plsql_evo
jun07_tas_moving_data_short.ppt                    misc
ausoug_adel                                        2008
successful_development.pdf                         ausoug_adel
apr08_11g_developers.ppt                           ausoug_apr
apr08_11g_developers_print.ppt                     ausoug_apr
mcdonald_oct08_flashback.pdf                       perth
oct08_11g_developers.ppt                           template
nov08_read_consistency.pdf                         rwwa
perf101b_rwwa.pptx                                 rwwa
sqlectric_melb                                     2009
sql_electric2.pdf                                  sqlectric_melb
sql_electric1.pptx                                 sqlectric_melb
ukoug                                              2009
...
...
...

And we’re pretty much done. At this point, we have the raw data we need in a child/parent form that we could use to traverse with a standard hierarchy CONNECT BY query – I’ll use LPAD to show the hierarchy relationships


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') toke
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  ),
 11  hier as(
 12    select
 13      distinct
 14        token,
 15        lag(token) over ( partition by id order by seq ) as parent
 16    from data
 17  )
 18  select lpad(' ',level*2)||token hier_list
 19  from hier
 20  start with parent is null
 21  connect by prior token = parent
 22  order siblings by token;

HIER_LIST
--------------------------------------------------------------------------------------------------------
  C:
    Users
      Connor
        Presentations
          2002
            254_full.ppt
            254old.ppt
            257.ppt
            UKOUG2002_final
              254.ppt
              257.ppt
              prelim.ppt
            dec_9i_features2.ppt
            dec_legends.ppt
            dec_legends_prelim.ppt
            jul_sig.ppt
            jul_sig_printout.ppt
            june_sig.ppt
            logo.ppt
            march_sig.ppt
            march_sig2.ppt
            plaus.ppt
            plus2.ppt
            scene_200205.pdf
            scene_200205a.pdf
            sep_unix_sig.ppt
            sep_unix_sig2.ppt
          2003
            254.ppt
            abm_succesful_development.ppt
            apr03_perth.ppt
            dec03_perth_9i_nf.ppt
            dec03_perth_9i_nf_popquiz.pdf
            dec03_perth_9i_nf_popquiz.ppt
            error_handling.pdf
            error_handling.ppt
            ims_performance_examples.pdf
            ims_performance_examples.ppt
            sep03_perth_9i_hidden.pdf
            sep03_perth_9i_hidden.ppt
            sep03_perth_9i_nf.ppt
            sep03_perth_legends.pdf
            sep03_perth_legends.ppt
            sep03_perth_prelim.ppt
            sf_oow_03.pdf
            sf_oow_03.ppt
            sf_oow_03a.ppt
            slide.ppt
            succesful_development.pdf
            succesful_development.ppt
            tools_for_performance.pdf
            tools_for_performance.ppt
          2004
            10046_scene.pdf
            bind_var.pdf
            bind_var.ppt
            dec03_perth_book.ppt
            dec03_perth_book2.ppt
            generating_test_data.pdf
            generating_test_data.ppt
            mar04_hotsos_legends.ppt
            mar04_hotsos_nf_w2k.ppt
            mar04_hotsos_nf_w2k_edit.ppt
            mcdonald1.pdf
            mcdonald2.pdf
            nov04_reorgs_rebuild.ppt
            nov04_succesful_development.ppt
            
            ...
            ...
            ...
            
          2018
            apex_connect
              apex_connect_sql_plsql.pdf
              apex_connect_sql_plsql.pptx
              apex_connect_sql_plsql_v2.pptm
              apex_connect_sql_plsql_v2.pptx
              apex_connect_sql_plsql_v3.pptm
              apex_connect_sql_plsql_v3_subset.pptm
              extensions1.pptx
              extensions2.pptx
            code_china
              better_sql_oracle_code.pptx
              better_sql_oracle_code_v2.pdf
              better_sql_oracle_code_v2.pptx
            code_hyderabad
              better_sql_oracle_code.pptx
            hong_kong
              hong_kong_122.pdf
              hong_kong_122.pptx
              hong_kong_122_no_multi.pptx
              hong_kong_partitioning_ora_template.pptx
              hong_kong_partitioning_ora_template2.pdf
              hong_kong_partitioning_ora_template2.pptx
            london
              london.pdf
              london.pptx
              optimizer.pdf
              optimizer.pptx
              optimizer_full.pptx
            odtug
              common_disasters_short.pptx
              common_disasters_short_comp.pptx      

And there we have it. From flat file listing to a nice hierarchical layout just by tackling the problem piece by piece using the trusty WITH clause.

Concurrency … the path to success and the path the failure

Let’s face it. Concurrency is a good thing when it comes to database applications. After all, if there is only a single user of your application, then chances are, it is not a successful application Smile.  Of course there are exceptions to this rule, but by and large, most of the applications we build are going to be used by large populations of users.  And given the recent publicity of users data and privacy, we can also be pretty confident that we want the data in our applications to be correct at all times.

As a developer, it is absolutely critical to keep concurrency in mind when building applications.  Here is a simple demonstration to thrust home the point. 

The task assigned to the developer here is simple – transfer move all of the rows satisfying a particular condition from table T1 to table T2.  Let’s create our database objects first:


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

So table T1 has 10,000 rows and T2 is empty, and our condition to determine which rows to transfer will be X being non-zero.  That happens to be all of the rows, but of course, it could be any filtering rule on the existing data. I’ve chosen all of the rows so I can keep the numbers easy to digest in the upcoming examples.

There is no such thing as a MOVE command in SQL, so we can implement this with two statements: 

  • INSERT-SELECT to copy the rows from T1 to T2
  • DELETE to remove the rows we just copied.

Here is the simple PLSQL routine to do that, and a first test run of the code.


SQL>
SQL> set serverout on
SQL> begin
  2      insert into t2 select * from t1 where x != 0;
  3      delete from t1 where x != 0;
  4      commit;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

SQL>

Everything looks fine doesn’t it? But it’s not.  In fact, I’ll do what any good developer should be doing, and add some instrumentation to the code to get more information about its operations.


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> set serverout on
SQL> begin
  2      insert into t2 select * from t1 where x != 0;
  3      dbms_output.put_line(sql%rowcount);           
  4      delete from t1 where x != 0;
  5      dbms_output.put_line(sql%rowcount);
  6      commit;
  7  end;
  8  /
10000
10100      <=========== !!!!!

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

SQL>
SQL>

Look at the output from the PL/SQL block.  I copied(inserted ) 10,000 rows from T1 to T2, but the deletion phase removed more than 10,000 rows. How was that possible?  It is because databases are about concurrency.  In another session, I had a simple script running that just adds new rows to the table continuously.  As a developer, it is vital that you do not assume that you are the only person accessing or manipulating a table.  My code in its existing form has corrupted the database, because 100 rows have been deleted that were never copied to T2.  They’re gone forever!

So….onto some potential solutions.  Lets try locking the rows first


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> set serverout on
SQL> declare
  2    cursor c is select * from t1 where x != 0 FOR UPDATE;
  3  begin
  4    open c;
  5    insert into t2 select * from t1 where x != 0;
  6    dbms_output.put_line(sql%rowcount);               
  7    delete from t1 where x != 0;
  8    dbms_output.put_line(sql%rowcount);
  9    commit;
 10    close c;
 11  end;
 12  /
10000
10100

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

As you can see, that has not yielded any benefit because locking is only of benefit for locking existing rows.  The new rows arriving from a separate session are unhindered, and hence my DELETE statement still picks up the new rows when it should not. I could workaround this by locking the entire table, but what if my filtering condition was only a small percentage of the rows?  It seems overly aggressive to lock everyone out of the table just to copy a small subset of rows.  In this case, I can use PL/SQL to help me out. Using the BULK COLLECT / BULK BIND facilities, I can restrict my DML operations to just the ROWIDs of the rows I am copying. 


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> set serverout on
SQL> declare
  2    cursor c is select rowid rid, t1.* from t1 where x != 0 for update;
  3    type array is table of c%rowtype index by pls_integer;
  4    l_rows array;
  5    l_idx int;
  6  begin
  7    open c;
  8    loop
  9      fetch c bulk collect into l_rows limit 20000;
 10      exit when l_rows.count = 0;
 11
 12      forall i in 1 .. l_rows.count
 13         insert into t2 values (
 14            l_rows(i).x,
 15            l_rows(i).y,
 16            l_rows(i).z );
 17      dbms_output.put_line(sql%rowcount);       
 18      forall i in 1 .. l_rows.count
 19        delete from t1
 20        where rowid = l_rows(i).rid;
 21      dbms_output.put_line(sql%rowcount);
 22    end loop;
 23  end;
 24  /
10000
10000

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
       147

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

I am collecting the 10000 ROWID’s of interest into an array, and then deleting only those rows.  So even though new rows are arriving, I will never see them because they will not be in the set of ROWIDs that I am moving to table T2.  We can see that during the operation, 147 new rows arrived in T1 from other sessions.

(Note:  Normally you would choose a bulk collect size of perhaps 1000 or similar, and loop around repeated to avoid excessive PGA use.  I bulk collected the entire set to show the rowcounts more intuitively)

You might be thinking that perhaps we no longer need the “FOR UPDATE” clause on the cursor, but it is still a required element.  Without it, it is possible that a different session could update a row from the candidate data set, and that updated row would still be deleted.  Or a different session could even delete, commit, and insert a fresh row that happens to pick up the same ROWID as one from the original set.

Bottom Line – keep concurrency front and foremost in your mind when you are writing code to manipulate data.

Choose your names wisely

It all seems simple enough.  You pick a name for an object…and that is the only object that have than name right ?  For example:


SQL> create table MY_UNIQUE_NAME ( x int );

Table created.

SQL>
SQL> create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME;
create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME
*
ERROR at line 1:
ORA-01471: cannot create a synonym with same name as object


SQL>
SQL> create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME;
create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL>
SQL> create package MY_UNIQUE_NAME is
  2    x int;
  3  end;
  4  /
create package MY_UNIQUE_NAME is
*
ERROR at line 1:
ORA-00955: name is already used by an existing object


So naturally we’d expect an error if we continued on with other object types.  But then this happens ?


SQL> create or replace
  2  trigger MY_UNIQUE_NAME
  3  before insert on MY_UNIQUE_NAME
  4  begin
  5    null;
  6  end;
  7  /

Trigger created.

SQL>
SQL> create index MY_UNIQUE_NAME on MY_UNIQUE_NAME ( x );

Index created.

It might all seem a bit random.  But this is all related to what is known as the NAMESPACE, that is, names of certain objects must be unique within a particular NAMESPACE.  We can see the namespaces for the objects by querying DBA_OBJECTS



SQL> select distinct namespace, object_type from dba_Objects order by 1,2;

 NAMESPACE OBJECT_TYPE
---------- -----------------------
         1 CREDENTIAL
         1 DESTINATION
         1 FUNCTION
         1 INDEXTYPE
         1 JAVA CLASS
         1 JOB
         1 JOB CLASS
         1 LIBRARY
         1 OPERATOR
         1 PACKAGE
         1 PROCEDURE
         1 PROGRAM
         1 SCHEDULE
         1 SCHEDULER GROUP
         1 SEQUENCE
         1 SYNONYM
         1 TABLE
         1 TABLE PARTITION
         1 TABLE SUBPARTITION
         1 TYPE
         1 VIEW
         1 WINDOW
         2 PACKAGE BODY
         2 TYPE BODY
         3 TRIGGER
         4 INDEX
         4 INDEX PARTITION
         5 CLUSTER
         8 LOB
         8 LOB PARTITION
         9 DIRECTORY
        10 QUEUE
        13 JAVA SOURCE
        14 JAVA RESOURCE
        19 MATERIALIZED VIEW
        21 CONTEXT
        23 RULE SET
        24 CONSUMER GROUP
        24 RESOURCE PLAN
        25 XML SCHEMA
        32 JAVA DATA
        36 RULE
        38 EVALUATION CONTEXT
        51 UNDEFINED
        52 UNDEFINED
        64 EDITION
        88 CONTAINER
        93 UNIFIED AUDIT POLICY
       132 LOCKDOWN PROFILE
           DATABASE LINK

50 rows selected.



Because indexes, triggers are in a different namespace to tables, synonyms, procedures, packages etc, they can share the same name.

I’d still probably recommend unique names across all namespaces, just so there is no confusion.

connor_speaking

NVL vs COALESCE

Jonathan Lewis just published a blog post about NVL and COALESCE and the optimizer costings for each.

There is also perhaps a significant difference between NVL and COALESCE in that the former seems to have an in-built optimization for handling bind variables and nulls. 

Consider an application where users optionally pass in search criteria and you have to query a table based on that criteria.  You have three natural choices here to implement that:

WHERE column = :search_criteria or :search_criteria is null

or

WHERE column = nvl(:search_criteria ,column)

or

WHERE column = coalesce(:search_criteria,column)

Functionally they are identical*, but the implementation detail shows a nice little optimizer trick that only works with NVL.


SQL> create table t as select * From dba_objects;

Table created.

SQL> variable search_criteria number
SQL>
SQL> exec :search_criteria := 123

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix1 on t ( object_id ) ;

Index created.

SQL>
SQL> set feedback only
SQL> select *
  2  from t
  3  where object_id = nvl(:search_criteria,object_id);

1 row selected.

SQL>
SQL> set feedback on
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  0g820t1jw00hm, child number 0
-------------------------------------
select * from t where object_id = nvl(:search_criteria,object_id)

Plan hash value: 2258578794

----------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                 |       |       |   430 (100)|          |
|   1 |  VIEW                                  | VW_ORE_1B35BA0F | 78868 |    36M|   430   (1)| 00:00:01 |
|   2 |   UNION-ALL                            |                 |       |       |            |          |
|*  3 |    FILTER                              |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| T               |     1 |   132 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | IX1             |     1 |       |     1   (0)| 00:00:01 |
|*  6 |    FILTER                              |                 |       |       |            |          |
|*  7 |     TABLE ACCESS FULL                  | T               | 78867 |     9M|   428   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(:SEARCH_CRITERIA IS NOT NULL)
   5 - access("OBJECT_ID"=:SEARCH_CRITERIA)
   6 - filter(:SEARCH_CRITERIA IS NULL)
   7 - filter("OBJECT_ID" IS NOT NULL)


27 rows selected.

SQL>
SQL> set feedback only
SQL> select *
  2  from t
  3  where object_id = coalesce(:search_criteria,object_id);

1 row selected.

SQL>
SQL> set feedback on
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  am3uvm7nvx5d9, child number 0
-------------------------------------
select * from t where object_id = coalesce(:search_criteria,object_id)

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   427 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |   132 |   427   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=COALESCE(:SEARCH_CRITERIA,"OBJECT_ID"))


18 rows selected.

SQL>
SQL>
SQL>
SQL> set feedback only
SQL> select *
  2  from t
  3  where ( object_id = :search_criteria or :search_criteria is null );

1 row selected.

SQL>
SQL> set feedback on
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  ff0s2j51scxss, child number 0
-------------------------------------
select * from t where ( object_id = :search_criteria or
:search_criteria is null )

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   427 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |  3945 |   508K|   427   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((:SEARCH_CRITERIA IS NULL OR
              "OBJECT_ID"=:SEARCH_CRITERIA))


20 rows selected.

SQL>

Only NVL gets the benefit of the query being “split” into two pieces – one to handle the case where the passed criteria is null, and the other for when the criteria is not null.  The FILTER in line 3 shows that we will only run one or the other.

So for these particular types of queries, make sure you test all the possibilities – you might find NVL (currently) is your best bet.

 

* – Addenda:  Thanks to Jonathan for pointing out that you can get discrepancies in the results for the three strategies above for columns that may contain nulls, so as always, take care.

Buzzword Bingo

Looking for that catchy title for your next presentation ?

I took the first word from the title of 1000 Oracle OpenWorld presentations, and looked for patterns Smile.  I omitted some obvious terms that are either products or definite/indefinite articles:

  • Oracle
  • Peoplesoft 
  • The
  • How 
  • OAUG
  • General
  • MySQL

And here is what we end up with:


SQL> select buzz, count(*)
  2  from buzz
  3  group by buzz
  4  order by 2 desc;

BUZZ                                                                                                   COUNT(*)
---------------------------------------------------------------------------------------------------- ----------
Building                                                                                                     18
Customers                                                                                                    14
Best                                                                                                         13
Digital                                                                                                      11
Migrating                                                                                                    10
Getting                                                                                                      10
Accelerate                                                                                                    8
Modernize                                                                                                     8
Implementing                                                                                                  7
Data                                                                                                          7
Transforming                                                                                                  6
Extend                                                                                                        6
Identifying                                                                                                   6
Managing                                                                                                      6
Achieving                                                                                                     6
Leveraging                                                                                                    5
Making                                                                                                        5
Creating                                                                                                      4

So there you go…All the verbs and adjectives you need for a successful presentation Smile

AskTOM TV–episode 11

Just a quick note to give the supporting collateral to the latest episode of AskTOM TV.

The question I tackled is this one:

https://asktom.oracle.com/pls/apex/asktom.search?tag=want-to-retrive-numbers-in-words

which was a fun one to answer because it showcases several useful SQL techniques:

  • Using CONNECT to synthesize rows,
  • Using regular expressions to parse text ,
  • Using MULTICAST in Oracle 10g to emulate the native LISTAGG functions from 11g onwards ,
  • Using the hooks into the OCI aggregation facilities to build custom aggregations ,
  • The JSP format mask as a mean to generate numeric words

And here is the entire script from the episode if you want to run it yourself.



drop type string_agg_type;
col column_value format a60
col digit format a60
col concat_str format a60
drop table  t purge;


select to_char(to_date('7','J'),'JSP') from dual;

select to_char(to_date('0','J'),'JSP') from dual;

select
  case x
    when '0' then 'zero'
    when '1' then 'one'
    when '2' then 'two'
    when '3' then 'three'
    when '4' then 'four'
    when '5' then 'five'
    when '6' then 'six'
    when '7' then 'seven'
    when '8' then 'eight'
    when '9' then 'nine'
  end
from ( select '3' x from dual ) 

/

select
  case x
    when '0' then 'zero'
    when '1' then 'one'
    when '2' then 'two'
    when '3' then 'three'
    when '4' then 'four'
    when '5' then 'five'
    when '6' then 'six'
    when '7' then 'seven'
    when '8' then 'eight'
    when '9' then 'nine'
  end
from (
  select substr('123',rownum,1) x
  from dual
  connect by level <= 3
  ) 

/  


create or replace type string_list is table of varchar2(1000);
/

create table t ( x int );
insert into t values (101);
insert into t values (456);
insert into t values (789);

select *
from t,
     table(cast(multiset(
        select substr(to_char(t.x),rownum,1)
        from dual
connect by level <= length(to_char(t.x))) as string_list)
)

/


select
  x,
  digit
from (
  select x, column_value digit
  from t,
       table(cast(multiset(
          select 
            case substr(to_char(t.x),rownum,1)
              when '0' then 'zero'
              when '1' then 'one'
              when '2' then 'two'
              when '3' then 'three'
              when '4' then 'four'
              when '5' then 'five'
              when '6' then 'six'
              when '7' then 'seven'
              when '8' then 'eight'
              when '9' then 'nine'
            end str
          from dual
          connect by level <= length(to_char(t.x))) as string_list)
  )
)

/

create or replace type string_agg_type as object
(
   data  string_list,

   static function
        ODCIAggregateInitialize(sctx IN OUT string_agg_type )
        return number,

   member function
        ODCIAggregateIterate(self IN OUT string_agg_type ,
                             value IN varchar2 )
        return number,

   member function
        ODCIAggregateTerminate(self IN string_agg_type,
                               returnValue OUT  varchar2,
                               flags IN number)
        return number,

   member function
        ODCIAggregateMerge(self IN OUT string_agg_type,
                           ctx2 IN string_agg_type)
        return number
);
/
 
create or replace type body string_agg_type
is

static function ODCIAggregateInitialize(sctx IN OUT string_agg_type)
return number
is
begin
    sctx := string_agg_type( string_list() );
    return ODCIConst.Success;
end;

member function ODCIAggregateIterate(self IN OUT string_agg_type,
                                     value IN varchar2 )
return number
is
begin
    data.extend;
    data(data.count) := value;
    return ODCIConst.Success;
end;

member function ODCIAggregateTerminate(self IN string_agg_type,
                                       returnValue OUT varchar2,
                                       flags IN number)
return number
is
    l_data varchar2(4000);
begin
    for x in ( select column_value from TABLE(data) order by 1 )
    loop
            l_data := l_data || ',' || x.column_value;
    end loop;
    returnValue := ltrim(l_data,',');
    return ODCIConst.Success;
end;

member function ODCIAggregateMerge(self IN OUT string_agg_type,
                                   ctx2 IN string_agg_type)
return number
is
begin
    for i in 1 .. ctx2.data.count
    loop
            data.extend;
            data(data.count) := ctx2.data(i);
    end loop;
    return ODCIConst.Success;
end;

end;
/
 
CREATE or replace
FUNCTION stragg(input varchar2 )
RETURN varchar2
PARALLEL_ENABLE AGGREGATE USING string_agg_type;
/


with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select 
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, stragg(digit) concat_str
from   source_data
group by x
order by 1

/
 
with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select '@'||lpad(level,10,'0')||'~'||
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, stragg(digit)concat_str
from   source_data
group by x
order by 1

/


with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select '@'||lpad(level,10,'0')||'~'||
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, regexp_replace(stragg(digit),'\@[0-9]*\~') concat_str
from   source_data
group by x
order by 1


/




Partition-wise join

So just what is a “partition-wise” join ?  We will use a metaphor to hopefully Smile explain the benefit.

image

Let’s say two people, Logan and Shannon, decide to move in together.  If each of them already have an existing residence, they will both have a lot of the common items that you find in any household.  So they have a decision to make – do they keep two of everything, or do they have a bit of a “cull” of things that they have in common.  In this imaginary scenario, we will focus on household items in the bathroom and the kitchen.  Logan grabs a set of kitchen knives a knife block, calls Shannon and asks: “Hey Shannon, do you already have a knife block?”

What do you think Shannon will do ? Search the entire house for an existing knife block ?  Of course not.  If there is a knife block, then the only place it will be located will be in the kitchen.  In fact, when matching up the items throughout the house, Shannon and Logan will restrict their investigation to the room that makes sense for the item in question.  That is just common sense – why would anyone search in the bathroom for (say) forks and spoons ?  It would just be a waste of effort.

(Editors Note:  Anyone with young children will of course dispute this metaphor, stating quite correctly that you can probably find every possible household item in every possible room, and probably outside as well Smile but we’ll omit that possibility for the sake of this discussion)

image

And that is exactly what a partition-wise join enables us to do in the database.  If two tables are partitioned with the same definition, and we are joining on the partition key, then that definition guarantees that for a row in one table with partition key “K” and hence partition “P”, we only need to seek that row in the same partition in the table we are joining to (where “same” is based on the partitioning definition).  It is the partitioning equivalent of “only searching in the kitchen and not the bathroom”.  We can see this via the execution plan when doing such a join.  Let’s create two tables with equal partition definitions and then join on the partition key.


SQL> --
SQL> -- Example 1
SQL> --
SQL>
SQL> drop table t1 purge;

Table dropped.

SQL> drop table t2 purge;

Table dropped.

SQL>
SQL> create table t1 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p_kitchen values less than (10000),
  5  partition p_bathroom values less than (20000),
  6  partition p_dining values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p_kitchen values less than (10000),
  5  partition p_bathroom values less than (20000),
  6  partition p_dining values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> insert into t1 select rownum, rownum from dual connect by level < 30000;

29999 rows created.

SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3155849676

---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    20 |  1641   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |    20 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      | 29999 |   585K|  1641   (1)| 00:00:01 |     1 |     3 |
|*  3 |    HASH JOIN         |      | 29999 |   585K|  1641   (1)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS FULL| T1   | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |     TABLE ACCESS FULL| T2   | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."X"="T2"."X")

SQL> set autotrace off
SQL>

The key part of the execution plan here is that the HASH JOIN is occurring within (or “under”) the PARTITION RANGE ALL iteration.  This can be interpreted as: “Start with the first partition in each table, do a hash join on that partition.  Then move onto the next partition; do a hash join on that partition”, and so on.  This is efficient on resources because at no point are we trying (and obviously failing) to join a row from table T1 partition P_KITCHEN to table T2 partition P_BATHROOM or P_DINING.  Each hash join is a smaller operation and hence also more likely to be completed in the available PGA allocation for that session.  Also, when it comes to running such a query in parallel, then each parallel slave can tackle the job of handling a partition in isolation to the other slaves.

If the partitions do not align (see the Editors note above Smile), then our join will not be as efficient.


SQL> --
SQL> -- Example 2
SQL> --
SQL>
SQL>
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (15000),
  5  partition p3 values less than (30000)
  6  );

Table created.

SQL>
SQL> --
SQL> -- all partitions do NOT align, so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |  1369   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |       |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|  1369   (1)| 00:00:01 |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|  1369   (1)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 29999 |   292K|   548   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 29999 |   292K|   548   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>

The key element here is that the HASH JOIN now sits above the cycling through all of the partitions.  In earlier releases of Oracle, you would not see the line containing the :BF0000, so it would be a simple join across all the rows as if the tables were not partitioned at all.  But when the partitions do not align, things are slightly better in modern releases.  We use a “Bloom filter” (hence the :BF prefix) to reduce the overhead of joining the two tables.  Since I’m using metaphors in this post, think of “phoning ahead” to the cinema to see if there are seats available for your favourite movie.  If the cinema owner says the movie is sold out, you have saved yourself a car trip. But just because the owner says there are seats available, it is still possible you might drive there and find that the movie has sold out during that time.  A Bloom filter is like phoning ahead – there’s a good chance you can avoid some work, but it is not a guarantee.  You can read about Bloom filters here in a great whitepaper by Christian Antognini.

Note that all of the partitions must align. Here is an example where the first three partitions are in alignment, having boundaries are 10000, 20000 and 30000, but our second table T2 has an additional partition defined.  Once again, we fall back to the Bloom filter option.


SQL> --
SQL> -- Example 3
SQL> --
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (10000),
  5  partition p2 values less than (20000),
  6  partition p3 values less than (30000),
  7  partition p4 values less than (40000)
  8  );

Table created.

SQL>
SQL> --
SQL> -- all partitions do NOT align, so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select rownum, rownum from dual connect by level < 40000;

39999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |  1913   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |       |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|  1913   (1)| 00:00:01 |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|  1913   (1)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 39999 |   390K|  1093   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 39999 |   390K|  1093   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>
SQL>


So faster queries on partitioned tables is not just about partition pruning.  Partition-wise joins also can make a beneficial impact on query response times.