From file names to directory hierarchy

I had a fun request come in from a colleague the other day.  They had a simple list of fully qualified file names and they needed to present that data in the familiar hierarchical tree layout. 

To demonstrate, I took a little trip down memory lane Smile and grabbed a subset of presentations I’ve done over the years.


SQL> create table t ( fname varchar2(1000));

Table created.

SQL>
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205.pdf');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205a.pdf');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254old.ppt');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254_full.ppt');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2002\257.ppt');

1 row created.
...
...
...

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer_full.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short.pptx');

1 row created.

SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short_comp.pptx');

1 row created.

SQL> select count(*) from t;

  COUNT(*)
----------
       634

1 row selected.

So the challenge was to present this list of files in a nice tree format which mirrored my directory structure.

(Truth be told, my preference would be that the query would do a better job than I do in trying to organize my files, but that is a blog post for another day Smile)

Step 1 was to use a little LATERAL trickery to parse out all of the elements of the file names into one row per “element”, where “element” an unqualified file name, or single level folder name.


SQL> select id, seq, token
  2  from
  3    ( select rownum id, fname from t ) t_base,
  4    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  5            from dual
  6            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  7            );

        ID        SEQ TOKEN
---------- ---------- ------------------------------------------------------------------------------------
         1          1 C:
         1          2 Users
         1          3 Connor
         1          4 Presentations
         1          5 2002
         1          6 scene_200205.pdf
         2          1 C:
         2          2 Users
         2          3 Connor
         2          4 Presentations
         2          5 2002
         2          6 scene_200205a.pdf
         3          1 C:
         3          2 Users
         3          3 Connor
         3          4 Presentations
         3          5 2002
         3          6 254old.ppt
         4          1 C:
         4          2 Users
         4          3 Connor
         4          4 Presentations
         4          5 2002
         4          6 254_full.ppt
         ...
         ...

We can use that as input to a standard LAG function to associate each file/folder with its parent, using the SEQ column to provide sequencing within each ID


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  )
 11  select
 12      id,
 13      seq,
 14      token,
 15      lag(token) over ( partition by id order by seq ) as parent
 16  from data
 17  order by id, seq desc;

        ID        SEQ TOKEN                                              PARENT
---------- ---------- -------------------------------------------------- -----------------------------------
         1          6 scene_200205.pdf                                   2002
         1          5 2002                                               Presentations
         1          4 Presentations                                      Connor
         1          3 Connor                                             Users
         1          2 Users                                              C:
         1          1 C:
         2          6 scene_200205a.pdf                                  2002
         2          5 2002                                               Presentations
         2          4 Presentations                                      Connor
         2          3 Connor                                             Users
         2          2 Users                                              C:
         2          1 C:
         3          6 254old.ppt                                         2002
         3          5 2002                                               Presentations
         3          4 Presentations                                      Connor
         3          3 Connor                                             Users
         3          2 Users                                              C:
        ...
        ...

That’s looking good, but we have duplicates in the sense that multiple files will roll up to a single folder (eg the “C:” root is repeated).. So we DISTINCT it out to leave just the relationships we need


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  )
 11  select
 12    distinct
 13      token,
 14      lag(token) over ( partition by id order by seq ) as parent
 15  from data
 16  /

TOKEN                                              PARENT
-------------------------------------------------- --------------------------------------------------
C:
254old.ppt                                         2002
257.ppt                                            2002
plus2.ppt                                          2002
254.ppt                                            UKOUG2002_final
error_handling.pdf                                 2003
2004                                               Presentations
mcdonald1.pdf                                      2004
sun_storage_brief.ppt                              sun_storage_presn
ausoug05_memory_x.ppt                              2005
jul05_read_consistency.ppt                         2005
saoug_oracle_home.ppt                              2005
plsql_80_a58236.pdf                                sql_plsql_evo
jun07_tas_moving_data_short.ppt                    misc
ausoug_adel                                        2008
successful_development.pdf                         ausoug_adel
apr08_11g_developers.ppt                           ausoug_apr
apr08_11g_developers_print.ppt                     ausoug_apr
mcdonald_oct08_flashback.pdf                       perth
oct08_11g_developers.ppt                           template
nov08_read_consistency.pdf                         rwwa
perf101b_rwwa.pptx                                 rwwa
sqlectric_melb                                     2009
sql_electric2.pdf                                  sqlectric_melb
sql_electric1.pptx                                 sqlectric_melb
ukoug                                              2009
...
...
...

And we’re pretty much done. At this point, we have the raw data we need in a child/parent form that we could use to traverse with a standard hierarchy CONNECT BY query – I’ll use LPAD to show the hierarchy relationships


SQL> with data as
  2  (
  3  select id, seq, token
  4  from
  5    ( select rownum id, fname from t ) t_base,
  6    lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') toke
  7            from dual
  8            connect by level <= 1+length(fname)-length(replace(fname,'\'))
  9            )
 10  ),
 11  hier as(
 12    select
 13      distinct
 14        token,
 15        lag(token) over ( partition by id order by seq ) as parent
 16    from data
 17  )
 18  select lpad(' ',level*2)||token hier_list
 19  from hier
 20  start with parent is null
 21  connect by prior token = parent
 22  order siblings by token;

HIER_LIST
--------------------------------------------------------------------------------------------------------
  C:
    Users
      Connor
        Presentations
          2002
            254_full.ppt
            254old.ppt
            257.ppt
            UKOUG2002_final
              254.ppt
              257.ppt
              prelim.ppt
            dec_9i_features2.ppt
            dec_legends.ppt
            dec_legends_prelim.ppt
            jul_sig.ppt
            jul_sig_printout.ppt
            june_sig.ppt
            logo.ppt
            march_sig.ppt
            march_sig2.ppt
            plaus.ppt
            plus2.ppt
            scene_200205.pdf
            scene_200205a.pdf
            sep_unix_sig.ppt
            sep_unix_sig2.ppt
          2003
            254.ppt
            abm_succesful_development.ppt
            apr03_perth.ppt
            dec03_perth_9i_nf.ppt
            dec03_perth_9i_nf_popquiz.pdf
            dec03_perth_9i_nf_popquiz.ppt
            error_handling.pdf
            error_handling.ppt
            ims_performance_examples.pdf
            ims_performance_examples.ppt
            sep03_perth_9i_hidden.pdf
            sep03_perth_9i_hidden.ppt
            sep03_perth_9i_nf.ppt
            sep03_perth_legends.pdf
            sep03_perth_legends.ppt
            sep03_perth_prelim.ppt
            sf_oow_03.pdf
            sf_oow_03.ppt
            sf_oow_03a.ppt
            slide.ppt
            succesful_development.pdf
            succesful_development.ppt
            tools_for_performance.pdf
            tools_for_performance.ppt
          2004
            10046_scene.pdf
            bind_var.pdf
            bind_var.ppt
            dec03_perth_book.ppt
            dec03_perth_book2.ppt
            generating_test_data.pdf
            generating_test_data.ppt
            mar04_hotsos_legends.ppt
            mar04_hotsos_nf_w2k.ppt
            mar04_hotsos_nf_w2k_edit.ppt
            mcdonald1.pdf
            mcdonald2.pdf
            nov04_reorgs_rebuild.ppt
            nov04_succesful_development.ppt
            
            ...
            ...
            ...
            
          2018
            apex_connect
              apex_connect_sql_plsql.pdf
              apex_connect_sql_plsql.pptx
              apex_connect_sql_plsql_v2.pptm
              apex_connect_sql_plsql_v2.pptx
              apex_connect_sql_plsql_v3.pptm
              apex_connect_sql_plsql_v3_subset.pptm
              extensions1.pptx
              extensions2.pptx
            code_china
              better_sql_oracle_code.pptx
              better_sql_oracle_code_v2.pdf
              better_sql_oracle_code_v2.pptx
            code_hyderabad
              better_sql_oracle_code.pptx
            hong_kong
              hong_kong_122.pdf
              hong_kong_122.pptx
              hong_kong_122_no_multi.pptx
              hong_kong_partitioning_ora_template.pptx
              hong_kong_partitioning_ora_template2.pdf
              hong_kong_partitioning_ora_template2.pptx
            london
              london.pdf
              london.pptx
              optimizer.pdf
              optimizer.pptx
              optimizer_full.pptx
            odtug
              common_disasters_short.pptx
              common_disasters_short_comp.pptx      

And there we have it. From flat file listing to a nice hierarchical layout just by tackling the problem piece by piece using the trusty WITH clause.

Those pesky LONG columns

There was a time, many moons ago Smile when CLOB, BLOB and BFILE did not exist as data types. So if you had anything longer than a few kilobytes of data to store, you had to use a LONG or a LONG RAW.  But those data types came with all sorts of restrictions and frustrations, and we all embraced the improvements that the LOB data types brought in Oracle 8.  But of course, we carry a lot of that historical “baggage” in the data dictionary.


SQL> select owner, table_name, column_name
  2  from   dba_tab_cols
  3  where  data_type = 'LONG'
  4  and    substr(table_name,1,4) in ('DBA_','ALL_','USER')
  5  order by 1,2, 3;

OWNER                          TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ -------------------------
SYS                            ALL_ARGUMENTS                  DEFAULT_VALUE
SYS                            ALL_CLUSTER_HASH_EXPRESSIONS   HASH_EXPRESSION
SYS                            ALL_CONSTRAINTS                SEARCH_CONDITION
SYS                            ALL_IND_EXPRESSIONS            COLUMN_EXPRESSION
SYS                            ALL_IND_PARTITIONS             HIGH_VALUE
SYS                            ALL_IND_SUBPARTITIONS          HIGH_VALUE
SYS                            ALL_MVIEWS                     QUERY
SYS                            ALL_MVIEW_AGGREGATES           MEASURE
SYS                            ALL_MVIEW_ANALYSIS             QUERY
SYS                            ALL_NESTED_TABLE_COLS          DATA_DEFAULT
SYS                            ALL_REGISTERED_MVIEWS          QUERY_TXT
SYS                            ALL_REGISTERED_SNAPSHOTS       QUERY_TXT
SYS                            ALL_SNAPSHOTS                  QUERY
SYS                            ALL_SQLSET_PLANS               OTHER
SYS                            ALL_SUBPARTITION_TEMPLATES     HIGH_BOUND
SYS                            ALL_SUMMARIES                  QUERY
SYS                            ALL_SUMMARY_AGGREGATES         MEASURE
SYS                            ALL_TAB_COLS                   DATA_DEFAULT
SYS                            ALL_TAB_COLS_V$                DATA_DEFAULT
SYS                            ALL_TAB_COLUMNS                DATA_DEFAULT
SYS                            ALL_TAB_PARTITIONS             HIGH_VALUE
SYS                            ALL_TAB_SUBPARTITIONS          HIGH_VALUE
SYS                            ALL_TRIGGERS                   TRIGGER_BODY
SYS                            ALL_VIEWS                      TEXT
SYS                            ALL_VIEWS_AE                   TEXT
SYS                            ALL_ZONEMAPS                   QUERY
SYS                            ALL_ZONEMAP_MEASURES           MEASURE
SYS                            DBA_ADVISOR_SQLPLANS           OTHER
SYS                            DBA_ARGUMENTS                  DEFAULT_VALUE
SYS                            DBA_CLUSTER_HASH_EXPRESSIONS   HASH_EXPRESSION
SYS                            DBA_CONSTRAINTS                SEARCH_CONDITION
SYS                            DBA_IM_EXPRESSIONS             SQL_EXPRESSION
SYS                            DBA_IND_EXPRESSIONS            COLUMN_EXPRESSION
SYS                            DBA_IND_PARTITIONS             HIGH_VALUE
SYS                            DBA_IND_SUBPARTITIONS          HIGH_VALUE
SYS                            DBA_MVIEWS                     QUERY
SYS                            DBA_MVIEW_AGGREGATES           MEASURE
SYS                            DBA_MVIEW_ANALYSIS             QUERY
SYS                            DBA_NESTED_TABLE_COLS          DATA_DEFAULT
SYS                            DBA_OUTLINES                   SQL_TEXT
SYS                            DBA_REGISTERED_MVIEWS          QUERY_TXT
SYS                            DBA_REGISTERED_SNAPSHOTS       QUERY_TXT
SYS                            DBA_SNAPSHOTS                  QUERY
SYS                            DBA_SQLSET_PLANS               OTHER
SYS                            DBA_SQLTUNE_PLANS              OTHER
SYS                            DBA_SUBPARTITION_TEMPLATES     HIGH_BOUND
SYS                            DBA_SUMMARIES                  QUERY
SYS                            DBA_SUMMARY_AGGREGATES         MEASURE
SYS                            DBA_TAB_COLS                   DATA_DEFAULT
SYS                            DBA_TAB_COLS_V$                DATA_DEFAULT
SYS                            DBA_TAB_COLUMNS                DATA_DEFAULT
SYS                            DBA_TAB_PARTITIONS             HIGH_VALUE
SYS                            DBA_TAB_SUBPARTITIONS          HIGH_VALUE
SYS                            DBA_TRIGGERS                   TRIGGER_BODY
SYS                            DBA_VIEWS                      TEXT
SYS                            DBA_VIEWS_AE                   TEXT
SYS                            DBA_ZONEMAPS                   QUERY
SYS                            DBA_ZONEMAP_MEASURES           MEASURE
SYS                            USER_ADVISOR_SQLPLANS          OTHER
SYS                            USER_ARGUMENTS                 DEFAULT_VALUE
SYS                            USER_CLUSTER_HASH_EXPRESSIONS  HASH_EXPRESSION
SYS                            USER_CONSTRAINTS               SEARCH_CONDITION
SYS                            USER_IM_EXPRESSIONS            SQL_EXPRESSION
SYS                            USER_IND_EXPRESSIONS           COLUMN_EXPRESSION
SYS                            USER_IND_PARTITIONS            HIGH_VALUE
SYS                            USER_IND_SUBPARTITIONS         HIGH_VALUE
SYS                            USER_MVIEWS                    QUERY
SYS                            USER_MVIEW_AGGREGATES          MEASURE
SYS                            USER_MVIEW_ANALYSIS            QUERY
SYS                            USER_NESTED_TABLE_COLS         DATA_DEFAULT
SYS                            USER_OUTLINES                  SQL_TEXT
SYS                            USER_REGISTERED_MVIEWS         QUERY_TXT
SYS                            USER_REGISTERED_SNAPSHOTS      QUERY_TXT
SYS                            USER_SNAPSHOTS                 QUERY
SYS                            USER_SQLSET_PLANS              OTHER
SYS                            USER_SQLTUNE_PLANS             OTHER
SYS                            USER_SUBPARTITION_TEMPLATES    HIGH_BOUND
SYS                            USER_SUMMARIES                 QUERY
SYS                            USER_SUMMARY_AGGREGATES        MEASURE
SYS                            USER_TAB_COLS                  DATA_DEFAULT
SYS                            USER_TAB_COLS_V$               DATA_DEFAULT
SYS                            USER_TAB_COLUMNS               DATA_DEFAULT
SYS                            USER_TAB_PARTITIONS            HIGH_VALUE
SYS                            USER_TAB_SUBPARTITIONS         HIGH_VALUE
SYS                            USER_TRIGGERS                  TRIGGER_BODY
SYS                            USER_VIEWS                     TEXT
SYS                            USER_VIEWS_AE                  TEXT
SYS                            USER_ZONEMAPS                  QUERY
SYS                            USER_ZONEMAP_MEASURES          MEASURE
WMSYS                          ALL_WM_IND_EXPRESSIONS         COLUMN_EXPRESSION
WMSYS                          USER_WM_IND_EXPRESSIONS        COLUMN_EXPRESSION

So when we want to do some data mining against the dictionary, the following error is a common and frustrating one!


SQL> select *
  2  from   dba_views
  3  where  text like '%mystring%';
where  text like '%mystring%'
       *
ERROR at line 3:
ORA-00932: inconsistent datatypes: expected CHAR got LONG

I often hear the comment “Why not just in the next version, just change all those LONG’s to CLOB?”.  I imagine it would be possible to do so, but just pause for a second and think of the amount of regression testing that would need to occur, not just inside the database group in the Oracle organization, but for every customer that has ever coded up a mechanism in PL/SQL, or C, or Java, or any other language that is potentially using a LONG column in one of their queries.  I don’t know for sure, but I suspect that is the reason why we have added new columns to the dictionary rather than modify existing ones.  For example, if you look at DBA_VIEWS, you can see the addition of a column TEXT_VC which is a VARCHAR2 equivalent to the pre-existing TEXT column which is a LONG.


SQL> desc DBA_VIEWS
 Name                                Null?    Type
 ----------------------------------- -------- ----------------
 OWNER                               NOT NULL VARCHAR2(128)
 VIEW_NAME                           NOT NULL VARCHAR2(128)
 TEXT_LENGTH                                  NUMBER
 TEXT                                         LONG
 TEXT_VC                                      VARCHAR2(4000)
 TYPE_TEXT_LENGTH                             NUMBER
 TYPE_TEXT                                    VARCHAR2(4000)
 OID_TEXT_LENGTH                              NUMBER
 OID_TEXT                                     VARCHAR2(4000)
 VIEW_TYPE_OWNER                              VARCHAR2(128)
 VIEW_TYPE                                    VARCHAR2(128)
 SUPERVIEW_NAME                               VARCHAR2(128)
 EDITIONING_VIEW                              VARCHAR2(1)
 READ_ONLY                                    VARCHAR2(1)
 CONTAINER_DATA                               VARCHAR2(1)
 BEQUEATH                                     VARCHAR2(12)
 ORIGIN_CON_ID                                NUMBER
 DEFAULT_COLLATION                            VARCHAR2(100)
 CONTAINERS_DEFAULT                           VARCHAR2(3)
 CONTAINER_MAP                                VARCHAR2(3)
 EXTENDED_DATA_LINK                           VARCHAR2(3)
 EXTENDED_DATA_LINK_MAP                       VARCHAR2(3)

TEXT_VC is a nice touch, but it would have been nice to have that as a CLOB column to get access to the entire DDL for the view should it exceed 4000 characters.  I can’t solve that problem, but I thought perhaps I can give you a workable compromise.  In 8i, the TO_LOB function was implemented to allow customers to migrate from LONG to CLOB with minimal fuss.  I can take advantage of that to provide a CLOB-based copy of DBA_VIEWS:


SQL> create table system.t  as
  2  select
  3     owner
  4    ,view_name
  5    ,text_length
  6    ,to_lob(text) text
  7    ,text_vc
  8    ,type_text_length
  9    ,type_text
 10    ,oid_text_length
 11    ,oid_text
 12    ,view_type_owner
 13    ,view_type
 14    ,superview_name
 15    ,editioning_view
 16    ,read_only
 17    ,container_data
 18    ,bequeath
 19    ,origin_con_id
 20    ,default_collation
 21    ,containers_default
 22    ,container_map
 23    ,extended_data_link
 24    ,extended_data_link_map
 25  from dba_views;

Table created.

SQL>
SQL> create index system.t_ix on system.t ( owner, view_name );

Index created.

That is all well and good, but of course, the moment I perform maintenance on any view in the system, that table will be “stale”. That still might be a perfectly reasonably solution for you – you just refresh the table contents (say) once per day, or when you do deployments of schema changes into your database.  But perhaps we can do a little better than that.  Using a DDL event trigger, we can capture changes on views and adjust our copy accordingly.


SQL> create or replace
  2  trigger sys.view$clob$handler
  3  after create or alter or drop or rename
  4  on database
  5  when ( ora_dict_obj_type = 'VIEW' )
  6  declare
  7    l_obj_name  varchar2(128) := ora_dict_obj_name;
  8    l_obj_owner varchar2(128) := ora_dict_obj_owner;
  9
 10    l_text   varchar2(1000);
 11    sql_text ora_name_list_t;
 12    l_idx    pls_integer;
 13  begin
 14    lock table system.t in exclusive mode;
 15
 16    --
 17    -- remove the existing row for the view
 18    --
 19    delete from system.t where owner = l_obj_owner and view_name = l_obj_name;
 20
 21    --
 22    -- if it is a rename event, we will try to derive the new name
 23    -- from the sql statement by looking for a trailing "TO"
 24    --
 25    if ora_sysevent in ('RENAME') then
 26      l_idx := ora_sql_txt(sql_text);
 27      for i in 1 .. l_idx
 28      loop
 29          l_text := l_text || sql_text(i);
 30      end loop;
 31      l_idx := instr(lower(l_text),' to ');
 32      if l_idx = 0 then
 33        raise_application_error(-20000,'Could not find appropriate rename content');
 34      end if;
 35      --
 36      -- The SQL has a trailing chr(0) which we need to remove
 37      --
 38      l_text := rtrim(ltrim(substr(l_text,l_idx+4)),' '||chr(0));
 39      --
 40      -- The object name in quotes mean we preserve the case specified, otherwise
 41      -- we will normalize to upper
 42      --
 43      if l_text like '"%"' then
 44        l_obj_name := rtrim(ltrim(l_text,'"'),'"');
 45      else
 46        l_obj_name := upper(l_text);
 47      end if;
 48    end if;
 49
 50    --
 51    -- Now we insert the updated definition for the view, or
 52    -- perhaps its new name
 53    --
 54    if ora_sysevent in ('CREATE','ALTER','RENAME') then
 55      insert into system.t
 56      select
 57         owner
 58        ,view_name
 59        ,text_length
 60        ,to_lob(text) text
 61        ,text_vc
 62        ,type_text_length
 63        ,type_text
 64        ,oid_text_length
 65        ,oid_text
 66        ,view_type_owner
 67        ,view_type
 68        ,superview_name
 69        ,editioning_view
 70        ,read_only
 71        ,container_data
 72        ,bequeath
 73        ,origin_con_id
 74        ,default_collation
 75        ,containers_default
 76        ,container_map
 77        ,extended_data_link
 78        ,extended_data_link_map
 79      from dba_views
 80      where owner = l_obj_owner
 81      and   view_name = l_obj_name;
 82    end if;
 83
 84  end;
 85  /

Trigger created.

Obviously, for more serious usage you’ll be choosing a better table name and not using SYS, but this is just a demo on my laptop.  Unlike a DML trigger, where we have access to “new” and “old” images of the data, for a RENAME command, I had to probe the SQL text to try derive the new name. So lets now perform some view DDL and see how our tracking trigger accommodates the changes.


SQL> create view view1 as select * from ALL_objects;

View created.

SQL> create view view2 as select * from ALL_objects;

View created.

SQL> create view view3 as select * from view2;

View created.

SQL> drop view view1;

View dropped.

SQL> rename view2 to view2a;

Table renamed.

SQL>
SQL>
SQL> select count(*) from system.t;

  COUNT(*)
----------
      7347

SQL> select count(*) from dba_views;

  COUNT(*)
----------
      7347

SQL> select owner, view_name from dba_views
  2  minus
  3  select owner, view_name from system.t;

no rows selected

SQL>
SQL> select owner, view_name from system.t
  2  minus
  3  select owner, view_name from dba_views;

no rows selected

And there we have it – our own custom version of DBA_VIEWS where the TEXT is now exposed as a CLOB column.  So now, mining that column for information is as easy as a simple predicate


SQL> select owner, view_name
  2  from   system.t
  3  where  text like 'selec%ALL_objects';

OWNER                          VIEW_NAME
------------------------------ -------------------
MCDONAC                        VIEW2A

Dealing with IP addresses

Kris Rice made mention on Twitter about the free (but limited lifetime) IP address to Country mappings that you can get from Maxmind.

image

If you ever need to deal with IP addresses, you might find the following utility package I wrote a while back useful.  It lets you convert from the string representation of an IP address to its numeric equivalent, and vice-versa.  It handles IPv4 and IPv6 with one caveat being that that I didn’t bother with the collapsed zeros for IPv6 so I could keep the performance snappy.  Free for your use without warranty or responsibility Smile


SQL> create or replace
  2  package ip_util is
  3
  4  function ip_num_from_str(p_ip_str varchar2) return number deterministic;
  5  function ip_str_from_num(p_ipnum number) return varchar2 deterministic;
  6
  7  end;
  8  /

Package created.

SQL>
SQL>
SQL> create or replace
  2  package body ip_util is
  3
  4    --
  5    -- constants need to be fixed, not expressions if you want to avoid ora-4068
  6    --
  7    l_ip41 constant number(12)  := 256;        -- power(256,1);
  8    l_ip42 constant number(12)  := 65536;      -- power(256,2);
  9    l_ip43 constant number(12)  := 16777216;   -- power(256,3);
 10    l_ip44 constant number(12)  := 4294967296; -- power(256,4);
 11
 12    l_ip61 constant number(38)  := 65536;                              --power(65536,1);
 13    l_ip62 constant number(38)  := 4294967296;                         --power(65536,2);
 14    l_ip63 constant number(38)  := 281474976710656;                    --power(65536,3);
 15    l_ip64 constant number(38)  := 18446744073709551616;               --power(65536,4);
 16    l_ip65 constant number(38)  := 1208925819614629174706176;          --power(65536,5);
 17    l_ip66 constant number(38)  := 79228162514264337593543950336;      --power(65536,6);
 18    l_ip67 constant number(38)  := 5192296858534827628530496329220096; --power(65536,7);
 19
 20
 21  function ip_num_from_str(p_ip_str varchar2) return number deterministic is
 22    l_ip_num     number;
 23    l_dot1       pls_integer;
 24    l_dot2       pls_integer;
 25    l_dot3       pls_integer;
 26    l_dot4       pls_integer;
 27
 28    l_colon      pls_integer;
 29    l_colon_cnt  pls_integer;
 30    l_hex        varchar2(32);
 31    l_ip_str     varchar2(64);
 32  begin
 33    if p_ip_str like '%.%' then
 34      l_dot1   := instr(p_ip_str,'.');
 35      l_dot2   := instr(p_ip_str,'.',l_dot1+1);
 36      l_dot3   := instr(p_ip_str,'.',l_dot2+1);
 37      l_dot4   := instr(p_ip_str,'.',l_dot3+1);
 38      if l_dot4 > 0 then
 39         raise_application_error(-20000,'Cannot be resolved to an IP4 address');
 40      end if;
 41
 42      l_ip_num :=  l_ip43*to_number(substr(p_ip_str,1,l_dot1-1)) +
 43                   l_ip42*to_number(substr(p_ip_str,l_dot1+1,l_dot2-l_dot1-1)) +
 44                   l_ip41*to_number(substr(p_ip_str,l_dot2+1,l_dot3-l_dot2-1)) +
 45                   to_number(substr(p_ip_str,l_dot3+1));
 46
 47    elsif p_ip_str like '%:%' then
 48      --
 49      -- Note: The abbreviation of "Consecutive sections of zeroes are replaced with a double colon (::)" is not implemented.
 50      --
 51      l_colon_cnt := length(p_ip_str)-length(replace(p_ip_str,':'));
 52      if l_colon_cnt != 7 then
 53         raise_application_error(-20000,'Cannot be resolved to an IP6 address');
 54      end if;
 55
 56      l_ip_str := p_ip_str||':';
 57      loop
 58        l_colon := instr(l_ip_str,':');
 59        l_hex := l_hex || lpad(substr(l_ip_str,1,l_colon-1),4,'0');
 60        l_ip_str := substr(l_ip_str,l_colon+1);
 61        exit when l_ip_str is null;
 62      end loop;
 63      l_ip_num := to_number(l_hex,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
 64    end if;
 65
 66    return l_ip_num;
 67  end;
 68
 69
 70  function ip_str_from_num(p_ipnum number) return varchar2 deterministic is
 71  begin
 72    if p_ipnum < l_ip44 then
 73      return  mod(trunc(p_ipnum/l_ip43),l_ip41) ||'.'||
 74              mod(trunc(p_ipnum/l_ip42),l_ip41) ||'.'||
 75              mod(trunc(p_ipnum/l_ip41),l_ip41) ||'.'||
 76              mod(p_ipnum,l_ip41);
 77    else
 78      --
 79      -- Note: The abbreviation of "Consecutive sections of zeroes are replaced with a double colon (::)" is not implemented.
 80      --
 81      return  to_char(mod(trunc(p_ipnum/l_ip67),l_ip61),'fmxxxx') ||':'||
 82              to_char(mod(trunc(p_ipnum/l_ip66),l_ip61),'fmxxxx') ||':'||
 83              to_char(mod(trunc(p_ipnum/l_ip65),l_ip61),'fmxxxx') ||':'||
 84              to_char(mod(trunc(p_ipnum/l_ip64),l_ip61),'fmxxxx') ||':'||
 85              to_char(mod(trunc(p_ipnum/l_ip63),l_ip61),'fmxxxx') ||':'||
 86              to_char(mod(trunc(p_ipnum/l_ip62),l_ip61),'fmxxxx') ||':'||
 87              to_char(mod(trunc(p_ipnum/l_ip61),l_ip61),'fmxxxx') ||':'||
 88              to_char(mod(p_ipnum,l_ip61),'fmxxxx');
 89    end if;
 90  end;
 91
 92  end;
 93  /

Package body created.

SQL> select ip_util.ip_num_from_str('192.168.1.2') from dual;

IP_UTIL.IP_NUM_FROM_STR('192.168.1.2')
--------------------------------------
                            3232235778

SQL> select ip_util.ip_str_from_num(3232235778) from dual;

IP_UTIL.IP_STR_FROM_NUM(3232235778)
-----------------------------------------------------------------------------------------------------------------
192.168.1.2

SQL> select ip_util.ip_num_from_str('2001:db8:0:0:0:ff00:42:8329') ip from dual;

                                          IP
--------------------------------------------
      42540766411282592856904265327123268393

SQL> select ip_util.ip_str_from_num(42540766411282592856904265327123268393) from dual;

IP_UTIL.IP_STR_FROM_NUM(42540766411282592856904265327123268393)
-----------------------------------------------------------------------------------------------------------------
2001:db8:0:0:0:ff00:42:8329


iASH–my “infinite ASH” routine

I love Active Session History (ASH) data because a lot of the work I’ve done in my consulting life was “after the fact” diagnosis.  By this I mean that many of us have been in a similar circumstance where the customer will contact you not when a problem is occurring, but only when you contact them for some other potentially unrelated reason.  At which point you hear will that dreaded sentence:

“Yeah, the Order Entry screen was really slow a couple of hours ago

And this is where ASH is an awesome resource.  With the ASH data available, there is a good chance you will be able to diagnose the issue without having to make an embarrassing request for the customer to repeat the task so that you can trace the underlying database activity.  Because no-one likes to be the person that says:

“Yeah that performance must have really sucked for you … Hey, let’s do it again!”

But even ASH has it’s limitations because sometimes the customer sentence is phrased slightly differently Smile

“Yeah, the Order Entry screen was really slow last Tuesday

In this case, it is unlikely that the ASH data will still be available.  Whilst a subset of the invaluable ASH data is retained in DBA_HIST_ACTIVE_SESS_HISTORY, I would prefer to keep the complete set of ASH data available for longer than the timespan for which it is practical (due to the limitations of memory). So I wrote myself a simple little routine that keep all of the ASH data around for longer.  It’s hardly rocket science – just a little partitioned table to capture V$ACTIVE_SESSION_HISTORY at regular intervals.  Let’s walk through it so you can understand it and adapt it for your own use.

First I’ll create a partitioned table to hold the ASH data.  I’m using partitioning to avoid the need to index the table, so the insertion cost is minimal.  I’m partitioning by day and the code assumes this, so take care if you intend to modify it.


SQL> create table ash_hist
  2  partition by range (sample_time)
  3  interval( numtodsinterval(1,'day'))
  4  (partition p1 values less than (timestamp' 2017-01-01 00:00:00'))
  5  as select * from sys.gv_$active_session_history;

Table created.

Here is my procedure to capture the data.  The essentials of the routine are:

  • Starting with the most recent partition, find the last recorded entry in ASH_HIST.  We’ll look back up to 10 days to find our starting point (hence the daily partitions).
  • If there is no data for the last 10 days, we’ll bomb out, because we haven’t been running the routine frequently enough.
  • Copy all the ASH data from this point to now into ASH_HIST using a nice efficient INSERT-APPEND, but we’ll skip the session that is doing the copying. (You can include it if you want just by removing line 8)
  • Once per week (you can control this by tinkering with the IF conditions on line 34) we’ll drop the oldest partitions.  By default I keep 90 days, but you can set this by altering “l_retention” on line 5.

SQL>
SQL> CREATE OR REPLACE procedure save_ash_hist is
  2    l_hi_val_as_string varchar2(1000);
  3    l_hi_val_as_date   date;
  4    l_max_recorded     timestamp;
  5    l_retention        number := 90;
  6
  7  begin
  8    dbms_application_info.set_module('$$SAVE_ASH$$','');
  9    -- we are looping to take advantage
 10    -- of partition elimination
 11
 12    for i in 0 .. 10 loop
 13       select max(sample_time)
 14       into   l_max_recorded
 15       from   ash_hist
 16       where  sample_time > systimestamp - i;
 17
 18       exit when l_max_recorded is not null;
 19    end loop;
 20
 21    if l_max_recorded is null then
 22      raise_application_error(-20000,'No max sample time with 10 days');
 23    end if;
 24    dbms_output.put_line('Last copied time was '||l_max_recorded);
 25
 26    insert /*+ append */ into ash_hist
 27    select *
 28    from sys.gv_$active_session_history
 29    where sample_time > l_max_recorded
 30    and   ( module != '$$SAVE_ASH$$' or module is null );
 31    dbms_output.put_line('Copied '||sql%rowcount||' rows');
 32    commit;
 33
 34    if to_char(sysdate,'DYHH24') between 'TUE01' and 'TUE06' then
 35
 36      begin
 37        execute immediate 'alter table ash_hist set interval ()';
 38      exception
 39        when others then null;
 40      end;
 41      execute immediate 'alter table ash_hist set interval (NUMTODSINTERVAL(1,''DAY''))';
 42
 43      for i in ( select *
 44                 from   user_tab_partitions
 45                 where  table_name = 'ASH_HIST'
 46                 and    partition_position > 1
 47                 order by partition_position )
 48      loop
 49        l_hi_val_as_string := i.high_value;
 50        execute immediate 'select '||l_hi_val_as_string||' from dual' into l_hi_val_as_date;
 51
 52        if l_hi_val_as_date < sysdate - l_retention then
 53          execute immediate 'alter table ash_hist drop partition '||i.partition_name;
 54        else
 55          exit;
 56        end if;
 57
 58      end loop;
 59    end if;
 60  end;
 61  /

Procedure created.

And that is all there is to it.  Each time we run the procedure, we’ll grab all the ASH data since the last time we ran and keep it in ASH_HIST.


SQL>
SQL> select count(*) from ash_hist;

  COUNT(*)
----------
       792

1 row selected.


SQL>
SQL> exec save_ash_hist

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from ash_hist;

  COUNT(*)
----------
       819

1 row selected.

A simple scheduler job to run the routine every couple of hours (I’m assuming your SGA holds at least 2 hours of samples in V$ACTIVE_SESSION_HISTORY – if not, you’d need to adjust the frequency) and you’re off and running.


SQL>
SQL>
SQL> BEGIN
  2      dbms_scheduler.create_job (
  3         job_name           =>  'ASH_CAPTURE',
  4         job_type           =>  'PLSQL_BLOCK',
  5         job_action         =>  'save_ash_hist;',
  6         start_date         =>  CAST((TRUNC(SYSDATE,'HH') + (1/24) + (55/24/60)) AS TIMESTAMP), -- job commences at 55 mins past the next hour
  7         repeat_interval    =>  'FREQ=HOURLY; INTERVAL=2',
  8         enabled            =>  true,
  9         comments           =>  'Permanent record of ASH data');
 10  END;
 11  /

PL/SQL procedure successfully completed.

You can assume all of the standard disclaimers here. Use at own risk, blah blah, no warranty, blah blah, etc

Addenda:  I should add that you could write this complete level of detail directly to DBA_HIST_ACTIVE_SESS_HISTORY via tinkering with “_ash_disk_filter_ratio” , but please get the endorsement of Support first.

Parsing freeform data in flat files

SQL loader is a very cool utility that has existed for a long time within Oracle to load flat files into the database. However sometimes people find the control file syntax quite cryptic, and when it comes to passing very complicated structures, this can mean control files which are hard to maintain. For me the best solution here is to use an external table. That way we can combine the power of the SQL Loader control file syntax embedded within the external table definition, along with the full power of PL/SQL and SQL for additional parsing of that data.

Here is an example where the data is spread across multiple lines and the task is to bring all that data together into a natural form, namely an ID followed by text.

So here is my file that has free format text


10,some data
that is really 
annoying because it
spans lots of rows.
20,and some more than does not.
30,and a mixture of stuff and we 
will assume that numerics then comm as 
is the thing that tells us it is a new line. 
40,otherwise we will keep on going.

What I’m going to do is use an external table to bring that data in as it is within the file, and then use a PL/SQL function in pipelined mode to parse the data into the component attributes.


SQL>
SQL> create table freeform
  2  (
  3  line varchar2(200)
  4  )
  5  ORGANIZATION external
  6  (
  7  TYPE oracle_loader
  8  DEFAULT DIRECTORY temp
  9  ACCESS PARAMETERS
 10  (
 11  RECORDS DELIMITED BY NEWLINE
 12  FIELDS LDRTRIM
 13  (
 14  line char(200)
 15  )
 16  )
 17  location
 18  (
 19  'freeform.dat'
 20  )
 21  )REJECT LIMIT UNLIMITED ;

Table created.

SQL>
SQL> select * from freeform;

LINE
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10,some data
that is really
annoying because it
spans lots of rows.
20,and some more than does not.
30,and a mixture of stuff and we
will assume that numerics then comm as
is the thing that tells us it is a new line.
40,otherwise we will keep on going.

9 rows selected.

SQL>
SQL> create or replace
  2  type output_data as object (id int, str varchar2(500))
  3  /

Type created.

SQL>
SQL> create or replace
  2  type output_list as table of output_data
  3  /

Type created.

SQL>
SQL> create or replace
  2  function cleanup_that_sucker(rc sys_refcursor) return output_list pipelined is
  3    type strlist is table of varchar2(200) index by pls_integer;
  4    s strlist;
  5    o output_data := output_data(null,null);
  6    c pls_integer := 0;
  7  begin
  8    loop
  9      fetch rc bulk collect into s limit 500;
 10
 11      for i in 1 .. s.count loop
 12        c := c + 1;
 13        if regexp_like(s(i),'^[[:digit:]]+,') then
 14          if c > 1 then
 15             pipe row (o);
 16          end if;
 17          o.id  := substr(s(i),1,instr(s(i),',')-1);
 18          o.str := substr(s(i),instr(s(i),',')+1);
 19        else
 20          o.str := o.str || ' ' || s(i);
 21        end if;
 22      end loop;
 23      exit when rc%notfound;
 24    end loop;
 25    pipe row (o);
 26    return;
 27  end;
 28  /

Function created.

SQL> sho err
No errors.
SQL>
SQL> select *
  2  from cleanup_that_sucker(cursor(
  3          select line from freeform
  4          ));

        ID STR
---------- ------------------------------------------------------------------------------------------------------------------------
        10 some data that is really annoying because it spans lots of rows.
        20 and some more than does not.
        30 and a mixture of stuff and we will assume that numerics then comm as is the thing that tells us it is a new line.
        40 otherwise we will keep on going.

4 rows selected.

SQL>
SQL>

And there we have it. We have the full power of PL/SQL and SQL at our fingertips even though we are parsing flat file data. This avoids complicated control file syntax and makes ongoing maintenance far easier. Performance is still good because we are using the bulk collect feature in PL/SQL. Also because we are taking a cursor expression as input, we have lots of flexibility in terms of what data are we passing in. We simply change the SQL expression in the cursor.

Buzzword Bingo

Looking for that catchy title for your next presentation ?

I took the first word from the title of 1000 Oracle OpenWorld presentations, and looked for patterns Smile.  I omitted some obvious terms that are either products or definite/indefinite articles:

  • Oracle
  • Peoplesoft 
  • The
  • How 
  • OAUG
  • General
  • MySQL

And here is what we end up with:


SQL> select buzz, count(*)
  2  from buzz
  3  group by buzz
  4  order by 2 desc;

BUZZ                                                                                                   COUNT(*)
---------------------------------------------------------------------------------------------------- ----------
Building                                                                                                     18
Customers                                                                                                    14
Best                                                                                                         13
Digital                                                                                                      11
Migrating                                                                                                    10
Getting                                                                                                      10
Accelerate                                                                                                    8
Modernize                                                                                                     8
Implementing                                                                                                  7
Data                                                                                                          7
Transforming                                                                                                  6
Extend                                                                                                        6
Identifying                                                                                                   6
Managing                                                                                                      6
Achieving                                                                                                     6
Leveraging                                                                                                    5
Making                                                                                                        5
Creating                                                                                                      4

So there you go…All the verbs and adjectives you need for a successful presentation Smile

AskTOM TV–episode 11

Just a quick note to give the supporting collateral to the latest episode of AskTOM TV.

The question I tackled is this one:

https://asktom.oracle.com/pls/apex/asktom.search?tag=want-to-retrive-numbers-in-words

which was a fun one to answer because it showcases several useful SQL techniques:

  • Using CONNECT to synthesize rows,
  • Using regular expressions to parse text ,
  • Using MULTICAST in Oracle 10g to emulate the native LISTAGG functions from 11g onwards ,
  • Using the hooks into the OCI aggregation facilities to build custom aggregations ,
  • The JSP format mask as a mean to generate numeric words

And here is the entire script from the episode if you want to run it yourself.



drop type string_agg_type;
col column_value format a60
col digit format a60
col concat_str format a60
drop table  t purge;


select to_char(to_date('7','J'),'JSP') from dual;

select to_char(to_date('0','J'),'JSP') from dual;

select
  case x
    when '0' then 'zero'
    when '1' then 'one'
    when '2' then 'two'
    when '3' then 'three'
    when '4' then 'four'
    when '5' then 'five'
    when '6' then 'six'
    when '7' then 'seven'
    when '8' then 'eight'
    when '9' then 'nine'
  end
from ( select '3' x from dual ) 

/

select
  case x
    when '0' then 'zero'
    when '1' then 'one'
    when '2' then 'two'
    when '3' then 'three'
    when '4' then 'four'
    when '5' then 'five'
    when '6' then 'six'
    when '7' then 'seven'
    when '8' then 'eight'
    when '9' then 'nine'
  end
from (
  select substr('123',rownum,1) x
  from dual
  connect by level <= 3
  ) 

/  


create or replace type string_list is table of varchar2(1000);
/

create table t ( x int );
insert into t values (101);
insert into t values (456);
insert into t values (789);

select *
from t,
     table(cast(multiset(
        select substr(to_char(t.x),rownum,1)
        from dual
connect by level <= length(to_char(t.x))) as string_list)
)

/


select
  x,
  digit
from (
  select x, column_value digit
  from t,
       table(cast(multiset(
          select 
            case substr(to_char(t.x),rownum,1)
              when '0' then 'zero'
              when '1' then 'one'
              when '2' then 'two'
              when '3' then 'three'
              when '4' then 'four'
              when '5' then 'five'
              when '6' then 'six'
              when '7' then 'seven'
              when '8' then 'eight'
              when '9' then 'nine'
            end str
          from dual
          connect by level <= length(to_char(t.x))) as string_list)
  )
)

/

create or replace type string_agg_type as object
(
   data  string_list,

   static function
        ODCIAggregateInitialize(sctx IN OUT string_agg_type )
        return number,

   member function
        ODCIAggregateIterate(self IN OUT string_agg_type ,
                             value IN varchar2 )
        return number,

   member function
        ODCIAggregateTerminate(self IN string_agg_type,
                               returnValue OUT  varchar2,
                               flags IN number)
        return number,

   member function
        ODCIAggregateMerge(self IN OUT string_agg_type,
                           ctx2 IN string_agg_type)
        return number
);
/
 
create or replace type body string_agg_type
is

static function ODCIAggregateInitialize(sctx IN OUT string_agg_type)
return number
is
begin
    sctx := string_agg_type( string_list() );
    return ODCIConst.Success;
end;

member function ODCIAggregateIterate(self IN OUT string_agg_type,
                                     value IN varchar2 )
return number
is
begin
    data.extend;
    data(data.count) := value;
    return ODCIConst.Success;
end;

member function ODCIAggregateTerminate(self IN string_agg_type,
                                       returnValue OUT varchar2,
                                       flags IN number)
return number
is
    l_data varchar2(4000);
begin
    for x in ( select column_value from TABLE(data) order by 1 )
    loop
            l_data := l_data || ',' || x.column_value;
    end loop;
    returnValue := ltrim(l_data,',');
    return ODCIConst.Success;
end;

member function ODCIAggregateMerge(self IN OUT string_agg_type,
                                   ctx2 IN string_agg_type)
return number
is
begin
    for i in 1 .. ctx2.data.count
    loop
            data.extend;
            data(data.count) := ctx2.data(i);
    end loop;
    return ODCIConst.Success;
end;

end;
/
 
CREATE or replace
FUNCTION stragg(input varchar2 )
RETURN varchar2
PARALLEL_ENABLE AGGREGATE USING string_agg_type;
/


with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select 
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, stragg(digit) concat_str
from   source_data
group by x
order by 1

/
 
with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select '@'||lpad(level,10,'0')||'~'||
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, stragg(digit)concat_str
from   source_data
group by x
order by 1

/


with source_data as
(
    select
      x,
      digit
    from (
      select x, column_value digit
      from t,
           table(cast(multiset(
              select '@'||lpad(level,10,'0')||'~'||
                case substr(to_char(t.x),rownum,1)
                  when '0' then 'zero'
                  when '1' then 'one'
                  when '2' then 'two'
                  when '3' then 'three'
                  when '4' then 'four'
                  when '5' then 'five'
                  when '6' then 'six'
                  when '7' then 'seven'
                  when '8' then 'eight'
                  when '9' then 'nine'
                end str
              from dual
              connect by level <= length(to_char(t.x))) as string_list)
      )
    )
)
select x, regexp_replace(stragg(digit),'\@[0-9]*\~') concat_str
from   source_data
group by x
order by 1


/