I had a fun request come in from a colleague the other day. They had a simple list of fully qualified file names and they needed to present that data in the familiar hierarchical tree layout.
To demonstrate, I took a little trip down memory lane and grabbed a subset of presentations I’ve done over the years.
SQL> create table t ( fname varchar2(1000));
Table created.
SQL>
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205.pdf');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\scene_200205a.pdf');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254old.ppt');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\254_full.ppt');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2002\257.ppt');
1 row created.
...
...
...
SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer.pptx');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2018\london\optimizer_full.pptx');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short.pptx');
1 row created.
SQL> insert into t values ('C:\Users\Connor\Presentations\2018\odtug\common_disasters_short_comp.pptx');
1 row created.
SQL> select count(*) from t;
COUNT(*)
----------
634
1 row selected.
So the challenge was to present this list of files in a nice tree format which mirrored my directory structure.
(Truth be told, my preference would be that the query would do a better job than I do in trying to organize my files, but that is a blog post for another day )
Step 1 was to use a little LATERAL trickery to parse out all of the elements of the file names into one row per “element”, where “element” an unqualified file name, or single level folder name.
SQL> select id, seq, token
2 from
3 ( select rownum id, fname from t ) t_base,
4 lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
5 from dual
6 connect by level <= 1+length(fname)-length(replace(fname,'\'))
7 );
ID SEQ TOKEN
---------- ---------- ------------------------------------------------------------------------------------
1 1 C:
1 2 Users
1 3 Connor
1 4 Presentations
1 5 2002
1 6 scene_200205.pdf
2 1 C:
2 2 Users
2 3 Connor
2 4 Presentations
2 5 2002
2 6 scene_200205a.pdf
3 1 C:
3 2 Users
3 3 Connor
3 4 Presentations
3 5 2002
3 6 254old.ppt
4 1 C:
4 2 Users
4 3 Connor
4 4 Presentations
4 5 2002
4 6 254_full.ppt
...
...
We can use that as input to a standard LAG function to associate each file/folder with its parent, using the SEQ column to provide sequencing within each ID
SQL> with data as
2 (
3 select id, seq, token
4 from
5 ( select rownum id, fname from t ) t_base,
6 lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
7 from dual
8 connect by level <= 1+length(fname)-length(replace(fname,'\'))
9 )
10 )
11 select
12 id,
13 seq,
14 token,
15 lag(token) over ( partition by id order by seq ) as parent
16 from data
17 order by id, seq desc;
ID SEQ TOKEN PARENT
---------- ---------- -------------------------------------------------- -----------------------------------
1 6 scene_200205.pdf 2002
1 5 2002 Presentations
1 4 Presentations Connor
1 3 Connor Users
1 2 Users C:
1 1 C:
2 6 scene_200205a.pdf 2002
2 5 2002 Presentations
2 4 Presentations Connor
2 3 Connor Users
2 2 Users C:
2 1 C:
3 6 254old.ppt 2002
3 5 2002 Presentations
3 4 Presentations Connor
3 3 Connor Users
3 2 Users C:
...
...
That’s looking good, but we have duplicates in the sense that multiple files will roll up to a single folder (eg the “C:” root is repeated).. So we DISTINCT it out to leave just the relationships we need
SQL> with data as
2 (
3 select id, seq, token
4 from
5 ( select rownum id, fname from t ) t_base,
6 lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') token
7 from dual
8 connect by level <= 1+length(fname)-length(replace(fname,'\'))
9 )
10 )
11 select
12 distinct
13 token,
14 lag(token) over ( partition by id order by seq ) as parent
15 from data
16 /
TOKEN PARENT
-------------------------------------------------- --------------------------------------------------
C:
254old.ppt 2002
257.ppt 2002
plus2.ppt 2002
254.ppt UKOUG2002_final
error_handling.pdf 2003
2004 Presentations
mcdonald1.pdf 2004
sun_storage_brief.ppt sun_storage_presn
ausoug05_memory_x.ppt 2005
jul05_read_consistency.ppt 2005
saoug_oracle_home.ppt 2005
plsql_80_a58236.pdf sql_plsql_evo
jun07_tas_moving_data_short.ppt misc
ausoug_adel 2008
successful_development.pdf ausoug_adel
apr08_11g_developers.ppt ausoug_apr
apr08_11g_developers_print.ppt ausoug_apr
mcdonald_oct08_flashback.pdf perth
oct08_11g_developers.ppt template
nov08_read_consistency.pdf rwwa
perf101b_rwwa.pptx rwwa
sqlectric_melb 2009
sql_electric2.pdf sqlectric_melb
sql_electric1.pptx sqlectric_melb
ukoug 2009
...
...
...
And we’re pretty much done. At this point, we have the raw data we need in a child/parent form that we could use to traverse with a standard hierarchy CONNECT BY query – I’ll use LPAD to show the hierarchy relationships
SQL> with data as
2 (
3 select id, seq, token
4 from
5 ( select rownum id, fname from t ) t_base,
6 lateral(select level seq, rtrim(regexp_substr(t_base.fname || '\', '(.*?)\\', 1, level),'\') toke
7 from dual
8 connect by level <= 1+length(fname)-length(replace(fname,'\'))
9 )
10 ),
11 hier as(
12 select
13 distinct
14 token,
15 lag(token) over ( partition by id order by seq ) as parent
16 from data
17 )
18 select lpad(' ',level*2)||token hier_list
19 from hier
20 start with parent is null
21 connect by prior token = parent
22 order siblings by token;
HIER_LIST
--------------------------------------------------------------------------------------------------------
C:
Users
Connor
Presentations
2002
254_full.ppt
254old.ppt
257.ppt
UKOUG2002_final
254.ppt
257.ppt
prelim.ppt
dec_9i_features2.ppt
dec_legends.ppt
dec_legends_prelim.ppt
jul_sig.ppt
jul_sig_printout.ppt
june_sig.ppt
logo.ppt
march_sig.ppt
march_sig2.ppt
plaus.ppt
plus2.ppt
scene_200205.pdf
scene_200205a.pdf
sep_unix_sig.ppt
sep_unix_sig2.ppt
2003
254.ppt
abm_succesful_development.ppt
apr03_perth.ppt
dec03_perth_9i_nf.ppt
dec03_perth_9i_nf_popquiz.pdf
dec03_perth_9i_nf_popquiz.ppt
error_handling.pdf
error_handling.ppt
ims_performance_examples.pdf
ims_performance_examples.ppt
sep03_perth_9i_hidden.pdf
sep03_perth_9i_hidden.ppt
sep03_perth_9i_nf.ppt
sep03_perth_legends.pdf
sep03_perth_legends.ppt
sep03_perth_prelim.ppt
sf_oow_03.pdf
sf_oow_03.ppt
sf_oow_03a.ppt
slide.ppt
succesful_development.pdf
succesful_development.ppt
tools_for_performance.pdf
tools_for_performance.ppt
2004
10046_scene.pdf
bind_var.pdf
bind_var.ppt
dec03_perth_book.ppt
dec03_perth_book2.ppt
generating_test_data.pdf
generating_test_data.ppt
mar04_hotsos_legends.ppt
mar04_hotsos_nf_w2k.ppt
mar04_hotsos_nf_w2k_edit.ppt
mcdonald1.pdf
mcdonald2.pdf
nov04_reorgs_rebuild.ppt
nov04_succesful_development.ppt
...
...
...
2018
apex_connect
apex_connect_sql_plsql.pdf
apex_connect_sql_plsql.pptx
apex_connect_sql_plsql_v2.pptm
apex_connect_sql_plsql_v2.pptx
apex_connect_sql_plsql_v3.pptm
apex_connect_sql_plsql_v3_subset.pptm
extensions1.pptx
extensions2.pptx
code_china
better_sql_oracle_code.pptx
better_sql_oracle_code_v2.pdf
better_sql_oracle_code_v2.pptx
code_hyderabad
better_sql_oracle_code.pptx
hong_kong
hong_kong_122.pdf
hong_kong_122.pptx
hong_kong_122_no_multi.pptx
hong_kong_partitioning_ora_template.pptx
hong_kong_partitioning_ora_template2.pdf
hong_kong_partitioning_ora_template2.pptx
london
london.pdf
london.pptx
optimizer.pdf
optimizer.pptx
optimizer_full.pptx
odtug
common_disasters_short.pptx
common_disasters_short_comp.pptx
And there we have it. From flat file listing to a nice hierarchical layout just by tackling the problem piece by piece using the trusty WITH clause.
You can avoid the use of the lateral join by tweaking the connect by, plus the additional connect by can be avoided by using analytic functions instead, e.g.:
Whether that is faster than your query is debateable though! *{:-)
Following Juri’s comment (well spotted!!), here’s a corrected version of my query above:
Dear Connor,
many thanks for this example of using LATERAL.
When trying to rewrite your original query for use in Oracle 11g (without LATERAL) I discovered that it delivers wrong results if subfolders with the same name (like PROD and TEST in the test case below) appear in different places in the directory hierarchy.
So here the test case and corrected query:
create table t ( fname varchar2(1000));
insert into t (fname) values (‘C:\readme.txt’);
insert into t (fname) values (‘C:\PROD\prod_readme.txt’);
insert into t (fname) values (‘C:\TEST\test_readme.txt’);
insert into t (fname) values (‘C:\Data\generic_data_1.xls’);
insert into t (fname) values (‘C:\Data\generic_data_2.xls’);
insert into t (fname) values (‘C:\Data\PROD\prod_data_1.xls’);
insert into t (fname) values (‘C:\Data\PROD\prod_data_2.xls’);
insert into t (fname) values (‘C:\Data\TEST\test_data_1.xls’);
insert into t (fname) values (‘C:\Data\TEST\test_data_2.xls’);
insert into t (fname) values (‘C:\Docs\generic_doc_1.txt’);
insert into t (fname) values (‘C:\Docs\generic_doc_2.txt’);
insert into t (fname) values (‘C:\Docs\PROD\prod_doc_1.txt’);
insert into t (fname) values (‘C:\Docs\PROD\prod_doc_2.txt’);
insert into t (fname) values (‘C:\Docs\TEST\test_doc_1.txt’);
insert into t (fname) values (‘C:\Docs\TEST\test_doc_2.txt’);
insert into t (fname) values (‘C:\Programs\generic_prog_1.exe’);
insert into t (fname) values (‘C:\Programs\generic_prog_2.exe’);
insert into t (fname) values (‘C:\Programs\PROD\prod_prog_1.exe’);
insert into t (fname) values (‘C:\Programs\PROD\prod_prog_2.exe’);
insert into t (fname) values (‘C:\Programs\TEST\test_prog_1.exe’);
insert into t (fname) values (‘C:\Programs\TEST\test_prog_2.exe’);
commit;
with data as
(
select id, seq, token, parent_folder
from
( select rownum id, fname from t ) t_base,
lateral(select level seq, rtrim(regexp_substr(t_base.fname || ‘\’, ‘(.*?)\\’, 1, level),’\’) token
,case
when level > 1
then
substr(t_base.fname, 1, instr(t_base.fname, ‘\’, 1, level – 1))
end as parent_folder
from dual
connect by level <= 1+length(fname)-length(replace(fname,'\'))
)
),
hier as(
select
distinct
token,
lag(token) over ( partition by id order by seq ) as parent,
parent_folder
from data
)
select lpad(' ',level*2)||token hier_list
from hier
start with parent is null
connect by prior token = parent
and prior parent_folder||parent||'\' = parent_folder
order siblings by token;
Best regards,
Juri
Hello All,
Just for fun, here is another variant, using data from the post of Juri, above:
WITH data (fname, parent, child, lvl)
AS
(
SELECT fname,
REGEXP_SUBSTR(fname,'(.*?)\\([^\]*$)’,1,1,NULL,1) parent,
REGEXP_SUBSTR(fname,'(.*?)\\([^\]*$)’,1,1,NULL,2) child,
1
FROM t
UNION ALL
SELECT fname,
REGEXP_SUBSTR(parent,'(.*?)\\([^\]*$)’,1,1,NULL,1) parent,
REGEXP_SUBSTR(parent,'(.*?)\\([^\]*$)’,1,1,NULL,2) child,
lvl + 1
FROM data
WHERE INSTR(parent,’\’) > 0
),
hier
AS
(
SELECT DISTINCT parent, null child
FROM data
START WITH INSTR(parent,’\’) = 0
CONNECT BY parent = PRIOR parent || ‘\’ || PRIOR child
UNION ALL
SELECT parent, child
FROM data
WHERE lvl = 1
)
SELECT
CASE
WHEN child IS NULL THEN LPAD(‘.’, 2 * REGEXP_COUNT(parent,’\\’),’.’) || PARENT
ELSE LPAD(‘.’, 2 * (REGEXP_COUNT(parent,’\\’) + 1),’.’) || CHILD
END files_hierarchy
FROM HIER
ORDER BY parent, child NULLS FIRST
/
FILES_HIERARCHY
—————————-
C:
..readme.txt
..C:\Data
….generic_data_1.xls
….generic_data_2.xls
….C:\Data\PROD
……prod_data_1.xls
……prod_data_2.xls
….C:\Data\TEST
……test_data_1.xls
……test_data_2.xls
..C:\Docs
….generic_doc_1.txt
….generic_doc_2.txt
….C:\Docs\PROD
……prod_doc_1.txt
……prod_doc_2.txt
….C:\Docs\TEST
……test_doc_1.txt
……test_doc_2.txt
..C:\PROD
….prod_readme.txt
..C:\Programs
….generic_prog_1.exe
….generic_prog_2.exe
….C:\Programs\PROD
……prod_prog_1.exe
……prod_prog_2.exe
….C:\Programs\TEST
……test_prog_1.exe
……test_prog_2.exe
..C:\TEST
….test_readme.txt
33 rows selected.
Thanks a lot & Best Regards,
Iudith Mentzel