At CloudWorld, after one of my sessions an attendee came up to me and said:
“Wouldn’t it be cool if we could modularise frequent SQL commands”.
I initially thought he meant database views, but he clarified his statement further:
“For example, I regularly load data from an external source and I need to check for, and remove, duplicate data based on the intended primary key. Every time I have a different table and key columns, I need to remember the syntax of the GROUP BY or the ROW_NUMBER function etc.”
So this was not really about modular SQL but modularising the solution of common problems that we often tackle with SQL.
Enter SQL macros. The ability to pass in table and columns dynamically without any fear of SQL injection make macros the perfect tool for a build-once-reuse-many approach to handling complex SQL challenges.
I took the attendees request for a modular duplicate checker, and with a simple SQL macro we can provide a powerful yet re-usable solution.
SQL> create or replace
2 function remove_dups(
3 p_table in dbms_tf.table_t,
4 p_cols in dbms_tf.columns_t,
5 p_mode in number default 1)
6 return varchar2 sql_macro(table) as
7 l_cols varchar2(1000);
8 begin
9 for i in 1 .. p_cols.count loop
10 l_cols := l_cols || ',t.' || p_cols(i);
11 end loop;
12 l_cols := ltrim(l_cols,',');
13
14 return '
15 select '||case when p_mode = 2 then 'rid' else '*' end||
16 ' from (
17 select t.*, t.rowid rid,
18 row_number() over ( partition by '||l_cols||' order by rowid ) as rn,
19 count(*) over ( partition by '||l_cols||') as dup_cnt
20 from p_table t
21 )
22 where '||
23 case
24 when p_mode = 1 then ' dup_cnt > 1'
25 when p_mode = 2 then ' dup_cnt > 1 and rn > 1'
26 when p_mode = 3 then ' dup_cnt > 1 and rn = 1'
27 end;
28 end;
29 /
Function created.
To demonstrate it, we’ll create a copy of the EMP table and then add a couple of duplicate rows.
SQL> create table emp as select * from scott.emp;
Table created.
SQL> insert into emp
2 select * from scott.emp where rownum <= 2;
2 rows created.
Default usage is to report all rows that are duplicates.
SQL> select * from remove_dups(emp,columns(empno));
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO RID RN DUP_CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ------------------ ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20 AAAjAzAAMAAAAFrAAA 1 2
7369 SMITH CLERK 7902 17-DEC-80 800 20 AAAjAzAAMAAAAFvAAA 2 2
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 AAAjAzAAMAAAAFrAAB 1 2
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 AAAjAzAAMAAAAFvAAB 2 2
If you were then looking to remove the duplicate entries, mode 2 of the macro returns just the ROWIDs of the duplicates that you would pass into a a DELETE WHERE ROWID in ( ) statement.
SQL> select * from remove_dups(emp,columns(empno),p_mode=>2);
RID
------------------
AAAjAzAAMAAAAFvAAA
AAAjAzAAMAAAAFvAAB
And finally, mode 3 returns only the first occurrence of any row that is a duplicate
SQL> select * from remove_dups(emp,columns(empno),p_mode=>3);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO RID RN DUP_CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ------------------ ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20 AAAjAzAAMAAAAFrAAA 1 2
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 AAAjAzAAMAAAAFrAAB 1 2
Obviously the only limit here is your imagination. You could have a mode that returns all rows from the table but only the first occurrence of a duplicate, or you could have additional parameters to decide which columns to return etc. The possibilities are endless.
I feel that we haven’t even scratched the surface of the plethora of use cases we will ultimately find for SQL macros.




Leave a reply to jkstill Cancel reply