LISTAGG hits prime time

February 5, 2019

2–4 minutes

It’s a simple requirement. We want to transform this:


SQL> select deptno, ename
  2  from   emp
  3  order by 1,2;

    DEPTNO ENAME
---------- ----------
        10 CLARK
        10 KING
        10 MILLER
        20 ADAMS
        20 FORD
        20 JONES
        20 SCOTT
        20 SMITH
        30 ALLEN
        30 BLAKE
        30 JAMES
        30 MARTIN
        30 TURNER
        30 WARD

into this:


    DEPTNO MEMBERS
---------- -------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Dinosaurs like myself remember the terrible trouble we used to go to in order to solve this seemingly simple problem. We could use the MODEL clause,


SQL> select deptno , rtrim(ename,',') enames
  2  from ( select deptno,ename,rn
  3         from emp
  4         model
  5         partition by (deptno)
  6         dimension by (
  7            row_number() over
  8             (partition by deptno order by ename) rn
  9               )
 10         measures (cast(ename as varchar2(40)) ename)
 11         rules
 12         ( ename[any]
 13             order by rn desc = ename[cv()]||','||ename[cv()+1])
 14         )
 15   where rn = 1
 16   order by deptno;

    DEPTNO ENAMES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

or we could use hierarchical trickery,


SQL> select deptno,
  2         substr(max(sys_connect_by_path(ename, ',')), 2) members
  3  from (select deptno, ename,
  4               row_number ()
  5                   over (partition by deptno order by empno) rn
  6        from emp)
  7  start with rn = 1
  8  connect by prior rn = rn - 1
  9  and prior deptno = deptno
 10  group by deptno
 11  /

    DEPTNO MEMBERS
---------- ---------------------------------------------------------
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        10 CLARK,KING,MILLER

or we could build our own aggregation routine from the ground up,


SQL> create or replace type string_agg_type as object
  2  (
  3     total varchar2(4000),
  4  
  5     static function
  6          ODCIAggregateInitialize(sctx IN OUT string_agg_type )
  7          return number,
  8  
  9     member function
 10          ODCIAggregateIterate(self IN OUT string_agg_type ,
 11                               value IN varchar2 )
 12          return number,
 13  
 14     member function
 15          ODCIAggregateTerminate(self IN string_agg_type,
 16                                 returnValue OUT  varchar2,
 17                                 flags IN number)
 18          return number,
 19  
 20     member function
 21          ODCIAggregateMerge(self IN OUT string_agg_type,
 22                             ctx2 IN string_agg_type)
 23          return number
 24  );
 25  /

Or we had some sort of personality disorder Smile then we could resort to manipulating some XML via XMLDB.


SQL> select deptno,
  2     xmltransform
  3     ( sys_xmlagg
  4        ( sys_xmlgen(ename)
  5        ),
  6       xmltype
  7       (
  8         '<?xml version="1.0"?><xsl:stylesheet version="1.0"
  9           xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 10            <xsl:template match="/">
 11              <xsl:for-each select="/ROWSET/ENAME">
 12                <xsl:value-of select="text()"/>,</xsl:for-each>
 13            </xsl:template>
 14          </xsl:stylesheet>'
 15       )
 16    ).getstringval() members
 17  from emp
 18  group by deptno;

    DEPTNO MEMBERS
---------- --------------------------------------------------------
        10 CLARK,MILLER,KING,
        20 SMITH,FORD,ADAMS,SCOTT,JONES,
        30 ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD,

Thankfully all of these were solved once we made it to 11g with the simple LISTAGG function


SQL> select deptno,
  2         listagg( ename, ',')
  3              within group (order by empno) members
  4  from   emp
  5  group  by deptno;

    DEPTNO MEMBERS
---------- -----------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

I’ve always liked LISTAGG because the function reads like the business requirement we are trying to meet:

“AGG-regate a LIST of ENAMEs in sequence of EMPNO within the grouping of DEPTNO”

But there has been once problem with LISTAGG since it’s arrival in 11g, and that is what to do with duplicate data. Duplicates can either just look messy, for example, when I swap out ENAME for JOB:


SQL> select deptno
 2          listagg(job,',') within group ( order by job) as jobs
 3   from   scott.emp
 4   group by deptno
 5   order by 1;
 
   DEPTNO JOBS
--------- --------------------------------------------------
       10 CLERK,MANAGER,PRESIDENT
       20 ANALYST,ANALYST,CLERK,CLERK,MANAGER
       30 CLERK,MANAGER,SALESMAN,SALESMAN,SALESMAN,SALESMAN

Or they could be even worse in the situation where the number of duplicates results in data exceeding allowing database limits. In the example below, there are hundreds of objects each with the same object type within a schema, and hence the aggregation blows the length limit for a varchar2.


SQL> select owner
 2          listagg(object_type,',') within group 
 3              ( order by object_id ) as types
 4   from   all_ojects
 5   group by owner
 6   order by 1;
ERROR:
ORA-01499: result of string concatenation is too long

With 19c, our (technical) prayers have been answered with the long awaited arrival of the DISTINCT extension to the syntax.


SQL> select deptno,
  2        listagg(distinct job,',') within group ( order by job ) as jobs
  3  from   scott.emp
  4  group by deptno
  5  order by 1;

    DEPTNO JOBS
---------- ------------------------------------------------------------
        10 CLERK,MANAGER,PRESIDENT
        20 ANALYST,CLERK,MANAGER
        30 CLERK,MANAGER,SALESMAN

3 rows selected.

SQL> select owner,
  2        listagg(distinct object_type,',') within group ( order by object_type ) as types
  3  from   all_objects
  4  group by owner
  5  order by 1;

OWNER
------------------------------
TYPES
------------------------------------------------------------------------------------------------------------
------------
APPQOSSYS
SYNONYM,TABLE

AUDSYS
INDEX PARTITION,LIBRARY,PACKAGE,PACKAGE BODY,TABLE,TABLE PARTITION,VIEW

CTXSYS
FUNCTION,INDEX,INDEXTYPE,LIBRARY,OPERATOR,PACKAGE,PACKAGE BODY,PROCEDURE,SEQUENCE,TABLE,TYPE,TYPE BODY,VIEW

etc

13 responses to “LISTAGG hits prime time”

Rumpi Gravenstein

February 5, 2019 at 8:52 pm

Now only if LISTAGG could be extended to support generating CLOBs…

Loading…

Reply
Mentzel Iudith

February 6, 2019 at 3:15 am

Hello Connor,

Here are 3 pre-19c solutions for “LISTAGG DISTINCT”:

select deptno, listagg(job, ‘,’) within group (order by job) jobs
from (
select deptno,
case row_number() over (partition by deptno, job order by 1)
when 1 then job
end job
from scott.emp
)
group by deptno
order by deptno
/

DEPTNO JOBS
———————————
10 CLERK,MANAGER,PRESIDENT
20 ANALYST,CLERK,MANAGER
30 CLERK,MANAGER,SALESMAN

3 rows selected.

select deptno,
( select listagg(column_value,’,’) within group (order by column_value)
from table(v.t_jobs)
) jobs
from (
select deptno,
collect(distinct job) t_jobs
from scott.emp
group by deptno
) v
order by deptno
/

DEPTNO JOBS
———————————
10 CLERK,MANAGER,PRESIDENT
20 ANALYST,CLERK,MANAGER
30 CLERK,MANAGER,SALESMAN

3 rows selected.

select deptno,
regexp_replace(listagg(job,’,’) within group (order by job), ‘([^,]+)(,\1)+’, ‘\1’) jobs
from scott.emp
group by deptno
order by deptno
/

DEPTNO JOBS
———————————
10 CLERK,MANAGER,PRESIDENT
20 ANALYST,CLERK,MANAGER
30 CLERK,MANAGER,SALESMAN

3 rows selected.

At least for self-brain-development, it sometimes help to be a “slower-upgrader” 🙂

Cheers & Best Regards,
Iudith Mentzel

Loading…

Reply
Racer I.

February 6, 2019 at 3:51 pm

Hi,

https://community.oracle.com/ideas/12533

As the last entries suggest there are still some rough spots :

I.e. handling (possibly with an exception) the case where the DISTINCT column(s) aren’t the leading ORDER BY ones.

regards,

Loading…

Reply
Racer I.

February 6, 2019 at 4:35 pm

Hi,

About my previous comment (although I can’t see it yet?) :

I just checked it and LISTAGG-DISTINCT works Ok with non-matching ORDER BY :

https://livesql.oracle.com/apex/livesql/s/hxn75ii2xjqfqhn0c1wxpu2zx

regards,

Loading…

Reply
1. Connor McDonald
  
  February 6, 2019 at 8:30 pm
  
  The “can’t see it yet” is because WordPress seems to randomly pick comments and says they need moderation from me. I’ve not yet worked out the rhyme or reason why
  
  Loading…
  
  Reply
  1. Racer I.
    
    February 6, 2019 at 10:20 pm
    
    Hi,
    
    Wild guess : could be web-links in the body?
    
    regards,
    
    Loading…
    
    Reply
2. Mentzel Iudith
  
  February 7, 2019 at 1:04 am
  
  Hello,
  Just as a curiosity, it also works without a WITHIN GROUP clause at all,
  and the results appear ordered by the DISTINCT column:
  
  select d.dname,
  listagg (DISTINCT e.job,’, ‘ on overflow truncate with count) jobs
  from scott.dept d, scott.emp e
  where d.deptno = e.deptno
  group by d.dname
  /
  
  DNAME JOBS
  ————————————————-
  ACCOUNTING CLERK, MANAGER, PRESIDENT
  RESEARCH ANALYST, CLERK, MANAGER
  SALES CLERK, MANAGER, SALESMAN
  
  3 rows selected.
  
  Loading…
  
  Reply
  1. Connor McDonald
    
    February 8, 2019 at 9:39 am
    
    I don’t think that is related to DISTINCT. I think in 18 we just allowed an optional WITHIN GROUP which effectively means you don’t care about the ordering of elements, eg
    
    SQL> select banner from v$version where rownum = 1;
    
    BANNER
    ——————————————————————————
    Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production
    
    SQL> select d.dname,
    2 listagg(e.job,’,’) jobs
    3 from scott.dept d, scott.emp e
    4 where d.deptno = e.deptno
    5 group by d.dname
    6 /
    listagg (e.job,’,’) jobs
    *
    ERROR at line 2:
    ORA-02000: missing WITHIN keyword
    
    SQL> select banner from v$version where rownum = 1;
    
    BANNER
    ———————————————————————-
    Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 – Production
    
    SQL> select d.dname,
    2 listagg (e.job,’,’) jobs
    3 from scott.dept d, scott.emp e
    4 where d.deptno = e.deptno
    5 group by d.dname
    6 /
    
    DNAME JOBS
    ————– ——————————————————-
    RESEARCH MANAGER,ANALYST,CLERK,CLERK,ANALYST
    SALES SALESMAN,SALESMAN,SALESMAN,CLERK,MANAGER,SALESMAN
    ACCOUNTING MANAGER,PRESIDENT,CLERK
    
    Loading…
    
    Reply
    1. Racer I.
      
      February 8, 2019 at 3:14 pm
      
      Hi,
      
      Seems to be unsorted with or without DISTINCT (with none or mismatched ORDER BY) which is Ok. Probably the same algorithm as GROUP BY which stopped sorting decades ago 😉
      Would need a lot of meticulous experimentation to see if Oracle simply ignores mismatched ORDER BYs or tries to honor them as much as possible.
      
      regards,
      
      Loading…
      
      Reply
      1. Mentzel Iudith
        
        February 9, 2019 at 3:39 am
        
        Hello,
        I don’t think that having a DISTINCT with a mismatched ORDER BY does make too much sense logically speaking.
        For example, in a SELECT DISTINCT you cannot order by something that is NOT part
        of the SELECT list.
        So, even it technically works and even it Oracle is picking up the distinct values in the mismatched ORDER that you specify, that ORDER becomes irrelevant for the final output.
        Cheers & Best Regards,
        Iudith
        
        Loading…
nsk475gmailcom

February 6, 2019 at 5:59 pm

Having Distinct is really a cool feature. But can we get rid of this error in 19c?

ERROR:
ORA-01499: result of string concatenation is too long

Loading…

Reply
1. Connor McDonald
  
  February 6, 2019 at 8:31 pm
  
  I’ve never really understood the requirement. Why do I want >32k of data on a single “line”. Seems odd to me
  
  Loading…
  
  Reply
More 19c New Features – amitzil – Oracle DBA blog

February 27, 2019 at 1:36 am

[…] Listagg was introduced in 11.2, but now (finally) Oracle added the distinct option as well. You can read more about it in Connor’s post. […]

Loading…

Reply