LISTAGG hits prime time

Posted by

It’s a simple requirement. We want to transform this:


SQL> select deptno, ename
  2  from   emp
  3  order by 1,2;

    DEPTNO ENAME
---------- ----------
        10 CLARK
        10 KING
        10 MILLER
        20 ADAMS
        20 FORD
        20 JONES
        20 SCOTT
        20 SMITH
        30 ALLEN
        30 BLAKE
        30 JAMES
        30 MARTIN
        30 TURNER
        30 WARD

into this:


    DEPTNO MEMBERS
---------- -------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

Dinosaurs like myself remember the terrible trouble we used to go to in order to solve this seemingly simple problem. We could use the MODEL clause,


SQL> select deptno , rtrim(ename,',') enames
  2  from ( select deptno,ename,rn
  3         from emp
  4         model
  5         partition by (deptno)
  6         dimension by (
  7            row_number() over
  8             (partition by deptno order by ename) rn
  9               )
 10         measures (cast(ename as varchar2(40)) ename)
 11         rules
 12         ( ename[any]
 13             order by rn desc = ename[cv()]||','||ename[cv()+1])
 14         )
 15   where rn = 1
 16   order by deptno;

    DEPTNO ENAMES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 ADAMS,FORD,JONES,SCOTT,SMITH
        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

or we could use hierarchical trickery,


SQL> select deptno,
  2         substr(max(sys_connect_by_path(ename, ',')), 2) members
  3  from (select deptno, ename,
  4               row_number ()
  5                   over (partition by deptno order by empno) rn
  6        from emp)
  7  start with rn = 1
  8  connect by prior rn = rn - 1
  9  and prior deptno = deptno
 10  group by deptno
 11  /

    DEPTNO MEMBERS
---------- ---------------------------------------------------------
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        10 CLARK,KING,MILLER

or we could build our own aggregation routine from the ground up,


SQL> create or replace type string_agg_type as object
  2  (
  3     total varchar2(4000),
  4  
  5     static function
  6          ODCIAggregateInitialize(sctx IN OUT string_agg_type )
  7          return number,
  8  
  9     member function
 10          ODCIAggregateIterate(self IN OUT string_agg_type ,
 11                               value IN varchar2 )
 12          return number,
 13  
 14     member function
 15          ODCIAggregateTerminate(self IN string_agg_type,
 16                                 returnValue OUT  varchar2,
 17                                 flags IN number)
 18          return number,
 19  
 20     member function
 21          ODCIAggregateMerge(self IN OUT string_agg_type,
 22                             ctx2 IN string_agg_type)
 23          return number
 24  );
 25  /

Or we had some sort of personality disorder Smile then we could resort to manipulating some XML via XMLDB.


SQL> select deptno,
  2     xmltransform
  3     ( sys_xmlagg
  4        ( sys_xmlgen(ename)
  5        ),
  6       xmltype
  7       (
  8         '<?xml version="1.0"?><xsl:stylesheet version="1.0"
  9           xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 10            <xsl:template match="/">
 11              <xsl:for-each select="/ROWSET/ENAME">
 12                <xsl:value-of select="text()"/>,</xsl:for-each>
 13            </xsl:template>
 14          </xsl:stylesheet>'
 15       )
 16    ).getstringval() members
 17  from emp
 18  group by deptno;

    DEPTNO MEMBERS
---------- --------------------------------------------------------
        10 CLARK,MILLER,KING,
        20 SMITH,FORD,ADAMS,SCOTT,JONES,
        30 ALLEN,JAMES,TURNER,BLAKE,MARTIN,WARD,

Thankfully all of these were solved once we made it to 11g with the simple LISTAGG function


SQL> select deptno,
  2         listagg( ename, ',')
  3              within group (order by empno) members
  4  from   emp
  5  group  by deptno;

    DEPTNO MEMBERS
---------- -----------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES

I’ve always liked LISTAGG because the function reads like the business requirement we are trying to meet:

“AGG-regate a LIST of ENAMEs in sequence of EMPNO within the grouping of DEPTNO”

But there has been once problem with LISTAGG since it’s arrival in 11g, and that is what to do with duplicate data. Duplicates can either just look messy, for example, when I swap out ENAME for JOB:


SQL> select deptno
 2          listagg(job,',') within group ( order by job) as jobs
 3   from   scott.emp
 4   group by deptno
 5   order by 1;
 
   DEPTNO JOBS
--------- --------------------------------------------------
       10 CLERK,MANAGER,PRESIDENT
       20 ANALYST,ANALYST,CLERK,CLERK,MANAGER
       30 CLERK,MANAGER,SALESMAN,SALESMAN,SALESMAN,SALESMAN

Or they could be even worse in the situation where the number of duplicates results in data exceeding allowing database limits. In the example below, there are hundreds of objects each with the same object type within a schema, and hence the aggregation blows the length limit for a varchar2.


SQL> select owner
 2          listagg(object_type,',') within group 
 3              ( order by object_id ) as types
 4   from   all_ojects
 5   group by owner
 6   order by 1;
ERROR:
ORA-01499: result of string concatenation is too long

With 19c, our (technical) prayers have been answered with the long awaited arrival of the DISTINCT extension to the syntax.


SQL> select deptno,
  2        listagg(distinct job,',') within group ( order by job ) as jobs
  3  from   scott.emp
  4  group by deptno
  5  order by 1;

    DEPTNO JOBS
---------- ------------------------------------------------------------
        10 CLERK,MANAGER,PRESIDENT
        20 ANALYST,CLERK,MANAGER
        30 CLERK,MANAGER,SALESMAN

3 rows selected.

SQL> select owner,
  2        listagg(distinct object_type,',') within group ( order by object_type ) as types
  3  from   all_objects
  4  group by owner
  5  order by 1;

OWNER
------------------------------
TYPES
------------------------------------------------------------------------------------------------------------
------------
APPQOSSYS
SYNONYM,TABLE

AUDSYS
INDEX PARTITION,LIBRARY,PACKAGE,PACKAGE BODY,TABLE,TABLE PARTITION,VIEW

CTXSYS
FUNCTION,INDEX,INDEXTYPE,LIBRARY,OPERATOR,PACKAGE,PACKAGE BODY,PROCEDURE,SEQUENCE,TABLE,TYPE,TYPE BODY,VIEW

etc

13 comments

  1. Hello Connor,

    Here are 3 pre-19c solutions for “LISTAGG DISTINCT”:

    select deptno, listagg(job, ‘,’) within group (order by job) jobs
    from (
    select deptno,
    case row_number() over (partition by deptno, job order by 1)
    when 1 then job
    end job
    from scott.emp
    )
    group by deptno
    order by deptno
    /

    DEPTNO JOBS
    ———————————
    10 CLERK,MANAGER,PRESIDENT
    20 ANALYST,CLERK,MANAGER
    30 CLERK,MANAGER,SALESMAN

    3 rows selected.

    select deptno,
    ( select listagg(column_value,’,’) within group (order by column_value)
    from table(v.t_jobs)
    ) jobs
    from (
    select deptno,
    collect(distinct job) t_jobs
    from scott.emp
    group by deptno
    ) v
    order by deptno
    /

    DEPTNO JOBS
    ———————————
    10 CLERK,MANAGER,PRESIDENT
    20 ANALYST,CLERK,MANAGER
    30 CLERK,MANAGER,SALESMAN

    3 rows selected.

    select deptno,
    regexp_replace(listagg(job,’,’) within group (order by job), ‘([^,]+)(,\1)+’, ‘\1’) jobs
    from scott.emp
    group by deptno
    order by deptno
    /

    DEPTNO JOBS
    ———————————
    10 CLERK,MANAGER,PRESIDENT
    20 ANALYST,CLERK,MANAGER
    30 CLERK,MANAGER,SALESMAN

    3 rows selected.

    At least for self-brain-development, it sometimes help to be a “slower-upgrader” 🙂

    Cheers & Best Regards,
    Iudith Mentzel

    1. Hello,
      Just as a curiosity, it also works without a WITHIN GROUP clause at all,
      and the results appear ordered by the DISTINCT column:

      select d.dname,
      listagg (DISTINCT e.job,’, ‘ on overflow truncate with count) jobs
      from scott.dept d, scott.emp e
      where d.deptno = e.deptno
      group by d.dname
      /

      DNAME JOBS
      ————————————————-
      ACCOUNTING CLERK, MANAGER, PRESIDENT
      RESEARCH ANALYST, CLERK, MANAGER
      SALES CLERK, MANAGER, SALESMAN

      3 rows selected.

      1. I don’t think that is related to DISTINCT. I think in 18 we just allowed an optional WITHIN GROUP which effectively means you don’t care about the ordering of elements, eg

        SQL> select banner from v$version where rownum = 1;

        BANNER
        ——————————————————————————
        Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production

        SQL> select d.dname,
        2 listagg(e.job,’,’) jobs
        3 from scott.dept d, scott.emp e
        4 where d.deptno = e.deptno
        5 group by d.dname
        6 /
        listagg (e.job,’,’) jobs
        *
        ERROR at line 2:
        ORA-02000: missing WITHIN keyword

        SQL> select banner from v$version where rownum = 1;

        BANNER
        ———————————————————————-
        Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 – Production

        SQL> select d.dname,
        2 listagg (e.job,’,’) jobs
        3 from scott.dept d, scott.emp e
        4 where d.deptno = e.deptno
        5 group by d.dname
        6 /

        DNAME JOBS
        ————– ——————————————————-
        RESEARCH MANAGER,ANALYST,CLERK,CLERK,ANALYST
        SALES SALESMAN,SALESMAN,SALESMAN,CLERK,MANAGER,SALESMAN
        ACCOUNTING MANAGER,PRESIDENT,CLERK

        1. Hi,

          Seems to be unsorted with or without DISTINCT (with none or mismatched ORDER BY) which is Ok. Probably the same algorithm as GROUP BY which stopped sorting decades ago 😉
          Would need a lot of meticulous experimentation to see if Oracle simply ignores mismatched ORDER BYs or tries to honor them as much as possible.

          regards,

          1. Hello,
            I don’t think that having a DISTINCT with a mismatched ORDER BY does make too much sense logically speaking.
            For example, in a SELECT DISTINCT you cannot order by something that is NOT part
            of the SELECT list.
            So, even it technically works and even it Oracle is picking up the distinct values in the mismatched ORDER that you specify, that ORDER becomes irrelevant for the final output.
            Cheers & Best Regards,
            Iudith

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.