The missing multiplication aggregation in SQL

Posted by

A few days back on Twitter, a cool little discussion arose out of the SQL Daily regular tip – the lack of an aggregate function that will return a aggregated product of a set of numbers.

blog_sql_product

Taking a look at the Lukas Eder’s post, it has a nice build up of how one can exploit the sum of logarithms to yield an implementation of a product aggregate. Things start simply enough, but once we need to handles things like zero and negatives, a little more care is needed to arrive at the final solution of:


select
  GROUPING_COL,
  case
    when sum (case when PRODUCT_COL = 0 then 1 end) > 0 then 0
    when mod(sum (case when PRODUCT_COL < 0 then -1 end),2) < 0  then -1 
    else 1 
  end * exp(sum(ln(abs(nullif(PRODUCT_COL, 0))))) product
from t
group by GROUPING_COL
order by 1;

Over on StackOverflow, someone tackled this issue in the different and dare I say Smile more elegant looking solution using recursive subquery factoring.


create or replace
view v as
select 
  GROUPING_COL, 
  PRODUCT_COL, 
  row_number() over (partition by GROUPING_COL order by rowid)  n, 
  count(1) over (partition by GROUPING_COL) cnt 
from MY_TABLE;

with 
 p(GROUPING_COL,PRODUCT_COL,n,maxn) as (
   select GROUPING_COL, PRODUCT_COL, n, cnt
   from v 
   where n = 1
   union all
   select p.GROUPING_COL, v.PRODUCT_COL * p.PRODUCT_COL, v.n, v.cnt
   from p
   join v on p.GROUPING_COL = v.GROUPING_COL and p.n + 1 = v.n
 )
select GROUPING_COL, PRODUCT_COL
from p
where n = maxn
order by GROUPING_COL;

And finally Oracle community stalwart Iudith Mentzel took advantage of the natural expression facilities in XMLTABLE to provide yet another solution.


select GROUPING_COL,
      xmlquery( 
        (listagg(PRODUCT_COL,'*')  within group (order by rownum)) 
          returning content).getnumberval()  prod
from t
group by GROUPING_COL

The lack of a native product aggregation is just a special case of any number of potential aggregation requirements that do not fall in the standard facilities offered by the common SQL dialects. Luckily, in such cases, the Oracle database lets you hook into the core level aggregation functionality in the kernel to build your own custom aggregations. This facility to build user defined aggregates has been available since way back in Oracle 8i.

Here’s how we can implement a product of numbers aggregation using user defined aggregates:


SQL> create or replace
  2  type t_product as object
  3  (
  4    g_product  number,
  5
  6    static
  7    function odciaggregateinitialize(
  8                  pctx in out t_product) return number,
  9
 10    member
 11    function odciaggregateiterate(
 12                  self  in out t_product,
 13                  value in     number) return number,
 14
 15    member
 16    function odciaggregateterminate(
 17                   self        in  t_product,
 18                   returnvalue out number,
 19                   flags       in  number) return number,
 20
 21    member
 22    function odciaggregatemerge(
 23                   self in out t_product,
 24                   mctx in     t_product) return number
 25  );
 26  /

Type created.

You can see that the type definition hooks into predefined names that map to the operations of an aggregate, namely, we initialize a structure, iterate through the data, merge the results and terminate the aggregation.

Then the body is defined and whilst it looks lengthy, the key element really is only line 17 where we multiply the current running product aggregate with the next value.


SQL> create or replace
  2  type body t_product is
  3
  4    static
  5    function odciaggregateinitialize(
  6                  pctx in out t_product) return number is
  7    begin
  8      pctx := t_product(1);
  9      return odciconst.success;
 10    end;
 11
 12    member
 13    function odciaggregateiterate(
 14                  self  in out t_product,
 15                  value in     number) return number is
 16    begin
 17      self.g_product := self.g_product * value;
 18      return odciconst.success;
 19    end;
 20
 21    member
 22    function odciaggregateterminate(
 23                   self        in  t_product,
 24                   returnvalue out number,
 25                   flags       in  number) return number is
 26    begin
 27      returnvalue := g_product;
 28      return odciconst.success;
 29    end;
 30
 31    member
 32    function odciaggregatemerge(
 33                   self in out t_product,
 34                   mctx in     t_product) return number is
 35    begin
 36      self.g_product := self.g_product * mctx.g_product;
 37      return odciconst.success;
 38    end;
 39  end;
 40  /

Type body created.


SQL> CREATE OR REPLACE FUNCTION prd(input number)
  2  RETURN number
  3  PARALLEL_ENABLE AGGREGATE USING t_product;
  4  /

Function created.

Lets take a look now at how each of these solutions perform. Obviously we need to take care what numbers we choose to aggregate, because it doesn’t take long for multiplication of numbers sitting outside the range of -1 to 1 to explode the limitations of the NUMBER datatype.

I’ll build a table of random numbers from DBA_OBJECTS with a mixture of positive and negative numbers that are relatively near zero to keep the product total in check.


SQL> create table t as
  2  select
  3    owner,
  4    decode(mod(rownum,10),
  5        0,dbms_random.value(1,1.5),
  6        1,-dbms_random.value(1,1.5),
  7        dbms_random.value(0.8,1.1)
  8    ) x
  9  from dba_objects;

First I’ll try the XMLTABLE code, which struggles right from the get-go because we are limited to the length of the string expression it must build to evaluate the product.


SQL> select owner,
  2        xmlquery(
  3          (listagg(x,'*')  within group (order by rownum))
  4            returning content).getnumberval()  prod
  5  from t
  6  group by owner
  7  /
ERROR:
ORA-01489: result of string concatenation is too long

Now I’ll move on to the recursive WITH statement. This time I get a result, but diving that deep into the recursive levels looks to be taking a toll on the performance – 30 seconds to get the answer.


SQL> create or replace
  2  view v as
  3  select
  4    owner,
  5    x,
  6    row_number() over (partition by owner order by rowid)  n,
  7    count(1) over (partition by owner) cnt
  8  from t;

View created.


SQL> with
  2   p(owner,x,n,maxn) as (
  3     select owner, x, n, cnt
  4     from v
  5     where n = 1
  6     union all
  7     select p.owner, v.x * p.x, v.n, v.cnt
  8     from p
  9     join v on p.owner = v.owner and p.n + 1 = v.n
 10   )
 11  select owner, x
 12  from p
 13  where n = maxn
 14  order by owner;

OWNER                                   X
------------------------------ ----------
ADTEST                         .916063976
APEX_190200                    -3.043E-06
APEX_200100                    -29592.895
APEX_200200                    -10.421379
APEX_LISTENER                  .867807442
APEX_MON                       -1.4860521
APEX_UTILS                     -1.6820526
APPQOSSYS                      -.84655904
ASKTOM                         .017037321
AUDSYS                         -.16824332
AUD_UTIL                       -1.5397086
AV_USER                        -1.3799817
CTXSYS                         .764967413
DBSFWUSER                      -.88050946
DBSNMP                         .589705819
DEMO                           .663070081
DVF                            .611727767
DVSYS                          .112586605
ENDUSER                        .957332413
FLOWS_FILES                    2.04009629
GSMADMIN_INTERNAL              -.76952263
HR                             -.69391381
LBACSYS                        .261528508
MCDONAC                        .496889981
MDSYS                          273.214744
OJVMSYS                        -2.0804545
OLAPSYS                        -1.9529601
ORACLE_OCM                     .680691584
ORDDATA                        -2.1255669
ORDPLUGINS                     -.92479865
ORDSYS                         -.05582792
ORDS_METADATA                  -2.8566871
OUTLN                          -1.4668778
PUBLIC                         -.00031593
REMOTE_SCHEDULER_AGENT         -.70447161
SCOTT                          .052298567
SI_INFORMTN_SCHEMA             -.84827267
SODAUSER                        .79189465
SOE                            .602934009
SYS                            2.4633E-24
SYSTEM                         .053874721
WHS_MD                         -1.0003258
WMSYS                          5.32279549
XDB                            2.13395549

44 rows selected.

Elapsed: 00:00:30.13

Next up is the sum of logarithms. It is much more efficient completing in less than a second, and is probably fine for many use cases.


SQL> select
  2    owner,
  3    case
  4      when sum (case when x = 0 then 1 end) > 0 then 0
  5      when mod(sum (case when x < 0 then -1 end),2) < 0  then -1
  6      else 1
  7    end * exp(sum(ln(abs(nullif(x, 0))))) product
  8  from t
  9  group by owner
 10  order by 1;

OWNER                             PRODUCT
------------------------------ ----------
ADTEST                         .916063976
APEX_190200                    -3.043E-06
APEX_200100                    -29592.895
APEX_200200                    -10.421379
APEX_LISTENER                  .867807442
APEX_MON                       -1.4860521
APEX_UTILS                     -1.6820526
APPQOSSYS                      -.84655904
ASKTOM                         .017037321
AUDSYS                         -.16824332
AUD_UTIL                       -1.5397086
AV_USER                        -1.3799817
CTXSYS                         .764967413
DBSFWUSER                      -.88050946
DBSNMP                         .589705819
DEMO                           .663070081
DVF                            .611727767
DVSYS                          .112586605
ENDUSER                        .957332413
FLOWS_FILES                    2.04009629
GSMADMIN_INTERNAL              -.76952263
HR                             -.69391381
LBACSYS                        .261528508
MCDONAC                        .496889981
MDSYS                          273.214744
OJVMSYS                        -2.0804545
OLAPSYS                        -1.9529601
ORACLE_OCM                     .680691584
ORDDATA                        -2.1255669
ORDPLUGINS                     -.92479865
ORDSYS                         -.05582792
ORDS_METADATA                  -2.8566871
OUTLN                          -1.4668778
PUBLIC                         -.00031593
REMOTE_SCHEDULER_AGENT         -.70447161
SCOTT                          .052298567
SI_INFORMTN_SCHEMA             -.84827267
SODAUSER                        .79189465
SOE                            .602934009
SYS                            2.4633E-24
SYSTEM                         .053874721
WHS_MD                         -1.0003258
WMSYS                          5.32279549
XDB                            2.13395549

44 rows selected.

Elapsed: 00:00:00.71

Finally I’ll compare that against the user defined aggregate. A little more code was required, but that is paid back in terms of performance benefit.


SQL> select owner, prd(x)
  2  from   t
  3  group  by owner
  4  order by 1
  5  /

OWNER                              PRD(X)
------------------------------ ----------
ADTEST                         .916063976
APEX_190200                    -3.043E-06
APEX_200100                    -29592.895
APEX_200200                    -10.421379
APEX_LISTENER                  .867807442
APEX_MON                       -1.4860521
APEX_UTILS                     -1.6820526
APPQOSSYS                      -.84655904
ASKTOM                         .017037321
AUDSYS                         -.16824332
AUD_UTIL                       -1.5397086
AV_USER                        -1.3799817
CTXSYS                         .764967413
DBSFWUSER                      -.88050946
DBSNMP                         .589705819
DEMO                           .663070081
DVF                            .611727767
DVSYS                          .112586605
ENDUSER                        .957332413
FLOWS_FILES                    2.04009629
GSMADMIN_INTERNAL              -.76952263
HR                             -.69391381
LBACSYS                        .261528508
MCDONAC                        .496889981
MDSYS                          273.214744
OJVMSYS                        -2.0804545
OLAPSYS                        -1.9529601
ORACLE_OCM                     .680691584
ORDDATA                        -2.1255669
ORDPLUGINS                     -.92479865
ORDSYS                         -.05582792
ORDS_METADATA                  -2.8566871
OUTLN                          -1.4668778
PUBLIC                         -.00031593
REMOTE_SCHEDULER_AGENT         -.70447161
SCOTT                          .052298567
SI_INFORMTN_SCHEMA             -.84827267
SODAUSER                        .79189465
SOE                            .602934009
SYS                            2.4633E-24
SYSTEM                         .053874721
WHS_MD                         -1.0003258
WMSYS                          5.32279549
XDB                            2.13395549

44 rows selected.

Elapsed: 00:00:00.11

SQL is a very mature and functionally complete language, so in my 25 years or so of working with databases, I’ve found the demand for customised aggregation facilities very low. But it is comforting to know that if the need does arise, its relatively straightforward to build your own in the Oracle database.

If you’re looking for more examples, here’s a couple more for you:

https://connor-mcdonald.com/2017/09/23/asktom-tv-episode-11/

https://asktom.oracle.com/pls/apex/asktom.search?tag=user-defined-aggregate-with-more-than-one-value-parameter

4 comments

  1. Hi Connor, solutions are endless within SQL. Which is great! You knew someone would drop a model clause in the comments:

    –model clause
    select owner, x
    from ( select owner
    , x
    , seqnr
    from ( select owner
    , x
    , row_number() over (partition by owner order by 1) seqnr
    from t
    )
    model
    partition by (owner)
    dimension by (seqnr)
    measures (x x)
    rules
    ( x[seqnr] order by seqnr desc =nvl(x[cv(seqnr)+1],1)*x[cv(seqnr)]
    )
    )
    where seqnr=1

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.