Why does SUM look broken for window functions?

Posted by

TL;DR: SUM is NOT broken Smile

A quandary from a work colleague this morning about rolling totals via the SUM aggregation in a window function.

“My rolling sum works fine for the EMP table…”


SQL> SELECT empno, ename, deptno, sal, 
  2     SUM(sal) OVER (PARTITION BY deptno order by sal) AS total_sal_by_dept
  3  FROM   emp;
  
     EMPNO ENAME          DEPTNO        SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
      7934 MILLER             10       1300              1300
      7782 CLARK              10       2450              3750
      7839 KING               10       5000              8750
      7369 SMITH              20        800               800
      7876 ADAMS              20       1100              1900

“…but it is broken on this identical use case to see segment sizes”


SQL> select
  2      tablespace_name, segment_name, bytes,
  3      sum(bytes) over(partition by tablespace_name order by bytes) rolling_sum_bytes
  4  from  user_segments
  5  order by tablespace_name, bytes;

TABLESPACE_NAME      SEGMENT_NAME                        BYTES ROLLING_SUM_BYTES
-------------------- ------------------------------ ---------- -----------------
MYTS                 T17_PK                              65536            131072
MYTS                 T17                                 65536            131072
MYTS                 T13_PK                             131072           1441792
MYTS                 T2_PK                              131072           1441792
MYTS                 T10_PK                             131072           1441792
MYTS                 T11_PK                             131072           1441792
MYTS                 T16_PK                             131072           1441792
MYTS                 T9_PK                              131072           1441792
MYTS                 T15_PK                             131072           1441792

At first glance, it does indeed look like strange results are coming out of this query on USER_SEGMENTS. But to explain why, lets return to the EMP table and look at all of the data.


SQL> SELECT empno, ename, deptno, sal, 
  2     SUM(sal) OVER (PARTITION BY deptno order by sal) AS total_sal_by_dept
  3  FROM   emp;
  
     EMPNO ENAME          DEPTNO        SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
      7934 MILLER             10       1300              1300
      7782 CLARK              10       2450              3750
      7839 KING               10       5000              8750
      7369 SMITH              20        800               800
      7876 ADAMS              20       1100              1900
      7566 JONES              20       2975              4875
      7788 SCOTT              20       3000             10875   
      7902 FORD               20       3000             10875   
      7900 JAMES              30        950               950
      7654 MARTIN             30       1250              3450
      7521 WARD               30       1250              3450
      7844 TURNER             30       1500              4950
      7499 ALLEN              30       1600              6550
      7698 BLAKE              30       2850              9400

Notice what happens when we have a “tie” in the data being used to sort the window. To ensure we have a deterministic result, the SUM must “cover” the ties because otherwise (picking one of the rows at random to be the first in the rolling sum) it would mean that repeated executions might provided different results. That’s not good Smile

To ensure a deterministic result I need to have an ordering system which is deterministic, ie, add a tie-breaker to the window ordering expression, typically the primary key:


 SQL> SELECT empno, ename, deptno, sal, 
  2     SUM(sal) OVER (PARTITION BY deptno order by sal,empno) AS total_sal_by_dept  
  3  FROM   emp;

     EMPNO ENAME          DEPTNO        SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
      7934 MILLER             10       1300              1300
      7782 CLARK              10       2450              3750
      7839 KING               10       5000              8750
      7369 SMITH              20        800               800
      7876 ADAMS              20       1100              1900
      7566 JONES              20       2975              4875
      7788 SCOTT              20       3000              7875  
      7902 FORD               20       3000             10875  
      7900 JAMES              30        950               950
      7521 WARD               30       1250              2200
      7654 MARTIN             30       1250              3450
      7844 TURNER             30       1500              4950
      7499 ALLEN              30       1600              6550
      7698 BLAKE              30       2850              9400

The reason the results from the query against USER_SEGMENTS look particularly odd is because of local managed tablespaces, meaning that many segments will have identical sizes in bytes. So it almost looks like the rolling sum is not moving at all. But in the same way we tackled the EMP table, we need to add sufficient elements to give determinism to the sorting sequence.


SQL> select
  2      tablespace_name, segment_name, bytes,
  3      sum(bytes) over(partition by tablespace_name order by bytes, segment_name, partition_name ) rolling_sum_bytes
  4  from  user_segments
  5  order by tablespace_name, bytes;

TABLESPACE_NAME      SEGMENT_NAME                        BYTES ROLLING_SUM_BYTES
-------------------- ------------------------------ ---------- -----------------
MYTS                 T17                                 65536             65536
MYTS                 T17_PK                              65536            131072
MYTS                 T10_PK                             131072            262144
MYTS                 T11_PK                             131072            393216
MYTS                 T12_PK                             131072            524288
MYTS                 T13_PK                             131072            655360
MYTS                 T14_PK                             131072            786432
MYTS                 T15_PK                             131072            917504
MYTS                 T16_PK                             131072           1048576
MYTS                 T2_PK                              131072           1179648
MYTS                 T8_PK                              131072           1310720
MYTS                 T9_PK                              131072           1441792
MYTS                 T10                               6291456           7733248
MYTS                 T11                               6291456          14024704
MYTS                 T12                               6291456          20316160
MYTS                 T13                               6291456          26607616
MYTS                 T14                               6291456          32899072
MYTS                 T15                               6291456          39190528

Bottom Line: If you are sorting, sort well Smile

If you’re unfamiliar with window functions, here’s a complete video tutorial series on Analytic Functions

One comment

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.