TL;DR: SUM is NOT broken
A quandary from a work colleague this morning about rolling totals via the SUM aggregation in a window function.
“My rolling sum works fine for the EMP table…”
SQL> SELECT empno, ename, deptno, sal,
2 SUM(sal) OVER (PARTITION BY deptno order by sal) AS total_sal_by_dept
3 FROM emp;
EMPNO ENAME DEPTNO SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
7934 MILLER 10 1300 1300
7782 CLARK 10 2450 3750
7839 KING 10 5000 8750
7369 SMITH 20 800 800
7876 ADAMS 20 1100 1900
“…but it is broken on this identical use case to see segment sizes”
SQL> select
2 tablespace_name, segment_name, bytes,
3 sum(bytes) over(partition by tablespace_name order by bytes) rolling_sum_bytes
4 from user_segments
5 order by tablespace_name, bytes;
TABLESPACE_NAME SEGMENT_NAME BYTES ROLLING_SUM_BYTES
-------------------- ------------------------------ ---------- -----------------
MYTS T17_PK 65536 131072
MYTS T17 65536 131072
MYTS T13_PK 131072 1441792
MYTS T2_PK 131072 1441792
MYTS T10_PK 131072 1441792
MYTS T11_PK 131072 1441792
MYTS T16_PK 131072 1441792
MYTS T9_PK 131072 1441792
MYTS T15_PK 131072 1441792
At first glance, it does indeed look like strange results are coming out of this query on USER_SEGMENTS. But to explain why, lets return to the EMP table and look at all of the data.
SQL> SELECT empno, ename, deptno, sal,
2 SUM(sal) OVER (PARTITION BY deptno order by sal) AS total_sal_by_dept
3 FROM emp;
EMPNO ENAME DEPTNO SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
7934 MILLER 10 1300 1300
7782 CLARK 10 2450 3750
7839 KING 10 5000 8750
7369 SMITH 20 800 800
7876 ADAMS 20 1100 1900
7566 JONES 20 2975 4875
7788 SCOTT 20 3000 10875
7902 FORD 20 3000 10875
7900 JAMES 30 950 950
7654 MARTIN 30 1250 3450
7521 WARD 30 1250 3450
7844 TURNER 30 1500 4950
7499 ALLEN 30 1600 6550
7698 BLAKE 30 2850 9400
Notice what happens when we have a “tie” in the data being used to sort the window. To ensure we have a deterministic result, the SUM must “cover” the ties because otherwise (picking one of the rows at random to be the first in the rolling sum) it would mean that repeated executions might provided different results. That’s not good
To ensure a deterministic result I need to have an ordering system which is deterministic, ie, add a tie-breaker to the window ordering expression, typically the primary key:
SQL> SELECT empno, ename, deptno, sal,
2 SUM(sal) OVER (PARTITION BY deptno order by sal,empno) AS total_sal_by_dept
3 FROM emp;
EMPNO ENAME DEPTNO SAL TOTAL_SAL_BY_DEPT
---------- ---------- ---------- ---------- -----------------
7934 MILLER 10 1300 1300
7782 CLARK 10 2450 3750
7839 KING 10 5000 8750
7369 SMITH 20 800 800
7876 ADAMS 20 1100 1900
7566 JONES 20 2975 4875
7788 SCOTT 20 3000 7875
7902 FORD 20 3000 10875
7900 JAMES 30 950 950
7521 WARD 30 1250 2200
7654 MARTIN 30 1250 3450
7844 TURNER 30 1500 4950
7499 ALLEN 30 1600 6550
7698 BLAKE 30 2850 9400
The reason the results from the query against USER_SEGMENTS look particularly odd is because of local managed tablespaces, meaning that many segments will have identical sizes in bytes. So it almost looks like the rolling sum is not moving at all. But in the same way we tackled the EMP table, we need to add sufficient elements to give determinism to the sorting sequence.
SQL> select
2 tablespace_name, segment_name, bytes,
3 sum(bytes) over(partition by tablespace_name order by bytes, segment_name, partition_name ) rolling_sum_bytes
4 from user_segments
5 order by tablespace_name, bytes;
TABLESPACE_NAME SEGMENT_NAME BYTES ROLLING_SUM_BYTES
-------------------- ------------------------------ ---------- -----------------
MYTS T17 65536 65536
MYTS T17_PK 65536 131072
MYTS T10_PK 131072 262144
MYTS T11_PK 131072 393216
MYTS T12_PK 131072 524288
MYTS T13_PK 131072 655360
MYTS T14_PK 131072 786432
MYTS T15_PK 131072 917504
MYTS T16_PK 131072 1048576
MYTS T2_PK 131072 1179648
MYTS T8_PK 131072 1310720
MYTS T9_PK 131072 1441792
MYTS T10 6291456 7733248
MYTS T11 6291456 14024704
MYTS T12 6291456 20316160
MYTS T13 6291456 26607616
MYTS T14 6291456 32899072
MYTS T15 6291456 39190528
Bottom Line: If you are sorting, sort well
If you’re unfamiliar with window functions, here’s a complete video tutorial series on Analytic Functions
I’m so glad you sorted that out …