Often what starts off as a simple aggregation…


SQL> select channel_id, promo_id, sum(amount_sold) tot
  2  from sh.sales
  3  group by channel_id, promo_id
  4  order by 1,2;

CHANNEL_ID   PROMO_ID        TOT
---------- ---------- ----------
         2        350  548779.15
         2        999 25797563.2
         3        350 1329476.49
         3        351 1208839.02
         3        999 55336945.1
         4        350  321125.26
         4        351   15664.24
         4        999 13370012.5
         9         33  277426.26

9 rows selected.

… morphs into a more complex equipment. For example, you show your customers the above summary data, and they respond with:

“Yeah that’s great, but I can get a total by channel, a total by promotion and a grand total as well”

Firstly, let me stress – please don’t write repeated passes at the data for that. You do not need to. We can take advantage of the ROLLUP, CUBE or GROUPING SETS extensions to GROUP BY to easily handle this. Here’s the easy solution to the new requirements.


SQL> select channel_id, promo_id, sum(amount_sold) tot
  2  from sh.sales
  3  group by grouping sets( (channel_id), (promo_id), () )
  4  order by 1,2;

CHANNEL_ID   PROMO_ID        TOT
---------- ---------- ----------
         2            26346342.3
         3            57875260.6
         4              13706802
         9             277426.26
                   33  277426.26
                  350  2199380.9
                  351 1224503.26
                  999 94504520.8
                      98205831.2

9 rows selected.

However, this is post is not about the ROLLUP, CUBE and GROUPING SETS extensions, but if you’ve not seen them before, congratulations you’re about to get a bonus tip! 🤣

The above result is all well and good when all the data is NOT NULL. But here’s the same original summary query run against my CUSTOM_SALES table.


SQL> select channel_id, promo_id, sum(amount_sold) tot
  2  from custom_sales
  3  group by channel_id, promo_id
  4  order by 1,2;

CHANNEL_ID   PROMO_ID        TOT
---------- ---------- ----------
         2        350  432912.72
         2        999 20690213.1
         2            5223216.46
         3        350 1063872.68
         3        351  961406.04
         3        999 44213116.3
         3            11636865.6
         4        350  260322.03
         4        351    10571.2
         4        999 10715642.9
         4            2720265.91
         9         33  219782.41
         9              57643.85

13 rows selected.

Notice that my source data contains some nulls for the PROMO_ID. Thus now when I add in my GROUPING_SETS extension…


SQL> select channel_id, promo_id, sum(amount_sold) tot
  2  from custom_sales
  3  group by grouping sets( (channel_id), (promo_id), () )
  4  order by 1,2;

CHANNEL_ID   PROMO_ID        TOT
---------- ---------- ----------
         2            26346342.3
         3            57875260.6
         4              13706802
         9             277426.26
                   33  219782.41
                  350 1757107.43
                  351  971977.24
                  999 75618972.3
                      98205831.2
                      19637991.8

10 rows selected.

…I now have a dilemma. How do I know when a NULL for PROMO_ID really means a null, or does it mean I am on one of the rolled up (sub)totals. In this simple example, I could eyeball the data to work it out, but we’d obviously like something more robust.

That is where the GROUPING and GROUPING_ID functions are invaluable.


SQL> select channel_id,
  2         promo_id,
  3         sum(amount_sold) tot,
  4         grouping(channel_id) is_channel,
  5         grouping(promo_id) is_promo,
  6         grouping_id(channel_id,promo_id) bitmap
  7  from custom_sales
  8  group by grouping sets( (channel_id), (promo_id), () )
  9  order by 1,2;

CHANNEL_ID   PROMO_ID        TOT IS_CHANNEL   IS_PROMO     BITMAP
---------- ---------- ---------- ---------- ---------- ----------
         2            26346342.3          0          1          1
         3            57875260.6          0          1          1
         4              13706802          0          1          1
         9             277426.26          0          1          1
                   33  219782.41          1          0          2
                  350 1757107.43          1          0          2
                  351  971977.24          1          0          2
                  999 75618972.3          1          0          2
                      98205831.2          1          1          3
                      19637991.8          1          0          2

10 rows selected.

GROUPING(CHANNEL_ID) is equivalent to asking the question “Does this row represent an aggregation of all CHANNEL_ID values?” If the answer to this question is YES, then we return 1 otherwise 0. So for the last 6 rows above, since we are aggregating by either PROMO_ID or the entire set, the result is 1.

GROUPING_ID is slightly different, representing a bitmap of the aggregation of the columns in the expression. In the example above, you can see that if you treated IS_CHANNEL and IS_PROMO as bits in a binary number, then the GROUPING_ID is the decimal equivalent of that bitmask

GROUPING and GROUPING_ID lets you easily distinguish the levels of aggregation in your GROUP BY queries.

Ho Ho Ho! Merry Christmas

Footnote: The appearance of the beverage companies in the thumbnail in no way implies an endorsement by them of this post or implies any particular relationship with Oracle outside of what they already might have. However, if they’d like to reach out and sponsor my Christmas drinks privately, then they are more than welcome! 🤣

Got some thoughts? Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending