Kris Kringle the Database – WIDTH_BUCKET – Learning is not a spectator sport

Kris Kringle the Database – WIDTH_BUCKET

December 5, 2024

1–2 minutes

Whether you call it “Kris Kringle” or “Secret Santa“, there’s much pleasure in giving small gifts to people over the Christmas period. So rather than fully fledged blog posts, I’ll wrap up this year with a series of daily quick tips on the Oracle Database. Ho Ho Ho !

If you are ever looking for a quick and easy way to get an assessment of the distribution of data in a column, a useful function is the WIDTH_BUCKET function

SQL> select width_bucket(amount_sold,20,500,10), amount_sold
  2  from sh.sales
  3  where rownum <= 20;

WIDTH_BUCKET(AMOUNT_SOLD,20,500,10) AMOUNT_SOLD
----------------------------------- -----------
                                  4      198.97
                                  0       13.95
                                  3      143.18
                                  1       61.22
                                  3      160.37
                                  0       10.15
                                 11      632.19
                                  1        32.4
                                  0        9.54
                                  1       51.96
                                  1       30.53
                                  2       112.8
                                  3      117.02
                                  2       99.52
                                  6      288.16
                                  1       55.87
                                  0        7.13
                                 11      628.89
                                  2       95.33
                                  2       95.66

Lets explore the parameters:

WIDTH_BUCKET(amount_sold,20,500,10) means

Look at AMOUNT_SOLD values
Divide those values into 10 buckets, ranging from 20 to 500, so in this case, 10 buckets each of size 48
For amounts between 20 and 20+48, this is bucket 1
For amounts between 20+48 and 20+48*2, this is bucket 2, and so forth
Anything less than 20, gets assigned bucket 0
Anything greater than 500, gets assigned bucket 11 (being the number of allowed buckets + 1 )

Typically, you will see WIDTH_BUCKET used to do some aggregation, for example:


SQL> select width_bucket(amount_sold,20,500,10) bucket, count(*)
  2  from sh.sales
  3  group by bucket
  4  order by 1;

    BUCKET   COUNT(*)
---------- ----------
         0     252397
         1     503825
         2      37715
         3      29381
         4      23127
         5       7123
         6       6394
         7        534
        10       1241
        11      57106

10 rows selected.

which gives us nice information about the distribution of the data, eg, ~252,000 values below 20, ~57,000 values above 500 and so on.

In case you are wondering how I managed to do a GROUP BY using the column alias, then 23ai brings that nice feature into play. More details on that in the video below

An alternative height balanced histogram of the data could be found using NTILE if that is more your preference



SQL> select decile, min(amount_sold), max(amount_sold), count(*)
  2  from (
  3    select ntile(10) over (order by amount_sold) as decile , amount_sold
  4    from sh.sales
  5  )
  6  group by decile
  7  order by 1;

    DECILE MIN(AMOUNT_SOLD) MAX(AMOUNT_SOLD)   COUNT(*)
---------- ---------------- ---------------- ----------
         1              6.4             9.85      91885
         2             9.85            12.95      91885
         3            12.95            21.23      91885
         4            21.23            26.45      91884
         5            26.45            34.24      91884
         6            34.24            46.72      91884
         7            46.72            50.79      91884
         8            50.79            62.45      91884
         9            62.45           176.86      91884
        10           176.86          1782.72      91884

10 rows selected.

Ho Ho Ho… Merry Christmas!

5 responses to “Kris Kringle the Database – WIDTH_BUCKET”

Melody S

December 5, 2024 at 5:57 pm

Greetings ‘Kris’!

Thank you for sharing this stocking stuffer. I have a question concerning the values assigned to bucket 1 that are under the $20.00 dollar value.

I see that $8.48 and $8.86 are located in Bucket 0… but values $10.40, $16.45, and $13.76 were placed in bucket 1. Shouldn’t ‘anything less than 20‘ be assigned to Bucket 0?

I am missing something.

Reply
1. Connor McDonald
  
  December 5, 2024 at 6:25 pm
  
  Well spotted! That was my fail – I was playing with various sets of data and parameters to get a nice distribution of data and cut-pasted the wrong results in. I’ve corrected that. Thanks
  
  Reply
Евгений Пузиков

December 6, 2024 at 5:00 pm

select width_bucket(amount_available,20,500,10) bucket, count(*)
from acnt_contract
group by bucket
order by 1;

ORA-00904: “BUCKET”: invalid identifier

should be group by width_bucket(amount_available,20,500,10)

Reply
1. Connor McDonald
  
  December 6, 2024 at 7:07 pm
  
  You did *read* the entire post ?
  
  Reply
Jared

December 7, 2024 at 2:22 pm

This is both cool and embarrassing. Cool because it makes histograms simple (I need to revise some scripts now).
Embarrassing because, well, I just now learned of width_bucket().
Here’s why:
https://docs.oracle.com/cd/B13789_01/olap.101/b10339/x_trap030.htm

Reply