Whether you call it “Kris Kringle” or “Secret Santa“, there’s much pleasure in giving small gifts to people over the Christmas period. So rather than fully fledged blog posts, I’ll wrap up this year with a series of daily quick tips on the Oracle Database. Ho Ho Ho !

If you are ever looking for a quick and easy way to get an assessment of the distribution of data in a column, a useful function is the WIDTH_BUCKET function

SQL> select width_bucket(amount_sold,20,500,10), amount_sold
  2  from sh.sales
  3  where rownum <= 20;

WIDTH_BUCKET(AMOUNT_SOLD,20,500,10) AMOUNT_SOLD
----------------------------------- -----------
                                  4      198.97
                                  0       13.95
                                  3      143.18
                                  1       61.22
                                  3      160.37
                                  0       10.15
                                 11      632.19
                                  1        32.4
                                  0        9.54
                                  1       51.96
                                  1       30.53
                                  2       112.8
                                  3      117.02
                                  2       99.52
                                  6      288.16
                                  1       55.87
                                  0        7.13
                                 11      628.89
                                  2       95.33
                                  2       95.66

Lets explore the parameters:

WIDTH_BUCKET(amount_sold,20,500,10)  means

  • Look at AMOUNT_SOLD values
  • Divide those values into 10 buckets, ranging from 20 to 500, so in this case, 10 buckets each of size 48
  • For amounts between 20 and 20+48, this is bucket 1
  • For amounts between 20+48 and 20+48*2, this is bucket 2, and so forth
  • Anything less than 20, gets assigned bucket 0
  • Anything greater than 500, gets assigned bucket 11 (being the number of allowed buckets + 1 )

Typically, you will see WIDTH_BUCKET used to do some aggregation, for example:


SQL> select width_bucket(amount_sold,20,500,10) bucket, count(*)
  2  from sh.sales
  3  group by bucket
  4  order by 1;

    BUCKET   COUNT(*)
---------- ----------
         0     252397
         1     503825
         2      37715
         3      29381
         4      23127
         5       7123
         6       6394
         7        534
        10       1241
        11      57106

10 rows selected.

which gives us nice information about the distribution of the data, eg, ~252,000 values below 20, ~57,000 values above 500 and so on.

In case you are wondering how I managed to do a GROUP BY using the column alias, then 23ai brings that nice feature into play. More details on that in the video below

 

An alternative height balanced histogram of the data could be found using NTILE if that is more your preference

 



SQL> select decile, min(amount_sold), max(amount_sold), count(*)
  2  from (
  3    select ntile(10) over (order by amount_sold) as decile , amount_sold
  4    from sh.sales
  5  )
  6  group by decile
  7  order by 1;

    DECILE MIN(AMOUNT_SOLD) MAX(AMOUNT_SOLD)   COUNT(*)
---------- ---------------- ---------------- ----------
         1              6.4             9.85      91885
         2             9.85            12.95      91885
         3            12.95            21.23      91885
         4            21.23            26.45      91884
         5            26.45            34.24      91884
         6            34.24            46.72      91884
         7            46.72            50.79      91884
         8            50.79            62.45      91884
         9            62.45           176.86      91884
        10           176.86          1782.72      91884

10 rows selected.

Ho Ho Ho… Merry Christmas!

5 responses to “Kris Kringle the Database – WIDTH_BUCKET”

  1. Greetings ‘Kris’!

    Thank you for sharing this stocking stuffer. I have a question concerning the values assigned to bucket 1 that are under the $20.00 dollar value.

    I see that $8.48 and $8.86 are located in Bucket 0… but values $10.40, $16.45, and $13.76 were placed in bucket 1. Shouldn’t ‘anything less than 20‘ be assigned to Bucket 0?

    I am missing something.

    1. Well spotted! That was my fail – I was playing with various sets of data and parameters to get a nice distribution of data and cut-pasted the wrong results in. I’ve corrected that. Thanks

  2. Евгений Пузиков Avatar
    Евгений Пузиков

    select width_bucket(amount_available,20,500,10) bucket, count(*)
    from acnt_contract
    group by bucket
    order by 1;

    ORA-00904: “BUCKET”: invalid identifier

    should be group by width_bucket(amount_available,20,500,10)

    1. You did *read* the entire post ?

  3. This is both cool and embarrassing. Cool because it makes histograms simple (I need to revise some scripts now).
    Embarrassing because, well, I just now learned of width_bucket().
    Here’s why:
    https://docs.oracle.com/cd/B13789_01/olap.101/b10339/x_trap030.htm

Leave a reply to Melody S Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending