Learning is not a spectator sport

Histograms on data (not the optimizer)

January 27, 2016

1–2 minutes

I’ve been doing a series on Analytics here but if you need to group data into ranges, you don’t necessarily need a fancy analytic. The function WIDTH_BUCKET can easily determine histogram groupings within a table.


SQL> SELECT CUST_ID, CUST_LAST_NAME, CUST_CREDIT_LIMIT,
  2     WIDTH_BUCKET(CUST_CREDIT_LIMIT, 100, 5000, 10) credit_bucket
  3     FROM sh.customers
  4     WHERE  country_id = 52787
  5     ORDER BY CUST_CREDIT_LIMIT;

   CUST_ID CUST_LAST_NAME       CUST_CREDIT_LIMIT CREDIT_BUCKET
---------- -------------------- ----------------- -------------
     44282 Justice                           1500             3
     50671 Sandoval                          1500             3
     26284 Lotto                             1500             3
      4827 Kessel                            1500             3
     27671 Lin                               1500             3
     17284 Fellows                           1500             3
      9671 Rittenour                         1500             3
     35283 Gatewood                          1500             3
      9670 Oppy                              1500             3
     35284 Thomas                            1500             3
     18670 Callihan                          1500             3
     18671 Sager                             1500             3
    102218 Hornick                           1500             3
     26283 Geiss                             1500             3
     45827 Rowe                              1500             3
     42898 Haske                             3000             6
      5519 Ballenger                         3000             6
     33898 Batterton                         3000             6
     46518 Orm                               3000             6
     24899 Lightfoot                         3000             6
     47211 Cartwright                        3000             6
     20744 Gravel                            3000             6
    102723 Myczkowski                        3000             6
    101478 Bishop                            3000             6
      1050 Overton                           3000             6
       700 Burnns                            3000             6
      1023 Newcomer                          3000             6
     38744 Tazelar                           3000             6
     29744 Durby                             3000             6
     41514 Burgess                           5000            11
     45828 Wood                              5000            11
    102129 Cay                               5000            11
     40128 Cain                              5000            11
     23515 Figgens                           5000            11
     32514 Lengel                            5000            11
    101613 Spivak                            7000            11
       470 Sandstrum                         7000            11
       763 Dutton                            7000            11
     36667 Capps                             7000            11
       446 Jeffreys                          7000            11
     42302 Everrett                          7000            11
     46519 East                              7000            11
       708 Door                              7000            11
    102178 Bloom                             7000            11
       123 Rockwell                          7000            11
    104403 Fenton                            7000            11
     13133 Colven                            7000            11
     27666 Remler                            7000            11
      4133 Carr                              9000            11
     10362 Tate                              9000            11
     19362 Rosenblum                         9000            11
     28362 Titus                             9000            11
     28458 Kohler                            9000            11
      6208 Vail                              9000            11
      2749 Kimball                           9000            11
     15208 Trimmer                           9000            11
     11748 Stokley                           9000            11
     24208 Baley                             9000            11
     26975 Baer                              9000            11
     35975 Grubb                             9000            11
     44974 Grier                             9000            11
       529 Barone                            9000            11
      2750 Jansen                           10000            11
    103571 Kennedy                          10000            11
       346 Elliott                          10000            11
     47209 Lieberman                        10000            11
     34590 Barden                           10000            11
     28469 Adams                            10000            11
      6209 Crocker                          10000            11
     43589 Eppling                          10000            11
    100761 Zwolinsky                        10000            11
     15209 Fernandez                        10000            11
    100824 Roy                              11000            11
    103845 Moy                              15000            11
     35402 Elkin                            15000            11

75 rows selected.

where the 4 arguments are:

Expression to be evaluated
Minimum (start) value
Maximum (end) value
Number of buckets in the distribution

Notice that if you extend outside the min and max value, the width bucket returns a value one less or one more then the bucket count.

2 responses to “Histograms on data (not the optimizer)”

Raj Jamadagni

January 27, 2016 at 8:39 pm

Technically I’d expect data to be grouped in 10 groups (4th param). I understand it indicates “maximum buckets”, but it would be interesting to see why Oracle decided to group them in bucket# 3, 6, 11 and not others.

Loading…

Reply
My Oracle Week ft. Jeremy | Oracle Business Intelligence

February 4, 2016 at 4:32 am

[…] 7. Histograms on data (not the optimizer) […]

Loading…

Reply

Learning is not a spectator sport

Leave a ReplyCancel reply

Trending

First dabblings with APEXlang

Vanishing objects; hello ORA-08103

A very robust query

The secret APEX performance hack