In the Oracle documentation, the specification for ORA_HASH is:

The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295.

So when I had a set of attributes come in, and I was looking for a simple way of mapping them to a simple unique key, ORA_HASH seemed the natural choice. Until this happened…

SQL> create table T (
  2    id varchar2(8),
  3    code varchar2(5),
  4    year varchar2(4)
  5  );

Table created.

SQL> insert into T values ('240301','21806','2004');

1 row created.

SQL> insert into T values ('276083','31808','2010');

1 row created.

SQL> select id, code, year, ora_hash(id||code||year) hash
  2  from t;

ID       CODE  YEAR       HASH
-------- ----- ---- ----------
240301   21806 2004 1018600323
276083   31808 2010 1018600323

My table has only 90,000 rows, so (assuming a roughly equal distribution of hash keys) I had a 0.002% chance of hitting that

Oracle karma 🙂

5 responses to “hash keys … almost unique”

  1. Depending if you want values 2 – 12 – 2004 to have the hash as 21 – 2 – 2004, you probably want separators between the elements being hashed. Don’t think it reduces the chance of an ‘unlucky’ collision, but if you are in a position when you are *only* storing the hash and get this wrong, you’ll be kicking yourself.

  2. Your math is awful. Please look into birthday attack at wikipedia carefully.

    In case the hash function is absolutely equal distributed, your chances to get a collision in 90’000 rows are 61% (!)
    select 1-exp(-power(90000,2)/(2*4294967295)) collision_probability from dual
    => 0.61

    The expected number of rows to get the first collision is just 77200 (select 1.1774*sqrt(4294967295) rows_to_first_collision from dual).

    1. of course, its the old “how many people in a room share a birthday” style of thing 🙂

      Thanks for stopping by.

  3. […] – not very much. So, hash collisions will occur frequently, even with small data sets (see Connor McDonald’s blog post). The probability of two input values with the same hash key is quite high. With 9300 input rows, […]

Leave a reply to connormcdonald Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending