In the Oracle documentation, the specification for ORA_HASH is:
The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295.
So when I had a set of attributes come in, and I was looking for a simple way of mapping them to a simple unique key, ORA_HASH seemed the natural choice. Until this happened…
SQL> create table T (
2 id varchar2(8),
3 code varchar2(5),
4 year varchar2(4)
5 );
Table created.
SQL> insert into T values ('240301','21806','2004');
1 row created.
SQL> insert into T values ('276083','31808','2010');
1 row created.
SQL> select id, code, year, ora_hash(id||code||year) hash
2 from t;
ID CODE YEAR HASH
-------- ----- ---- ----------
240301 21806 2004 1018600323
276083 31808 2010 1018600323
My table has only 90,000 rows, so (assuming a roughly equal distribution of hash keys) I had a 0.002% chance of hitting that
Oracle karma 🙂
Depending if you want values 2 – 12 – 2004 to have the hash as 21 – 2 – 2004, you probably want separators between the elements being hashed. Don’t think it reduces the chance of an ‘unlucky’ collision, but if you are in a position when you are *only* storing the hash and get this wrong, you’ll be kicking yourself.
I would try the lottery 😉
Your math is awful. Please look into birthday attack at wikipedia carefully.
In case the hash function is absolutely equal distributed, your chances to get a collision in 90’000 rows are 61% (!)
select 1-exp(-power(90000,2)/(2*4294967295)) collision_probability from dual
=> 0.61
The expected number of rows to get the first collision is just 77200 (select 1.1774*sqrt(4294967295) rows_to_first_collision from dual).
of course, its the old “how many people in a room share a birthday” style of thing 🙂
Thanks for stopping by.