Whenever you have the need to use the DISTINCT keyword, its worth just pausing for a second, and making sure that you are not hiding just a larger issue. It actually might represent either incorrect use of SQL or incorrect assumptions from the data model.

Consider the following example

SELECT DISTINCT d.dname
FROM   emp e, dept d
WHERE  e.ename = 'SMITH'
AND    e.deptno = d.deptno

The query is certainly valid, but when I see “distinct” I ask myself the following questions:

Has the DISTINCT has been added in an attempt to only return a single row ?, ie, is someone working under the assumption being that an employee name can only refer to a single department ? Unless there is a unique constraint on the ENAME column, then we can still just as easily get multiple rows back (even with the DISTINCT), so the SQL will be a “sleeping problem” in the application until the data causes it to fail.

Because the DISTINCT keyword here:

  • doesn’t give you any guarantee that you’ll only get one row, and
  • might be making the database do excessive work (get all rows, sort them and remove duplicates), every single time you run this SQL

Alternatively, perhaps there is meant to be a single department for an ENAME, ie, ENAME is unique. If that is the case, then we should change the database model/design to reflect it and not use DISTINCT to “fake” it.

I’m not saying DISTINCT is always a bug – but it’s always worth just asking yourself if you were correct to be using it.

5 responses to “Mr DISTINCT might not be your friend”

  1. All good, I just use UNIQUE 😉

    1. Amateur – I fetch all the rows and de-dup on the client with a home-grown hashmap like a real man 🙂

  2. I’ll use an EXISTS with correlated subquery to EMP, add UNIQUE constraint on EMP(DEPTNO,ENAME), and then get rid of the DISTINCT….

    😉

  3. Or shuffle it all into Hadoop in the cloud and clear out the duplicates there 🙂

  4. […] while back I posted about how sometimes the appearance of the DISTINCT keyword in a SQL statement is a red flag to […]

Leave a reply to connormcdonald Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending