Mr DISTINCT might not be your friend

Posted by

Whenever you have the need to use the DISTINCT keyword, its worth just pausing for a second, and making sure that you are not hiding just a larger issue. It actually might represent either incorrect use of SQL or incorrect assumptions from the data model.

Consider the following example

SELECT DISTINCT d.dname
FROM   emp e, dept d
WHERE  e.ename = 'SMITH'
AND    e.deptno = d.deptno

The query is certainly valid, but when I see “distinct” I ask myself the following questions:

Has the DISTINCT has been added in an attempt to only return a single row ?, ie, is someone working under the assumption being that an employee name can only refer to a single department ? Unless there is a unique constraint on the ENAME column, then we can still just as easily get multiple rows back (even with the DISTINCT), so the SQL will be a “sleeping problem” in the application until the data causes it to fail.

Because the DISTINCT keyword here:

  • doesn’t give you any guarantee that you’ll only get one row, and
  • might be making the database do excessive work (get all rows, sort them and remove duplicates), every single time you run this SQL

Alternatively, perhaps there is meant to be a single department for an ENAME, ie, ENAME is unique. If that is the case, then we should change the database model/design to reflect it and not use DISTINCT to “fake” it.

I’m not saying DISTINCT is always a bug – but it’s always worth just asking yourself if you were correct to be using it.

5 comments

  1. I’ll use an EXISTS with correlated subquery to EMP, add UNIQUE constraint on EMP(DEPTNO,ENAME), and then get rid of the DISTINCT….

    ๐Ÿ˜‰

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.