GROUP BY might be distinctly better than DISTINCT

Posted by

One of the cool things with materialised* views in Oracle is their ability to be kept in sync with the source table(s) from which they are derived from, in real time or near real time. To achieve this, we typically employ mechanisms such as materialised view logs to capture modifications to the source tables, and occasionally we need to change the definition of the materialised view itself when dealing with aggregations and joins. However, some times we know that if DML on the source is incredibly rare and/or the cost of updating the materialised view is very small, we can avoid all that and simply perform a REFRESH COMPLETE whenever a transaction is committed on the source tables. This avoids any issue around the materialised view becoming stale, and also avoids the need for scheduler jobs to keep the materialised view refreshed.

But all of that is really background to set up the demo below. What I’m really talking about in this post is that although a GROUP BY can be synonymous with a DISTINCT, be aware that the database engine can treat them differently in some circumstances. For example, in most circumstances, attempting to use DISTINCT in a materialised view that is designed to be fast refreshed or refreshed on commit, is going to raise the ire of the database.



SQL> create table t as select * from dba_objects;

Table created.

SQL> create materialized view  mv
  2   compress basic logging
  3    build immediate
  4    refresh complete on commit
  5    disable query rewrite
  6    as select distinct owner from t;
  as select distinct owner from t
                                *
ERROR at line 6:
ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view

However, replacing the DISTINCT with a GROUP BY, which of course is the same net result for the data does not incur the same issue and the materialised view works just fine.



SQL> create materialized view  mv
  2   compress basic logging
  3    build immediate
  4    refresh complete on commit
  5    disable query rewrite
  6    as select owner from t group by owner;

Materialized view created.

Some (incomplete) tests show that our materialised view is indeed refreshing as intended



SQL> select count(*) from t where owner = 'Y';

  COUNT(*)
----------
         1

SQL> select * from mv where owner = 'Y';

OWNER
------------------------------
Y

SQL> delete from t where owner = 'Y';

1 row deleted.

SQL> commit;

Commit complete.

SQL> select * from mv where owner = 'Y';

no rows selected

A while back I posted about how sometimes the appearance of the DISTINCT keyword in a SQL statement is a red flag to indicate an error in the composition of the SQL, but the demo above shows that perhaps opting for GROUP BY instead of DISTINCT might be a good option as a general “rule of thumb” (shakes head in disappointment in himself for using the expression Smile). Of course, if you do opt for GROUP BY, then make sure you add some appropriate commentary lest the next developer comes along and thinks “Hey, I’ll switch this for DISTINCT” .

* – yeah, I know I’m using the the non-US spelling for “materialised”. That’s the price you pay for reading an Australian blog Smile

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.