Let’s start with this somewhat odd looking result. I have an inline function that returns a random number between 0 and 20, and I call that for each row in ALL_OBJECTS and then I extract just those rows for which the generated random number is 10. Seems simple enough….but why do I get results for which the value of the second column is most certainly not 10?
SQL> with
2 function
3 getrand(pval in number) return number is
4 begin
5 return round(dbms_random.value(0,20));
6 end;
7 my_rnd as
8 ( select object_id, getrand(object_id) getrnd from all_objects )
9 select *
10 from my_rnd
11 where getrnd = 10;
12 /
OBJECT_ID GETRND
---------- ----------
29 1
53 10
42 8
30 3
78 16
87 18
89 1
145 12
155 13
175 15
183 12
198 15
...
...
This is just a new variation of the same old theme – determinism.
There is no guarantee when and how many times the database will execute a function within a single invocation of a SQL statement, or even how many times it will be executed for a single candidate row the SQL statement is processing. The optimizer is totally entitled to shift it to anywhere in the plan. What does this mean ? It means only deterministic functions make sense in SQL. Let’s model the above with a simpler example:
SQL> create table t as select * from dba_objects where rownum <= 20;
Table created.
SQL> create or replace
2 function xgetrand(pval in number) return number is
3 begin
4 return round(dbms_random.value(0,20));
5 end;
6 /
Function created.
SQL> select *
2 from (
3 select object_id, xgetrand(object_id) gg from t
4 )
5 where gg = 10;
OBJECT_ID GG
---------- ----------
38 2
6 8
Immediately that looks odd, because the SQL reads like “I only want GG=10” yet I got back “2” and “8”. But if we drill into the execution plan, we’ll see this:
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 3 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("XGETRAND"("OBJECT_ID")=10)
Notice the FILTER information. The query has been transformed to be a simple:
SQL> select object_id, xgetrand(object_id) gg from t
2 where xgetrand(object_id) = 10;
If you’re doubtful of that claim, we can validate it by dumping a 10053 trace. In that you’ll find the following:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T"."OBJECT_ID" "OBJECT_ID","XGETRAND"("T"."OBJECT_ID") "GG" FROM "T" "T" WHERE "XGETRAND"("T"."OBJECT_ID")=10
Already you can see the scope for the function being called twice per row – once for the WHERE clause, and once for the SELECT part. In fact, for all we know it could be called three times, or four times. You (the coder) does not have control over that decision. For the sake of discussion, let’s assume it is called only twice. The first execution of the function (on line 2 above) returned 10 twice across the 20 rows (because we got 2 rows back in the result set), but then we ran the function again (on line1) as we gave back each row, hence the counter-intuitive output.
In the original example, ALL_OBJECTS is being used as a source for rows which is a complex view. The function call could be pushed anywhere deep into this view, which means it might be executed tens of thousands of times, and many of those executions might return a result of “10”.
The bottom line remains the same: non-deterministic means trouble for functions in SQL.
“… You (the coder) does not have control over that decision. …”
not true
with function
getrand(pval in number) return number is
begin
return round(dbms_random.value(0,20));
end;
my_rnd as
( select object_id, (select getrand(object_id) from dual) getrnd from all_objects )
select *
from my_rnd
where getrnd = 10
I disagree. You are relying on an *observed behaviour*. What if we change the implementation? What if the optimizer one day decides scalar queries can be folded back into the main query. I have no issue with using scalar subquery as a *performance* initiative, but to use it as a mean of influencing the *result* of a query is asking for problems.
Thats similar to the old practice of people using GROUP BY to “force” ordering on a result, which all went south in 10g when GROUP BY HASH came along.
thanks, good to know !
Hello Connor,
When I execute your first SELECT in LiveSQL, I get a “consistent result” π
with
function
getrand(pval in number) return number is
begin
return round(dbms_random.value(0,20));
end;
my_rnd as
( select object_id, (select getrand(object_id) from dual) getrnd from all_objects )
select *
from my_rnd
where getrnd = 10
/
OBJECT_ID GETRND
———————-
1783 10
1831 10
1839 10
1895 10
1897 10
1925 10
……………….
11007136 10
11007137 10
11007156 10
2667 rows selected.
I will NOT repeat the long execution plan, but it is essential that it contains a VIEW operation
at the outermost level, and then it applies the following filter:
…
Predicate Information (identified by operation id):
—————————————————
2 – filter(“GETRND”=10)
This is because you are calling the function “getrand” in a scalar query.
If you call it directly, like this:
with
function
getrand(pval in number) return number is
begin
return round(dbms_random.value(0,20));
end;
my_rnd as
( select object_id, getrand(object_id) getrnd from all_objects )
select *
from my_rnd
where getrnd = 10
/
then indeed you get an inconsistent result:
OBJECT_ID GETRND
———————-
144 7
1799 19
1813 2
1867 19
1885 2
1905 3
1938 13
…
11006414 16
11003792 19
11006551 18
11006573 6
11006587 3
2693 rows selected.
and deep, inside the execution plan, at step 6, we have a filter like the following:
6 – filter((“GETRAND”(“O”.”OBJ#”)=10 AND “O”.”NAME”‘_NEXT_OBJECT’ AND
For the “small table” case, though, the outcome is exactly as you showed, for example:
select *
from (
select object_id, xgetrand(object_id) gg from t
)
where gg = 10
/
OBJECT_ID GG
——————
535 5
1752 8
2 rows selected.
Predicate Information (identified by operation id):
—————————————————
1 – filter(“XGETRAND”(“OBJECT_ID”)=10)
Interestingly, we get a similar (aka “unconsistent”) result if we declare function XGETRAND as DETERMINISTIC, which in fact is exactly what we want here, that is,
to have it return always the same value for the same argument.
However, this does NOT happen.!
But, if we use a scalar query, like in the first example, we get again consistent results and the filter shows it:
select *
from (
select object_id, (select xgetrand(object_id) from dual) gg from t
)
where gg = 10
/
OBJECT_ID GG
——————
1767 10
Predicate Information (identified by operation id):
—————————————————
2 – filter(“GG”=10)
Another way for getting consistent results is by using the MATERALIZE hint:
with
my_rnd as
( select /*+ MATERIALIZE */ object_id, xgetrand(object_id) gg from t )
select *
from my_rnd
where gg = 10
/
OBJECT_ID GG
——————
143 10
/
OBJECT_ID GG
——————
1752 10
So, using scalar queries is an essential step in such SELECT-s, not just for optimization,
but also for “full control” and consistency π
Cheers & Best Regards,
Iudith Mentzel
See my comments to Matthias.
Hello Connor,
I do agree with you that the implementation of scalar queries might be changed in the future,
but, however, their functionality is “documented enough” and “in a favorable way”, for this NOT to happen.
Instead, I think that the MATERIALIZE hint is a kind of “imperative” hint that the optimizer should follow, though it is not exactly documented, in spite of being very useful in many cases.
I think that Oracle should rather become “more supporting” in what concerns the DETERMINISTIC clause, or maybe even introduce a new “QUERY DETERMINISTIC” variant of it,
which would say imperatively that the function should be executed ONCE ONLY for each distinct argument set, during each query execution, regardless of how many times the function will be called
either explicitly, in the code itself, or as a result of some optimizer query transformation.
Currently, in addition to some bugs, DETERMINISTIC only works per query FETCH
and, as I have tested with the small table example, it has NO influence at all if the function is called
in a WHERE clause.
Regardless of the fact that calling PL/SQL functions from SQL is not exactly the best thing to do,
I think that this operation is used “widely enough” for Oracle to come up with a fully documented feature that supports such usage in a consistent way (and without bugs, like those which suddenly appeared with DETERMINISTIC in the last versions).
Yet another way for achieving consistency, using only fully documented features, is to cache
the function results in a package collection and reuse the values if the same function is called again with the same argument ( we may call it a “session level result cache” or a “do-it-yourself materialization” ).
I have used this technique in many real-life scenarios, and it always proved to be very useful.
Cheers & Best Regards,
Iudith
re: however, their functionality is βdocumented enoughβ and βin a favorable wayβ, for this NOT to happen
Where? Show me in the docs (https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Scalar-Subquery-Expressions.html) where it says you can rely on how many times we’ll do any executions within them?
I liken this to the following:
select /*+ index_desc(my_table my_index_on_col) */ col from t where rownum = 1
to get the max value of ‘col’ from a table. It will *potentially* work just fine, but if it one day *stops* working (index dropped, optimizer changes, column contains nulls etc)…then guess what notification you’ll get from the database that something is wrong? Yup……NOTHING.
Scalar subqueries, MATERIALIZE etc etc – all are fine for a *performance* alteration. If you are using them to change the *function* (ie, the results), then you’re on a road that leads nowhere nice ….
Hello Connor,
Well … by “documented enough” I mean mainly the following Oracle Magazine post of Tom Kyte,
https://blogs.oracle.com/oraclemagazine/on-caching-and-evangelizing-sql
as well as having the topic discussed very often in AskTom questions.
Indeed, using scalar queries DOES NOT guarantee that the function will be called exactly once,
because this depends on the number of hash collisions in the hash table used to store the results.
In Tom Kyte’s post the hash table is described as having a default size of 255,
but there exists a hidden parameter “_query_execution_cache_max_size” that can influence this size.
Here is a seria of very nice posts dealing in depth with this issue:
http://orasql.org/2013/02/10/deterministic-function-vs-scalar-subquery-caching-part-1/
http://orasql.org/2013/02/11/deterministic-function-vs-scalar-subquery-caching-part-2/
http://orasql.org/2013/03/13/deterministic-function-vs-scalar-subquery-caching-part-3/
By the way, if it already comes to the documentation:
In the very short documentation at the link you indicated, there is an error, stating that
“scalar subquery expressions are not allowed in the returning clause of DML statements”.
This is NOT correct, as you can see for example here:
http://dboriented.com/2017/10/07/returning-into-enhancement-suggestion/
Regarding the INDEX_DESC hint … and all hints for that matter:
As Jonathan Lewis says in many of his posts (and maybe also documented somewhere),
a hint is always obeyed, as long as it is correct in the specific scenario,
and regardless of whether it does or does not improve performance.
Well … except if the hints were generally disabled by using the new parameter that allows this …
The problem is that a hint might become incorrect in many ways.
But, at least in very simple cases, when we are able to ensure their correctness,
hints SHOULD always be followed.
I think that both MATERIALIZE and INDEX_DESC (at least on a single table query)
are such “unambiguous” hints, that should be followed, and from my own experience, they are !
Of course, dropping the index changes the “game rules” for an INDEX_DESC hint,
but that’s a different problem.
Using MATERIALIZE unambiguously states that “the result of the subquery should be calculated
and stored in a temporary space, and further on, used as if it were a physical table”
and, as such, it should be followed, for better or for worse.
There is a wide “folklore” regarding alternatives to achieve the same effect,
like selecting and/or using a ROWNUM column, a.s.o.
And, yes … if already at this point, a “philosophical” thought that just comes to me …
Maybe a new simple ERROR hint could be added, with the effect of having the statement
raise an error if *any* of the other hints specified in the *same* hint group (same comment)
cannot be applied, for any reason.
I really wish that Oracle could offer “real syntax” alternatives for achieving the effect of at least some of the hints … and, a long time ago, when the first hints were introduced in Oracle 7, I thought that this would happen further on … but, instead, the number of hints just keeps increasing and the percentage of the documented ones seems to be decreasing …
Well … as a corollary to this discussion, maybe you could think of a few nice Office Hours session
topics regarding the “hints philosophy” as a whole π
Cheers & Apologizes for my so long posts π
Iudith
@Connor: where is the “like”-button to Iudith’s comment ?
@Iudith: ***** – 5 stars from me !
‘I do agree with you that the implementation of scalar queries might be changed in the future,
but, however, their functionality is βdocumented enoughβ and βin a favorable wayβ, for this NOT to happen.’ – exactly !
Connor might be right *theoretically*, but I am also quite confident that *in practice* the implementation will not change. I think Tom had the same opinion, otherwise he most likely wouldn’t have written the article you mentioned.
Any chance we will see your ideas on https://community.oracle.com/community/groundbreakers/database/database-ideas ?
Please read Tom’s article again – it is about changing the *performance* characteristics. That is totally different to using such a technique to change the *results* of the query. As I said to Ludith – go ahead and do whatever you like to get the best performance out of a query, but if you rely on techniques to change the actual result…then ultimately, that’s gonna work out badly π
Hello All,
@Matthias, thanks a lot for your favorable comments π
@Connor, you are right, of course, but, however there is an essential difference between the
approaches we discussed:
1. Using a scalar query (with today’s implementation) is almost always improving performance
and sometimes (in a high perceived percentage) can make results *to appear* as deterministic.
BUT
2. Using the MATERIALIZE hint (if followed by the optimizer, and I see no reason for NOT to be),
DOES ensure 100% deterministic result in cases like this example,
and, by avoiding repeated PL/SQL function calls, is also very likely to improve performance.
So, these two are qualitatively different.in my opinion, so cannot be put both at once
on a “black list” of the “to be avoided bad guys” !
By the way, we have not discussed at all the RESULT_CACHE solution,
which is somehow similar to the scalar query in that it will produce a “deterministic effect”
in a very high percent (misses could however happen, because of cached results being aged out from the RC), but, considering the possibly high number of distinct input arguments,
it is probably not the best option in this case.
In summary, if this was a real-life problem, requiring a 100% consistent result,
I would go for the METARIALIZE hint solution, as an alternative to storing the inner query
results in a temporary table.
Cheers & Best Regards,
Iudith
So let me pose this question to you…
a) I have a query containing a MATERIALIZE hint, and we use that hint to rely on the *result*, eg if the hint was not there then we would send the wrong person to prison because the query contents are not properly deterministic
b) We upgrade to 18c and set “optimizer_ignore_hints” = true.
What happens?
I know the answer…and (spoiler alert) luckily it just so happens the MATERIALIZE hint is exempt, but it demonstrates the folly of a hint being used to determine *result* instead of behaviour.
Hello Connor,
The only place in the 18c documentation where the OPTIMIZER_IGNORE_HINTS is specified at all
is the database reference … the SQL Tuning manual DOES NOT say a word about it …
As such, it is a very strong two-edged sword that a DBA might choose to use blindly against the developers … which is simply bad !
But its setting can also be overridden at session level … so it becomes a “power game”
rather than a tool …
Do you mean that the MATERIALIZE hint is NOT ignored, even in the presence of “optimizer_ignore_hints” = TRUE ? If so, then GREAT π
I know that it also does not influence the PARALLEL hint, for example …
The MATERIALIZE hint is just a more elegant way of achieving the same that can also be achieved by populating a (temporary) table in a separate step … which is obviously a possible solution.
I just simply don’t feel comfortable with using a “more laborious” way for achieving the same effect
as an available feature.is able to.
As I said, I would have liked more to use a SQL clause instead of a hint for achieving ANY effect,
and I think that Oracle should think indeed about enhancing the SQL syntax itself
with a few such options that can be useful in many scenarios.
For a comparison:
As an alternative to using the INDEX_DESC hint that you mentioned,
we have now the possibility to use an ORDER BY … FETCH NEXT 1 ROW in syntax
( achieving hopefully the same performance as well ).
Thanks a lot for your patience & Best Regards,
Iudith
If you’re talking docs, then perhaps you’d like point me to the reference in the docs where the MATERIALIZE hint is covered? π
I’d love a “obey-hint-or-error” setting in the database, but until that day arrives, any piece of code that relies on a hint to change the results of a query is sheer folly.
@Iudith
“… The MATERIALIZE hint is just a more elegant way of achieving the same that can also be achieved by populating a (temporary) table in a separate step …”
not quite true, I used to use MATERIALIZE quite a lot, however, it mostly sucks because it is hard to tell the optimizer the cardinalities of the data you have materialized.
Therefore, I use more and more temp-tables, as you can gather stats on them after populating.
Finally, Oracle seemed to see that this technique has its rights and adopted plenty of Microsoft’s features on temp-table SQL Server imlements since ages !
Hello All,
Yes, temporary tables are a nice feature by themselves … though the newer 18c private temporary tables even don’t allow to gather statistics …
By the way, I don’t see any logical reason that, for example if we indicate to Oracle that we want the results of a WITH subquery to be materialized, this would mean ignoring all the statistics that could be used normally for calculating the cardinality of the result of that subquery,
we just indicate that this result set is NOT to be transformed in any way (merging, unnesting, a.s.o.).
@Connor, you are right of course that the MATERIALIZE hint is not documented … just as its
counterpart INLINE hint ,,, and the so related CARDINALITY hint ,,,
Reading a lot of web posts from some of the best known optimizer specialists (Jonathan Lewis is one example, and here are many others), one learns a lot about these hints, and, in the luckiest case,
also remembers those that are discussed more frequently …
I wish that Oracle would indeed document ALL the hints, explaining clearly what they do or what they do not.
If the official intention was to entirely conceal all these undocumented hints from the entire Oracle community, then how come that so many people (yes, most of them real gurus on these topics)
do however know so much about them ?
I don’t think that exactly the undocumented hints are definitely more dangerous to use than
the undocumented ones …
I do agree that it is best to avoid ALL hints, and let the optimizer do what it is created for.
The optimization philosophy based 100% on (statically or dynamically) collected statistics
cannot cover each and every possible intricacy of the real life scenarios … Life is many times much more complicated that just joining and filtering tables (however many of them, all with equijoins, a.s.o.) … so that Oracle itself does state very clearly in each documentation version that there do really exist situations when the developer “knows more” about the data AND the problem to be solved than Oracle is able to know, by all the “officially available” techniques …
So, along the many versions, the internal optimization techniques kept becoming more and more sophisticated, and partially less and less explained in detail in the documentation, just for trying to acquire better performance with less need for overly sophisticated coding.
And, sometimes, the “logical path” taken for solving a problem correctly is also as important
as performance, so, just as more and more tools (internal as well as documented) are added for improving performance, there is an equal need to also supply enough tools for “coding for a correct logical path”.
Back to the original problem that started this thread:
I think that the developer needs a clear and convenient way to “tell Oracle” that he wants
each PL/SQL function be called ONCE ONLY for each input argument, and then reuse that calculated value as many times as necessary, because this is the logical path required for a correct solution.
It’s of course very hard, if not impossible for Oracle to deduct that such a requirement is in place,
but it can be very easy for it to obey any (wishfully documented) way (syntax, hint, a.s.o.)
that allows the developer to communicate to Oracle that this logical path should be followed.
In this particular case, I don’t indeed understand why the NO_MERGE hint ( which IS documented )
does NOT have the same effect of “keeping the inline view unmerged” …
For example, the following still returns inconsistent results π¦
with
function
getrand(pval in number) return number is
begin
return round(dbms_random.value(0,20));
end;
select *
from
( select /*+ NO_MERGE */ object_id, getrand(object_id) getrnd from all_objects )
where getrnd = 10
/
But, instead, using “the ROWNUM trick” DOES make the consistency effect,
just the same as the MATERIALIZE hint π π π
with
function
getrand(pval in number) return number is
begin
return round(dbms_random.value(0,20));
end;
select *
from
( select object_id, getrand(object_id) getrnd, ROWNUM rn from all_objects ) v
where getrnd = 10
/
Also, a “method” not exactly documented (as far as I know ), but known to many developers.
I personally learned it from Tom Kyte … maybe the first thing I learned from him in our first
( and unforgettable for me ) face-to-face meeting π
Maybe you can discuss this issue in a little detail in an Office Hours session ?
Cheers again & Best Regards,
Iudith
interesting:
we can control the Oracle’s decision also by caching object_id via scalar subquery instead of caching getrand
https://livesql.oracle.com/apex/livesql/s/im0khsvty6r8gxsnepbkfd2e5
Hello all,
This is really interesting !
Here things even become and a little “mystical”, because scalar query caching, even if “accepting”
its current implementation, only states that the SAME scalar query will not be reexecuted for the SAME argument (here object_id), but, theoretically, this is not supposed to prevent executing
GETRAND several times … because it is NOT the result of GETRAND that is cached.
Cheers & Best Regards,
Iudith