I had an AskTOM question recently about finding occurrences of a set of words within sentences stored in a table.
For example, if you have two sentences,
- The quick brown fox jumps over the lazy dog
- Never trust a fox. Looks like a dog, behaves like a cat.
and I were given the phrase “fox,dog,cat” then I need to report that the 2 of the terms occur in the first sentence, and 3 of the terms occur in the second.
To test this on a slightly larger set of rows than just 2, I extracted the plain text from the 23ai SQL Language reference, preserving the new lines, and loaded this into a database table called T. For example, some random rows.
SQL> select id, txt
2 from t
3 where rownum <= 20
4 and txt like '%match%';
ID TXT
---------- --------------------------------------------------------------------------------------------------
197 [e is friends] is an edge pattern that matches either incoming or outgoing
268 ^ matches the position before the first row in the partition. this
538 a mismatch domain_check_type raises an error.
603 a path pattern specifies a linear pattern that matches a string of vertices and edges.
611 a percent sign % in the pattern can match zero or more characters as opposed to
612 a period . does not match the newline character.
388 a dash as an alternative to schema_name indicates that if match_string matches
431 a foreign key constraint requires values in one table to match values in another table.
450 a graph table shape defines how the result of pattern matching should be transformed into
473 a is person is a vertex pattern that matches vertices labeled person and binds
474 a java name, or a wildcard that can match any name.
1490 addition, the current cdb name should match the target cdb name specified in
704 a row pattern match consists of a set of contiguous rows in a row pattern partition. each row
723 a semijoin returns rows that match an exists subquery without duplicating rows from
894 a vertex pattern is a pattern that matches vertices in a graph. the result of such matching is
1037 accept a sequence of multiple values matched by the rhs path expression. note
1039 accept for a match. in this table, n and m represent unsigned integers.
1820 aliases must match the number of expressions selected by the view. aliases must follow the
1893 all of its graph element tables are defined with a primary key that matches an
1660 after match skip to last up
20 rows selected.
Using a technique I saw on Stew Ashton’s excellent SQL blog, it’s easy to transform the list of search words into rows,
SQL> with words as
2 ( select 'create,database,privilege,select,drop,view,table' terms
3 from dual
4 )
5 select term from words w,
6 json_table(
7 replace(json_array(w.terms), ',', '","'),
8 '$[*]' columns (
9 term varchar2(50) path '$'
10 )
11 );
TERM
--------------------------------------------------
create
database
privilege
select
drop
view
table
and then of course the task becomes straightforward. To keep the SQL easy to read I’ve loaded the search terms into their own table TERMS using the technique above. I’ve also loaded both the SQL language text as well as the search terms in lower case to not bother with mixed case and the like. I’ve also not bothered with the potential intricacies of what defines a “word” in a word search, for example, Does it need to be a whole word or partial word match? What about plurals? etc etc. This is just a very simple “Does a sequence of characters exist in the larger set?” task.
With that in mind, one possible SQL to achieve this is below. To keep the output at a reasonable size, I’ve limited it to only those rows that contained 4 or more of the 7 terms.
SQL> select id, listagg(val,',') within group ( order by val) result, count(*) hits, txt
2 from terms,
3 t
4 where instr(t.txt,terms.val) > 0
5 group by id, txt
6 having count(*) > 3
7 order by 1;
ID RESULT HITS TXT
---------- --------------------------------- ---------- ------------------------------------------------------------
1319 create,drop,privilege,table 4 add privileges create any table, drop any table;
3938 create,privilege,select,table 4 any table, create any sequence, and select any sequence syst
em privileges.
8058 create,database,table,view 4 clause, then oracle database creates the materialized view i
n the default tablespace
10604 create,database,privilege,table 4 create an index on the table. you need these privileges beca
use oracle database creates an
10980 create,privilege,select,view 4 create materialized view system privilege, and the read or s
elect object privilege on
11735 create,select,table,view 4 created. a query that selects rows from two or more tables,
views, or materialized
12700 create,database,table,view 4 database creates the materialized view log in the schema of
its master table. you cannot
12728 create,database,drop,table 4 database drops the nested table segments of the current part
ition and creates new
13014 database,drop,select,table 4 database, drop tablespace, resumable, select any
15527 create,drop,privilege,table 4 drop privileges create any table
15528 create,drop,privilege,table 4 drop privileges create any table;
22941 database,drop,table,view 4 if the table has a materialized view log, then the database
drops this log and any
22950 database,drop,table,view 4 if the tablespace contains a materialized view log, then the
database drops the log and
23150 create,database,table,view 4 if you create views that refer to remote tables and views, t
hen the database links you specify
23216 create,database,drop,table,view 5 if you drop a materialized view that was created on a prebui
lt table, then the database
24223 create,select,table,view 4 in order to create a materialized view whose defining query
selects from a master table
33695 drop,privilege,select,table 4 on the table and the select privilege on the sequence. if th
e sequence is later dropped, then
37829 database,select,table,view 4 pl/sql object name. you can query a table or view in the oth
er database with the select
38528 create,drop,privilege,table 4 privileges create any table and drop any table:
38529 create,drop,privilege,table 4 privileges create any table, drop any table
38530 create,drop,privilege,table 4 privileges create any table, drop any table;
43999 privilege,select,table,view 4 select read or select privilege, insert, update, or delete r
ows from all the tables or views on
49214 create,database,table,view 4 sysdate. a materialized view log was created for the employe
e table, so oracle database
49281 create,drop,privilege,table 4 system privileges create any table and drop any table in the
current container:
52489 create,drop,privilege,table 4 the following statement adds the system privileges create an
y table and drop any table to
52796 create,drop,privilege,table 4 the following statement drops the system privilege create an
y table from unified
54243 create,privilege,table,view 4 the owner of the materialized view must have the create tabl
e system privilege. the
57316 drop,privilege,table,view 4 to drop a materialized view log, you must have the privilege
s needed to drop a table.
57413 create,privilege,select,view 4 to grant select and update privileges on the view emp_view,
which was created in creating
61995 drop,privilege,table,view 4 view system privilege. you must also have the privileges to
drop the internal table and
62596 create,database,table,view 4 when you create a materialized view, oracle database creates
one internal table and
31 rows selected.
But here’s a question for the curious out there. What other SQL techniques could be employed here to solve this problem? I stress, this isn’t me withholding a different clever solution from you, which I’ll subsequently spring upon you later. This is just a genuine curiosity to see how other minds might tackle the problem, and not specifically with a mindset of “better” or “faster”, just “different”.
You can download the population scripts for both T and TERMS from my github repo here.
I hope to see some cool ideas in the comments.
.




Leave a reply to iudithd5bf8e4d8d Cancel reply