In-memory opportunities abound

Posted by

There has always been a bit of a Catch-22 with some of the really cool options in the Oracle Database. You want to explore the value of them, but you don’t want to draw the ire of any licensing implications of doing so. Of course, you can use XE or a trial version of the software, but nothing really helps prove (or disprove) the value of some functionality as much as running it on real Production volumes with real Production data.

So I was very very excited to see this in the 20c documentation:

image

You can now use 16GB for In-Memory without any additional license costs. All the relevant details are covered by Andy’s blog post here, along with the Office Hours video, so please check them out.

Also, I’m underplaying things by saying that this is just a means by which you can “trial” the in-memory feature. 16GB might not seem like much, but even with enormous databases, it is often the case that the “hot” data, or the data “of interest” is a much smaller subset.  For example, you might have 10 years of sales data in your 20TB database, but this weeks sales might only be 10-20 gigabytes. Naturally, in-memory facilities on the entire 20TB would deliver tremendous benefit, but even without that, throwing this weeks sales data into that 16GB in-memory store opens the door to real time analytics on your recent data, without impacting any existing analytic facilities you already have in place for the entire data set.

Don’t believe me? OK, how about a little demo to show you how good 16GB might be….I’ll create a table of sales transactions. Lets load a lazy 250million rows into that table!


SQL> create table cust_products as
  2  select rownum-1 product_id, 'Product name '||rownum prod_name
  3  from dual
  4  connect by level <= 1234;

Table created.

SQL>
SQL> create table cust_customers as
  2  select rownum-1 cust_id, 'Customer name '||rownum cust_name
  3  from dual
  4  connect by level <= 10000;

Table created.

SQL>
SQL> create table cust_sales as
  2  select
  3    mod(rownum,10000) cust_id,
  4    rownum    trans_id,
  5    mod(rownum,5000)/100 amt,
  6    sysdate+rownum/1000000 trans_ts,
  7    mod(rownum,1234) product_id,
  8    'Discount applied' commentary
  9  from
 10    ( select 1 from dual connect by level <= 1000 ),
 11    ( select 1 from dual connect by level <= 1000 ),
 12    ( select 1 from dual connect by level <= 250 )
 13  /

Table created.

SQL>
SQL> select round(blocks*8192/1024/1024/1024,2) gig
  2  from user_tables
  3  where table_name = 'CUST_SALES';

       GIG
----------
     12.85

1 row selected.

SQL>
SQL>
SQL>

As you can see, that table is around 13GB in size. Now I’ll run an analytic style query to get an assessment of product sales for a subset of customers.


SQL> set feedback only
SQL> set timing on
SQL> with summary as
  2   ( select s.product_id, sum(s.amt) tot
  3     from   cust_sales s,
  4            cust_customers c
  5     where  s.product_id > 10
  6     and    s.cust_id = c.cust_id
  7     and    c.cust_id > 100
  8     and    s.amt between 0 and 24
  9     group by s.product_id
 10   )
 11  select p1.prod_name, sm.tot
 12  from summary sm,
 13       cust_products p1
 14  where sm.product_id = p1.product_id;

1223 rows selected.

Elapsed: 00:01:15.18

To be honest, I’m pretty impressed even with this result. 75 seconds for 250million candidate rows with joins on a single disk home PC is nothing to scoff at, but even so, if I was hammering my OLTP system with queries that take 75 seconds, there is a good chance I will be causing some slow down or disruption to transaction performance.

I’ve allocated 16GB of in-memory on this instance. Now I’ll pop those tables into the in-memory store and populate it with an initial seeding query


SQL> alter table cust_sales inmemory;

Table altered.

SQL> alter table cust_products inmemory;

Table altered.

SQL> alter table cust_customers inmemory;

Table altered.

SQL>
SQL> set feedback only
SQL> select * from cust_products;

1234 rows selected.

SQL> select * from cust_customers;

10000 rows selected.

SQL> select count(cust_id), count(product_id) from cust_sales;

1 row selected.

SQL> set feedback on

--
-- [wait a bit]
--

SQL> select segment_name, populate_status, bytes_not_populated
  2  from v$im_segments;

SEGMENT_NAME                   POPULATE_STAT BYTES_NOT_POPULATED
------------------------------ ------------- -------------------
CUST_PRODUCTS                  COMPLETED                       0
CUST_CUSTOMERS                 COMPLETED                       0
CUST_SALES                     COMPLETED                       0

Now let’s re-run that same query.


SQL> set feedback only
SQL> set timing on
SQL> with summary as
  2   ( select s.product_id, sum(s.amt) tot
  3     from   cust_sales s,
  4            cust_customers c
  5     where  s.product_id > 10
  6     and    s.cust_id = c.cust_id
  7     and    c.cust_id > 100
  8     and    s.amt between 0 and 24
  9     group by s.product_id
 10   )
 11  select p1.prod_name, sm.tot
 12  from summary sm,
 13       cust_products p1
 14  where sm.product_id = p1.product_id;

1223 rows selected.

Elapsed: 00:00:02.67


Oh………my……..goodness.

The performance is not just the data being located in the in-memory column store. We can also take advantage of other facilities such as in-memory vector transformations. The explain plan reveals some of the changes in play here.


--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                             |   866 | 40702 | 26817  (37)| 00:00:02 |
|*  1 |  HASH JOIN                                 |                             |   866 | 40702 | 26817  (37)| 00:00:02 |
|   2 |   VIEW                                     |                             |   866 | 22516 | 26813  (37)| 00:00:02 |
|   3 |    TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   4 |     LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6619_C997BEF7 |       |       |            |          |
|   5 |      HASH GROUP BY                         |                             |     1 |     8 |     3  (67)| 00:00:01 |
|   6 |       KEY VECTOR CREATE BUFFERED           | :KV0000                     |     1 |     8 |     1   (0)| 00:00:01 |
|*  7 |        TABLE ACCESS INMEMORY FULL          | CUST_CUSTOMERS              |  9900 | 39600 |     1   (0)| 00:00:01 |
|   8 |     HASH GROUP BY                          |                             |   866 | 25114 | 26811  (37)| 00:00:02 |
|*  9 |      HASH JOIN                             |                             |   866 | 25114 | 26810  (37)| 00:00:02 |
|  10 |       TABLE ACCESS FULL                    | SYS_TEMP_0FD9D6619_C997BEF7 |     1 |     8 |     2   (0)| 00:00:01 |
|  11 |       VIEW                                 | VW_VT_0737CF93              |   866 | 18186 | 26808  (37)| 00:00:02 |
|  12 |        VECTOR GROUP BY                     |                             |   866 | 13856 | 26808  (37)| 00:00:02 |
|  13 |         HASH GROUP BY                      |                             |   866 | 13856 | 26808  (37)| 00:00:02 |
|  14 |          KEY VECTOR USE                    | :KV0000                     |   117M|  1799M| 26312  (36)| 00:00:02 |
|* 15 |           TABLE ACCESS INMEMORY FULL       | CUST_SALES                  |   117M|  1349M| 26290  (36)| 00:00:02 |
|  16 |   TABLE ACCESS INMEMORY FULL               | CUST_PRODUCTS               |  1234 | 25914 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------------

Furthermore, when you consider the way the table was populated, ie, via the MODULO function, the customer and product data is evenly scattered throughout the table. The in-memory compression and performance being obtained here is probably close to worst case! Yes…2 seconds to analyze 250million rows can be a worst case Smile

So now you have a risk-free mechanism to taking advantage of one of the biggest game-changing features to come to the Oracle database. Well…I say “risk free”. Truth be told, the biggest risk here is that once you show your customers all the goodness with just a 16G in-memory store, rest assured they’ll be clamouring all over you for more! Smile

And as Andy says – plans are afoot to bring this to a 19c release update as well.

So keep an eye out for the availability of this enhancement. If you’re getting ready for an upgrade cycle, this alone probably justifies moving to 19c from your current version when the time comes.

8 comments

  1. Hi Connor,

    That is an incredible feature. I really look forward to using it.
    Any plans to make it available on Authorized Cloud Environments?

    Regards,
    Mikhail.

  2. Thanks for socializing this info fantastic news. 2 thoughts/questions
    So, in theory, for 19c once it is ported we could set INMEMORY_SIZE to ~15G and not worry about licensing costs or accidental overages?
    I assume for products that have receive a ‘free’ limited use license such as OEM can still use INMEMORY. Correct? This would be awesome for OEM 13c performance.

  3. Interesting. Very. But to avoid confusion, do you want to change 16G to 16GB?

    https://en.wikipedia.org/wiki/Gigabyte

    I kinda wondered if you were talking about a version of the database I hadn’t come across… 6.0.36, 7.3.4, 8.0.4, 10.2, 11.4, 12c, 19c, 20c, 16G?

    My guess is it’s an Aussie thing. After all, the rest of the world uses “dayta”, not “daata”… 🙂

  4. Hi

    This was very interesting and valuable piece of information.
    Please also inform this community if/when this feature comes available to 19c.
    There is so many things going on with Oracle, that one truly has difficulties with keeping up.

    In Your example You had 13gb table, but could it have been bigger than 16gb?
    Is that 16gb limit related to size of in-memory store where in-memory data is compressed or is it related to the size of original table ?
    What happens when this 16 gb limit is reached ?

    I have once read:
    <>

    lh

Leave a Reply to Mikhail Velikikh Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.