The big table dilemma – what about transaction rates?

November 14, 2021

7–11 minutes

Yesterday I posted some demonstrations of the impressive power of indexes in relational databases for data retrieval even when those tables get massive in size. That resulted in a few comments on Twitter and on the post itself regarding the other half of the “equation”, namely, that my test was potentially biased in favour of query performance because I loaded the entire table before I added the indexes.

This means the indexes are nicely compacted with no “fragmentation”. This is a valid point but as I responded on Twitter, when you trying to do a weekend query test with a 2billion row table, then pre-loading the data makes a lot of sense unless you want to lose most of your weekend waiting for the table to be populated with single row inserts Smile

In particular, given that on my podcast recently I discussed why it can be good to have a mindset of “Every index is a bug”

then pre-loading a table, and indexing it to the hilt might seem somewhat hypocritical Smile

So to make sure I’m not hiding/missing any issues or distorting the test, let us now consider the performance of getting the data into this table in a way that we would expect from a real world application, ie, taking clicks from a large user base. Using the same population script, I’ll add a predicate to load only 1/3rd of the data (~670million rows) and then I’ll explore loading additional data in a real world scenario, ie, single row inserts from concurrent sessions.


SQL> create table user_likes (
  2    user_id int not null,
  3    post_id int not null,
  4    dt      date default sysdate not null
  5  ) ;

Table created.

SQL> insert /*+ APPEND */ into user_likes
  2  with
  3  u as (
  4    select /*+ materialize */ rownum user_id from dual
  5    connect by level <= 200000
  6    order by dbms_random.value ),
  7  p as (
  8    select /*+ materialize */ rownum post_id from dual
  9    connect by level <= 10000
 10    order by dbms_random.value )
 11  select user_id, post_id, sysdate
 12  from u,p
 13  where mod(user_id+post_id,3) = 0;

666666667 rows created.

I’m going to index the tables slightly differently, ie, as non-unique, because that will make my benchmark code easier Smile because I’m not going to have to worry about duplicates – I can just generate a random user and a random post and just slam a row in there. If anything, this approach should worsen my benchmark results because I’ll have greater likelihood of hitting pre-existing index data and needing to split leaf and/or branch blocks.


SQL> create index user_likes_ix1 on user_likes ( user_id, post_id );

Index created.

SQL> create index user_likes_ix2 on user_likes ( post_id, user_id );

Index created.

To benchmark concurrent application workload, I’m going to use a benchmark mechanism you may have seen previously from my session on physical design I did earlier this year. (You can see that video at the bottom of this post). I have a table called RESULTS which not only will store the benchmark results, it will serve as a semaphore mechanism to ensure all concurrent threads of my application are launched at the same time.


SQL> create table results as select rownum-1 seed, 0 tps, 0 ela, systimestamp st, systimestamp en from dual
  2  connect by level <= 6;

Table created.

Now my benchmark has two phases, an INIT phase will pre-calculates the random data to be inserted. I do this because I don’t want the cost of generating randomised data to distort the benchmark result. (In a real application, users of course know what posts they want to click “Like” on!). Then the RUN section of the code will

wait to get access to the RESULTS table (our semaphore)
perform 1,000,000 random single “Like” requests (ie, an insert into our USER_LIKES table), each as a single transaction
then store the throughput at the end of the run


SQL> create or replace
  2  package benchmark is
  3    iter int := 1000000;  --  new likes to post
  4
  5    idx  int;
  6    seed int;
  7    l_start timestamp;
  8
  9    type numlist is table of number
 10      index by pls_integer;
 11
 12    l_user      numlist;
 13    l_post      numlist;
 14
 15    procedure init(p_seed int);
 16    procedure run(p_seed int);
 17  end;
 18  /

Package created.

SQL>
SQL> create or replace
  2  package body benchmark is
  3
  4  procedure init(p_seed int) is
  5  begin
  6    seed := p_seed;
  7    dbms_random.seed(p_seed);
  8
  9    select mod(rownum,200000)+1 bulk collect into l_user
 10    from dual connect by level <= iter
 11    order by dbms_random.value;
 12
 13    select mod(rownum,10000)+1 bulk collect into l_post
 14    from dual connect by level <= iter
 15    order by dbms_random.value;
 16
 17  end;
 18
 19  procedure run(p_seed int) is
 20    delt number;
 21  begin
 22    l_start := systimestamp;
 23    for i in 1 .. iter
 24    loop
 25      insert into user_likes (user_Id, post_id)
 26      values (l_user(i), l_post(i));
 27      commit;
 28    end loop;
 29    delt := extract(second from (systimestamp-l_start)) + 60*extract(minute from (systimestamp-l_start));
 30    update results
 31      set tps = round(iter / delt,1),
 32          ela = delt,
 33          st = l_start,
 34          en = systimestamp
 35    where seed = p_seed;
 36    commit;
 37  end;
 38
 39  end;
 40  /

Package body created.

SQL>
SQL> lock table results in exclusive mode;

Table(s) Locked.

Because my PC has 8 cores (I told you it was old Smile ) I’ll run 6 of these concurrently, leaving 2 cores for the database and OS to do their thing.

So we’ll launch the 6 concurrent SQL sessions slamming this table, totalling 6million transactions and see what results we get.


--
-- launch 6 new sessions that will wait on my commit
--
SQL> host start sql_plus @benchmark.sql 0

SQL> host start sql_plus @benchmark.sql 1

SQL> host start sql_plus @benchmark.sql 2

SQL> host start sql_plus @benchmark.sql 3

SQL> host start sql_plus @benchmark.sql 4

SQL> host start sql_plus @benchmark.sql 5

SQL> commit;

Commit complete.

--- now we wait

SQL> select min(ela), max(ela), avg(ela) from results;

  MIN(ELA)   MAX(ELA)   AVG(ELA)
---------- ---------- ----------
     589.9      608.4      593.8

SQL> select sum(tps) from results;

  SUM(TPS)
----------
   10103.1

I think that’s a pretty impressive result. I can achieve about 10,000 transactions (ie, “Likes”) per second. For context, I read recently that VISA normally performs 4,000 transactions per second world-wide. I can’t vouch for the validity of that statement, but if true, I think that speaks volumes for the performance the benchmark is getting here.

And just in case you’re thinking I’m gaming the system here, here’s the memory allocation for my database on this PC


SQL> show sga

Total System Global Area 3221225240 bytes
Fixed Size                  9273112 bytes
Variable Size             889192448 bytes
Database Buffers         2298478592 bytes
Redo Buffers               24281088 bytes

Yes, only 3G of RAM in total allocated! The PC has 32G of RAM but since I run a lot of databases on this box, I’m need to be frugal on the memory that each one gets Smile

I want to stress – I’ve deliberately not exploited any of the “advanced” features of the database to gain a performance boost here. If you take a look at the video from my channel at the tail of this post, I talk about several mechanisms that could be used to perhaps bump up this performance by as much as 300%. But even without using any of those techniques, it is fair to say that a database novice, who merely creates a table, adds indexes as per their anticipated query requirements, and wants to scale up to 2 billion rows, can still get 10,000 transactions per second on modest hardware

But now that I’ve smashed in 6million rows, the question remains – what damage have I done the indexes? Do database indexes get stale? Do my previous query performance tests still hold up?

Lets repeat the test with a slight modification to take into the account that we now have a variable number of rows for each user and/or post data in the table. (Once again you can just skip to just the bits in blue if you don’t want death-from-code)


SQL> set serverout on
SQL> declare
  2    type numlist is table of number index by pls_integer;
  3    iter int := 50000;
  4    u numlist;
  5    p numlist;
  6    res numlist;
  7    s1 timestamp;
  8    s2 timestamp;
  9    s3 timestamp;
 10    delt number;
 11    lo number := 999999999;
 12    hi number := 0;
 13    cnt int;
 14  begin
 15    select trunc(dbms_random.value(1,200000) )
 16    bulk collect into u
 17    from dual
 18    connect by level <= iter;
 19
 20    select trunc(dbms_random.value(1,10000) )
 21    bulk collect into p
 22    from dual
 23    connect by level <= iter;
 24
 25    s1 := localtimestamp;
 26    for i in 1 .. iter
 27    loop
 28      s2 := systimestamp;
 29      select user_id
 30      bulk collect into res
 31      from   user_likes
 32      where  user_id = u(i);
 33      delt := extract(second from localtimestamp - s2);
 34      if delt < lo then lo := delt; end if;
 35      if delt > hi then hi := delt; cnt := res.count; end if;
 36    end loop;
 37    delt := extract(second from localtimestamp - s1) + 60*extract(minute from localtimestamp - s1);
 38
 39    dbms_output.put_line(iter||' executions: All posts for nominated user');
 40    dbms_output.put_line('=====================================');
 41    dbms_output.put_line('Total: '||delt);
 42    dbms_output.put_line('Avg: '||(delt/iter));
 43    dbms_output.put_line('Low: '||lo);
 44    dbms_output.put_line('Hi: '||hi||' (for '||cnt||' rows returned)');
 45
 46    lo  := 999999999;
 47    hi  := 0;
 48    s1 := localtimestamp;
 49    for i in 1 .. iter
 50    loop
 51      s2 := systimestamp;
 52      select user_id
 53      bulk collect into res
 54      from   user_likes
 55      where  post_id = p(i);
 56      delt := extract(second from localtimestamp - s2);
 57      if delt < lo then lo := delt; end if;
 58      if delt > hi then hi := delt; cnt := res.count; end if;
 59    end loop;
 60    delt := extract(second from localtimestamp - s1) + 60*extract(minute from localtimestamp - s1);
 61
 62    dbms_output.put_line(iter||' executions: All users for nominated post');
 63    dbms_output.put_line('=====================================');
 64    dbms_output.put_line('Total: '||delt);
 65    dbms_output.put_line('Avg: '||(delt/iter));
 66    dbms_output.put_line('Low: '||lo);
 67    dbms_output.put_line('Hi: '||hi||' (for '||cnt||' rows returned)');
 68
 69    lo  := 999999999;
 70    hi  := 0;
 71    s1 := localtimestamp;
 72    for i in 1 .. iter
 73    loop
 74      s2 := systimestamp;
 75      select user_id
 76      bulk collect into res
 77      from   user_likes
 78      where  post_id = p(i)
 79      and    user_id = u(i);
 80      delt := extract(second from localtimestamp - s2);
 81      if delt < lo then lo := delt; end if;
 82      if delt > hi then hi := delt; end if;
 83    end loop;
 84    delt := extract(second from localtimestamp - s1) + 60*extract(minute from localtimestamp - s1);
 85
 86    dbms_output.put_line(iter||' executions: Lookup (probably) single post for a nominated user');
 87    dbms_output.put_line('=======================================');
 88    dbms_output.put_line('Total: '||delt);
 89    dbms_output.put_line('Avg: '||(delt/iter));
 90    dbms_output.put_line('Low: '||lo);
 91    dbms_output.put_line('Hi: '||hi);
 92
 93  end;
 94  /
50000 executions: All posts for nominated user
=====================================
Total: 101.196
Avg: .00202392
Low: 0
Hi: .097 (for 3334 rows returned)

50000 executions: All users for nominated post
=====================================
Total: 1454.985
Avg: .0290997
Low: .007
Hi: .885 (for 66851 rows returned)

50000 executions: Lookup (probably) single post for a nominated user
=======================================
Total: 11.196
Avg: .00022392
Low: 0
Hi: .011

PL/SQL procedure successfully completed.

As you can see, for the small lookups, we’re still in the single digit millisecond region for performance, and for the worst case scenario (getting thousands of rows for a single post) we’re averaging around 30ms. This is actually better than the original benchmark, because I’m dealing with ~700million rows instead of 2billion. (Don’t flame me for using less rows – I want to have some weekend free time Smile )

Finally, if you’re thinking “What if there were multiple queries going on, what about the read contention?” then I can’t comment for other database engines, but its pretty much a non-issue with Oracle because we don’t ever do read locks. In saying this I’m not trying to throw shade on other relational databases here. Whilst I think that Oracle is the best of them, you should be able to get nearly the same performance here with any of the major players – SQL Server, DB2, Postgres, MySQL etc).

What I am saying is that … relational databases are insanely good. So next time you stumble across a blog out there on the interwebs which talks about how relational database can’t scale

then maybe point them this way and hopefully they’ll get a better understanding of why relational database have been so successful for decades.

If you are on Oracle, and want to see how you might be able to take 10,000 transactions per second to maybe 30,000 per second, then check out my physical design video below.

13 responses to “The big table dilemma – what about transaction rates?”

Vasily Suvorov

November 14, 2021 at 5:32 pm

Hi Connor! Can you please gather statistics on this table? 🙂

Reply
s2pafrica

November 14, 2021 at 6:27 pm

Hello Connor, very detailed and thorough report. I followed the initial article from Tweeter.

I cut my teeth on SQL databases myself, MSSQL 2005 and MySQL 4, so I’m well aware of the power relational databases have.

That said, I genuinely think there’s some truth that relational databases are difficult to scale horizontally.

While I miss the ACID guarantees absent in NoSQL databases, the ability to get incremental performance and/or storage gains by adding another inexpensive machine to the cluster is compelling for those on a budget.

My $.02!

Reply
1. Connor McDonald
  
  November 15, 2021 at 6:49 am
  
  Valid points and sure if I’m building Twitter or Instagram etc, I’ll need a different approach, but most of us aren’t doing that.
  
  For the vast majority of use cases out there, you can
  – take a simple hardware setup (or get the equivalent on a cloud vendor)
  – take a simple database design
  – do the normal indexing things you’d do for any database
  and a relational database will most likely well beyond your needs now and future.
  
  Reply
Connor McDonald

November 14, 2021 at 7:24 pm

The initial load is a direct load so there will already be pretty representative stats

Reply
1. Vasily Suvorov
  
  November 14, 2021 at 7:36 pm
  
  Well it looks more like exception, usually data comes from bussiness day by day, not by direct load. And statistics become stale time to time.
  
  Reply
Connor McDonald

November 14, 2021 at 9:22 pm

I’m just not sure on what you’re asking here? ie, that statistics will change things? or take a long time? or do something else?

And of course, real time stats (19c and above) can defer the need or reduce the frequency for a full re-gather

Reply
1. Vasily Suvorov
  
  November 15, 2021 at 12:16 am
  
  Usually gathering statistics on large tables takes long time and brings locking issues. I hope you show us how to gather them right on large table.
  
  Reply
  1. Connor McDonald
    
    November 15, 2021 at 7:10 am
    
    We don’t take locks with statistics gathering. Similarly, we use an aging out algorithm for query plans so we don’t end up with a parsing storm once the new stats are in place.
    
    Speed depends on how many resources you’d like to throw at it:
    
    SQL> set timing on
    SQL> exec dbms_stats.gather_table_stats(”,’USER_LIKES’)
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:09:05.96
    SQL> exec dbms_stats.gather_table_stats(”,’USER_LIKES’,degree=>4)
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:02:46.66
    
    Reply
Tony Johnson

November 15, 2021 at 1:24 am

Personally set up a daily interval partitioned table with additional 64 hash subpartitions per day with local Primary key index that does 10b new rows a day with 2.4 trillion rows total in the table. Most common queries are run that span multiple days and usually one subpartiion in each day returning many thousand rows in less than 5 seconds. Under Oracle 19c but started out in 12c. Only issue is full backups taking forever and many archive logs to keep managed.

Reply
1. Connor McDonald
  
  November 15, 2021 at 6:46 am
  
  True, there are plenty of ways to scale this to “infinity” but I didn’t want to be accused of “Hey, you’re cheating with all sorts of fancy tech”. This demo is to show that I can
  – take a simple hardware setup
  – take a simple database design
  – do the normal indexing things you’d do for any database
  and a relational database will most likely well beyond your needs now and future.
  
  Sure if I’m building Twitter or Instagram etc, I’ll need a different approach, but most of us aren’t doing that.
  
  Reply
Peter Crowther (@TheOzzardOfWiz)

November 28, 2021 at 11:06 pm

Looks about right. We have a hair over half a billion readings in one table, and about 100M status entries in a second, in a virtual machine that’s neither especially large nor has especially high bandwidth to disk. This lot has come in organically over the last decade. Far too large to reorganise, so page splits all over the place. SQL Server 2014. We typically see a few milliseconds from the client for inserts (some of which is network overhead), a few milliseconds for single-row retrieves, and a few tens of milliseconds for the multi-row retrieves that we care about.

Even at this scale, the physical design and application design have stayed very naive. I reckon there’s at least another order of magnitude in there, quite possibly two, before we have to think about scaling horizontally – though we might have to get a little cute about both application and physical design to get that second order of magnitude.

Reply
1. Connor McDonald
  
  November 29, 2021 at 6:37 am
  
  Rest assured someone is going to come along at some stage and look at those numbers and go … “We need a 384node cluster running sharded blah blah blah to handle that” 🙂
  
  Reply
  1. Peter Crowther (@TheOzzardOfWiz)
    
    November 30, 2021 at 7:13 pm
    
    Of course they are. And they’ve probably got shares in either a power company or a hyperscale provider ;-).
    
    Reply