Had an interesting AskTom question today about dynamically keeping partition sizes in check. The poster had large table, already partitioned, but since there was no logical column to do a range partition on, it was partitioned on a surrogate primary key value. Due to the volatility of the data, (with a BLOB column on the table) evenly sized primary key ranges led to dramatically different partition sizes, causing manageability concerns. Partitions could be split and/or merged, but obviously if those partitions contain data, that is both a very expensive operation, and could impact availability of the data depending on the type of maintenance operation performed.
So the challenge became – can we proactively have (roughly) equi-sized partitions, when we don’t know how and when the data is going to come in. Obviously, its virtually impossible to know exactly every time an extent is allocated to a segment, and we don’t want to run things to tightly that we end up not being able to do inserts. It is like the old stock market phrase: “Past behaviour is not a reliable indicator for future behaviour“, in that all it takes is a few very very large blobs to easily get you to a large partition size in just a few rows, whereas tomorrow, it might take millions of rows (each with tiny blobs) to get you to 10g.
So we are probably going to have some sort of compromise.
That compromise might be relying on the “fast split” optimization where you pick a point where no data will exist on the “right” hand side of the split point.
So my logic for such a schema is this:
- walk along the partitions in order
- if “this” partition in empty, check the previous one for its size
- if its greater than our allowable threshold then
- split it into 2 partitions at a logical point so that one has all the data, one empty one (ptmp)
- split the “maxvalue” highest partition (pmax) into two (pn and pmax), to establish a new sensible boundary point for the new partition
- merge ptmp and pn
So with some numbers, our partition strategy might go along the following lines. We start with the following setup, we where have our partition definitions, and the current high water mark for the primary key column that is being used for partitioning
values less than current hwm
p1 1000 1000
p2 2000 1500
pmax maxvalue -
and we think p2 is now “too large”. We need to split p2 so that it cannot grow any further
Step1 – split p2
values less than current hwm
p1 1000 1000
p2 1600 1500
ptmp 2000 -
pmax maxvalue -
So we’ve now capped p2 at 1600. We would pick something like 1600 because when we commenced the split, the primary key value was up to 1500, so we want some leeway whilst inserts still carry on.
Step2 – split pmax
values less than current hwm
p1 1000 1000
p2 1600 1500
ptmp 2000 -
p3 3000 -
pmax maxvalue -
You may wonder why we split PMAX since it appears unrelated. We’ve done this, because ‘ptmp’ is no good to us because its capped at 2000, and of course, it could have started from anywhere between 1500 and 1999, depending on where the primary values were up to when we did this check. We want it capped at a sensible logical value, so we’ll be using 3000, which is currently assigned to P3.
Step3 – merge p3 and ptmp
values less than current hwm
p1 1000 1000
p2 1600 1500
p3 3000 -
pmax maxvalue -
Both P3 and PTMP are empty, so the merge of these two will be just a dictionary operation.
And we’re done. All of these operations, because they would be working on empty partitions should be virtually instantaneous. A complete implementation would include some further checks to ensure this is the case during the processing stage, but for the sake of a simple demo we’ll be optimistic.
So here’s the demo, where 30meg is the size limit for a segment. We’ll create a table partitioned on our surrogate key, and pre-load it with some data.
SQL> create table T ( x int, y char(200))
2 partition by range (x)
3 (
4 partition p1 values less than (300000),
5 partition p2 values less than (600000),
6 partition p3 values less than (900000),
7 partition p4 values less than (1200000),
8 partition pmax values less than (maxvalue)
9 );
Table created.
SQL>
SQL> insert /*+ APPEND */ into T
2 select rownum, rownum
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL>
SQL> create index IX on T ( x ) local;
Index created.
SQL>
SQL> alter table T add constraint T_PK primary key ( x ) using index local
2 /
Table altered.
SQL>
SQL> select bytes
2 from user_segments
3 where segment_name = 'T';
BYTES
----------
75497472
75497472
75497472
25165824
You can see that the earlier partitions are approx 70meg in size, and the “current” one is 25meg in size. Now in another session I’m running inserts in batches to simulate online activity which will make the table grow as the new rows come in.
SQL> declare
2 l_start int;
3 l_cnt int := 1;
4 begin
5 select max(x) into l_start from t;
6
7 for i in 1 .. 1000 loop
8 for j in 1 .. 10000 loop
9 insert into t values (l_start+l_cnt,i);
10 l_cnt := l_cnt + 1;
11 end loop;
12 commit;
13 dbms_lock.sleep(4);
14 end loop;
15 end;
16 /
And now, here’s my routine that keep’s an eye on things and automatically do some partition maintenance when the next-to-last partition gets large than 30meg
SQL> set serverout on
SQL> declare
2 l_has_Rows int;
3 l_hi_par varchar2(30);
4 l_hi_bytes int;
5 l_hwm int;
6
7 procedure ddl(m varchar2) is
8 begin
9 dbms_output.put_line(m);
10 execute immediate m;
11 end;
12 begin
13 for i in ( select p.partition_name, s.bytes, p.partition_position
14 from user_segments s,
15 user_tab_partitions p
16 where p.table_name= 'T'
17 and p.table_name = s.segment_name(+)
18 and p.partition_name = s.partition_name(+)
19 order by p.partition_position asc
20 )
21 loop
22 execute immediate 'select count(*) from t partition ( '||i.partition_name||') where rownum = 1' into l_has_rows;
23 dbms_output.put_line(i.partition_name||':'||i.bytes||':'||l_has_rows);
24 if l_has_rows > 0 then
25 --
26 -- we've hit a partition with rows
27 --
28 if i.partition_name = 'PMAX' then
29 raise_application_error(-20000,'We got rows in PMAX...thats a problem');
30 end if;
31
32 l_hi_par := i.partition_name;
33 l_hi_bytes := i.bytes;
34 else
35 --
36 -- see if we've hit PMAX, check size of prev one
37 --
38 if l_hi_bytes > 30*1024*1024 then
39 execute immediate 'select max(x) from t partition ( '||l_hi_par||')' into l_hwm;
40
41 ddl('alter table T split partition '||l_hi_par||' at ('||(l_hwm+10000)||') into (partition '||l_hi_par||', partition ptmp)');
42 ddl('alter table T split partition pmax at ('||(l_hwm+10000+300000)||') into (partition p'||i.partition_position||', partition pmax)');
43 ddl('alter table T merge partitions ptmp,p'||i.partition_position||' into partition p'||i.partition_position);
44
45 end if;
46 exit;
47 end if;
48 end loop;
49 end;
50 /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:25165824:1
PMAX::0
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:33554432:1
PMAX::0
alter table T split partition P4 at (1020000) into (partition P4, partition ptmp)
alter table T split partition pmax at (1320000) into (partition p5, partition pmax)
alter table T merge partitions ptmp,p5 into partition p5
PL/SQL procedure successfully completed.
SQL> /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:33554432:1
P5:8388608:1
PMAX::0
PL/SQL procedure successfully completed.
SQL> /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:33554432:1
P5:16777216:1
PMAX::0
PL/SQL procedure successfully completed.
SQL> /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:33554432:1
P5:33554432:1
PMAX::0
alter table T split partition P5 at (1140000) into (partition P5, partition ptmp)
alter table T split partition pmax at (1440000) into (partition p6, partition pmax)
alter table T merge partitions ptmp,p6 into partition p6
PL/SQL procedure successfully completed.
SQL> /
P1:75497472:1
P2:75497472:1
P3:75497472:1
P4:33554432:1
P5:33554432:1
P6:8388608:1
PMAX::0
PL/SQL procedure successfully completed.
SQL>
The first execution did nothing but report that partition P4 was 25megs in size. By the time of the next run, it was 33meg in size and so we split it, and created a new empty partition P5, being the merger of the two remnats we carved off the existing P4 and PMAX. Then the next couple of executions required nothing to be done, and finally we did the same to end up with P6.
So there’s the beginnings of a tool to keep partition sizes in check for random data arrival. Enjoy.