I was reading a very interesting article on Uber’s move from Postgres to MySQL. I really like it when IT professionals and/or companies take the time to explain their technology decisions. It’s a brave thing to do, because it’s easy for people to jump on the bashing bandwagon (“Ha ha … Company X chose Y and now they’re bust” etc etc). It’s the same reason you rarely see detailed customer reference information about the technology they are using, or how they are succeeding or failing. It’s generally kept pretty quiet. So for Uber engineering to be open about it is impressive, and a lesson for us all.
Not being as familiar with either Postgres or MySQL, one statement really caught my attention (colour emphasis mine):
“if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes“
That blew me away. Funnily enough, as an Oracle DBA, I’ve always had that reluctance on creating indexes on tables. I find it’s always a good thing to ask for justification – the aim always being to have “just the right amount” of indexes. What is that right amount ? Enough to meet the needs of the application and the customers, and not a single one more
It strikes me as that philosophy being really really important in a Postgres environment, because if the Uber statement is true, then indexes have the potential to have an enormous overhead on any transactional system.
That got me thinking – how does Oracle handle index updates?. I was pretty sure we only update indexes where the relevant columns are modified, but how could we construct a demo to show that. Here’s one such way. We’ll create a table with some indexes, and put one of those indexes into a read only tablespace. Updates to the indexes in normal tablespaces should go ahead without error, whilst those in the read only tablespace would be expected to return an error.
SQL> create table t ( pkcol int default seq.nextval, name varchar2(30), birthdate date ) tablespace USERS;
Table created.
SQL> insert into t (name,birthdate)
2 select object_name, created
3 from dba_objects;
96744 rows created.
SQL> create index ix1 on t ( pkcol ) tablespace USERS;
Index created.
SQL> create index ix2 on t ( name ) tablespace DEMO;
Index created.
SQL> create index ix3 on t ( birthdate ) tablespace USERS;
Index created.
SQL>
SQL> alter tablespace DEMO read only;
Tablespace altered.
SQL>
SQL> update t
2 set birthdate = birthdate + 10
3 where pkcol between 100 and 200;
101 rows updated.
SQL>
SQL> insert into t (birthdate)
2 values (sysdate);
1 row created.
SQL>
SQL>
SQL> insert into t (name,birthdate)
2 values ('Connor',sysdate);
insert into t (name,birthdate)
*
ERROR at line 1:
ORA-00372: file 9 cannot be modified at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\DEMO.DBF'
SQL>
SQL>
SQL>
As we can see, the Oracle database does not act in the same way as Uber has reported that Postgres does. We handle index updates as efficiently as we can.
” an update to a field that is only covered by a single index must be propagated into all 12 indexes“
If the field (column) is covered by a single index what is there to propagate to the other 11 indexes when the one column is updated?
The answer is in the Uber article.
“As we mentioned earlier, row tuples are immutable. Therefore, to update the record, we add a new tuple to the table”
Adding a row touches all columns which in turn affects all indexes.
Let me know if my interpretation is incorrect.
Why would updating a column in a row add a new tuple (row) to the table?
That would be because they never actually “update a column in a row” – they always insert a new row, and the database knows which version of each row to show to users.
It’s different to Oracle which copies old rows to undo segments before updating, so only the latest version of each row is in the table.
Hi Connor,
I read Uber’s post because of a tweet mentioning your post (that mentioned Uber’s post).
In any case, I found Uber’s post very interesting. I really didn’t expect the post to touch on very low level subjects that as physical storage and retrieval from the disk.
Going back to your post, that part about locking an index was something that I’ve never seen before. I mainly focus on MySQL databases, and that’s why I have this question for you: can you advise (if possible) how to lock an index in MySQL? This will be excellent for testing and optimization.