I tweeted this snippet the other day because to be honest I was just overjoyed about being able to get the travel laptop out of its case and fire it up again.
In every virtual Office Hours session and every virtual event I’ve done in the past two years, I’ve urged people to take whatever steps are appropriate in their local city/country/etc to get past this pandemic, because I’ve always believed the true value from a user community comes from being face to face.
And soon, I’ll finally be able to get back to liaising with the Oracle community at a face to face event, which fills me with excitement, but of course the first thing I had to do was bring the lightweight travel laptop up to date with more recent versions of the database, virtualbox, SQL Developer, APEX, ORDS etc. When I’m at home, I do most of my database work for AskTom on our cloud databases, but when one is travelling, I prefer to have local software running rather than take the risk with hotel or conference wifi.
But as luck would have it, my laptop wasn’t too happy about being switched off for a couple of years and it reported some disk issues once I’d had it booted up for a while. Its an oldish laptop and I suspect the SSD has done its fair share of reads and write. A “Disk Repair” under Windows said that it had corrected any issues, and all seemed well until one of my database wasn’t happy when I tried to query some tables.
Of course, I should have paused and started documenting the steps I took to sort out this mess to make a blog post out of it, but truth be told, I was in a bit of a panic (even though I know I have full disk backups floating around) because I didn’t want to lose the time that I’d already spent freshening up the software.
So I can’t post screenshots/code of the actual fixes I did, but now that it is all “after the fact”, I can replicate what I did via a more controlled experiment. The reason I wanted to blog about this was (being a dinosaur who has been burnt by block corruption in the past) I was absolutely blown away by how easy the task of recovering from one has now become. The stuff of nightmares is now just a few commands and you’re back in business.
Here’s a demo of block recovery done from a deliberate corruption. First I’ll create a fresh tablespace and put a table T in there being multiple copies of SCOTT.EMP
SQL> create tablespace DEMO datafile 'X:\ORACLE\ORADATA\DB19S\DEMO.DBF' size 10m;
Tablespace created.
SQL> create table t tablespace demo as select * from scott.emp;
Table created.
SQL> insert into t select * from t;
14 rows created.
(repeated that lots of times)
I’ll back it up using rolling incremental copy. (I’ve chosen this mechanism because this demo also served to help an AskTom customer with their question), and just to make sure the rolling backups do not impact my ability to recover, I’ve rolled it forward incrementally as well.
RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'mydb' DATABASE FORMAT 'X:\oracle\bkp\db19s\%U';
Starting backup at 23-MAR-22
using channel ORA_DISK_1
no parent backup or copy of datafile 1 found
no parent backup or copy of datafile 3 found
no parent backup or copy of datafile 4 found
no parent backup or copy of datafile 7 found
no parent backup or copy of datafile 5 found
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=X:\ORACLE\ORADATA\DB19S\SYSTEM01.DBF
output file name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-SYSTEM_FNO-1_1Q0P479L tag=MYDB RECID=37 STAMP=1100094783
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=X:\ORACLE\ORADATA\DB19S\SYSAUX01.DBF
output file name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-SYSAUX_FNO-3_1R0P47A4 tag=MYDB RECID=38 STAMP=1100094793
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=X:\ORACLE\ORADATA\DB19S\UNDOTBS01.DBF
output file name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-UNDOTBS1_FNO-4_1S0P47AB tag=MYDB RECID=39 STAMP=1100094800
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=X:\ORACLE\ORADATA\DB19S\USERS01.DBF
output file name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-USERS_FNO-7_1T0P47AI tag=MYDB RECID=40 STAMP=1100094803
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting datafile copy
input datafile file number=00005 name=X:\ORACLE\ORADATA\DB19S\DEMO.DBF
output file name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-DEMO_FNO-5_1U0P47AJ tag=MYDB RECID=41 STAMP=1100094803
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 23-MAR-22
Starting Control File and SPFILE Autobackup at 23-MAR-22
piece handle=C:\ORACLE\PRODUCT\19\DATABASE\C-2801206271-20220323-0F comment=NONE
Finished Control File and SPFILE Autobackup at 23-MAR-22
RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'mydb' DATABASE FORMAT 'X:\oracle\bkp\db19s\%U';
Starting backup at 23-MAR-22
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=X:\ORACLE\ORADATA\DB19S\SYSTEM01.DBF
input datafile file number=00003 name=X:\ORACLE\ORADATA\DB19S\SYSAUX01.DBF
input datafile file number=00004 name=X:\ORACLE\ORADATA\DB19S\UNDOTBS01.DBF
input datafile file number=00007 name=X:\ORACLE\ORADATA\DB19S\USERS01.DBF
input datafile file number=00005 name=X:\ORACLE\ORADATA\DB19S\DEMO.DBF
channel ORA_DISK_1: starting piece 1 at 23-MAR-22
channel ORA_DISK_1: finished piece 1 at 23-MAR-22
piece handle=X:\ORACLE\BKP\DB19S\200P47AU_1_1 tag=MYDB comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 23-MAR-22
Starting Control File and SPFILE Autobackup at 23-MAR-22
piece handle=C:\ORACLE\PRODUCT\19\DATABASE\C-2801206271-20220323-10 comment=NONE
Finished Control File and SPFILE Autobackup at 23-MAR-22
RMAN> RECOVER COPY OF DATABASE WITH TAG 'mydb';
Starting recover at 23-MAR-22
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile copies to recover
recovering datafile copy file number=00001 name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-SYSTEM_FNO-1_1Q0P479L
recovering datafile copy file number=00003 name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-SYSAUX_FNO-3_1R0P47A4
recovering datafile copy file number=00004 name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-UNDOTBS1_FNO-4_1S0P47AB
recovering datafile copy file number=00005 name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-DEMO_FNO-5_1U0P47AJ
recovering datafile copy file number=00007 name=C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-USERS_FNO-7_1T0P47AI
channel ORA_DISK_1: reading from backup piece X:\ORACLE\BKP\DB19S\200P47AU_1_1
channel ORA_DISK_1: piece handle=X:\ORACLE\BKP\DB19S\200P47AU_1_1 tag=MYDB
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished recover at 23-MAR-22
Starting Control File and SPFILE Autobackup at 23-MAR-22
piece handle=C:\ORACLE\PRODUCT\19\DATABASE\C-2801206271-20220323-11 comment=NONE
Finished Control File and SPFILE Autobackup at 23-MAR-22
Now its time to corrupt a block. I’ll close the database and I corrupted a block by slamming some junk in there. In this case, the “junk” is a block of redo which I’m pretty sure is not going to be treated as a valid data block . It’s easy to manipulate things are block boundaries just by using DD with a block size that matches your database block size.
X:\oracle\oradata\DB19S>dd if=DEMO.DBF bs=8192 count=140 > DEMO2.DBF
140+0 records in
140+0 records out
X:\oracle\oradata\DB19S>dd if=REDO01.LOG bs=8192 count=1 >> DEMO2.DBF
1+0 records in
1+0 records out
X:\oracle\oradata\DB19S>dd if=DEMO.DBF bs=8192 skip=141 >> DEMO2.DBF
1140+0 records in
1140+0 records out
X:\oracle\oradata\DB19S>move DEMO2.DBF DEMO.DBF
1 file(s) moved.
Now that the block is toast, lets start the database and run a query
SQL> select max(ename) from t;
select max(ename) from t
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 140)
ORA-01110: data file 5: 'X:\ORACLE\ORADATA\DB19S\DEMO.DBF'
As we can see, block 140 no longer reflects a valid block in my table T and the database has detected this. The database lists the invalid block in V$DATABASE_BLOCK_CORRUPTION
SQL> SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO CON_ID
---------- ---------- ---------- ------------------ --------- ----------
5 140 1 0 CORRUPT 0
and that is pretty much all we need to ask RMAN to sort out the mess.
RMAN> RECOVER CORRUPTION LIST;
Starting recover at 23-MAR-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1227 device type=DISK
channel ORA_DISK_1: restoring block(s) from datafile copy C:\ORACLE\PRODUCT\19\DATABASE\DATA_D-DB19S_I-2801206271_TS-DEMO_FNO-5_1U0P47AJ
starting media recovery
media recovery complete, elapsed time: 00:00:03
Finished recover at 23-MAR-22
and voila…we’re good to go
SQL> select max(ename) from t;
MAX(ENAME)
----------
WARD
Bottom Line: I’ll still be grabbing a new SSD just to make sure, but its nice to know that the days of block level recovery being a complex labour intensive task with auxiliary instances, and datapumps etc etc are a thing of the past.