Just an addenda to what I posted last week about downloading the Formula 1 motor racing data where I provided a script that would
- download the ANSI SQL compressed script
- decompress it and convert the ANSI SQL population script to an Oracle version
- drop the existing tables in your schema
- build and populate new versions of tables
This is all well and good for exploring but what if you have built some research or educational apps on top of the data and you’d rather not blow the data away every time you wanted to update the data to show the current results.
There is a full set of REST APIs available on the site to achieve this, but to achieve a generic “What has changed?” (eg drivers, circuit details, constructors etc et) I figured I would end up smashing the site with lots of REST calls and the last thing anyone wants to be is that person that is killing a site due to load.
So I’ve a taken a different approach to refresh the data without dropping/recreating. In my github repos not you’ll see a few new scripts
- f1_delta.sh
- f1_delta.sed
- f1_delta_wrapper.sql
Like the original script, I’ll download the full ANSI population script, but this time populate a series of tables prefixed with “F1DELT” (for delta). Then using a little SQL magic, we interrogate the data dictionary to dynamically generate a series of PL/SQL anonymous blocks which contain MERGE statements to refresh the pre-existing F1-prefixed tables with any new data in the F1DELT-prefixed tables. I deliberately generate two MERGE statements to separate the UPDATE phase from the INSERT phase, mainly so that its easier to digest changes from new data. For example, for the race results, the anonymous block that is generated looks like this
declare
l_rows pls_integer;
begin
merge into F1_RESULTS w
using ( select
RESULTID
,RACEID
,DRIVERID
,CONSTRUCTORID
,ORDINAL
,GRID
,POSITION
,POSITIONTEXT
,POSITIONORDER
,POINTS
,LAPS
,TIME
,MILLISECONDS
,FASTESTLAP
,RANK
,FASTESTLAPTIME
,FASTESTLAPSPEED
,STATUSID
from F1DELT_RESULTS ) p
on ( p.RESULTID = w.RESULTID
)
when matched then
update set
w.RACEID = p.RACEID
, w.DRIVERID = p.DRIVERID
, w.CONSTRUCTORID = p.CONSTRUCTORID
, w.ORDINAL = p.ORDINAL
, w.GRID = p.GRID
, w.POSITION = p.POSITION
, w.POSITIONTEXT = p.POSITIONTEXT
, w.POSITIONORDER = p.POSITIONORDER
, w.POINTS = p.POINTS
, w.LAPS = p.LAPS
, w.TIME = p.TIME
, w.MILLISECONDS = p.MILLISECONDS
, w.FASTESTLAP = p.FASTESTLAP
, w.RANK = p.RANK
, w.FASTESTLAPTIME = p.FASTESTLAPTIME
, w.FASTESTLAPSPEED = p.FASTESTLAPSPEED
, w.STATUSID = p.STATUSID
where decode(w.RACEID ,p.RACEID,0,1) = 1
or decode(w.DRIVERID ,p.DRIVERID,0,1) = 1
or decode(w.CONSTRUCTORID ,p.CONSTRUCTORID,0,1) = 1
or decode(w.ORDINAL ,p.ORDINAL,0,1) = 1
or decode(w.GRID ,p.GRID,0,1) = 1
or decode(w.POSITION ,p.POSITION,0,1) = 1
or decode(w.POSITIONTEXT ,p.POSITIONTEXT,0,1) = 1
or decode(w.POSITIONORDER ,p.POSITIONORDER,0,1) = 1
or decode(w.POINTS ,p.POINTS,0,1) = 1
or decode(w.LAPS ,p.LAPS,0,1) = 1
or decode(w.TIME ,p.TIME,0,1) = 1
or decode(w.MILLISECONDS ,p.MILLISECONDS,0,1) = 1
or decode(w.FASTESTLAP ,p.FASTESTLAP,0,1) = 1
or decode(w.RANK ,p.RANK,0,1) = 1
or decode(w.FASTESTLAPTIME ,p.FASTESTLAPTIME,0,1) = 1
or decode(w.FASTESTLAPSPEED ,p.FASTESTLAPSPEED,0,1) = 1
or decode(w.STATUSID ,p.STATUSID,0,1) = 1;
l_rows := sql%rowcount;
dbms_output.put_line('F1_RESULTS Merge phase 1 complete, updated rows='||l_rows);
merge into F1_RESULTS w
using ( select
RESULTID
,RACEID
,DRIVERID
,CONSTRUCTORID
,ORDINAL
,GRID
,POSITION
,POSITIONTEXT
,POSITIONORDER
,POINTS
,LAPS
,TIME
,MILLISECONDS
,FASTESTLAP
,RANK
,FASTESTLAPTIME
,FASTESTLAPSPEED
,STATUSID
from F1DELT_RESULTS ) p
on ( p.RESULTID = w.RESULTID
)
when not matched then
insert
( w.RESULTID
, w.RACEID
, w.DRIVERID
, w.CONSTRUCTORID
, w.ORDINAL
, w.GRID
, w.POSITION
, w.POSITIONTEXT
, w.POSITIONORDER
, w.POINTS
, w.LAPS
, w.TIME
, w.MILLISECONDS
, w.FASTESTLAP
, w.RANK
, w.FASTESTLAPTIME
, w.FASTESTLAPSPEED
, w.STATUSID
)
values (p.RESULTID
,p.RACEID
,p.DRIVERID
,p.CONSTRUCTORID
,p.ORDINAL
,p.GRID
,p.POSITION
,p.POSITIONTEXT
,p.POSITIONORDER
,p.POINTS
,p.LAPS
,p.TIME
,p.MILLISECONDS
,p.FASTESTLAP
,p.RANK
,p.FASTESTLAPTIME
,p.FASTESTLAPSPEED
,p.STATUSID
);
l_rows := sql%rowcount;
dbms_output.put_line('F1_RESULTS Merge phase 2 complete, inserted rows='||l_rows);
end;
Thus when you run f1_delta_wrapper.sql, the final phase of the script will apply these PLSQL blocks to each of the tables resulting in an output like this below
F1_CIRCUITS Merge phase 1 complete, updated rows=0
F1_CIRCUITS Merge phase 2 complete, inserted rows=0
F1_CONSTRUCTORRESULTS Merge phase 1 complete, updated rows=0
F1_CONSTRUCTORRESULTS Merge phase 2 complete, inserted rows=10
F1_CONSTRUCTORS Merge phase 1 complete, updated rows=1
F1_CONSTRUCTORS Merge phase 2 complete, inserted rows=0
F1_CONSTRUCTORSTANDINGS Merge phase 1 complete, updated rows=0
F1_CONSTRUCTORSTANDINGS Merge phase 2 complete, inserted rows=10
F1_DRIVERS Merge phase 1 complete, updated rows=0
F1_DRIVERS Merge phase 2 complete, inserted rows=0
F1_DRIVERSTANDINGS Merge phase 1 complete, updated rows=0
F1_DRIVERSTANDINGS Merge phase 2 complete, inserted rows=21
F1_LAPTIMES Merge phase 1 complete, updated rows=0
F1_LAPTIMES Merge phase 2 complete, inserted rows=1259
F1_PITSTOPS Merge phase 1 complete, updated rows=0
F1_PITSTOPS Merge phase 2 complete, inserted rows=25
F1_QUALIFYING Merge phase 1 complete, updated rows=0
F1_QUALIFYING Merge phase 2 complete, inserted rows=20
F1_RACES Merge phase 1 complete, updated rows=0
F1_RACES Merge phase 2 complete, inserted rows=0
F1_RESULTS Merge phase 1 complete, updated rows=0
F1_RESULTS Merge phase 2 complete, inserted rows=20
F1_SEASONS Merge phase 1 complete, updated rows=0
F1_SEASONS Merge phase 2 complete, inserted rows=0
F1_STATUS Merge phase 1 complete, updated rows=0
F1_STATUS Merge phase 2 complete, inserted rows=0
Changes not committed yet - don't forget to COMMIT or ROLLBACK
Now you can keep your Formula 1 racing data up to date every time new data appears on the site.
Enjoy!