I saw this post on Bluesky over the weekend.
Now before you panic, this blog post isn’t going to have any political discussion about DOGE, Trump, Republicans, Democrats and the like. I’m just going to focus on the topic of the challenges of modernising any system which has been in place for decades.
And the challenge is simple: It is really really really hard to do.
When you open up the code base for any system that has been in place for decades, your first expression is typically just plain good old fashioned horror. The code often looks like a garbled jumbled mess where, even if the structure of the code loosely modular, the contents of those modules are a bowl of noodles of switch/if-then/case statements with hundreds of exception cases floating around the core functionality.
And here is where we all make the same mistake – we think to ourselves:
“I reckon I could do a much better job with this code”
I got some bad news for you… You are 100% wrong 🙂
Old system are full of spaghetti code for three reasons
- The code was written by someone that was a novice or unskilled,
- The code was written under time pressure, for example, the top priority was getting a fix in place (typically this is due to an urgent production fix)
- The code was written to handle some special exceptional case that was not known during the original design/development.
But my postulate is that when inspecting the code after the fact, you cannot determine which of the three above scenarios was the root cause. With our inflated developer egos, we often assume that it was (1) above, and thus we could do a much better job just by refactoring the code. That can be a fatal assumption to make, because even the smallest of changes you make to “improve” the code might result in a (perhaps tiny) functionality change. If that functionality change suddenly violates the true reason for the way the code was the way it was, namely (2) or (3) above, then you have probably just broken your system.
Rebuilding or reimplementing or “modernising” an entire system, means tackling the above issue of refactoring code at scale, on every piece of code in your system.
It’s amazing to see the naivete out on social media when it comes to the size of this challenge. Here’s a sample of some of the replies I saw on various platforms
This is a common developer mindset: Migration = “We just need to rewrite the code”
If I had to guess a percentage metric, I would contend that migrating the code is less than 1% of the task of modernising a system. Let me put aside the enormous workload of:
- changes to the UI, even the physical devices that might present the UI (You might be going from mainframe terminal to GUI, mobile etc)
- changes to the transaction model,
- changes to the usage of the application(s),
- training all of the staff that use and administer the application(s),
- the need to migrate the historical data which might be riddled with inconsistencies
Even with all of that not taken into consideration, the biggest workload you will face is that your new code needs to be tested. For an older system, there is a very good chance no unit tests exist for your existing application. Unit testing, CI/CD, TDD etc are all relatively new styles of development (the term “new” here referring to the last couple of decades).
So now you have to write unit tests for every piece of functionality in the system. If you’re thinking “Oh, I’ll use an AI for that“, how does an AI know the expected outcomes? The only source of truth in terms of expected outcomes is in the original system itself, and it is even possible that some of those expected outcomes are actually incorrect – but are deemed “correct” by convention. Plenty of old systems return the “wrong” result based on their original design specification, but because the bug is found literally decades later, the result being returned is now so ingrained in people’s understanding of how the system works, that this is now deemed the “correct” behaviour.
Building test coverage for a system that is 20, 30 or 40 years old, is astoundingly hard to do, because as I mentioned above, those systems typically have thousands of “special case” exceptions to standard functionality baked into the code base.
Let me present an example from a customer I worked with before joining Oracle. They were a online betting organization, and I’ll keep things really simple by assuming the existing (mainframe COBOL + assembler) application simply dealt with a single horse race. Conceptually, the existing code base handled the following:
- people bet on a horse race,
- the race is run,
- the horses that finish 1st, 2nd and 3rd earn winnings
- those winnings are distributed to the people that bet on those horses.
Migrating that functionality to a new platform (a mix of C# and C++) was not difficult, and when we tested the scenario above, the new code matched the old code. But that is when you start learning the hard way 🙂 why existing code bases can look like spaghetti. Our code (thankfully not yet in production) started crashing or corrupting data because it was now encountering the experiences that the existing code base had faced over its 40 years of dealing with horse racing around the world. For example:
- If there was less than a certain number of horses in a race, a rule might be that only 1st and 2nd place get winnings. The code base had to handle that anomaly.
- What happens when there was a dead heat for 1st? Was the third horse deemed second or third? How do the winnings get distributed?
- What about triple dead heats? Yes, that can happen
- There were some horse races that didn’t even have 3 horses in total running, only 2.
- There were some horse races with ONE horse! In some countries, if all horses except one were scratched (ie, did not race) the lone horse still had to complete the course to get the winnings. Rest assured our new code totally blew a fuse when it hit a single horse race 🙂
- The fundamentals of arithmetic were different – you can’t split $1 across three horses with simple division, because paying $0.33 each leaves you with an “orphan” cent. There were actually legislative rules in government for betting organisations about how that last cent is accounted for, and who should get it. (It has to deterministic).
- If a horse started a race, but did not finish the race, then that might change whether the race would have 1st, 2nd and 3rd winnings to just 1st and 2nd due to the drop in total numbers.
This list of “special cases” went on and on and on. That is the reality of mature systems that evolve over time. They get littered with code changes, additions, and exceptions to handle the continuously changing nature of the functionality and data demands. And those changes are done by developers with different levels of skill, who might be under different time pressures, all of which yield a giant pile of spaghetti code which (and here is the key point) is working code!.
Now, to bring this discussion back to the original Bluesky post at the top of this blog, I invite you to apply that challenge to a trillion dollar system that pays millions of people, where perhaps a single incorrect or missing unit test in your new system yields a situation where payments are wrong or missing, the consequences of which could be people not being able to pay rent, or buy food for their family, or a myriad of other personal hardships.
Modernisation of systems can be a good thing, and many times a modernisation of an existing system is warranted if only to ensure that there remains a skillset of people to maintain/enhance it, but anyone that thinks it is a easy task is deluded.




Leave a reply to iudithd5bf8e4d8d Cancel reply