I saw this post on Bluesky over the weekend.

Screenshot 2025-03-31 081000

Now before you panic, this blog post isn’t going to have any political discussion about DOGE, Trump, Republicans, Democrats and the like. I’m just going to focus on the topic of the challenges of modernising any system which has been in place for decades.

And the challenge is simple: It is really really really hard to do.

When you open up the code base for any system that has been in place for decades, your first expression is typically just plain good old fashioned horror. The code often looks like a garbled jumbled mess where, even if the structure of the code loosely modular, the contents of those modules are a bowl of noodles of switch/if-then/case statements with hundreds of exception cases floating around the core functionality.

And here is where we all make the same mistake – we think to ourselves:

“I reckon I could do a much better job with this code”

I got some bad news for you… You are 100% wrong 🙂

Old system are full of spaghetti code for three reasons

  1. The code was written by someone that was a novice or unskilled,
  2. The code was written under time pressure, for example, the top priority was getting a fix in place (typically this is due to an urgent production fix)
  3. The code was written to handle some special exceptional case that was not known during the original design/development.

But my postulate is that when inspecting the code after the fact, you cannot determine which of the three above scenarios was the root cause. With our inflated developer egos, we often assume that it was (1) above, and thus we could do a much better job just by refactoring the code. That can be a fatal assumption to make, because even the smallest of changes you make to “improve” the code might result in a (perhaps tiny) functionality change. If that functionality change suddenly violates the true reason for the way the code was the way it was, namely (2) or (3) above, then you have probably just broken your system.

Rebuilding or reimplementing or “modernising” an entire system, means tackling the above issue of refactoring code at scale, on every piece of code in your system.

It’s amazing to see the naivete out on social media when it comes to the size of this challenge. Here’s a sample of some of the replies I saw on various platforms

 

image

image

image

image

image

image

image

image

image

image

image

This is a common developer mindset: Migration = “We just need to rewrite the code”

If I had to guess a percentage metric, I would contend that migrating the code is less than 1% of the task of modernising a system. Let me put aside the enormous workload of:

  • changes to the UI, even the physical devices that might present the UI (You might be going from mainframe terminal to GUI, mobile etc)
  • changes to the transaction model,
  • changes to the usage of the application(s),
  • training all of the staff that use and administer the application(s),
  • the need to migrate the historical data which might be riddled with inconsistencies

Even with all of that not taken into consideration, the biggest workload you will face is that your new code needs to be tested. For an older system, there is a very good chance no unit tests exist for your existing application. Unit testing, CI/CD, TDD etc are all relatively new styles of development (the term “new” here referring to the last couple of decades).

So now you have to write unit tests for every piece of functionality in the system. If you’re thinking “Oh, I’ll use an AI for that“, how does an AI know the expected outcomes? The only source of truth in terms of expected outcomes is in the original system itself, and it is even possible that some of those expected outcomes are actually incorrect – but are deemed “correct” by convention. Plenty of old systems return the “wrong” result based on their original design specification, but because the bug is found literally decades later, the result being returned is now so ingrained in people’s understanding of how the system works, that this is now deemed the “correct” behaviour.

Building test coverage for a system that is 20, 30 or 40 years old, is astoundingly hard to do, because as I mentioned above, those systems typically have thousands of “special case” exceptions to standard functionality baked into the code base.

Let me present an example from a customer I worked with before joining Oracle. They were a online betting organization, and I’ll keep things really simple by assuming the existing (mainframe COBOL + assembler) application simply dealt with a single horse race. Conceptually, the existing code base handled the following:

  • people bet on a horse race,
  • the race is run,
  • the horses that finish 1st, 2nd and 3rd earn winnings
  • those winnings are distributed to the people that bet on those horses.

Migrating that functionality to a new platform (a mix of C# and C++) was not difficult, and when we tested the scenario above, the new code matched the old code. But that is when you start learning the hard way 🙂 why existing code bases can look like spaghetti. Our code (thankfully not yet in production) started crashing or corrupting data because it was now encountering the experiences that the existing code base had faced over its 40 years of dealing with horse racing around the world. For example:

  • If there was less than a certain number of horses in a race, a rule might be that only 1st and 2nd place get winnings.  The code base had to handle that anomaly.
  • What happens when there was a dead heat for 1st? Was the third horse deemed second or third? How do the winnings get distributed?
  • What about triple dead heats? Yes, that can happen
  • There were some horse races that didn’t even have 3 horses in total running, only 2.
  • There were some horse races with ONE horse! In some countries, if all horses except one were scratched (ie, did not race) the lone horse still had to complete the course to get the winnings. Rest assured our new code totally blew a fuse when it hit a single horse race 🙂
  • The fundamentals of arithmetic were different – you can’t split $1 across three horses with simple division, because paying $0.33 each leaves you with an “orphan” cent. There were actually legislative rules in government for betting organisations about how that last cent is accounted for, and who should get it.  (It has to deterministic).
  • If a horse started a race, but did not finish the race, then that might change whether the race would have 1st, 2nd and 3rd winnings to just 1st and 2nd due to the drop in total numbers.

This list of “special cases” went on and on and on. That is the reality of mature systems that evolve over time. They get littered with code changes, additions, and exceptions to handle the continuously changing nature of the functionality and data demands. And those changes are done by developers with different levels of skill, who might be under different time pressures, all of which yield a giant pile of spaghetti code which (and here is the key point) is working code!.

Now, to bring this discussion back to the original Bluesky post at the top of this blog, I invite you to apply that challenge to a trillion dollar system that pays millions of people, where perhaps a single incorrect or missing unit test in your new system yields a situation where payments are wrong or missing, the consequences of which could be people not being able to pay rent, or buy food for their family, or a myriad of other personal hardships.

Modernisation of systems can be a good thing, and many times a modernisation of an existing system is warranted if only to ensure that there remains a skillset of people to maintain/enhance it, but anyone that thinks it is a easy task is deluded.

4 responses to “Spaghetti code is Good code”

  1. iudithd5bf8e4d8d Avatar
    iudithd5bf8e4d8d

    Hi Connor,
    You are so, so right !
    All that changed during these last few years is that now, of course, “there is AI who can do it !”,
    while the previous mantra was “Oh, we have a tool that can do all this automatically !” … or … well … “the most” of it …

    I heard this so, so many times … and NOT just for migrating COBOL to some other tool, and not just because it was written in the “spaghetti” style … (by the way, yes, mostly because its author could not do it otherwise, because, of course, he/she was meant to become a high level manager and not stay a simple developer ) but also because someone, in his “highness”, just decided that a technology should just be replaced, without taking
    into account all the complexities of such a process …

    One such example was the (now already “classic”) migration from Oracle Forms to APEX … where each such discussion used to start with:
    well … we should start by counting “how many forms your application has ?” … to be able to evaluate the amount of work required …
    and I saw this repeated times and again, with slides showing how many projects were successful by doing this …
    No word about how complex those are, how do they integrate with the outside world’s architecture, a.s.o.

    I spent a lot of my career doing lots of migrations, from version to version and from one environment (and technology) to another,
    mostly not for a clearly defined benefit but for just being able “to preserve functionality” in a completely different architecture.
    The most difficult and challenging part of any project is to make it integrate smoothly with other projects in the enterprise,
    and many times you find yourself “fighting this battle” alone, because the managers only know at most about “migrating code”,
    (spaghetti or not), and nothing about the whole environment and whole architecture that they “manage” or were supposed to …

    By the way: sorry for my ignorance, but I don’t exactly know what is “DOGE” in the first picture …
    Anyway, the world has now “A” (one) KING … one that surely the “DOGE” (that of Venice, not that from the picture) had not dreamed of ..

    Cheers & Best Regards,
    Iudith Mentzel

  2. A lesson I learned long ago: Having a large brain capacity does not make one immune to acting stupidly.
    The idea of rewriting a large code base in months and expecting it to work is a classic example of Dunning Kruger Effect in action.

  3. My second job in IT was for Oracle in the UK, working on a project replacing an existing hospital computer system for a new one . I am pretty sure I was not taken on for my technical skills but because my first job in IT was working on the hospital system they were replacing. Hospital systems are really, really complex and I had not only worked on the code of that older system but I also trained staff how to user it ,so I knew some of the oddities and flaws of the system. THAT is what my new team wanted from me, some insight into what the system had to achieve which was not in the old & inaccurate specifications. Of course, I didn’t know all the oddities, probably not even 10% of them, but it helped. We had to find out the others by talking with a lot of end users, making a ton of mistakes, and testing.

    No AI knows how a hospital IT system works, it would utterly and dangerously fail.

  4. Old system are full of spaghetti code for three reasons…

    I would like to add number 4:

    The code was written by many different developers over many years, each having their own style and structure of coding.

    Much harder to analyze and find patterns within the code for AI

    /Jan (https://www.linkedin.com/in/jan-nilsson-26b07810/)

Got some thoughts? Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending