A while back, I made another appearance on the Snack Overflow podcast, this time talking about my work with legacy code: Snack Overflow: 115. Amani älskar legacykod!.
TL;DL - in this episode, I talk about some of the things I’ve worked on in the past 2 years, much of which is taking over ownership of and rebuilding legacy solutions, and everything that that entails. I describe the process of doing a handover of a complex service from one team to another, and we talk a bit about whether we can do anything to prevent the problem of overly complex legacy systems.
The Problem in a Nutshell
The types of legacy systems I often see and need to work on are really the result of many generations of teams and individual persons owning and working on a system, none of which are still around to tell the story. I think this is pretty normal in larger software developing organizations, especially when working on distributed systems like microservices.
When migrating something from one team to another, we actually want to read as little code as possible - we just want to make sure that it works (the same berfore and after mirgation). What usually follows one of these migrations, is then new feature development. Now we have to go one step further and read quite a lot of code to understand how something works, and even more to understand how something is intended to work. So we really need to make sure we are being efficient.
Cleanup
One of the worst parts of legacy systems is dead code. And an extension of that, live code which isn’t actually in use.
- Developers (junior perhaps more than senior), have a tendency to copy/paste whole repos just to get a new repo kickstarted - sometimes this means introducing whole pieces of a service which technically work, but are not used.
- Developers also have a tendency not to delete code. Part of the reason why someone would be afraid of deleting code is simply not knowing how to identify that something isn’t in use. Since just leaving code is very unlikely to break anything, a lot of developers chose this option.
Both of the above are really terrible for that future maintainer of a repo. You have to be really careful not to waste time interpreting code that it turns out hasn’t been executed in the past 2 years and effectively is dead. So one of the things I might prioritize before adding features to a legacy system, is try to remove all the live but unused code.
I can generally do the above by (programmatically) looking at metrics and logs (including access logs) of different resources. If there’s no traffic, I’d try to identify where that resource is created and delete it. If I’m lucky, that also allows me to delete a bunch of code directly associated with it. Each time some piece of code, like a function is identified as dead, I’d be able to deduce that all the consumers of that function also are dead, and work my way up all the call stack, ruthlessly deleting.
Prevention
So if the problem is partially caused by multiple people and multiple teams creating and maintaining a service in the past, how do I prevent the problem from repeating in the future? To be honest, I hadn’t really thought much about that at the time of recording the podcast; but I do believe I work a lot in a way that mitigates the problems of legacy code that I’ve seen.
Some examples that I bring up in the podcast:
- Any sufficiently complicated problem that I run into when working with a legacy system, I generally have to document for myself. This usually involves drawing an architecture diagram of the current state (ideally programatically, but generally manually and focused on the relevant parts). This helps me share and sanity check a solution proposal with the team I’m in, but most importantly, the information is there for myself now and a maintainer in the future.
- The teams I’m in naturally grow better understandings of the systems that we work on by means of code review. This one might be obvious, but a code review allows us not only to share knowledge (and in doing so break down silos of information), but it also allows us to question decisions on a lower level. This somewhat simple extra step allows us to basically step into the shoes of that future maintainer that might have to read that piece of code again. If the intent and the solution isn’t clear to the reviewer today, it definitely won’t be clear to the reviewer 7 years in the future with nobody to ask for clarification.
- KISS. We talk about the tropes of under- and overengineering. I conclude that I more often see underengineering (quick, not necessarily performant hacks that work) in the low-level, but overengineering (very complicated solutions to problems that at the very least can be solved more easily today) on architecture/infrastructure level. Quick hacks are fine, especially if ther’re POCs or truly temporary solutions, but at some point you have to put in some work to refine and clean up that technical debt. And when I’m not creating a solution from scratch, but rather maintaining something existing, this is where I’d make note of issues that I detect but don’t have bandwidth to fix right away.