DEV Community

Discussion on: What's your worst technical debt story?

Collapse
 
cathodion profile image
Dustin King

Early 2000s: a program is written as a TCP-based server for a new-ish application-level protocol. It's a C++ program written as a prototype, runs on someone's desktop computer, and has a GUI (this is one program, not a GUI client for a non-GUI server).

2009: I'm hired to add some filtering features to this program. It's a mess. Within any function, there are no variable names. Everything's just n1, n2, s1, etc. Sometimes a single variable is reused for multiple purposes within a function. I later wonder if the research group that wrote it has been passing us an obfuscated version of the code. The filtering project goes fine, though there's not enough time to refactor the program as a whole. The research group (out of state) that wrote it is still trying to keep control of the codebase, and we have other projects, so there are only so many changes we can make.

It still has a GUI, but my team runs it on a server in an always-open user session.

2014-ish: Windows 2003 is going EOL, and support for the always-on user session for the GUI doesn't exist in Windows 2008. Because my team has a lot of other projects, the research team is tasked with the upgrade. Besides splitting the GUI and server portions of the program, they also undertake a major rewrite of the codebase using modern C++ practices (such as naming your variables), Boost, etc. They lose funding and are laid off halfway through, and we inherit the code.

We test the new version in our staging environment and put it in production. The new version has significant problems with crashing and data staleness in production, but we can't revert because Windows 2003 has gone away. I'm assigned back to the project to help fix it. The codebase is a mess of the old code (basically all of it) plus the new code (as a bridge between the GUI and server portions of the old code). I have an idea for a fix. Basically I make some things asynchronous that were synchronous (which adds complexity, so I don't envy whoever inherited after that). I think it will take a day. It takes a month. The fix seems to help. I later learn there's still crashing, but I hadn't been hearing about it because direct communication between dev and ops teams are no longer encouraged. (I would later have an epiphany: "They just made the queues too big!" but by this point I no longer work there.)

Meanwhile, someone on the local architecture team is allowed to start rewriting the server portion of the program (in C#), and it looks really promising. But then we get new security requirements, so all devs are tasked with remediating the flood of of scan findings for about 9 months, so the rewrite is put on the back burner. Shortly after the security work dries up I leave (as did a lot of devs after being given 6+ months of mostly sysadmin work), so I don't know if the rewrite was ever allowed to proceed.