Sam Wiley

Posted on Feb 23 • Edited on Feb 27

Strike While the (Big) Iron's Hot

#mainframe #modernization #cobol #strategy

Making Legacy System Transformation Simple, Fast, and Safe

Summary

Mainframe modernization projects often fail due to ever-climbing development costs and ever-receding delivery dates. These failures stem from underestimating complexity and attempting ambitious transformation before laying the groundwork for safe, efficient change.

Modernization can be efficient, effective, and safe with the right strategy and some good tools.

I propose a three-phase approach that prioritizes software malleability while minimizing risk and controlling scope:

Rewrite: Translate COBOL to portable, unit-tested Java
Replatform: Move Java code from z/OS to the cloud
Rearchitect: Refactor and reshape for improved maintainability and new capabilities

By proceeding through these phases in tightly scoped iterations, modernization can make steady progress while maintaining system stability, enabling gradual skill and knowledge transition, and avoiding the trap of ever-expanding systems and costs.

Bottom Line Up Front

Modernization projects fail by underestimating the challenge posed by legacy systems whose defining characteristics are being difficult to maintain and test. Success comes from making the software easier to change before attempting transformation. In other words: strike only after you’ve made the iron hot.

The Problem: Why Modernization Projects Fail

Challenges of Legacy Systems

Legacy COBOL applications are hard to change. Planning new development for these applications invariably relies on the expertise of the humans who have been maintaining their particular system for decades. You are unlikely to find anything resembling a library of unit tests that can be quickly run at the press of a button, a CI/CD pipeline with quality checks and smoke tests, or any other automated safety net system that might prevent a naive developer from making an innocent-looking change with unintended consequences.

Compounding the challenge of having no software-based safety net, legacy systems:

Are tightly coupled monoliths where most components are multi-purpose and therefore cannot be easily isolated or extricated (or fully understood).
Operate with zero fault tolerance since these systems process transactions that can have life-or-death consequences (Medicare, social security, unemployment, etc.).
Are extremely hardware efficient - having evolved with IBM hardware over decades - running in carefully right-sized on-prem data centers, leaving very little performance budget for inefficient changes.
They remain under the continuous flux of regulatory changes and maintenance updates meaning there’s no “freeze period” possible for shepherding major transformations.

Given that all of these challenges can be present at once, it’s easy to see why development teams who aren’t familiar with these challenges can underestimate the difficulty of transforming legacy systems.

The Modern Development Trap

All too often, I’ve seen software development teams enter a legacy modernization space and mistakenly assume that the waterfall change management is only still in place because nobody has thought to go faster and adopt Scrum yet. Or the design of the legacy system is simply incidental to coming from a time where developers didn’t know any better. Or - worst of all - they use the metaphor of archaeology to describe their foray into understanding the legacy system as “discovery” as if the scores of people who are actively maintaining the system don’t exist.

It’s easy to see how a team that is unfamiliar with the challenges of working with a legacy system could simply chalk up the technology, design, and change process to a lack of reimagination. In my experience, these teams are often full of bright and capable engineers who are no slouches and who have done wonders in spaces with modern technology stacks. However, they quickly find themselves in a morass of unexpected constraints, relying on far more integrations with the legacy systems than they had originally imagined. This leaves them struggling to demonstrate meaningful progress in 2-week increments as their Scrum framework quickly becomes little more than a transparency mechanism rather than a process framework for facilitating truly agile development.

Every thread they pull seems to threaten to unravel the entire sweater. Every goal or deadline set seems to soften or recede in the face of late-discovered constraints. In these moments, where bona fide top-heavy (and justifiably expensive) development teams are delving into suspiciously low-level technical details to fill up the scheduled time of their sprint demos, project managers’ alarm bells should start to ring. They should begin to ask themselves some hard questions about return on investment.

Brief Thought Experiment

For a moment, forget the mainframe aspect of this scenario. Just imagine that you have inherited a large software system - millions of lines of code - and it is your sole responsibility for the next twenty years with no alternative jobs or responsibilities to consider.

The first change request arrives in your inbox describing changes to processing rules, necessitating the expansion of some data models, and the assumption that you'll figure out what this means for the auditing and reporting subsystems, and so on.

You dive into the code base and quickly realize that this multi-million-line monolithic Rube-Goldberg machine has no tests. The realization that tweaking the conditional logic around a cryptically named flag that also appears several other places in the sprawling pipeline sends a shiver down your spine about unanticipated side effects with no safety net.

If you're like most engineers I know, your first instinct should be to write some tests before you start tweaking things because you know that unintended side-effects can divert millions of life-saving dollars in the wrong direction. If your thoughts are in this neighborhood, then the suggestions I'm about to make should sound like common sense.

The Solution: A Three-Phase Strategy

Successfully modernizing complex legacy systems requires laying the groundwork for making transformative changes efficient and effective. I propose a three-phase approach, each with specific goals, outputs, and success criteria, to reliably and iteratively make these systems easier to change.

Phase 1: Rewrite (COBOL to Java)

Goal: Transform COBOL programs into portable, thoroughly tested Java code.

To be clear, I’m not talking about the vast marketplace of lousy JOBOL transpilers, especially now that LLMs are starting to show more promise.

In this phase, making function-and-interface-equivalent program replacements with platform abstraction and unit tests is the first and most important step towards software agility. Constraining these rewrites for function and interface equivalency protects the work from scope creep while enabling the safety net of production parallel testing. This ensures a straightforward process that developers can learn and accelerate at while sidelining sources of scope creep.

Key Benefits

Test Coverage: Unit test coverage is key for dramatically accelerating change verification, making it safer to deploy changes quickly.
Portability: Abstraction layers for platform-specific code ensure that the code and its tests can be run anywhere, accelerating the development cycle and avoiding the need to rewrite business rules and tests when migrating off z/OS.
Safety through Parallel Testing: Rewriting code with identical function and interface allows for a repeatable testing safety net that will avoid accidental production changes which can damage peoples’ lives and program credibility.

Abstraction for portability is the key technical design aspect. It allows your newly written Java code to run on any platform for which you have implementing classes - including your tests! COBOL code that normally only compiles, runs, and tests on z/OS can now be developed and tested on developers’ local machines, on z/OS, or in the cloud as part of a CI/CD pipeline running tests as a quality gate or handling a production workload.

This first step is by far the most important. I should note that going program-for-program is a guideline meant to enable quick re-integration of rewritten software components, keeping dual-maintenance costs low. However, you shouldn't twist yourself into technical pretzels by taking this guidance as law. Especially with recent LLM improvements enabling much faster rewrites, you should trust your judgment on what feels like right-sized bites of work. Defining bites of work along interface boundaries that you can keep stable for enabling production parallel testing while implementing platform abstractions is the key.

Phase 2: Replatform (z/OS to Cloud)

Goal: Move tested, portable Java code from z/OS to cloud infrastructure.

“Cloud migration” can sound like the be-all-end-all finish line for a modernization project, so extra caution and awareness of the tradeoffs is warranted here.

Migrating to the cloud can unlock myriad benefits:

Flexibility of pay-as-you-go hardware resources. Reliability of configured redundancy that can be spun up at multiple data centers across the world for disaster recovery.
Ease of deployment in a pipeline that is separate from the mainframe’s deployment, even if some releases still need coordination.
Access to the vast marketplace of productivity tools for developers and program and data analytics tools to give insights to business owners.

Hybrid-platform applications also have several challenges to be aware of:

Complexity from fragmenting an application across multiple platforms makes a system harder to understand and troubleshoot.
Latency from substituting a sub-millisecond local program call to a 50ms API call at a significant scale has the potential to threaten batch cycle timing requirements.
Cost, since new cloud infrastructure will cost money but will not immediately enable you to reduce the costs of your on-premesis data center. New infrastructure will also require you to bring more engineers with more skillsets into the broader program team to facilitate.

For this stage, I recommend only migrating substantially large translated functional components of your application. Considering data dependencies and locality, whether your application can afford the runtime degradation, and how many times you want to setup temporary cross-platform connective tissue are all relevant factors in deciding how much code to replatform at once.

Be mindful of how easy your system is to understand, maintain, and troubleshoot as a whole. The total size of your system, the variety of its tech stack, and any temporary software whose purpose is multi-language or multi-platform integration should all be considered sources of cost, drag, and risk. Keeping modernization work simple and lean is the key to building and maintaining project speed.

Phase 3: Refactor, Rearchitect (Monolith to Modular)

Goal: Reshape your system as you see fit, either for more maintainability benefits or begin to explore new feature development.

Now that your application is much easier to change with modern testing capabilities, and much easier to deploy with modern CI/CD, you stand a much better chance at implementing more substantial system transformations. Since the Rewrite phase was constrained against any real redesign, your software team will likely have some ideas about ways that refactoring and rearchitecting could simplify your software, improve its testability, or yield other benefits.

Whether you want to continue investing in maintainability or if you would rather start filling up a backlog of new features and user experience improvements, you now stand a much better chance than before you laid the technical groundwork for making change as safe and as easy as possible.

In Closing: Simplicity Enables Speed

Mainframe legacy applications are large, complex, and mission-critical. Their function and structure are deeply intertwined, with each component having evolved into multipurpose. Attempting to transform these systems before taking the time to fully understand them and building a safety net for changes is a misadventure where every step is on a rake.

“Strike while the iron’s hot” is common sense, and it’s the perfect analogy for the strategy I recommend. Immediately diving into transformative change for legacy COBOL systems is like trying to strike while the iron is cold. You should invest time in making your system easier to change before you try to radically transform it.

Rewriting COBOL into portable, unit-tested Java, while adhering to functional and interface equivalence (both to protect from scope creep and enable parallel production testing) is the most efficient path to software malleability. This critical first step of moving from COBOL to a more productive language can either be the easiest step if you focus on it exclusively, or you can ignore the defining challenge of these systems and let this task be the constant burden that makes the difference between sink or swim.

Honorable Not-Mentioneds

There are a lot of topics omitted from this paper: data modernization, interface modernization, the set of software utilities that enable COBOL/Java integration to help you hit the ground running, and so much more on project planning, management, and success metrics.

I may write about these topics if there’s enough interest, but I’m always happy to chat about this and more for those who are interested.

Reach Out

If you like the sound of this strategy and you'd like some help applying it to your particular project, I would like to help. I can be reached at:

About the Author

Sam Wiley is a software engineer and Boring Technology advocate specializing in mainframe modernization. He has hands-on experience as a lead developer on a modernization team with accomplishments ranging from building new capabilities from knowledge buried in IBM redbooks to building and overseeing successful production modernization deployments of software that processes billions of dollars annually. He has also acted as a technical advisor coordinating modernization work across multidisciplinary teams, planning and overseeing a multitude of projects. His approach to modernization emphasizes pragmatic, risk- and cost-minimized transformation strategies that balance technical innovation with operational stability.

DEV Community