augusto kiniama rosa

Posted on Oct 21 • Originally published at Medium on Sep 19

Migration Series — Lesson 2: Unearthing the Truth: A Strategic Approach to Legacy System Assessment

#snowflake #dataarchitecture #datamigration #migration

Migration Series — Lesson 2: Unearthing the Truth: A Strategic Approach to Legacy System Assessment

The Need to Perform Upfront Deeper Legacy Systems Assessments

Photo by Harshil Gudka on Unsplash

The first step in any large-scale data migration is often the most contentious. Leaders face a difficult balance between the technical need for a deep, upfront analysis of the legacy system and the business’s demand to show progress and deliver value quickly. The traditional method, where a lengthy assessment is required before any development, is a trap. It often causes “analysis paralysis,” postponing the project before it even really starts.

A successful migration requires a new mindset. The initial assessment should not be a “frozen” waterfall stage but a dynamic, parallel activity. See the first article of the series here: Lesson 1.

The Parallel-Track Solution: Assess and Build Simultaneously

The most effective strategy is to perform a deep assessment in parallel with building the foundational elements of your new cloud platform, like Snowflake. This approach allows you to demonstrate progress on two fronts: you are de-risking the migration by understanding the legacy environment while simultaneously building the core infrastructure needed for the future state.

This foundational work is a critical part of a “smart start” and should include:

Defining items like security roles and access controls. In the Snowflake case, see this list to get you started.
Establishing CI/CD pipelines early to enable automation for your transformation workflows and Snowflake Account Object Management (security, warehouses, governance, and so on).
Specifying the base structures for your code deployment and testing frameworks.
Ingestion of raw data is required for this phase using an ELT approach.

By doing this in parallel, the assessment informs the build, and the build provides a tangible platform for the eventual migration.

Redefining “Assessment”: From Exhaustive Inventory to Strategic Survey

The approach also changes the nature of the assessment itself. The goal is no longer a traditional, exhaustive inventory of every single legacy asset. Instead, it becomes a “survey and risk assessment” activity designed to find out just enough to begin the migration intelligently. The purpose is to identify a valuable, technologically representative Minimum Viable Product (MVP) without boiling the ocean.

Your assessment should be surgically focused on the following:

Pick a thin slice : Go for one that aligns with a core stakeholder need, like, for example, a critical BI asset for Sales and metrics like Annual Contract Value, or any directly operational use cases like risk score reports. It would be best to pick something that does not touch everything in your ecosystem, maybe even focus on one or two source systems.
Inventory what matters: Instead of cataloging everything, using the thin slice, inventory everything that relates to it, dependencies, think systems
Map dependencies for the MVP: Conduct end-to-end dependency mapping specifically for the selected MVP candidate, not the entire legacy system.
Analyze complexity to de-risk: Analyze the complexity of your first target to ensure it is achievable and won’t become a nightmare to deliver.

This reflects a migration mindset rather than legacy thinking. It’s not focused on thoroughly assessing old programs' performance but on understanding their logic and risks sufficiently to proceed.

How LLMs Accelerate the Strategic Survey

In this series, I will discuss LLMs and their role in speeding up processes. For most of the migration, I use a prompt instructing the LLM that 75% of the work should be handled by the LLM itself, not just code that creates logic. Based on experience, most simple to intermediate systems can be analyzed by an LLM for logic and risk, offering a faster overview compared to a human.

Here are some things that you can do in this stage:

Rapid Complexity Analysis: You can feed legacy ETL scripts or database procedures into an LLM to get a quick analysis of their complexity. This helps you triage potential MVP candidates and avoid inadvertently selecting a “nightmare” target for your first phase.
Scoped Dependency Mapping: Instead of manually combing through code, you can use an LLM to analyze it for a specific business area. Ask it to identify source and target tables for a proposed MVP, speeding up dependency mapping for that part of the system. I used an LLM to analyze an SAP universe file and quickly identify the base tables, although you can review all pages and find similar information on page 415. I guarantee it will take longer to read that PDF than to have the LLM do it for you.
On-Demand Documentation: LLMs can instantly generate plain-language explanations of legacy code. This allows your team to quickly understand the business logic within a potential MVP without spending months on traditional documentation efforts.

By combining a parallel-track methodology with the accelerating power of LLMs, you can satisfy the dual mandate of any migration: building a robust future platform while delivering tangible business value from the very beginning.

Here is a sample prompt that analyses SAP BI Universes without using any software:

You are an expert SAP/BO universe analyst and dbt/Snowflake data modeler on macOS. Repeat the following end-to-end workflow for a SAP universe and produce all deliverables exactly as specified.

Context
- OS: macOS
- Repo (working): /Users/augustorosa/Documents/git/sap-universe-to-dbt
- Universe file: sample/RW_SalesAndOperations.unv
- Target Snowflake source schema name: sap_prd

Objectives
1) Analyze the .unv and extract SQL/table metadata
   - Detect that .unv is a zip; list and extract contents
   - Parse UNW_Storage/Tables, Joins, Objects; find SQL-like text (SELECTs, joins, @Prompt, AFS/"...").
   - Build a complete inventory of referenced base objects: schema.object (e.g., prd.VBAK, prd."/AFS/MARM", Z* views/tables); include AFS SUB_* list.

2) Produce CSV inventories
   - outputs/base_tables_full.csv: schema,object including prd.* plus any unqualified table names found in Objects
   - outputs/base_tables.csv: simplified schema,object unique base list (optional)
   - outputs/base_tables_universe_only.csv: universe-only objects not found in the PDF (see step 4)

3) ERD
   - Create two ERDs (Mermaid erDiagram):
     a) Core OTC: VBAK, VBAP, VBEP, LIKP, LIPS, VBRK, VBRP, VBPA, KNA1 (+ keys/PK/FK, cardinalities)
     b) Master/pricing/Z-views/AFS: MARA, MAKT, MVKE, ADRC, T005T, TVM*, TVV*, A904/KONP, Z* views; show N:1 text/pricing relationships and links to MARA
   - Use clear PK/FK attributes, 1:N, N:M, and notes for VGBEL/VGPOS relationships.

4) PDF comparison
   - Convert sample/Sales and Operations.pdf to text; extract prd.* tokens
   - Compute overlaps and deltas vs universe inventory
   - Write universe-only list (step 2c)

Deliverables
- CSVs:
  - outputs/base_tables_full.csv
  - outputs/base_tables.csv (optional simple)
  - outputs/base_tables_universe_only.csv
- ERDs (Mermaid):
  - Core OTC diagram
  - Master/pricing/Z-views/AFS diagram

Constraints & Checks
- Use exact quoting for AFS names ("/AFS/…"); alias to safe snake_case in raw staging only.
- Keep ERD readable: split into 2+ diagrams to avoid overcrowding; include cardinalities and key fields.

At the end
- Print a short summary: counts of tables found, AFS SUB_* list, paths of produced CSVs, and created dbt files.

This is an illustrative prompt, not saying it is the best, but the point is that up until now, someone would have to code software to analyze existing environments or do it manually. LLM help facilitate at least a portion of the work.

Let’s also remember that Snowflake offers access to the top models on its platform, and you can perform this analysis directly within Snowflake. I plan to cover this aspect in another article, including how to use the Snowflake Cortex platform for migrations.

Conclusion

A strategic legacy system assessment isn’t about stalling progress with endless analysis. It’s about finding the right balance — learning just enough to move forward intelligently while laying down the foundations of your future platform. In Lesson 1: The Power of a Strategic MVP, I emphasized the importance of carving out a first slice of value early in the migration. Lesson 2 builds on that idea: your assessment should be focused, parallel to the build, and accelerated by modern tools like LLMs.

This shift in mindset — from exhaustive cataloging to targeted surveying — helps you reduce risk, prevent analysis paralysis, and demonstrate progress simultaneously. It creates the foundation for migrations that are quicker, smarter, and more aligned with business goals.

In the next article, Migration Series — Lesson 3: Unearthing the Logic: A Realistic Guide to Business Logic Translation, I’ll dive into the next major challenge: how to extract, understand, and translate business logic from legacy systems into your new environment without losing meaning or control.

I am Augusto Rosa, a Snowflake Data Superhero and Snowflake SME. You can follow me on LinkedIn.

Subscribe to my Medium blog https://blog.augustorosa.com for the most interesting Data Engineering and Snowflake news.

DEV Community

Migration Series — Lesson 2: Unearthing the Truth: A Strategic Approach to Legacy System Assessment

Migration Series — Lesson 2: Unearthing the Truth: A Strategic Approach to Legacy System Assessment

The Need to Perform Upfront Deeper Legacy Systems Assessments

The Parallel-Track Solution: Assess and Build Simultaneously

Redefining “Assessment”: From Exhaustive Inventory to Strategic Survey

How LLMs Accelerate the Strategic Survey

Conclusion

Top comments (0)