I just watched a video published 2 days ago: "5 Tips to Understand Legacy Code" by Jonathan Boccara:
I like it a lot, yet I feel that something is missing here. The author gives 5 tips for how can you start figuring out a large codebase that is new to you.
- "Find a stronghold" - a small portion of the code (even single line) that you understand perfectly and expand from there, look at the code around it.
- "Analyze stacks" - put a breakpoint to capture a representative moment in the execution of the program (with someone's help) and look at call stack to understand layers of the code.
- "Start from I/O" - analyze the code that processes data at the very beginning (input data, e.g. source file) or at the end (output data).
- "Decoupling" - learn by trying to do some refactoring of the code, especially to decouple some of its parts.
- "Padded-room" - find a code that doesn't depend on any other (has limited scope) and go from there.
These are all great advices. I agree with them. But computer programs have two aspects: dynamic (the way the code executes over time - algorithms, functions, control flow) and static (the way data are stored in memory - data structures). I have a feeling that these 5 points focus mostly on dynamic aspect, so as an advocate of "data-oriented design" I would add another point:
6. "Core data structure": Find structures, classes, variables, enums, and other definitions that describe the most important data that the program is operating on. For example, if it's a game engine, see how objects of a 3D scene are defined (some abstract base CGameObject class), how they can relate to each other (forming a tree structure or so-called scene graph), what properties do they have (name, ID, position, size, orientation, color), what kinds of them are available (mesh, camera, light). Or if that's a compiler, look for definition of Abstract Syntax Tree (AST) node and enum with list of all opcodes. Draw UML diagram that will show what data types are defined, what member variables do they contain and how do they relate to each other (inheritance, composition). After visualizing and understanding that, it will be much easier to analyze the dynamic aspect - code of algorithms that operate on this data. Together, they form the essence of the program. All the rest are just helpers.