Shai Almog

Posted on Dec 23, 2021

End to end Debug Object Modeling. First hard problem in Building DDTJ - Day 4

#startup #opensource #java #programming

Yesterday I got the second PR out of the door (and the 3rd although that was automated). Today wasn't as productive...

Winter just started in full force around these parts, and yesterday, everything was flooded. This slowed down some of my work, especially after my son's kindergarten was flooded and he had to stay home. Still, I could make progress thanks to the fact that the code now runs end to end. That means I can start debugging the whole thing by using the command line and setting a breakpoint in the backend. Very convenient.

I can now finally start collecting data. I've also run into my first “hard” problem that I’ve been avoiding mentally for a while…

The Problem

Debugging the app is mostly trivial. Let’s say we invoke a method and get an event. Specifically, a MethodEntryEvent. In that event, we can grab the values of the arguments, etc. This is also the case later on when we invoke the APIs we need to mock. But that’s for later… Right now, let’s focus on that.

The debugger returns a somewhat problematic value type. We’ll need to pull all of that data locally (the debugger is a remote VM) to debug that. The core problem is with a deep object graph. E.g. let’s say I have a method such as:

void method(Root obj) {
    //...
}

Now Root contains a reference to every object in the system. If we want to invoke this method correctly in a future execution, do we need to pull Root and all its data?
That’s insane. There has to be a better way...

But that’s the smaller problem. Let’s say we have an object's data locally. How do we physically create an instance with the right values to pass to a method?
Do we use a constructor? A builder? Or is it all 100% mocked?

If it’s the latter, will the code be readable?

Will the test we generate pass?

I had given this a lot more thought in the past day, and I’m a bit conflicted here. An approach I sometimes take in these cases is to see these things all the way through, then refine the result to something I like. Sometimes the solutions reveal themselves as we get closer.

Here are a few ideas I had to solve the problem...

The “Insane” Idea

One idea I had is:

Watch every object creation and log that
If a new type of object is created or it’s created in a new way, we can connect the creation process to the fields
If the constructor is public, we can use that
If it’s a builder, we can follow the process from there

As I write this, the idea seems even crazier than it sounds in my head, but it might be doable. At least partially.

Not as Insane Idea - Heuristics

The insane idea is indeed a bit too much. But there are things we can do to reach a similar effect for most common code.

I’d say 98% of objects fall into pretty common patterns for creation and conventions. By just programming the most common heuristics, we can probably auto-generate 98% of the tests correctly and the last 2% well… That’s probably a tiny fix to make.

If a class has a setter matching the field name… Great problem solved
If the class doesn’t have a default constructor, but has a constructor that accepts parameters matching the fields. Pass the fields based on argument name or type if the name doesn’t match
If neither of those match check for a static method that returns the class instance
Finally, look for builder calls

If I choose to take this route, I’ll probably implement the first two for the MVP.

The Hack

Another approach is to skip this altogether and serialize the object. We can allocate and inject the fields. Unfortunately, I doubt the code would look great. I think that’s the main reason the ideas above are so appealing.

However, for pure data objects, this might not be a terrible idea.

The Obvious Idea (Mocking)

Obviously, mocking all objects that are passed in has its value. But I’m not sure if that’s what we would really like to do. Mocking code is pretty verbose. It also doesn’t increase the coverage of the mocked class.

Even if we go with the mocked approach, this still wouldn’t be trivial and can end up nesting a lot since the mocked object might need to return another (mocked) object and so forth. The nesting can become quite difficult.

What I’m Doing Right Now

I’m still at the early stage of collecting data, so I’m just collecting value object data into a simplified object. Essentially, a type of string and an object. If it's a primitive value, then it’s simple. I just store the value.
Initially, I tried to simplify the approach for an object. I thought that if it’s an object, I can create a Map with the fields of the object that aren’t transient. I’d have a nesting constant and a recursion blocker that would stop me from going too deep into an object hierarchy. This would have allowed me to detect a link to the same object and use the same reference internally to avoid a problem. That last part I might still need. I also have a plan to block recursion so it won’t go deeper than 3 levels by default.

But I don’t think that’s a workable solution. We’ll run into problems when we try to generate code based on that. I think we need to implement the “not as insane” idea and I think that’s a workable approach. To get this working, I need an additional data point. I need to cache the class objects that we can create and those that can’t. If we can create a class, we will mark it. Assuming there’s a process to create the class, we’ll know the fields that need saving.

If we can’t create a class, we might still have the option of generating a mock for that class, so a test might still be possible.

As I’m writing this, I’m also coding the logic which will be extensive. Tomorrow it might turn out that I implemented something completely different...

Increasing Coverage

I merged the PR I worked on yesterday with decent coverage. Turns out I was missing the lombok.config file. It isn’t essential, but you need to add the entry:

lombok.addLombokGeneratedAnnotation=true

Otherwise, code generated by Lombok isn’t marked as generated code and is included in the coverage statistics. I had 30% coverage with 11 lines uncovered... Adding this changed the dynamic to 83%.

Today

One thing that sucked at the end of day yesterday… Turns out I copy pasted a typo in the document title for the last couple of days and it said DDJT instead of DDTJ. Ugh. I can’t fix it. It’s in the URLs, it’s syndicated etc. Spell checkers should really check the titles, but to be fair, they don’t even check the acronyms at all.

Hopefully, today will be better. I plan to pull out the object state and fields as we step over the code. I was hoping for something running by the end of the week, but that will probably only happen next week.

If you find this interesting/useful it would be great if you follow me on twitter...

Forem

End to end Debug Object Modeling. First hard problem in Building DDTJ - Day 4

The Problem

The “Insane” Idea

Not as Insane Idea - Heuristics

The Hack

The Obvious Idea (Mocking)

What I’m Doing Right Now

Increasing Coverage

Today

Top comments (0)

Read next

Announcing RateMyOpenAPI!

jsonmergepatch.com: Free JSON Merge Patch Tool and API

Git and Github

Hacking the Python Import System and Rewriting the AST For Durable Execution