Hunting for a Bug in a Prototype Project

#miniscript #miniscript2 #troubleshooting #bug

Every now and then, you have to really wonder how you ended up in a situation where you're hunting for a bug that seems impossible to be happening.

I had decided to submit some pull requests to the prototype project that Joe Strout is working on for MiniScript 2.0. Among the components of the prototype are an assembly language, an assembler, and a VM to run the assembled byte code. (The latest benchmarks are very promising, by the way, indicating that it is very much possible to get MiniScript into a new era of high speed scripting.)

Now I already had added a sample factorial assembly program, as well as support in both the assembler and VM for MULT and DIV opcodes. I thought I was getting a quick sense for how the prototype was designed, and what I would need to do to add anything further. Redspark, from the MiniScript Discord, began adding support for more comparison opcodes to the VM. Today, I added support in the assembler for these same comparison opcodes, but in doing so, this is where I came across the unexpected bug.

I wrote an assembly program to test the opcodes:

# A program that tests the comparison operators.
#  0 -> Working
# -1 -> Not working

@main:
    LOAD r0, 0
    LOAD r1, 1

    IFLT r1, r0, error # Testing <
    IFEQ r0, r1, error # Testing ==
    IFNE r0, r0, error # Testing !=
    IFLE r1, r0, error # Testing <=
    RETURN
error:
    LOAD r0, -1
    RETURN

This program should return 0, but it was returning -1, and only due to the line with IFEQ. The new IFEQ opcode was supposed to test if two registers are equal, and if so, jump to the given label. In this case, it was jumping when the results were clearly NOT equal.

Where to start?

Well, the assembly program was very simple, and it was in fact correct.

I checked the code changes I made to the assembler and couldn't find anything wrong. I checked RedSpark's changes to the VM, and I couldn't find anything wrong there either. I began adding some debug print statements -- quick-and-dirty bug fixing isn't pretty and certainly not proper, but it seemed overkill to start figuring things out with a debugger at this point.

Well, this wasn't right.

Why was it executing IFLT instead of IFEQ? I saw two main possibilities. Either:

the assembler was translating the wrong opcode; or else
the VM was executing the wrong opcode.

Again, I went over both my changes and RedSpark's changes, but I still couldn't find anything wrong. I needed more information.

Well, this was even more puzzling: the opcodes being produced by my code in the assembler were correct, but they were not matching what the VM was executing. I added even more debug printing in the assembler, and turned on the existing debug variable in the VM to get even more output:

I commented out the offending line in the assembly code, and saw the above output. My code was emitting opcodes 7, 9, and 10 correctly, but they were somehow getting to the VM all as opcode 7! Opcode 7 was IFLT, the original comparison opcode. This meant all of the comparison opcodes were somehow being run by the VM as IFLT. Somewhere in between my output from the assembler and the VM's execution, something was going awfully wrong. It was as if something was rewriting my output...

... and that's exactly what was happening. A little bit of background first, though. All of these comparison opcodes take 3 arguments: 2 registers for comparison, and then an offset to jump if the comparison is true. The offset can be either an immediate value (ie. IFEQ r0, r1, 2), or it can take a label (ie. IFEQ r0, r1, someLabel). In this case, I was using a label (ie. "error"). When you use a label, it has to be translated into the other format during assembly.

The bug was actually occurring during the assembler's 2nd pass. When it was translating the label into a direct offset, it was also forcefully rewriting the opcode associated with the line. This part of the code ensured that IFLT was the only comparison opcode that was being sent off in the final rewrite of the line. Since IFLT was the only comparison opcode around when this part of the project was written, there were no consequences until people decided to start adding more opcodes.

Programmers often say that programming is about thinking outside of the box. Troubleshooting can be that way, too. I had started with the assumption that the error was most likely going to be in the new code, but it's worth considering that an error can be in existing code that held to assumptions that aren't true anymore.

Hopefully we see MiniScript 2.0 soon!

DEV Community

Hunting for a Bug in a Prototype Project

Top comments (0)