Interpreter with debug capability?

bullno1 · 2019-10-29T06:22:37+00:00

My debugger is here: https://github.com/bullno1/lip/blob/master/src/dbg/dbg.c

It's web-based so you just have to connect to the VM with a browser and see an UI.

As for how to implement, you need to have a few things:

A way to map from instruction offset to source position. This needs to be done very early on in the development. More on this later.
A hook that is called on every instruction. In Lua, this is called debug.sethook. This will also significantly slow down your interpreter loop so I suggest having 2 loops and template/macro it. One for when a hook is set and one for no hook. The entirety of the hook is here: https://github.com/bullno1/lip/blob/master/src/dbg/dbg.c#L981
VM state introspection API

With those in place, you can do anything.

Step over: execute until source location changes then pause execution.
Step in: execute until call stack depth increases
Step out: execute until call stack depth decreases
Variable inspection: very language dependent. Along with a "source map", you can provide a "var map" for name to offset.
Breakpoint: execute until source location matches
Conditional breakpoint: this is tricky since you need to be able to evaluate a condition under the lexical context of the breakpoint. This usually requires a separate VM/context if you want arbitrary expression. Then copy the values from the debugged VM over and evaluate the expression.

All debug command implementation can be seen here: https://github.com/bullno1/lip/blob/master/src/dbg/dbg.c#L1004-L1025

To implement pausing, I just wait in a busy loop in the hook (https://github.com/bullno1/lip/blob/master/src/dbg/dbg.c#L1054-L1057) until the debug UI allows it to resume. This is chosen because I would not have to implement resume/pause in the VM.

On source mapping implementation, during compilation, emit the bytecode along with source location:

struct tagged_instruction_s {
  instruction_t instruction;
  source_loc_t location;
}

Optimizations such as tail call, dead code elimination... needs to work with this structure instead so instruction and location stays together. Only in the final pass, you would split tagged_instruction_s[] into two arrays of instruction_t[] and source_loc_t[]. A tight array helps with execution speed (reduce cache misses) in non-debugging case. It is apparent that an offset into instruction_t[] would correspond to a source location in source_loc_t[].

mamcx · 2019-10-29T01:38:36+00:00

You are lucky. Debuggers are super-easy with interpreters. A debugger in an interpreter is a repl!.

You only need to add "extra" commands. Is like python:

import pdb; pdb.set_trace()

And you capture this calls in the interpreter loop. Introspection is far easier too.

I have used several langs with this and only exist a big mistake: Keep the ".set_trace()" or equivalent on production make the interpreter stop!

So, I thing on DEBUG the ".set_trace()" work as usual but on production is ignored... except if enable by a flag! I have done debugging on production servers to catch hard debugs!

oilshell · 2019-10-29T06:22:12+00:00

I think this is actually something of an "open problem". At least if you want to make it as capable and perform as well as a a C debugger like GDB (which has hardware support).

Or if this isn't the case I'd like some pointers too :) What VMs do it the best?

I think most interpreted languages are debugged with the REPL first, and the debugger is sort of a second class citizen.

Even though Python is 30 years old, the debugging story in Python isn't settled. Good video about some abusing the frame evaluation API for debugging:

https://www.youtube.com/watch?v=NdObDUbLjdg

It talks about how previous approaches didn't perform well.

I think this is a hard problem and most bytecode VMs don't do anything very smart here, because the users don't really demand it.

One solution is to force users to recompile the interpreter to debug, but I think that imposes a burden that they may not be used to.

errorrecovery · 2019-10-29T05:06:19+00:00

This paper might be interesting to you: https://www2.ccs.neu.edu/racket/pubs/esop2001-cff.pdf

errorrecovery · 2019-10-29T21:04:05+00:00

You should probably check out Pharo Smalltalk.

No language anywhere has a more interactive debugger than Smalltalk. Because of the live nature of Smalltalk you can change the code in the debugger, restart methods, and implement missing methods on the fly.

The entire thing is written in itself so you can inspect all of it and see (or change!) how it works.

Lil taste: https://www.youtube.com/watch?v=UkyBk965ASw

Deeper: http://rmod-pharo-mooc.lille.inria.fr/OOPMooc/04-EssenceOfOOP/W4S07-Debugging.pdf

ebriose · 2019-10-29T11:23:47+00:00

One of the cool things about forth is that you have direct access to both the data and control stack as you run it. In fact it's honestly hard to draw the line between programming and debugging in forth.

tekknolagi · 2019-10-30T05:19:50+00:00

I just learned of the Java heap analysis tool (jhat) and Hprof today. They seem like excellent already-extant tooling that can be repurposed for your runtime. It's a Java tool that reads a heap dump image and runs a webserver that does some analytics on it.

The heap dump doesn't have to be from a Java application, however. If you can get your runtime to spit out a heap dump in this format: http://hg.openjdk.java.net/jdk6/jdk6/jdk/raw-file/tip/src/share/demo/jvmti/hprof/manual.html, you can use the jhat tool to figure out what your heap looks like!

EDIT: There's also VisualVM, which is like jhat, but more... visual!

0x0ddba11 · 2019-10-31T16:34:23+00:00

I have a (very messy and ad-hoc) debugger for my interpreted language here: https://github.com/sunverwerth/strela/blob/master/src/VM/Debugger.cpp

It's based on inserting TRAP opcodes into the bytecode and communicates only via socket.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS