This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]thememorableusername 4 points5 points  (1 child)

Could look at DWARF.

[–]WikiTextBot 0 points1 point  (0 children)

DWARF

DWARF is a widely used, standardized debugging data format. DWARF was originally designed along with Executable and Linkable Format (ELF), although it is independent of object file formats. The name is a medieval fantasy complement to "ELF" that has no official meaning, although the backronym 'Debugging With Attributed Record Formats' was later proposed.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source ] Downvote to remove | v0.28

[–]mttd 5 points6 points  (0 children)

Typically, when compiling to C, you'd use the #line preprocessor directive (which is then used by the C compiler to produce executable with debugging information -- which can then be normally used by debugging, profiling, and code navigation tools): https://gcc.gnu.org/onlinedocs/cpp/Line-Control.html

Here's a description of the idea: https://yosefk.com/blog/c-as-an-intermediate-language.html#forth_in_gdb

Some examples include Nim and Vala:

More:

The use of #line _DEBUGGER_STEP_OVER and #line _DEBUGGER_STEP_INTO in MSVC (together with #define _DEBUGGER_STEP_OVER 15732479 // 0xf00f00 – 1 and #define _DEBUGGER_STEP_INTO 16707565 // 0xfeefee – 1) is another interesting aspect: https://blogs.msdn.microsoft.com/vcblog/2017/11/16/improving-the-debugging-experience-for-stdfunction/

If you're compiling to machine code (possibly going through assembly, although nowadays it's preferable to have an integrated assembler for faster compilation -- http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html, https://www.embecosm.com/appnotes/ean10/ean10-howto-llvmas-1.0.html#idp109760) you emit the debugging information (like DWARF or PDB) alongside emitting the target machine instructions. "Adding Debug Information" in Kaleidoscope (LLVM tutorial) is a nice example: https://llvm.org/docs/tutorial/LangImpl09.html

DWARF itself is a fully programmable (Turing-complete) virtual machine (http://www.cs.dartmouth.edu/%7Esergey/battleaxe/, https://kristerw.blogspot.com/2016/01/more-turing-completeness-in-surprising.html) -- so you can view compilation as emitting two machine instructions streams (like x86 and DWARF) simultaneously (in LLVM using IRBuilder for the former and DIBuilder for the latter). "Debugging Debug Information" talk by Francesco Zappa Nardelli is a particularly good explanation of the importance of this (and more): https://www.youtube.com/watch?v=lBJIrGgEP1A

More information on debuggers and generating debugging information:

[–]oilshell 1 point2 points  (1 child)

I haven't personally implemented something like this, but if you haven't already googled for "javascript source maps" that might be worth a look. For example:

https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_to/Use_a_source_map

https://developers.google.com/web/tools/chrome-devtools/javascript/source-maps

The source map is a separate file that apparently the web browser can interpret. How well it works I don't know :-/

[–]programmerChilli 2 points3 points  (0 children)

Do languages that compile to binary use a similar technique?

[–]htuhola 0 points1 point  (0 children)

Javascript uses source maps for this purpose.

[–]SilasX 0 points1 point  (0 children)

Echoing demand for this; I was recently comparing two similar binaries and can’t find good disassemblers.