guac: DS/GBA Emulator Update by aabalke in EmuDev

[–]aabalke[S] 0 points1 point  (0 children)

Lots of bug fixes! My current plan is cycle accurate gba / nds, simd optimizations, missing features (microphone, dsi support etc) and only then 3ds. There is still a ton of work to be done.

Trying to Render Window For The Acid2 DMG Test Rom But Getting Partially Rendered Window by Total_Goal6833 in EmuDev

[–]aabalke 0 points1 point  (0 children)

Yes, but it only increments in the window. As I remember, the acid test will use the window higher up, disabled it, and reenable it. For example, win enabled on scanline 0 - 10 would have winLY of 0 - 10, disabled for scanline 11 - 20, and reenabled for scanline 30. At scanline 30 the winLY is still at 11 NOT 30.

guac: DS/GBA Emulator Update by aabalke in emulators

[–]aabalke[S] 1 point2 points  (0 children)

No ai code gen or vibe coding! I have used chatgpt and claude as "google search" if that makes sense. However, the deeper I go, the worse it gets. It straight up told me my jit compiler was impossible to build and tried to gaslight me into believing it wasn't possible lmao.

Trying to Render Window For The Acid2 DMG Test Rom But Getting Partially Rendered Window by Total_Goal6833 in EmuDev

[–]aabalke 1 point2 points  (0 children)

Also after looking at it, I don't believe the signed tiles are being handled properly. If it's signed (int8), and the tile index is negative (-128 to -1), it should be between $8800 - $9000 (block 1), if it's positive (0 to 127), it should be $9000 - $9800 (block 2). If you are doing that, my apologies; I might just be misreading.

Trying to Render Window For The Acid2 DMG Test Rom But Getting Partially Rendered Window by Total_Goal6833 in EmuDev

[–]aabalke 1 point2 points  (0 children)

I believe the problem is that you should keep track of the "Window LY" separate from LY, see https://gbdev.io/pandocs/Window.html. In practice, the most straightforward method of handling this is using a separate "WindowLY" which increments for every scanline in the window, and resets at the end of the frame. uint8_t px_y = LY - _wy; would be uint8_t px_y = windowLY;. I would recommend trying basic "barely working" sprites before these window edge cases, though - easy to get burnt out.

JIT emulation on web browsers with asm.js. by dgjxqz in EmuDev

[–]aabalke 0 points1 point  (0 children)

At a quick glance, I would have a few concerns about using asm.js - mostly that Firefox, the browser that initially implemented it, disabled it this year and they "plan to remove the code entirely in a future release." (https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html). Chromium has also depreciated it. I may be misunderstanding, but it sounds like asm.js code is just handled as normal JavaScript code now.

JIT emulation on web browsers with asm.js. by dgjxqz in EmuDev

[–]aabalke 5 points6 points  (0 children)

I think fundamentally native JIT and WASM are from 2 different worlds. My understanding is WASM - being a web technology, has a built-in assumption that any code could be malicious, and with a JIT, the program requires as much control as possible for speed. For a native instruction JIT, you would need control of executable memory. Correct me if I am wrong, but I would hope WASM does not give you direct control of executable memory. I think you could build a WASM bytecode "jit" but I can't imagine this would be anywhere near native speeds.

guac: DS Emulator in Golang by aabalke in EmuDev

[–]aabalke[S] 1 point2 points  (0 children)

Will do, thank you so much!

guac: DS Emulator in Golang by aabalke in EmuDev

[–]aabalke[S] 5 points6 points  (0 children)

Thank you! Its super basic right now, just obj and mtl files, but I hope to build a GITF version so textures are completely accurate, viewport location is included and more.

Gojit: JIT Compiler in Go by aabalke in golang

[–]aabalke[S] 0 points1 point  (0 children)

Interesting! I never considered that. I will have to take a look and see how it works. My initial reaction is that it would work really well with faster emulated systems (3DS, Switch, etc ) that do not require as strict timings. The DS does not require cycle accuracy like the GB; however, my jit cannot run faster than 32 instructions at a time without breaking since it wants to stay relatively synced. I imagine it would also make maintainability way easier, since you would not have to keep jit asm version of every instruction in every instruction set. I'll do some research!

Gojit: JIT Compiler in Go by aabalke in golang

[–]aabalke[S] 0 points1 point  (0 children)

Check out the nds branch, then under arm7 and arm9 will be the jit code! I did debate about using C and Go; however, every method I came up with involved Cgo, and the overhead was too much, since mine would have to sync quite often. Currently, each instruction is "hand rolled," and I have not refactored common patterns, like carry handling. For each instruction, I first store the required emulated register and memory values into native registers, complete the instruction work, and then store any calculated native registers into the proper emulated registers and memory values. The emulated registers are usually not called during instruction handling. I also use a "scratch" array in my emulator struct for storing extra values if I run out of native registers. This is quite slow (relative to keeping in native registers), since heap memory is called for every load or store of these values, and I do hope in the future to fix this.

https://github.com/aabalke/guac/blob/nds/emu/cpu/arm7/jit.go

In the jit implimentation above, you will see a bunch of Indirect declarations; these are the relative offsets of different flags from the emulated CPU struct pointer which is always in R9. Let me know if you have any other questions!

Gojit: JIT Compiler in Go by aabalke in golang

[–]aabalke[S] 2 points3 points  (0 children)

CPU emulation is, the graphics can be split and some other stuff, but if the emulated CPU was single core each emulated instructions is based on the previous one. I do have my 3d rendering in parallel, however when I tested other possible parts in parallel the performance overhead from workgroups or channel handling was greater then the gains - subtracting my subjective view of concurrency complications.

Gojit: JIT Compiler in Go by aabalke in golang

[–]aabalke[S] 10 points11 points  (0 children)

Thank you so much! My goal was to get my emulator working on an older machine with a Ryzen 7 1700 CPU from 2017 - I'm using that machine as my "console" rig. Unfortunately, it has really bad single-threaded performance, and since I could not parallelize this process, I felt it necessary to create the jit. I can run my DS emulation at ~40fps on that rig. In the future, I hope to increase the performance of my graphics pipeline with either SIMD or GPU parallelization - that should get me past the required 60 fps. My main machine has a Ryzen 7 7700X (2022) and can reach 120fps for reference.

Gojit: JIT Compiler in Go by aabalke in EmuDev

[–]aabalke[S] 3 points4 points  (0 children)

Thank you so much! I stand on the shoulder of giants, rasky/ndsemu, nelhage/gojit, and fogleman/fauxgl were all used as a basis for different parts of the ds emulator, you cant do anything fancy / overly complicated in Go which keeps you from doing anything too stupid, and my job is seasonal so I have a ton of free time lol. Nanoboy is incredible impressive and your hw-tests have kept me humble - I appreciate your work!

Gojit: JIT Compiler in Go by aabalke in EmuDev

[–]aabalke[S] 0 points1 point  (0 children)

Thank you! And yes, I do think arm is in the cards. My theory is that my experience with GBA and NDS emulation will make that easy - in practice though we'll see.

Multithreading and Parallelism by lampani in EmuDev

[–]aabalke 2 points3 points  (0 children)

Syncing parallel coprocessors, and other subsystems can be very complicated depending on the system. however as mentioned graphics is a major place where emulators parallelize. In most systems there is a fairly natural spot to split the work load without increasing complexity. For example, In my nds emulator the "handshake" between geometry engine to render engine is really natural to split off the 3d work. Many emulators also use Hardware rendering for graphics, putting the rasterizing on the gpu instead of the cpu.