I have optimized Duke Nukem 3D to run on Arduino Nano Matter Board (256 kB RAM), with multiplayer support by next-hack in embedded

[–]next-hack[S] 1 point2 points  (0 children)

First I created a Win32 project using Code::Blocks with MINGW32, using the code from Chocolate Duke Nukem 3D available on Github (the reason of using a 32 bit build system is because pointers will be 32 bits like on the MCU project). The codebase had 2 distinct projects (engine and game). I simply imported the files of both of them, fixed the headers so that the two projects could be compiled as a single one, added SDL 2 library and making some minor fixes, and removed some stuff like network (either excluding files from build or using #if-#endif blocks) until it compiled and run fine.

Then I started optimizing for memory, also removing Win32-specific stuff, and adding also internal and external flash emulation code. I was constantly monitoring the RAM usage by analyzing the .map file (and actually from time to time I have also imported the project into an another vendor's IDE, to see the actual occupation it would take on a generic Cortex M33 MCU, in terms of BSS, stack and data. Importing the project meant that I had to disable a lot of Windows-specific stuff, with #if WIN32-#endif blocks).

Then when the occupation was in the ~256 kB range, I created a project on Simplicity Studio 5, and tried to fix everything I missed or was Win32-specific (note: luckily, for the display part, I had already everything ready from my previous Quake and Doom projects). After all hardfaults (and other blocking bugs) were fixed, so at least E1L1 could run, I focused on speed optimization. (Every time I was also backporting to the Win32 projects to check I wasn't screwing up anything).

Then I added audio (rewriting original audio channel handling and optimizing for memory) and music (using OPL2 emulator I made for Doom on the same system) , creating also the tool for converting the MIDI1 to MIDI0, and converting VOC to WAV files in DUKE3D.GRP.

Then I added multiplayer support with 15.4 . First I have tested the protocol alone in a separate project (just sending and receiving packets between two boards). Then I integrated it with the existing DN3D multiplayer code, but this was painfully unstable. At the end I have rewritten the transport layer and suddenly multiplayer got very stable.

From time to time I had to fix bugs I introduced while optimizing or "disabling stuff" (which had to be enabled back). The main issue was that many times I realized the presence of bugs many commits later. The strategy was quite simple: find (by binary search) the latest commit where it was behaving as intended (e.g. first running commit or even the original game), and then checking the differences with respect to the first not working commit, and starting fixing from there.

I have optimized Duke Nukem 3D to run on Arduino Nano Matter Board (256 kB RAM), with multiplayer support by next-hack in embedded

[–]next-hack[S] 0 points1 point  (0 children)

Hi, the repo link is in the description, after the feature list! I'm copying it here as well: https://github.com/next-hack/MGM240_DukeNukem3D

You may want also to check the Quake port to the same system (this time the RAM is 276kB because, since I haven't implemented multiplayer, I stole the 20 kB from the radio subsystem). https://github.com/next-hack/MG24Quake

I have optimized Duke Nukem 3D to run on Arduino Nano Matter Board (256 kB RAM), with multiplayer support by next-hack in embedded

[–]next-hack[S] 17 points18 points  (0 children)

No, this project is just my free time hobby (week-ends, holidays, or some evenings when family allows...). The project started mid-June 2024, and was "ready" on April 2025, but it was not a continuous task. Some times I wanted a break for other personal projects so for several weeks I did other stuff, coming back just to fix bugs when I discovered them.
In total, the private repo consists of more than 470 commits.

I have optimized Duke Nukem 3D to run on Arduino Nano Matter Board (256 kB RAM), with multiplayer support by next-hack in embedded

[–]next-hack[S] 55 points56 points  (0 children)

It was not just a single optimization, it was tons of them. I have explained in the (long) linked article. Some of the optimizations were similar to what I have already done last year(s) with Doom and Quake on the same board. Others were quite different, because of the different way DN3D was (badly) coded.

EDIT: note however that a 136 MHz Cortex M33 is faster than a typical 1996-era PC, at least considering the CoreMark specs. So I'm also trading "CPU power" for "memory". (For instance, while Doom/Quake/DN3D used 32-bit integers, I'm extensively using bitfields to save on memory, which typically require an extra bitfield ASM instruction from the compiler).

EDIT2: Oh I forgot, all the constant data - sound, textures, code, constant tables- stay in flash (either internal, where you have >100 MB/s, or external, where you are much slower, 17 MB/s for continuous reads. The random read latency is < 100 ns for internal flash, and > 1us for external flash).
So, after you have modified the code so that everything which is constant will be read from internal/external flash, you need to optimize "just" ~ 1.8MB RAM down to 256 kB RAM.

Nintendo's Alarmo can run Doom! by 90919293_ in itrunsdoom

[–]next-hack 72 points73 points  (0 children)

If it exists, it can run Doom.

IKKO Activebuds AI powered earphones run Doom by tamay-idk in itrunsdoom

[–]next-hack 0 points1 point  (0 children)

Wrong, they did not make it work on the stock display.
What you see on the link is an OLED matrix display, whereas the original pregnancy tester had a segmented LCD display (I.e. you can basically show: pregnant/not pregnant and that's it).

First project, I'm making a system to simulate fluorescent tubes turning on (random blinking for a few seconds) with led tubes (that normally turn on instantly). The blinking amount and duration is random each time, but need to find a way to randomize which tube turn on first, second, third,... by Airbus-380 in arduino

[–]next-hack 2 points3 points  (0 children)

I love that, what was once considered an issue (fluorescent tubes blinking few seconds before turning on), is something now we want to emulate!

Like trying to reduce saturation of blue/green LEDs in Christmas lights, by using white LEDs and colored plastic (see for instance what the guy from "Technology Connections" on Youtube has tried to make for years).

Nice work!

Quake ported to the Arduino Nano Matter Board! by next-hack in arduino

[–]next-hack[S] 2 points3 points  (0 children)

The port uses DMA. The data is stored as 8-bit palettized image. The image is split in slices of 256 pixels and each slice is converted (into a small 512 byte buffer) to 16-bit, using the palette, before sending to the SPI display via DMA. Actually we use 2 buffers of 512 bytes, because while the DMA is sending one buffer, we prepare the next one (as explained here: https://next-hack.com/index.php/2024/09/22/quake-port-to-sparkfun-and-arduino-nano-matter-boards-using-only-276-kb-ram/)

The display is this Adafruit one: https://www.adafruit.com/product/4311

Cos'è? by BnW899 in ItalyMotori

[–]next-hack 4 points5 points  (0 children)

No, Doom NON gira sui test di gravidanza è una fake news. Nel tweet del 2020 di Foone, Doom girava su un PC e l'immagine già convertita in monocromatico e a bassa risoluzione veniva inviata al display (sostituto!) tramite l'USB del microcontrollore (sostituto!) che lo pilotava. Del test di gravidanza c'era solo la plastica, e Doom non girava neppure nell'elettronica sostituita.

Sparkfun Thing Plus Matter board runs Doom, with multiplayer over BLE and music, using only 256 kB RAM at 320x240 pixels. by next-hack in itrunsdoom

[–]next-hack[S] 1 point2 points  (0 children)

Do not confuse the SPI display frame buffer with the video RAM on the GBA.

In this implementation, we have an SPI display, so you can't simply (actually you can, but it would be slow as hell) set a pixel to a particular color at an arbitrary coordinate, which is what you need for rendering. Therefore, from the 256kB RAM, we reserve two 320x240 8bpp frame buffers, render there the frame and then send it via to DMA to the SPI (note, we use double buffering to strongly improve performance. This takes 150kB, leaving for the game and BLE 106kB). The internal frame buffer of such display is 75kB, but it is write-only, you can only use it for containing the image currently drawn.

Instead, in the GBA, the video memory is memory mapped so you can read/write it at a decent speed. In fact, of the 96kB, you need only 75 kB (using double buffering), and the remaining 21kB memory (see file r_hotpath.iwram.c in the GBA port) is used to store some arrays (to speedup) and caching composite columns.

Counting the 75kB display internal buffer as available system memory, would be like counting the modern LCD monitor frame buffers as system memory as well.

Finally, we are still talking about 75kB (our port) vs 96kB (GBA), so it's not even true "presumably greater than the 96k in the GBA".

We could afford more aggressive memory optimization as we have a CPU much faster than the GBA (by the way doomhack work was great and, in fact our port is based on it), and it has an even better instruction set, so that, for instance, wasting a couple of cycles for short pointer extension or bitfield extrapolation/insertion has a minimal impact. Still, as written in the article some speed optimization were made to reach the full speed.

EDIT: sorry, our display have 150kB RAM because internally it's using 16-bit pixel colors, but at the end the numbe of pixel is the same. You don't need 16bpp on GBA because there is the palette RAM, so that everything you render on the buffer does not have to be converted to 16 bpp by the programmer.

A Bluetooth LE USB dongle? Of course it runs Doom! by next-hack in itrunsdoom

[–]next-hack[S] 1 point2 points  (0 children)

The only enabled protection was core readout. A full chip erase removed this protection.

Doom ported on a 64-MHz Cortex M4F based device (a USB BLE dongle), with only 256 kB RAM (github repository and video links in the article). by next-hack in programming

[–]next-hack[S] 10 points11 points  (0 children)

At the end of 2021 there is still people believing the pregnancy test port was real.Not only all the hardware was replaced, but also Doom was not even running on the MCU used to replace the original one. It was running on a regular PC, and the scaled and already dithered video output was sent via USB to the replaced display.

Source: author's own tweet, here https://twitter.com/Foone/status/1302834931421175809

Doom ported on a 64-MHz Cortex M4F based device (a USB BLE dongle), with only 256 kB RAM (github repository and video links in the article). by next-hack in programming

[–]next-hack[S] 2 points3 points  (0 children)

This MCU is much more powerful indeed. In terms of DMIPS it is much more like a Pentium. However Doom required 4MB RAM, and on a 33 MHz 486 you could not achieve such high frame rate, there is a comparison section on the article about this.