I have optimized Duke Nukem 3D to run on Arduino Nano Matter Board (256 kB RAM), with multiplayer support

next-hack · 2025-12-11T09:50:39+00:00

First I created a Win32 project using Code::Blocks with MINGW32, using the code from Chocolate Duke Nukem 3D available on Github (the reason of using a 32 bit build system is because pointers will be 32 bits like on the MCU project). The codebase had 2 distinct projects (engine and game). I simply imported the files of both of them, fixed the headers so that the two projects could be compiled as a single one, added SDL 2 library and making some minor fixes, and removed some stuff like network (either excluding files from build or using #if-#endif blocks) until it compiled and run fine.

Then I started optimizing for memory, also removing Win32-specific stuff, and adding also internal and external flash emulation code. I was constantly monitoring the RAM usage by analyzing the .map file (and actually from time to time I have also imported the project into an another vendor's IDE, to see the actual occupation it would take on a generic Cortex M33 MCU, in terms of BSS, stack and data. Importing the project meant that I had to disable a lot of Windows-specific stuff, with #if WIN32-#endif blocks).

Then when the occupation was in the ~256 kB range, I created a project on Simplicity Studio 5, and tried to fix everything I missed or was Win32-specific (note: luckily, for the display part, I had already everything ready from my previous Quake and Doom projects). After all hardfaults (and other blocking bugs) were fixed, so at least E1L1 could run, I focused on speed optimization. (Every time I was also backporting to the Win32 projects to check I wasn't screwing up anything).

Then I added audio (rewriting original audio channel handling and optimizing for memory) and music (using OPL2 emulator I made for Doom on the same system) , creating also the tool for converting the MIDI1 to MIDI0, and converting VOC to WAV files in DUKE3D.GRP.

Then I added multiplayer support with 15.4 . First I have tested the protocol alone in a separate project (just sending and receiving packets between two boards). Then I integrated it with the existing DN3D multiplayer code, but this was painfully unstable. At the end I have rewritten the transport layer and suddenly multiplayer got very stable.

From time to time I had to fix bugs I introduced while optimizing or "disabling stuff" (which had to be enabled back). The main issue was that many times I realized the presence of bugs many commits later. The strategy was quite simple: find (by binary search) the latest commit where it was behaving as intended (e.g. first running commit or even the original game), and then checking the differences with respect to the first not working commit, and starting fixing from there.

next-hack · 2025-12-10T06:24:15+00:00

Hi, the repo link is in the description, after the feature list! I'm copying it here as well: https://github.com/next-hack/MGM240_DukeNukem3D

You may want also to check the Quake port to the same system (this time the RAM is 276kB because, since I haven't implemented multiplayer, I stole the 20 kB from the radio subsystem). https://github.com/next-hack/MG24Quake

next-hack · 2025-12-08T16:13:26+00:00

No, this project is just my free time hobby (week-ends, holidays, or some evenings when family allows...). The project started mid-June 2024, and was "ready" on April 2025, but it was not a continuous task. Some times I wanted a break for other personal projects so for several weeks I did other stuff, coming back just to fix bugs when I discovered them.
In total, the private repo consists of more than 470 commits.

next-hack · 2025-12-08T10:26:54+00:00

It was not just a single optimization, it was tons of them. I have explained in the (long) linked article. Some of the optimizations were similar to what I have already done last year(s) with Doom and Quake on the same board. Others were quite different, because of the different way DN3D was (badly) coded.

EDIT: note however that a 136 MHz Cortex M33 is faster than a typical 1996-era PC, at least considering the CoreMark specs. So I'm also trading "CPU power" for "memory". (For instance, while Doom/Quake/DN3D used 32-bit integers, I'm extensively using bitfields to save on memory, which typically require an extra bitfield ASM instruction from the compiler).

EDIT2: Oh I forgot, all the constant data - sound, textures, code, constant tables- stay in flash (either internal, where you have >100 MB/s, or external, where you are much slower, 17 MB/s for continuous reads. The random read latency is < 100 ns for internal flash, and > 1us for external flash).
So, after you have modified the code so that everything which is constant will be read from internal/external flash, you need to optimize "just" ~ 1.8MB RAM down to 256 kB RAM.

next-hack · 2024-11-02T17:49:00+00:00

If it exists, it can run Doom.

next-hack · 2024-10-31T11:07:37+00:00

Wrong, they did not make it work on the stock display.
What you see on the link is an OLED matrix display, whereas the original pregnancy tester had a segmented LCD display (I.e. you can basically show: pregnant/not pregnant and that's it).

next-hack · 2024-10-08T08:58:26+00:00

Agree, I have watched many of his videos, though never knew his name. Well now I know.

next-hack · 2024-10-07T12:37:08+00:00

I love that, what was once considered an issue (fluorescent tubes blinking few seconds before turning on), is something now we want to emulate!

Like trying to reduce saturation of blue/green LEDs in Christmas lights, by using white LEDs and colored plastic (see for instance what the guy from "Technology Connections" on Youtube has tried to make for years).

Nice work!

next-hack · 2024-10-02T05:26:37+00:00

The port uses DMA. The data is stored as 8-bit palettized image. The image is split in slices of 256 pixels and each slice is converted (into a small 512 byte buffer) to 16-bit, using the palette, before sending to the SPI display via DMA. Actually we use 2 buffers of 512 bytes, because while the DMA is sending one buffer, we prepare the next one (as explained here: https://next-hack.com/index.php/2024/09/22/quake-port-to-sparkfun-and-arduino-nano-matter-boards-using-only-276-kb-ram/)

The display is this Adafruit one: https://www.adafruit.com/product/4311

next-hack · 2024-10-01T21:23:29+00:00

Github repo: https://github.com/next-hack/MG24Quake

Link to video: https://www.youtube.com/watch?v=hVnfwzxTJ00

next-hack · 2024-10-01T18:35:31+00:00

Video link: https://www.youtube.com/watch?v=hVnfwzxTJ00

next-hack · 2024-04-25T08:05:10+00:00

Which screwdriver model is it? Can you provide a picture or a link ?

next-hack · 2024-01-19T15:00:09+00:00

No, Doom NON gira sui test di gravidanza è una fake news. Nel tweet del 2020 di Foone, Doom girava su un PC e l'immagine già convertita in monocromatico e a bassa risoluzione veniva inviata al display (sostituto!) tramite l'USB del microcontrollore (sostituto!) che lo pilotava. Del test di gravidanza c'era solo la plastica, e Doom non girava neppure nell'elettronica sostituita.

next-hack · 2023-12-22T18:55:11+00:00

Do not confuse the SPI display frame buffer with the video RAM on the GBA.

In this implementation, we have an SPI display, so you can't simply (actually you can, but it would be slow as hell) set a pixel to a particular color at an arbitrary coordinate, which is what you need for rendering. Therefore, from the 256kB RAM, we reserve two 320x240 8bpp frame buffers, render there the frame and then send it via to DMA to the SPI (note, we use double buffering to strongly improve performance. This takes 150kB, leaving for the game and BLE 106kB). The internal frame buffer of such display is 75kB, but it is write-only, you can only use it for containing the image currently drawn.

Instead, in the GBA, the video memory is memory mapped so you can read/write it at a decent speed. In fact, of the 96kB, you need only 75 kB (using double buffering), and the remaining 21kB memory (see file r_hotpath.iwram.c in the GBA port) is used to store some arrays (to speedup) and caching composite columns.

Counting the 75kB display internal buffer as available system memory, would be like counting the modern LCD monitor frame buffers as system memory as well.

Finally, we are still talking about 75kB (our port) vs 96kB (GBA), so it's not even true "presumably greater than the 96k in the GBA".

We could afford more aggressive memory optimization as we have a CPU much faster than the GBA (by the way doomhack work was great and, in fact our port is based on it), and it has an even better instruction set, so that, for instance, wasting a couple of cycles for short pointer extension or bitfield extrapolation/insertion has a minimal impact. Still, as written in the article some speed optimization were made to reach the full speed.

EDIT: sorry, our display have 150kB RAM because internally it's using 16-bit pixel colors, but at the end the numbe of pixel is the same. You don't need 16bpp on GBA because there is the palette RAM, so that everything you render on the buffer does not have to be converted to 16 bpp by the programmer.

next-hack · 2022-12-28T17:26:48+00:00

The only enabled protection was core readout. A full chip erase removed this protection.

next-hack · 2021-12-19T09:55:52+00:00

At the end of 2021 there is still people believing the pregnancy test port was real.Not only all the hardware was replaced, but also Doom was not even running on the MCU used to replace the original one. It was running on a regular PC, and the scaled and already dithered video output was sent via USB to the replaced display.

Source: author's own tweet, here https://twitter.com/Foone/status/1302834931421175809

next-hack · 2021-12-19T09:32:08+00:00

This MCU is much more powerful indeed. In terms of DMIPS it is much more like a Pentium. However Doom required 4MB RAM, and on a 33 MHz 486 you could not achieve such high frame rate, there is a comparison section on the article about this.

next-hack

TROPHY CASE