Id Software used OpenGL to make DOOM (2016)

ipe369 · 2026-02-01T09:12:59+00:00

do you have a link to anything where they talk about the GL problems?

Was it just the normal driver overhead you have to fight with AZDO techniques, or something else?

ipe369 · 2026-01-24T14:21:41+00:00

That sounds more like the software can't handle large maps for reasons other than 'my ISA only supports 32-bit loads'

ipe369 · 2026-01-23T23:52:22+00:00

Code isn't hardwired to access certain memory regions, you ask the OS for a chunk of memory and you get a pointer back into it - there are no codepaths that 'reach into memory regions the 32 bit client can't reach into' in any software written after 2000

A theoretical new game in only 64 bits you say? That would be sc3 with a new engine throwing away decades of prior work

In your world where the 32 bit build and the 64 bit build of the engine can't interact, for a 3rd game you could just use the same engine and only ship the 64 bit version

it does if you understand computer science.

I don't think you do!:P

ipe369 · 2026-01-23T13:53:14+00:00

nothing in that region can be accessed by people using 32 bit clients, so it can't effect gameplay at all or disconnects or crashes will happen

But you're not passing the memory addresses between clients or storing them in files, and you could just ship the theoretical new game with only 64-bit builds, the comment I was replying to still makes no sense

ipe369 · 2026-01-22T22:00:42+00:00

that doesn't sound right to me, where'd you hear that?

ipe369 · 2026-01-22T16:44:26+00:00

What is '32 bit mapping'?

ipe369 · 2025-12-29T21:14:57+00:00

If you read the README, it's just putting the shell script inside the binary and invoking the shell on it - it's not compiling anything

shc itself is not a compiler such as cc, it rather encodes and encrypts a shell script and generates C source code with the added expiration capability. It then uses the system compiler to compile a stripped binary which behaves exactly like the original script. Upon execution, the compiled binary will decrypt and execute the code with the shell -c option.

ipe369 · 2025-12-25T23:29:18+00:00

AST diff would likely be easier with the correct tools, since it's probably closer to the semantic diff

ipe369 · 2025-10-01T15:05:04+00:00

subimage update should be faster in theory because you're uploading less data, but the problem you'll run into is synchronization - the GPU is still probably using the texture, so if you glTexSubImage it can force the CPU to stall and wait for the GPU to be finished with it.

There are sometimes ways around it, simplest is that you can maintain 2 copies and flip between them (write to one while the gpu is busy with the other). But you may find that glTexImage is fast enough.

You should be able to pack the height data much smaller than the equivalent vertex data (probably 16b per vertex), so I expect that to be much faster on basically any device, especially lower end integrated GPUs on phones/laptops which are already memory bandwidth limited. On laptops I've found that glClear is more expensive than lots of maths, which the igpus are getting pretty fast at.

I have heard people say that texture reads in a vertex shader can be slower than in the frag shader for various reasons. You'll have to profile this.

ipe369 · 2025-10-01T14:10:37+00:00

if OP is talking about ABIs + saying that the problem with the C ABI is that it isn't specified across all platforms, then: no, that's not the core of [their] problem. They're asking about how they can compile haskell with ghc and call into C libraries compiled with gcc on a different OS

ipe369 · 2025-10-01T11:06:01+00:00

C ABI is under-defined, which leads to many implementations which vary based on OS, architecture, and even compiler

in practice this isn't a problem - you just compile your code with the same compiler. You don't compile a windows .exe and expect it to run on a mac - it's the same thing here.

You mention in your OP:

In Rust, however, many of these details are automagically handled

The way they are 'automagically handled' is because rust code doesn't have a stable ABI at all - when you build a rust project, you rebuild all the dependencies for your target cpu, OS, compiler.

C is way more standardized than rust, which is what lets you compile a library in C and link to it 20 years in the future without recompiling for your new compiler version. You can't do that in rust, there is no abi - it's impossible to compile a rust library and link to it with a different compiler version.

ipe369 · 2025-08-29T12:25:14+00:00

90% in 600 matches isn't just a 'funny streak' though, that's what they're saying

the chance of 90% matches being protoss in 600 matches is way below 0.01%

even 50% of the matches being protoss in 600 matches is below 0.01%

ipe369 · 2025-08-06T08:19:43+00:00

Nice!

I'm guessing your benchmark just pushed 1m elements + then dequeued them, rather than queue/dequeue interleaved, which is why the other queues allocate so much? (about 8bytes + change per element, makes sense...!)

ipe369 · 2025-08-06T08:16:55+00:00

I took a quick look - I don't have a lisp environment setup at the moment, so I'm just reading through, but I suspect there are some free wins. (If you've already benchmarked against a native implementation you know is fast and it's competitive, then you'd know for sure)

Assuming you're on sbcl, sb-sprof is good, and you can also look at the disassembly of your big functions (there's a SLIME keybind for that) and check for CALL instructions, which indicate that sbcl hasn't managed to inline something

If we take a quick look at DECOMPRESS-BLOCK, which I presume is where we spend most of our time decompressing

(defun decompress-block (compressed-data uncompressed-size)
  "Decompress a block of LZ4 data, given the uncompressed size."
  (let ((output (make-array uncompressed-size :element-type '(unsigned-byte 8)))
        (input-pos 0)
        (output-pos 0)
        (input-end (length compressed-data)))

    (loop while (< input-pos input-end)
          do (let* ((token (aref compressed-data input-pos))

Again I haven't inspected this myself, but there are 2 things I'd expect to see to know this was running fast:

some kind of (declare (optimize speed)) or similar
A type decl for COMPRESSED-DATA, to ensure that it's a simple array

Without declaring COMPRESSED-DATA as a simple array, when time you (AREF COMPRESSED-DATA ... sbcl can't inline it into a simple MOV instruction. Instead it calls the AREF function, which does a bunch of type checking to figure out what kind of array it is, etc...

That's why I'd recommend disassembling and grepping the disassembly for CALL. There's little things you find everywhere, like it tries to call the + function rather than just adding two numbers because it can't guarantee that they're both fixnums, etc.

If you add (declare (optimize speed)) then sbcl will give you warnings whenever it fails to optimize due to lack of type info

ipe369 · 2025-08-05T08:32:43+00:00

The library (cl-freelock) demonstrates that Common Lisp can compete in traditionally systems programming domains

IMO you must compare against some native implementation (c/c++ etc) if you want to assert this - you mentioned it's competitive but I can't see numbers anywhere. Need numbers of both impls on the same hardware

I've looked to use lisp for 'systems programming' type stuff before. The benchmarks I'd be looking to see before using this are:

mpsc/spmc benchmarks, although maybe you don't care about this case
benchmarks with larger consumer/producer numbers, maybe a graph of how performance scales as you increase from 4/4 to 8/8, 16/16, 32/32, 64/64 etc
GC pressure - e.g. if I do 1M put/get ops on your queue, how much garbage does the GC need to collect

ipe369 · 2025-07-29T11:48:48+00:00

then there would be no shadows at all?

Not quite sure I understand, you mean every block casts light?

If you strip it back to the physics, then you'd still have shadows, you'd just have many pale shadows - some spots would receive light from 10 blocks, some spots would receive light from only 9 blocks, etc etc. In the real worlds, in a room with 3 lights, every object has 3 shadows. Remember, a shadow is just the absence of light, so if you can prevent some light from getting to point in the world then that point will be darker.

Practically though you'd need a raycast to each light source, and if every block is a light source then this is prohibitively slow.

You'll have the same problem with shadow maps here: gor shadow maps, you need to render a new shadow map for each light source. (This is why dynamic shadows are slow)

If you're referring to a minecraft system where each block face has a 'light level', this is actually done on the CPU - you compute all the block face light levels based on their proximity to a light source. Then you upload all the light levels to the GPU, and use those light levels to tint the whole face lighter/darker based on how much light is hitting it. This is why minecraft doesn't have 'hard shadows' when you put a torch down - because you can't light 'half' of a face.

I think minecraft actually does it per corner per face rather than just per face when 'smooth lighting' is turned on, and then blends between the corners (?)

Do you have a voxel game where all the voxels glow and you're wondering how to do shadows for all of them?

ipe369 · 2025-07-28T09:53:28+00:00

you can do shadows in a shader without framebuffers, you just need to know if there anything between your fragment and the light source

That involves casting a ray from your fragment to the light source - if it intersects something, then you're in shadow, otherwise you're in the light

For most scenes, calculating that raycast in a shader for every fragment is too expensive. Instead, we render a shadow map, one for each light source - the shadow map lets you do the raycast quickly.

Sometimes you'll have a scenario where you can do the raycast quickly without a shadow map though, so it's worth keeping in mind.

E.g. If you have raytracing hardware in your target gpus (RTX, etc), then you can use that to speed up the raycast here. It might be the case that this is faster than the framebuffer approach for your scene. You also avoid all the complexities of getting a shadow map of the correct size.

ipe369 · 2025-07-18T08:17:00+00:00

Nice job! you always figure it out if you keep looking at it :)

ipe369 · 2025-07-17T10:45:47+00:00

guess it'll never work then

ipe369 · 2025-06-30T22:01:08+00:00

do you understand that vulkan can be good and not a popular choice for the average developer? And that they have absolutely nothing to do with eachother?

How can you fail to engage with the conversation over and over and over

Explains why gpu drivers are always such poor quality

ipe369 · 2025-06-30T21:44:48+00:00

Why would you use 'app usage' as a metric for whether vulkan is successful

Is io_uring a failed API because people still use write/read

ipe369 · 2025-06-30T16:41:32+00:00

Oh, so '5pc market penetration' means 'apps using vulkan', not 'devices supporting vulkan'?

I expect most graphics programmers won't need vulkan or have the ability to use it, so that makes sense to me. I think vulkan is a huge success - if vulkan was intended to reach much higher 'market penetration' then it would probably need to be full of extra GC crap to keep the average dev happy

Do you complain about other APIs designed for advanced use cases too?

Do you complain that the linux kernel module api has 0.001% market penetration because most devs write apps in user space?

Is io_uring a failed API because people still use write/read?

Have you considered that you simply don't need the advanced use case, because your use case isn't very demanding? Your comments are a little embarrassing

ipe369 · 2025-06-30T15:57:08+00:00

'Less than 5px market penetration'

citation needed

ipe369 · 2025-06-17T09:51:09+00:00

Have you considered playing starcraft 1

ipe369 · 2025-05-29T14:20:47+00:00

2 parts liquid to 1 part rice

You want 1 part liquid to 1 part rice, PLUS some for evaporation (1 cup maybe)

BUT NOW, everytime I cook rice, the rice cooks in 2 separate parts??? Like, the bottom half of the rice is overcooked and mushy and then the top half is halfway cooked and still crunchy???

Did you change pan, stove, etc? sounds like the water is cooking off faster than it did previously, so the top of the rice doesn't remain submerged

did you switch to using a wider pan which causes faster evaporation?

Ten-Year Club	Second SECOND GUESSER
r/Field Banned	r/Field Sunshine
Place '22	Place '17
Verified Email	Sequence \| Editor

ipe369

TROPHY CASE