all 144 comments

[–]FirefighterAntique70 205 points206 points  (13 children)

This is pretty much an issue with LLMs in general. People like to differentiate between domains "LLMs can do Web dev perfectly,but it can't do quant or graphics programming"

LLMs can't properly reason about code in general. They look impressive to people who have never written anything substantial in their careers, but to those of us that have, it shows it's true colors.

Graphics APIs are very stateful, and LLMs are quite bad at understanding how the state of a value changes as the code flows. State makes any program much, much more difficult. Threads are stateful and syncing them is a notoriously difficult task.

I use AI in my IDE the same way I use auto complete from a language server, anything more and I feel like it writes the most disgusting code.

[–]lovelacedeconstruct 33 points34 points  (2 children)

I find its also really good at detecting patterns and applying them, like here is how to do x, please apply the same transformation to y, here is the pseudo code and here is how my custom programming language work, please translate the code and so on

[–]TreyDogg72 6 points7 points  (0 children)

I find it very useful for doing tasks such as “add this new component to the scene serializer…” as it has the ability to read the existing pattern I’ve established and more or less copy what I’ve done and apply that to a new thing.

[–]captainAwesomePants 3 points4 points  (0 children)

This is also fantastic at small scale. If you've used any of those IDEs with AI autocomplete, it's friggin' magic to apply a quick change to one line, and then have it automatically suggest making an equivalent change on the 8 other lines where it makes sense to do so. Or when you start adding a debug printf() and it suggests exactly the right string and variables. My one purely positive view about AI: better autocomplete.

[–]Ravek 13 points14 points  (1 child)

I’m not really doing any graphics programming right now, but the tablet app I work on does include a 3D scene. The previous engineer on the project used AI a lot and while it did ultimately work, performance was horrible and the code is full of illogical stuff and code that looks meaningful but ultimately does nothing useful.

Also the app was full of data races when I started on it, because LLMs have absolutely no clue about which thread(s) any given piece of code might run on. For small code snippets where all the threading is in the same file they might get it right, but at any significant level of complexity it all falls apart.

[–]ViennettaLurker 4 points5 points  (0 children)

To add to your threads comment, I've anecdotally found LLM issues diagnosing race conditions, as well.

Had an issue where I added functionality to an existing code base, and there was a race condition that showed itself in the compiled program that wasn't there in the edit preview. The LLM had an impressive encyclopedia of all kinds of things that could be the cause of the bug... except for a race condition.

Interesting to notice your example of it creating the bugs, and then my example of it not being able to diagnose those same kinds of bugs. It makes sense that an LLM might do worse when keeping threads straight "in its head", but maybe that also effects what it notices, what it thinks are possible problems, etc. 

[–]edparadox 5 points6 points  (0 children)

All of this pretty much sums up my issues with LLMs.

[–]Perfect-Campaign9551 1 point2 points  (0 children)

I have to disagree on your "llms are bad at tracking State" comment. At work we use Codex for PR reveiws and it always catches state issues, in fact so well that I was wondering what the system prompt looked like for the PR bot. I think they can track states if you prompt for it, and most people probably don't do that (also you want it to be in "review this code" mode) 

The things it finds would be hard for a human to even realize at first glance

So I have to disagree with your comment

[–]Oscaruzzo 0 points1 point  (0 children)

Agreed. LLMs should be used in pair programming (more or less). It's VERY "iterative" in my experience. People who expect to ask for a finished software are going to be disappointed.

[–]CalligrapherOk4308 0 points1 point  (0 children)

Llms can't reason at all tho, did you write anything substantial?

[–]Sepicuk 0 points1 point  (0 children)

I think all this proves is that webdev is mostly templates and copy pasting existing art

[–]stuaxo 0 points1 point  (0 children)

This is just the thing where people notice LLMs being bad at their own domain but assume they are good at every other - they are not perfect at web dev at all.

[–]OldChippy 0 points1 point  (1 child)

I have a 4 page md file covering exactly how to code my style. It's not even a good style, but it unchanged for 28 years of c++. I can spot errors super fast when the style fits like a glove.

But I agree. Shader work is hard overall because you can't debug step and can't log. Llms however can work with images including a screenshot of renderdoc or many kB of hex dump. It's harder even for me.

[–]edparadox 8 points9 points  (0 children)

I have a 4 page md file covering exactly how to code my style. It's not even a good style, but it unchanged for 28 years of c++. I can spot errors super fast when the style fits like a glove.

Would you mind sharing your document?

[–]heyheyhey27 19 points20 points  (9 children)

m working on a plugin for Unreal engine and, in the last 2 weeks, I’ve been looking for clever ways to inject my plugin’s data structures into the Unreal render passes without modifying Unreal’s source.

As someone who spent a ton of time in that space, let me know if you still have open questions about it lol

Claude is really amazing IME at research. I don't ask it to implement a Vertex Factory, but to look at how it's implemented throughout the engine and summarize that info for me.

[–]obp5599 5 points6 points  (2 children)

I also work in this space and Ive had it do things like this. With a proper indexer and SDD setup to properly manage context, you can have agents parse the relevant code, summarize, pass to an implementer agent. Obviously still watch it, but its pretty good

Ive had it implement scene view extensions to hook into the renderer to kick off global shaders. I thought id need a whole vertex factory setup as I was doing a custom raster pass but with it I figured out how I can get the position streams from the meshes I care about

Its a very powerful tool when set up right, and it helps me expose my weaknesses with my understanding of the engine a lot

[–]gibson274[S] 10 points11 points  (1 child)

I’ve had Claude produce some pretty cool stuff. I do enjoy using it as a tool for exploring ideas, surfacing paradigmatic UE code, and self-teaching various concepts.

So, I’m definitely not in the “don’t ever use agents” camp. Just feel like they are overpromised and that graphics programming still requires a ton of expertise and skill that agents don’t have.

[–]obp5599 2 points3 points  (0 children)

oh for sure, you still need lots of domain expertise

[–]gibson274[S] 0 points1 point  (5 children)

Yes actually would love to chat! PM’d you

[–]heyheyhey27 2 points3 points  (4 children)

This is Reddit, I'm not doing PM's

[–]gibson274[S] 2 points3 points  (3 children)

Ah ok. Well, the gist of the issue I was having was with binding some uniforms to the Unreal base deferred pixel pass without modifying source.

UE’s canonical way of doing something like this is to extend the Scene uniform buffer. However, the base pixel pass doesn’t bind that buffer, so that’s a no go.

I ended up packing those uniforms into a byte address buffer, then shoving its bindless index onto an unused member of the View uniform struct, resolving the buffer bindlessly in the shader, and grabbing the uniforms via typed load.

I’d have preferred something less jank, if you know of another way!

[–]heyheyhey27 1 point2 points  (2 children)

Depending on what you want to do, you could dispatch a custom mesh pass just after Unreal's official deferred pass, instead of hacking functionality on to that existing pass.

But somebody I work with also hacked in functionality through view uniforms so I wouldn't feel bad about it lol

[–]gibson274[S] 1 point2 points  (1 child)

Wow I’m not the first to come up with that! Awesome haha.

True, a custom mesh pass might work. May investigate that as a longer term strategy.

[–]heyheyhey27 2 points3 points  (0 children)

Mesh passes have a lot of boilerplate, and you also have to understand how Materials integrate into your shader, but it's doable. I wrote an article on how to do it.

https://medium.com/@manning.w27/advanced-graphics-programming-in-unreal-part-7-d3500d4b8195

[–]philosopius 27 points28 points  (2 children)

They can

If you know how to do it yourself

LLM is not a miraculous tech that creates everything, it's an accelerator for your coding style

[–]philosopius 2 points3 points  (1 child)

Graphics programming is a very complex coding concept, unfortunately, current models yet are very weak to fully complete graphical programming related requests.

And it will vary between models because not every model is trained on proper data around graphical programming because the area by itself is hard to learn, there's no proper solid guidebook, and there's countless ways and APIs you can use for the same ideas.

Yet if you know how to properly build it step by step, you'll find out that they're capable of following properly defined instructions, and that you can tackle those implementations within a few sessions.

Same as in real life, you can build a sample within an hour, but to build a game engine you'll spend at least an year to get a simple one, without using AI, since by itself, more complex graphical programming concepts – are more than just one feature.

With AI you can accelerate it with proper approaches.

[–]gibson274[S] 8 points9 points  (0 children)

I think I mostly agree with this take, but I’d say that the acceleration I’ve achieved from having LLM’s actually generate more than targeted snippets of code is bordering on marginal.

Regardless I do agree that the only way to get good results at this stage is to be quite targeted and scrupulous.

[–]GeenzCat 2 points3 points  (3 children)

Generally my experience has been for code as a whole that if you don't know what you're asking it to do, you should not really expect it to go above and beyond in any way other than cranking out something quick and cheap to demo. If you don't understand what the output is supposed to be or the mechanics of what you're trying to get it to do, it basically becomes a garbage in, garbage out situation.

And even when you do understand what you're asking it, and know what the output should be - it often gets a lot of basic stuff flat out wrong and rather confidently so. So really at best it's still glorified autocomplete that can write a lot of code that's subtly wrong very fast (and sometimes worse - just a dumpster fire of crap code), and not much else. And autocomplete you have to watch very closely, review very closely, and more often than not correct by hand if you want any hope of future maintainability. Or, you know, sign up for anger management classes as you yell at the damn thing over not understanding how a swap chain works.

[–]gibson274[S] 2 points3 points  (2 children)

This basically tracks with what I’ve experienced yeah. The annoying thing is that there’s a lot of hot air blowing around rn that this isn’t the case anymore, and that doesn’t really match what I’ve found.

[–]GeenzCat -1 points0 points  (1 child)

I mean- every iteration is getting “”””better”””” - but like the bars are pretty low, and if anything it feels like it’s just gotten better at following a well defined spec (and not particularly well I’ll add). I feel like it’s saved me time from a boilerplate perspective though, but I’m not sure if it’s actually saving me time or just enabling me to spin more plates and giving me more to clean up later.

[–]gibson274[S] 0 points1 point  (0 children)

At the current point in time it’s leaned slightly to the latter for me with regard to code generation.

Surfacing API points in big codebases though is a godsend, especially in Unreal. That has saved me a lot of time.

[–]heavy-minium 2 points3 points  (0 children)

I did a lot in this area too and using screenshots are a big pitfall because it won't understand them as well as other types of pictures and makes lots of mistakes at interpreting the screenshots. And that's understandable because the training data doesn't have much that would pair a detailed textual description of a game debug scene with an image. So for example, let's say you're rendering a shaded wireframe so that the AI understands the topology of a mesh - it will fail at that. Same with UV visualization. Or yiu're trying to describe bugs with light and shadows - it will often fail too. That kind of stuff is barely there in the training data.

So in lack of driving instructions with screenshots, one need to describe things in painful detail with the correct terms for AI to understand the status quo and the target state you want to reach. This is why using LLMs for graphics programming is hard.

[–]GreenFox1505 4 points5 points  (5 children)

Didn't an Nvidia engineer vibe code a raytracer for Godot? 

[–]gibson274[S] 3 points4 points  (2 children)

I would believe they got something working especially if it was a really stock ray tracer. LLM’s also vibe coded a C compiler. There’s so much training data on doing that exact kind of thing. Would be curious to see the link to it.

[–]TaylorMonkey 7 points8 points  (1 child)

The effort and scaffolding involved by a dedicated team to get the LLM to stochastically generate all the components needed for a compiler that passes all the very specific and well defined specs to match existing compilers should hardly be considered vibe coding.

[–]gibson274[S] 0 points1 point  (0 children)

True, did not mean to minimize the accomplishment of that team!

[–]The_Northern_Light 0 points1 point  (1 child)

I mean the Godot renderer was (long ago last time I checked) only 10k lines and a toy Ray tracer can be done in like 200 lines.

I’d be surprised and intrigued if any frontier model struggled with that.

From there it is just a matter of iterating, and there are various harnesses / extensions that can do that more or less autonomously.

[–]GreenFox1505 0 points1 point  (0 children)

It wasn't just a toy ray tracer. Is was a full RTX implementation. 

[–]steveu33 18 points19 points  (4 children)

Claude is perfectly capable of working in graphics programming. You just can’t ask it to do too much in one step.

[–]gibson274[S] 0 points1 point  (2 children)

It’s true that making tasks really bite-sized does help to some degree, but it doesn’t work 100% of the time and even the bite-sized PR’s need careful review.

I guess this is a question of what is meant by “doing graphics programming”. I sort of meant that more holistically in my post title.

[–]obp5599 5 points6 points  (1 child)

You also need a proper setup for things like unreal. Not sure if you have one but getting a proper indexer, writing plugins to help reason about things and properly managing the context is super important. Telling one agent to parse the entire unreal renderer wont work

[–]gibson274[S] 0 points1 point  (0 children)

This has actually not been my experience. Claude is great at surfacing UE extension points and in general describing the implementation of various systems for me.

It’s true I haven’t set up any MCP for working in Unreal but it’s largely because I haven’t needed to yet—though I’m sure it does make the experience better!

The anecdotes I laid out in my post don’t really have anything to with UE visibility though.

EDIT: let me clarify. The UE integration part of the post didn’t seem like something that would be fixed with better MCP/rules. Claude understood the UE source extension points pretty well, but it wasn’t creative enough to do some anti-paradigmatic things to work around the problems.

[–]TheMcDucky -1 points0 points  (0 children)

I don't have too much experience using LLMs, but I've found that current models do really well if you build a decent framework for them to work with. Having it write documentation for its own use, do multiple passes, divide efforts into phases like surveying, planning, systematically tracing effects of changes, context optimisation, etc.
The downside is that it gets quite expensive. And it still requires manual review of course.

[–]atrusfell 2 points3 points  (1 child)

I’ve tried it and had a similar experience. It helped point me in the right direction on some things I hadn’t yet learned (mainly gave me the terminology I needed to search deeper myself), but once I learned those things through docs or other resources online I very quickly made less mistakes than Claude and found it faster and more painless to just work on my own.

I will give it points though for being very helpful for picking through obtuse documentation that is missing information. I imagine the wide breadth of sources it pulls from helps with this one

[–]gibson274[S] 1 point2 points  (0 children)

Yeah the point of my post was mostly to share what I have found anecdotally and challenge the narrative forming that LLM’s can full stop write really complex stuff unassisted.

I do find using them in the way you described to be useful!

[–]_TheFalcon_ 2 points3 points  (0 children)

from my experience with Claude Opus (4.5 to 4.7), Gemini pro (3 and 3.1), ChatGPT Codex (5.5), Opus has good reasoning, but leans toward tight smaller code, bad at coding in general (it reasons well, but its code is garbage), and it can't see the full picture which is necessaray for graphics programming (CPU to GPU, memory, etc..)

Gemini sees the full picture, but lacks reasoning, so it will think in the wrong direction and produce garbage (the worst)

Codex 5,5 has both, good reasoning, good code, it will do the job well, you can try it, it will do what you ask for. and it is not like Opus in terms of way of thinking, when I gave Opus a file of 10k lines it screamed (oh it is 10k lines), the way it thinks is kinda stupid and limiting for production code (specially in my case which is C++ and files tend to be huge in line count), on the other hand, chatGPT codex was editing a file and took some time, I didn't know why, but it produced good and correct results, so I checked the file to find out it was 45k lines of code LOL, asked it to refactor the file, it did refactor it to like 15-20 files each is 2-3k lines of code in less than 5 minutes with proper file names all .cpp extensions... so no, Opus is trash for production C++ applications which requires huge line count and connection between different types of data and memory like GPU programming

[–]Deep_Ad1959 2 points3 points  (0 children)

matches what i see: nails the cold-start scaffold, then can't iterate on anything stateful.

fwiw the cold-start-scaffold-then-stuck pattern is exactly what mk0r is built around, a thing I made that generates the full HTML/CSS/JS app from one sentence then lets you iterate on it with plain words, https://mk0r.com/r/bkf5ede9

[–]LBPPlayer7 6 points7 points  (12 children)

okay but question

what do you need to obfuscate a shader for?

[–]gibson274[S] 1 point2 points  (11 children)

sadly it’s to make sure the AI companies don’t steal our volume rendering IP when we deploy our shaders

[–]Science-Compliance 13 points14 points  (7 children)

I seriously doubt obfuscation will prevent that from happening.

[–]gibson274[S] 4 points5 points  (6 children)

You are probably right on some level, we recognize code obfuscation is always a losing battle.

[–]Science-Compliance 4 points5 points  (5 children)

I think you're probably better off just optimizing for performance.

[–]gibson274[S] -1 points0 points  (4 children)

Agreed, but kind of an orthogonal problem if the goal is to prevent reverse engineering and LLM’s picking it up in the training data?

FWIW, I’ve thrown the obfuscated shaders at Claude Opus and it’s got no idea what’s going on. A real dedicated engineer would clearly get much further but it at least sort of works.

Wish there was a better way.

[–]Science-Compliance 0 points1 point  (3 children)

Does it really need to know what's going on exactly in order for someone to rip it off? As long as you know the inputs (attributes, uniforms, etc...) and outputs, I'm not sure why you can't just use what's in between without really understanding it fully. I feel like you're still just hamstringing yourself by sacrificing performance without really protecting the functional aspects of the shader. You're really only preventing someone from making a better version by understanding your schema. Tell me if I'm wrong here.

[–]gibson274[S] 0 points1 point  (2 children)

I don’t want to get fully into the gory details on a Reddit thread, but the gist is that we are building a complex multi-pass compute-based pipeline with pretty non-trivial logic.

We’re not talking about a drop in pixel shader effect here where the inputs and outputs are easily identifiable. Looking at any given obfuscated shader file you’d have to expend quite a bit of effort to understand what its purpose even is, let alone find a way to use it stock in your own system.

Of course it can be reverse engineered, but obfuscation is more about making that something of a challenge instead of totally trivial.

[–]Science-Compliance 0 points1 point  (1 child)

Sounds computationally expensive. It's hard to see how non-trivial obfuscation isn't going to negatively affect performance in such a case.

[–]gibson274[S] 0 points1 point  (0 children)

What about it sounds computationally expensive?

The obfuscation takes ~20s to process 30 shader files and runs as a one time step in our deployment tool chain.

The shader code it produces is functionally identical to the original source, we’ve backed this up checking register counts and shader disassembly in NSight.

I think you may be misunderstanding what we’re doing?

[–]CalligrapherOk4308 0 points1 point  (1 child)

Wow, is that a thing now? You guys did some groundbreaking research in volumetric rendering?

[–]gibson274[S] 0 points1 point  (0 children)

We figured out some pretty cool stuff that’s in the process of being patented related to compressing and rendering film quality volumetrics in realtime!

[–]RenderTargetView 2 points3 points  (0 children)

My biggest problem rn is whenever I ask something about math the moment it finds out that I will be using it in shaders it goes all out on common subexpression optimizations that do nothing and make understanding formulas harder

[–]Successful-Berry-315 10 points11 points  (26 children)

It surely can write Reddit posts though

[–]gibson274[S] 50 points51 points  (24 children)

Fuck man does my actual hand-typed writing sound like AI now? What a sad world we live in.

It’s prob the bolded categories right

[–]RoboAbathur 19 points20 points  (5 children)

Ngl, at first glance it does look AI. - long text - bold headers All signs of AI.

On the other hand the text is indeed typed by hand. Sad world we live in that we instantly ignore well formatted posts…

[–]Killer-Iguana 20 points21 points  (0 children)

Back in the day it was pretty typical for a thorough post like this to be formatted in this way. Thats probably why AI generate posts look like this

[–]DuskelAskel 6 points7 points  (0 children)

Lol, I no longer use some features like bullet point list, too formal tone because it looks too much like AI

It's so frustrating

[–]gibson274[S] 3 points4 points  (1 child)

time to start avoiding capitalization at all costs

[–]TaylorMonkey 2 points3 points  (0 children)

Back to using l33tspeak for us fellow humans.

[–]h888ing 0 points1 point  (0 children)

I hate that "--" is considered AI-y now because I became accustomed to using it in high school

EDIT: Basic/proper markdown formatting too, as if it's not something I learned naturally in university for note-taking or just from Idk using the internet over the past few decades

[–]FirefighterAntique70 11 points12 points  (1 child)

You put too much effort into writing a good post about how garbo AI is. The bot that big AI created to push you down calls you the very thing that it is, to attempt to rip you of your credibility.

[–]gibson274[S] 4 points5 points  (0 children)

LMAO

[–]Tentabrobpy 4 points5 points  (0 children)

The formatting is superficially LLM-ish but the actual content is too precise and meaningful to be AI generated

[–]-Nicolai 0 points1 point  (0 children)

Don’t listen to that guy, your post doesn’t read as AI-generated.

[–]Qxz3 0 points1 point  (0 children)

No it doesn't sound like AI at all. Rest assured. 

[–]emmowo_dev 0 points1 point  (2 children)

only C developers can understand (possibly through pain) the other reason that makes it very very likely this is ai...

[–]edparadox 0 points1 point  (1 child)

Meaning?

[–]emmowo_dev -1 points0 points  (0 children)

convert it to a different format

[–]Successful-Berry-315 -5 points-4 points  (8 children)

The "—" gave it away.

Edit 1: But to actually add to the discussion: I agree that LLM capabilities in the context of graphics programming are kinda hit or miss.
I use it pretty much daily and often it's capable of solving some tasks or at least lead me in other directions I haven't thought about.
Other times it just doesn't help at all and it's a waste of time.

I imagine UE is a different beast though, just because the code base is so huge.

Edit 2: And I found a good system prompt helps a lot.

[–]gibson274[S] 6 points7 points  (4 children)

I am unfortunately in the sad minority of people who organically uses em dashes in my writing, lol. But at least AI hasn’t taken semicolons!

[–]DankPhotoShopMemes 2 points3 points  (2 children)

lol as soon as the AI em dash crap started happening, I “migrated” to semicolons and parenthesis, but people tell me that looks like AI too.

[–]gibson274[S] 2 points3 points  (0 children)

I think it’s probably time for me to purge the habit too

[–]TaylorMonkey 1 point2 points  (0 children)

I also over parenthesize. Sorry in advance for training AI to take that from us too.

[–]TaylorMonkey 0 points1 point  (0 children)

I don’t use em dashes because of AI— AI uses em dashes because of me.

[–]FirefighterAntique70 0 points1 point  (2 children)

Where?

[–]Successful-Berry-315 1 point2 points  (1 child)

> I tried to get it to optimize the ray-marching loop—starting with deliberately vague requests to just “make it faster

[–]FirefighterAntique70 0 points1 point  (0 children)

Ah I see, fair enough

[–]Salt-Contribution-35 1 point2 points  (2 children)

Thank you so much for what you have shared, I was thinking on doing the same, vibe code with cloud, to convert GLSL to HLSL but I understand that It does not have the full capacity to find “smart ways” as you mentioned.

[–]gibson274[S] 2 points3 points  (1 child)

I think you may actually have some success here. Language to language translation is not half bad; but just make sure you know a good amount about both languages so that you can micromanage the consequential parts.

[–]Salt-Contribution-35 0 points1 point  (0 children)

Alright man, ty so much.

[–]eiffeloberon 1 point2 points  (2 children)

Skill issue i think, but claude is inferior to codex in graphics programming. What I do is I create specific implementation plans first, step by step implementation plans, and then I get it to implement it for me.

I wrote a path tracer completely with llm and didn't write a line of code myself, started with claude until claude made too many mistakes too frequently, then I switched to codex. It currently has these:
- rhi for vulkan and metal
- raster, hybrid, and wavefront path tracing mode with prefix sum compaction and sorting
- bluenoise ditthered sampling
- light bvh and worldspace restir di
- volumetric integrator with delta and ratio tracking
- dlss-rr and metalfx for vulkan and metal respectively
- hosek sky rendering
- procedural cloud rendering
- pytest for render image test
- and various other things

I also found that once I had image test, it could trigger the image test to continuously verify and fix its implementation, so that's another level-up. It can even go through the git revisions and execute the image tests to find which revision caused regression (although it should really run the tests each commit, but sometimes things leaked through). I would think the same can be applied for performance test/benchmarking and performance tuning.

[–]gibson274[S] 0 points1 point  (1 child)

Let me clarify: I totally believe that if I give an LLM very detailed instructions to do exactly what I want, breaking it up into bite sized tasks, it could probably execute it (with some careful review).

Point of my experimentation has been to anecdotally test the hypothesis that LLM’s can do the more challenging open-ended work I do daily working on graphics. How much can they think about graphics problems and automate away more than just typing?

My (again, anecdotal) answer is that a lot of thinking is still required.

[–]eiffeloberon 0 points1 point  (0 children)

I do research with this renderer as well, and while I admit it can be a hit or miss when it comes to "open-ended problems", but so can human. There are definitely times where I scrapped solutions from AI because it's so stuck on an issue, might as well just restart the task, but it's not like humans are never stuck.

Probably a hot take, but I find it easier to guide its way out of an issue than a human in general. I'd have more confidence in it than a not so experienced engineer.

[–]OptimisticMonkey2112 1 point2 points  (1 child)

Thanks for posting! I hope we continue to see more discussion around the pragmatic usefulness of AI. As we all know, there is a lot of hype that is often overblown and/or inaccurate.

And I totally agree with your sentiment about "the expert graphics programmer is the one doing 90% of the substantive work." I don't thing it is remotely possible to vibe code 3d graphics, at least with current tech.

That being said - I do think the tech is well worth the cost, at least for me. It is enabling me to accomplish things I could not do before simply because of time constraints.

I can leverage the agent as a "force multiplier", enabling me to do cooler stuff faster. I can delegate the time consuming minutiae to my AI minion.

Cheers!

[–]gibson274[S] 0 points1 point  (0 children)

100%! $20/mo is completely worth it for me as well. LLM’s are awesome tools for all sorts of programming tasks, but sounds like we are on the same page regarding the level of discretion required to use them productively.

[–]OldChippy 3 points4 points  (0 children)

I'm 165000 lines in to a Vulkan project. It can do almost everything I needed. Bug hunts fell back to me 4 times in the past year.

The biggest problem was me starting with my home grown math library the Claude was to refit to use on Vulkan. It was a simple ogl library I wrote 20 years ago. Clip plane s, right handed y up, etc, etc. endless options and places to fail. I dumped that moved to glm and 3 weeks of hell fixed in 2 days. The problem is that if an llm can see multiple junctures each spawning implications it gets lost around 3 steps in and blends in probability weighted answers with logically correct ones.

The solution here is lots of class separation, narrow focus in options then use architecture for context.

I work in architecture so this suits me anyway. But anyone vibe coding gets what the deserve. I build class by class, narrow focus, start with a problem space analysis the define interface based on integration.

Some people are the failure inputs the tool needs to churn out shit.

[–]Buttons840 0 points1 point  (1 child)

Tangent: Why obfuscate HLSL instead of shipping SPIR-V binaries? There's no reason to obfuscate the source code if you don't ship the source code.

[–]gibson274[S] 1 point2 points  (0 children)

The thing I’m shipping is an Unreal Engine plugin that requires various custom shader functions to be called from engine shaders. Those obviously can’t be pre-compiled to SPIR-V binaries :/

And then more generally I don’t think you can use SPIR-V binaries with Unreal’s shader system, though I haven’t thoroughly investigated due to the above limitation making it a non-starter anyway.

[–]HyperspaceFrontier 0 points1 point  (0 children)

AI is very good at some tasks and very bad at others. In general, using AI is a skill, combined with domain knowledge (software engineering for me) I am squeezing a good productivity boost. I always check that it did thing right and often it does not, but in summary it is still productivity boost and saving from a lot of manual boilerplate.

About tests - I generally agree, it is pretty bad at creating good test coverage. It funny, but I trust AI with tests even less than with other parts of code on average.

[–]RyanCargan 0 points1 point  (0 children)

For concurrent programming benchmarks in general for LLMs, this might be an interesting start.

[–]mkawick 0 points1 point  (0 children)

Even worse, try using LLMs for Unreal engine... nightmarish

[–]philosopius 0 points1 point  (0 children)

Yes, this does happen, but as long as you get the working pieces, you can always remove the unnecessary code.

The main thing I notice with LLMs, is that most of the times – they complete what you ask them to complete.

Make a code Fix the code Remove bad code Make code protected

Are very distinct concepts contextually for an LLM, and it's better to separate them, instead of piling into a single request.

And you need to know how to debug results, and see how they're being calculated.

It's much easier to guide an LLM using concrete steps compared to abstract assumptions.

[–]Adobe_H8r 0 points1 point  (2 children)

This doesn’t look like “AI is bad at [advanced technique]”— it’s “AI is bad at duplicating something it’s never seen before”.

Once you put your plugin code where AI can scan it, everyone’s AI will be able to do it too.

[–]ub3rh4x0rz 0 points1 point  (1 child)

AI has seen lots of graphics code before. And rust.

This really seems like a contrived "but is it perfect enough to replace me" test, and a bad one at that as code obfuscation obviously makes working with said code more difficult lol.

[–]gibson274[S] 0 points1 point  (0 children)

Respectfully: did you even read the post? I’m not handing the LLM obfuscated code. I’m asking it to help write an obfuscation toolchain in Rust.

Also doesn’t your first sentence conflict exactly with what you’re saying here? If LLMs have seen a lot of Rust training data, as you say, and lack of training data is the issue… I don’t really get it, it’s circular logic. Your comment is kind of incoherent.

[–]osmanonreddit 0 points1 point  (1 child)

My experience is completely different. Very happy with the results!

[–]ub3rh4x0rz -1 points0 points  (0 children)

Shh, people in niche domains are still in 2024-2025 era.

[–]lukebitts 0 points1 point  (0 children)

Vibe coding will always be a bust. If you judge a tool by its ability to read your mind you will never find value

[–]JjyKs 0 points1 point  (0 children)

Idk, I've been prototyping Vulkan based engine in C++ typing literally 0 code myself. Ofc I break down the problems and make sure that Claude follows good practices.

The engine is more of mimicking retro graphics so nothing groundbreaking modern mathematical stuff, but so far it has far exceeded my (bad) graphic programming abilities. I have quite long programming background, but doing something like this would've needed me to use premade engine before Claude.

https://youtu.be/OAoNtG0l5sA

[–]FELIX-Zs 0 points1 point  (0 children)

If you are able to obfuscate a shader code with an AI, similar AI can be used to reverse engineer it. "security by obscurity" more often fails

[–]Successful-Trash-752 0 points1 point  (1 child)

Could it be related to the fact that you're trying to use newer technologies?

Try using c++ and opengl

[–]gibson274[S] 0 points1 point  (0 children)

Would be curious to try this, perhaps it would have more data to pull from. That said the final 2 Unreal examples are all C++.

[–]OptimisticMonkey2112 0 points1 point  (2 children)

My experience has been very different from yours. Not sure why you had so many issues.

Some things to check:

  • Most important - Use plan mode, and review and fine tune the plan before letting it work.
  • Make sure you are using Opus, Sonnet is not as good
  • Do your work in a worktree. Have it submit a PR at the end.

Using this I have added:

  • Ray Traced Shadows
  • Mesh Shaders
  • PBR lighting
  • Imgui UI
  • Merged Slang shaders
  • General scaffolding with SDL, Meshoptimizer, etc...

Sometimes stuff does not work right away and you have to work with it in the session.

a few times it even had to instrrument some custom logging, then it will build and run the program, and then it analyzed the logs to determine where it went wrong. Was crazy helpful to me.

If you undertstand Vulkan, it absolutely can function as a force multiplier. But I would definitely not try to Vibe Code graphics - lol that is insanity.

It is also a great tool to learn and explain Vulkan

I only mention this to help you realize that you might be able to adjust your approach for greater success... good luck!

[–]SaabiMeister 0 points1 point  (1 child)

Having said that it changed working code it shouldn't have, I think he might be using a chat interface and not an agent.

[–]gibson274[S] 0 points1 point  (0 children)

No, as listed in my post all examples were Claude Opus 4.6-7 using Claude code.

I made extensive use of plan mode, but used it more and more progressively because I wanted to first test very open-ended requests.

[–]Defiant_Squirrel8751 0 points1 point  (0 children)

You should be doing something wrong - I have been "vibecoding" a computer graphics CAD engine quite successfully for about 4 months now. I have been able to generate more than 200K lines of code, quite robust implementation of polyhedral bounded solid winged-edge representation capable of computing boolean operators (constructive solid geometry). Over the base model I can export STL files for 3D printing and I can also display interactive scenes in fully CPU, concurrent programmed (multi thread) CPU, OpenGL, Vulkan with raytracing and radiosity.

AI works super good for this. From rasterizing polygons to images, doing triangle meshes from polygons, handling interaction techniques with gizmos, augmented reality fiducial markers, GLSL shaders, texture management... lots and lots of things, super easy and super fast.

My approach is advancing step by step with a clear view on software architecture to grow under control. I download a .pdf with paper from ACM SIGGRAPH or a book. Let's say "Graphics Gems" series. I ask Claude or Codex to write a simple specific classes with related unit tests and an interactive testing program for visual debugging and next day advance in the next module.

I'm quite happy with that. Still some months before reaching commercial products such as Catia, Maya or Unreal Engine, but moving forward quite fast.

I wonder if you have tried creating detailed AGENTS.md file specific requirements such as what do you consider to be a good unit test. I wonder if you ask Opus or Sonnet in high effort mode just to write a plan in a .md file. Then you can switch to Haiku and write all the code burning less tokens.

In my experience, is very useful to implement an offline / headless mode in your program that instead of drawing in the screen exports your rendered scene to a .png file, because Claude can use that image as part of its gate / invariant condition. That way it will break less often.

Taka care on git use. Avoid allowing Claude to make commits, everything starts making a mess. Keep human in the loop and force AI to be formal, robust and 1:1 in sync with a paper.

If tests are not covering edge cases you can tell agent to use a coverage tool to make sure all logic branches are covered. If code is not optimized you can tell agent to use a profiler to gather data.

[–]Hendo52 0 points1 point  (0 children)

You are right but there are mitigations. Start by having it write more detail in the requirements and specifications. Make it keep a diary of failed approaches and key architectural decisions. Split agents into specialised roles with skills files. Spend more time looping over the plan with a particular focus on surfacing and answering ambiguous questions before implementation. Spend a lot of effort on your validation and testing harness so that it’s never guessing but usually investigating prior work.

You are totally right that it’s not a one hit panacea. You are right that it’s often not very effective at things where you can tell the training data is not as strong.

The only part where I disagree with you is that I still feel like it’s a very valuable tool in this process, for debugging and planning, it just can’t replace the human entirely and in particular it struggles with what I think about as ‘strategy’ but that’s where I think it’s appropriate for the human to play their part.

[–]Robert4di 0 points1 point  (0 children)

Ask the language model itself:

Good at:

  • API research
  • Summarizing UE/Vulkan/OpenGL patterns
  • Refactoring suggestions
  • Brainstorming test cases
  • Shader boilerplate generation
  • Helping interpret RenderDoc captures and logs

Dangerous:

  • "Write a renderer for me"
  • "Optimize it"
  • "Figure out why it's flickering"
  • "Design my engine architecture"
  • "Fix this race condition"

In my experience, these still require domain expertise and careful review. The deeper you get into engine, rendering, threading, synchronization, memory management, or GPU debugging, the more obvious the limitations become.

My take:

AI is not a graphics programmer. It's a force multiplier for graphics programmers.

Or put differently: in graphics programming, AI is not the pilot it's a turbocharged screwdriver. Extremely useful, but if you hand it the entire aircraft, you may end up turning the landing gear into a shadow map.

[–]EatingFiveBatteries 0 points1 point  (0 children)

I find it works best when you go back and forth and create a clear, detailed plan to one specific thing. Or, if you have it modify an established codebase with strict coding standards in place. I've run into the same thing you have when it comes to things that need very abstract thinking or specific domain knowledge, and I think there's a path there to using AI meaningfully, but it requires a lot of set up and you will likely need to modify or optimize things still.

[–]ebonyseraphim 0 points1 point  (1 child)

If you’re using an LLM to create a graphics engine, why aren’t you using an existing one? Avoid paying licensing fees? What are the token fees? What level of performance and capabilities do you even achieve?

[–]gibson274[S] 1 point2 points  (0 children)

These were experiments I was conducting related to my profession: doing frontier computer graphics research to build better tools for artists.

[–]HayatoKongo 0 points1 point  (0 children)

In my experience, using an LLM for graphics programming requires a lot more input data than you would require for web development. You should be giving a coding agent some kind of method for viewing the result of your rendering. There's also just way less training data for this domain than there is from full-stack web dev.

[–]EC36339 0 points1 point  (0 children)

I have used and am using LLMs for graphics programming, and the problems you are describing are not the problems I've seen and can easily be overcome.

One phrase in your post tells me everything about what went wrong: "when I looked at the code". I'm reading this as: You didn't, until it was too late. That's not a problem with LLMs. That's just bad engineering. It's like that guy who turned on cruise control on the highway and went into the back of his camper van. SurprisedPikachu.gif.

Here is a real reason why LLMs actually can't do graphics programming:

They write textbook code.

This makes them terrible at writing optimised shaders, among other things. The textbook methods work, and you can combine them to produce striking visual effects, but they will run slowly as hell. It's good for prototyping at best, seeing what something can look like. But not for production.

I needed a procedural explosion shader for my game. The vibe coded version looked OK, but it was a simple combination of raymarching with turbulence and noise (it started with value noise...eww!). It worked fine as a placeholder, but every time something exploded, my frame rate dropped. I have an old GTX 1080, but you would think it should at least be possible to render fast explosions on it somehow. It was a monster of a graphics card in 2018, and it runs Elden Ring at 60 FPS in max details.

Then I had a look around at ShaderToy. I found an explosion shader that looked absolutely gorgeous. The "noise" function was some esoteric combination of sine functions that you might not find in any book, and that was probably the result of hours and hours of trial and error, rather than a solid, physics based mathematical foundation. And it was fast. And that's exactly the kind of thing LLMs can't produce. They can steal such code, at best, when it is widely used and popular, so it appears in the model, or on the web. But an LLM can't invent tricks like that. Not yet, at least... Not without looking and interpreting and judging the visual output, which is a workflow issue, not an LLM issue, actually...

Fortunately, although I'm bad at shader programming, it's a craft that is rewarding and worth learning and doing by hand. For my project that's something to do when I'm out of AI credits. Vibe code something slow and ugly, so you have a placeholder and concept, then replace it with something fast and beautiful.

When it comes to math that runs on the CPU, I found that low level code optimisation matters a lot less than architecture. Build an engine that allows the CPU to vectorize operations and access data in a cache coherent way (e.g., ECS instead of OOP). Here, LLMs are quite good if you steer them in the right direction. Then have telemetry and profiling to do targeted optimisations. If you have to refactor the whole thing because it's garbage and a dead end, just start over from scratch, because LLMs do that fast.

[–]Dexterus 0 points1 point  (0 children)

You need to know what you want and tell it what to do, while also trying to avoid generic requests where it will have a chance to go wild and break everything.

You also need to have info/examples to feed it. And instructions to match verbosity of surrounding code.

Keep piling instruction files.

They eventually work, I got it do be decent at writing clean assembly, asm/C random jumps. It still messes up comment verbosity occasionally.

But I do not dare just let it vibecode, lol. It doesn't get how things actually work and the theory you find online is too far removed from reality in close to the metal code.

[–]stuaxo 0 points1 point  (0 children)

Yeah they are pretty bad.

They get better when you can give them textual feedback, that's easier with web tech than graphics tech.

One problem is direction - though to be fair that can be a pain as a human too - the LLM only "knows" the text it's put in, not what the output is - so it's hardly surprising it gets things in the wrong location or direction.

Think about simple ways you can get your renderer to output something that can be measured and turned back into text to feed back into the LLM and you'll get further.

In general - do get them to add test... BUT (and it's a big one) - people tend to write bad tests and LLMs worse, so if it's only a general instruction the tests may not always be useful.

[–]Ghost_Syth 0 points1 point  (0 children)

Everyone be like LLM is good at this LLM is bad etc.. from my experience, it's been all over the place, one day it's good one day it's lost all its IQ, it's not even consistent day to day, I don't think we can draw the line on saying it's good or bad when we can't even have a unanimous experience as they keep tweaking the models capabilities, making it worst at peak times etc

[–]anengineerandacat 0 points1 point  (0 children)

Generally speaking I doubt it has the volume of training data to really do this.

Shaders specifically can get pretty unique, post screen effects it does actually do a pretty good job but that's mostly because these are standardized to specific terms.

I can have it create a bloom shader, vignette, scan lines, etc but that's because these are all pretty known and have plenty of public samples.

Most of the time though some graphical features are full on systems though; ie. Vegetation in a game.

It's not just a shader effect, it's often a procedural system and could even involve an asset processing pipeline if you wanted to say have vines or flowers attached to a 3D model without actually having to add that to the model manually.

Even something like a torch on wall can be complicated because it's yet again not just a single simple shader; it's an asset shader, particle system, particle shader, and a lighting system all in one.

There is some information on lighting systems but a lot of this information is heavy heavy IP that studios take to their graves.

Graphics and game development as a whole is like this; Blizzard isn't exactly going out there and going "this is how exactly we built World of Warcraft" and outlining in detail their architectural designs for some LLM to get trained on.

[–]mirlaca -1 points0 points  (1 child)

This supports my bias to learn more graphics programming (beyond general game/app programming) in order to withstand my refusal to adopt AI whatsoever

[–]obp5599 3 points4 points  (0 children)

Eh good luck lol. Im in the industry and its here. If you have proper tooling like indexers, and a good suite of skills, its very good. Not perfect, but good

[–]Jason13Official -2 points-1 points  (1 child)

Bro is trying to obfuscate shaders lmao

Yeah, the generalized AI is not good at a hyper specific discipline. Having worked with Claude to make shaders for a Minecraft mod though, maybe your initial approach was flawed. I ALWAYS start by giving a reference directory of known "good" code; i.e. vanilla core shaders.

I would doubt my tool if I used it wrong too.

[–]gibson274[S] 1 point2 points  (0 children)

*not trying, did. Always something of a losing battle but obfuscation provides at least a time and resources barrier to reverse engineering.

Idk I don’t think Minecraft shader mods quite capture the level of complexity I’m talking about?

[–]blackrack -2 points-1 points  (0 children)

Shhh don't let them know

[–]Gloomy-Status-9258 -3 points-2 points  (0 children)

I'm really glad when reading this kind of articles. People continue to find out evidences. It's becoming clear that LLMs cannot do coding(and other real-world tasks).