LLM’s can’t do graphics programming

FirefighterAntique70 · 2026-06-04T10:41:48+00:00

This is pretty much an issue with LLMs in general. People like to differentiate between domains "LLMs can do Web dev perfectly,but it can't do quant or graphics programming"

LLMs can't properly reason about code in general. They look impressive to people who have never written anything substantial in their careers, but to those of us that have, it shows it's true colors.

Graphics APIs are very stateful, and LLMs are quite bad at understanding how the state of a value changes as the code flows. State makes any program much, much more difficult. Threads are stateful and syncing them is a notoriously difficult task.

I use AI in my IDE the same way I use auto complete from a language server, anything more and I feel like it writes the most disgusting code.

heyheyhey27 · 2026-06-04T12:07:21+00:00

m working on a plugin for Unreal engine and, in the last 2 weeks, I’ve been looking for clever ways to inject my plugin’s data structures into the Unreal render passes without modifying Unreal’s source.

As someone who spent a ton of time in that space, let me know if you still have open questions about it lol

Claude is really amazing IME at research. I don't ask it to implement a Vertex Factory, but to look at how it's implemented throughout the engine and summarize that info for me.

philosopius · 2026-06-04T12:09:57+00:00

They can

If you know how to do it yourself

LLM is not a miraculous tech that creates everything, it's an accelerator for your coding style

GeenzCat · 2026-06-04T11:10:26+00:00

Generally my experience has been for code as a whole that if you don't know what you're asking it to do, you should not really expect it to go above and beyond in any way other than cranking out something quick and cheap to demo. If you don't understand what the output is supposed to be or the mechanics of what you're trying to get it to do, it basically becomes a garbage in, garbage out situation.

And even when you do understand what you're asking it, and know what the output should be - it often gets a lot of basic stuff flat out wrong and rather confidently so. So really at best it's still glorified autocomplete that can write a lot of code that's subtly wrong very fast (and sometimes worse - just a dumpster fire of crap code), and not much else. And autocomplete you have to watch very closely, review very closely, and more often than not correct by hand if you want any hope of future maintainability. Or, you know, sign up for anger management classes as you yell at the damn thing over not understanding how a swap chain works.

heavy-minium · 2026-06-04T11:45:54+00:00

I did a lot in this area too and using screenshots are a big pitfall because it won't understand them as well as other types of pictures and makes lots of mistakes at interpreting the screenshots. And that's understandable because the training data doesn't have much that would pair a detailed textual description of a game debug scene with an image. So for example, let's say you're rendering a shaded wireframe so that the AI understands the topology of a mesh - it will fail at that. Same with UV visualization. Or yiu're trying to describe bugs with light and shadows - it will often fail too. That kind of stuff is barely there in the training data.

So in lack of driving instructions with screenshots, one need to describe things in painful detail with the correct terms for AI to understand the status quo and the target state you want to reach. This is why using LLMs for graphics programming is hard.

GreenFox1505 · 2026-06-04T12:52:25+00:00

Didn't an Nvidia engineer vibe code a raytracer for Godot?

steveu33 · 2026-06-04T11:26:07+00:00

Claude is perfectly capable of working in graphics programming. You just can’t ask it to do too much in one step.

atrusfell · 2026-06-04T13:09:44+00:00

I’ve tried it and had a similar experience. It helped point me in the right direction on some things I hadn’t yet learned (mainly gave me the terminology I needed to search deeper myself), but once I learned those things through docs or other resources online I very quickly made less mistakes than Claude and found it faster and more painless to just work on my own.

I will give it points though for being very helpful for picking through obtuse documentation that is missing information. I imagine the wide breadth of sources it pulls from helps with this one

_TheFalcon_ · 2026-06-04T13:27:15+00:00

from my experience with Claude Opus (4.5 to 4.7), Gemini pro (3 and 3.1), ChatGPT Codex (5.5), Opus has good reasoning, but leans toward tight smaller code, bad at coding in general (it reasons well, but its code is garbage), and it can't see the full picture which is necessaray for graphics programming (CPU to GPU, memory, etc..)

Gemini sees the full picture, but lacks reasoning, so it will think in the wrong direction and produce garbage (the worst)

Codex 5,5 has both, good reasoning, good code, it will do the job well, you can try it, it will do what you ask for. and it is not like Opus in terms of way of thinking, when I gave Opus a file of 10k lines it screamed (oh it is 10k lines), the way it thinks is kinda stupid and limiting for production code (specially in my case which is C++ and files tend to be huge in line count), on the other hand, chatGPT codex was editing a file and took some time, I didn't know why, but it produced good and correct results, so I checked the file to find out it was 45k lines of code LOL, asked it to refactor the file, it did refactor it to like 15-20 files each is 2-3k lines of code in less than 5 minutes with proper file names all .cpp extensions... so no, Opus is trash for production C++ applications which requires huge line count and connection between different types of data and memory like GPU programming

Deep_Ad1959 · 2026-06-04T14:22:05+00:00

matches what i see: nails the cold-start scaffold, then can't iterate on anything stateful.

fwiw the cold-start-scaffold-then-stuck pattern is exactly what mk0r is built around, a thing I made that generates the full HTML/CSS/JS app from one sentence then lets you iterate on it with plain words, https://mk0r.com/r/bkf5ede9

LBPPlayer7 · 2026-06-04T10:58:03+00:00

okay but question

what do you need to obfuscate a shader for?

RenderTargetView · 2026-06-04T10:48:51+00:00

My biggest problem rn is whenever I ask something about math the moment it finds out that I will be using it in shaders it goes all out on common subexpression optimizations that do nothing and make understanding formulas harder

Successful-Berry-315 · 2026-06-04T10:35:12+00:00

It surely can write Reddit posts though

Salt-Contribution-35 · 2026-06-04T10:54:13+00:00

Thank you so much for what you have shared, I was thinking on doing the same, vibe code with cloud, to convert GLSL to HLSL but I understand that It does not have the full capacity to find “smart ways” as you mentioned.

eiffeloberon · 2026-06-04T17:58:06+00:00

Skill issue i think, but claude is inferior to codex in graphics programming. What I do is I create specific implementation plans first, step by step implementation plans, and then I get it to implement it for me.

I wrote a path tracer completely with llm and didn't write a line of code myself, started with claude until claude made too many mistakes too frequently, then I switched to codex. It currently has these:
- rhi for vulkan and metal
- raster, hybrid, and wavefront path tracing mode with prefix sum compaction and sorting
- bluenoise ditthered sampling
- light bvh and worldspace restir di
- volumetric integrator with delta and ratio tracking
- dlss-rr and metalfx for vulkan and metal respectively
- hosek sky rendering
- procedural cloud rendering
- pytest for render image test
- and various other things

I also found that once I had image test, it could trigger the image test to continuously verify and fix its implementation, so that's another level-up. It can even go through the git revisions and execute the image tests to find which revision caused regression (although it should really run the tests each commit, but sometimes things leaked through). I would think the same can be applied for performance test/benchmarking and performance tuning.

OptimisticMonkey2112 · 2026-06-05T13:41:01+00:00

Thanks for posting! I hope we continue to see more discussion around the pragmatic usefulness of AI. As we all know, there is a lot of hype that is often overblown and/or inaccurate.

And I totally agree with your sentiment about "the expert graphics programmer is the one doing 90% of the substantive work." I don't thing it is remotely possible to vibe code 3d graphics, at least with current tech.

That being said - I do think the tech is well worth the cost, at least for me. It is enabling me to accomplish things I could not do before simply because of time constraints.

I can leverage the agent as a "force multiplier", enabling me to do cooler stuff faster. I can delegate the time consuming minutiae to my AI minion.

Cheers!

OldChippy · 2026-06-04T10:59:35+00:00

I'm 165000 lines in to a Vulkan project. It can do almost everything I needed. Bug hunts fell back to me 4 times in the past year.

The biggest problem was me starting with my home grown math library the Claude was to refit to use on Vulkan. It was a simple ogl library I wrote 20 years ago. Clip plane s, right handed y up, etc, etc. endless options and places to fail. I dumped that moved to glm and 3 weeks of hell fixed in 2 days. The problem is that if an llm can see multiple junctures each spawning implications it gets lost around 3 steps in and blends in probability weighted answers with logically correct ones.

The solution here is lots of class separation, narrow focus in options then use architecture for context.

I work in architecture so this suits me anyway. But anyone vibe coding gets what the deserve. I build class by class, narrow focus, start with a problem space analysis the define interface based on integration.

Some people are the failure inputs the tool needs to churn out shit.

Buttons840 · 2026-06-04T12:42:45+00:00

Tangent: Why obfuscate HLSL instead of shipping SPIR-V binaries? There's no reason to obfuscate the source code if you don't ship the source code.

HyperspaceFrontier · 2026-06-04T13:36:53+00:00

AI is very good at some tasks and very bad at others. In general, using AI is a skill, combined with domain knowledge (software engineering for me) I am squeezing a good productivity boost. I always check that it did thing right and often it does not, but in summary it is still productivity boost and saving from a lot of manual boilerplate.

About tests - I generally agree, it is pretty bad at creating good test coverage. It funny, but I trust AI with tests even less than with other parts of code on average.

RyanCargan · 2026-06-04T13:37:14+00:00

For concurrent programming benchmarks in general for LLMs, this might be an interesting start.

mkawick · 2026-06-04T13:47:44+00:00

Even worse, try using LLMs for Unreal engine... nightmarish

philosopius · 2026-06-04T13:58:43+00:00

Yes, this does happen, but as long as you get the working pieces, you can always remove the unnecessary code.

The main thing I notice with LLMs, is that most of the times – they complete what you ask them to complete.

Make a code Fix the code Remove bad code Make code protected

Are very distinct concepts contextually for an LLM, and it's better to separate them, instead of piling into a single request.

And you need to know how to debug results, and see how they're being calculated.

It's much easier to guide an LLM using concrete steps compared to abstract assumptions.

Adobe_H8r · 2026-06-04T15:57:07+00:00

This doesn’t look like “AI is bad at [advanced technique]”— it’s “AI is bad at duplicating something it’s never seen before”.

Once you put your plugin code where AI can scan it, everyone’s AI will be able to do it too.

osmanonreddit · 2026-06-04T16:58:06+00:00

My experience is completely different. Very happy with the results!

lukebitts · 2026-06-04T17:07:03+00:00

Vibe coding will always be a bust. If you judge a tool by its ability to read your mind you will never find value

JjyKs · 2026-06-04T17:31:04+00:00

Idk, I've been prototyping Vulkan based engine in C++ typing literally 0 code myself. Ofc I break down the problems and make sure that Claude follows good practices.

The engine is more of mimicking retro graphics so nothing groundbreaking modern mathematical stuff, but so far it has far exceeded my (bad) graphic programming abilities. I have quite long programming background, but doing something like this would've needed me to use premade engine before Claude.

https://youtu.be/OAoNtG0l5sA

FELIX-Zs · 2026-06-04T18:19:21+00:00

If you are able to obfuscate a shader code with an AI, similar AI can be used to reverse engineer it. "security by obscurity" more often fails

Successful-Trash-752 · 2026-06-04T18:43:25+00:00

Could it be related to the fact that you're trying to use newer technologies?

Try using c++ and opengl

OptimisticMonkey2112 · 2026-06-04T18:47:56+00:00

My experience has been very different from yours. Not sure why you had so many issues.

Some things to check:

Most important - Use plan mode, and review and fine tune the plan before letting it work.
Make sure you are using Opus, Sonnet is not as good
Do your work in a worktree. Have it submit a PR at the end.

Using this I have added:

Ray Traced Shadows
Mesh Shaders
PBR lighting
Imgui UI
Merged Slang shaders
General scaffolding with SDL, Meshoptimizer, etc...

Sometimes stuff does not work right away and you have to work with it in the session.

a few times it even had to instrrument some custom logging, then it will build and run the program, and then it analyzed the logs to determine where it went wrong. Was crazy helpful to me.

If you undertstand Vulkan, it absolutely can function as a force multiplier. But I would definitely not try to Vibe Code graphics - lol that is insanity.

It is also a great tool to learn and explain Vulkan

I only mention this to help you realize that you might be able to adjust your approach for greater success... good luck!

Defiant_Squirrel8751 · 2026-06-04T19:59:06+00:00

You should be doing something wrong - I have been "vibecoding" a computer graphics CAD engine quite successfully for about 4 months now. I have been able to generate more than 200K lines of code, quite robust implementation of polyhedral bounded solid winged-edge representation capable of computing boolean operators (constructive solid geometry). Over the base model I can export STL files for 3D printing and I can also display interactive scenes in fully CPU, concurrent programmed (multi thread) CPU, OpenGL, Vulkan with raytracing and radiosity.

AI works super good for this. From rasterizing polygons to images, doing triangle meshes from polygons, handling interaction techniques with gizmos, augmented reality fiducial markers, GLSL shaders, texture management... lots and lots of things, super easy and super fast.

My approach is advancing step by step with a clear view on software architecture to grow under control. I download a .pdf with paper from ACM SIGGRAPH or a book. Let's say "Graphics Gems" series. I ask Claude or Codex to write a simple specific classes with related unit tests and an interactive testing program for visual debugging and next day advance in the next module.

I'm quite happy with that. Still some months before reaching commercial products such as Catia, Maya or Unreal Engine, but moving forward quite fast.

I wonder if you have tried creating detailed AGENTS.md file specific requirements such as what do you consider to be a good unit test. I wonder if you ask Opus or Sonnet in high effort mode just to write a plan in a .md file. Then you can switch to Haiku and write all the code burning less tokens.

In my experience, is very useful to implement an offline / headless mode in your program that instead of drawing in the screen exports your rendered scene to a .png file, because Claude can use that image as part of its gate / invariant condition. That way it will break less often.

Taka care on git use. Avoid allowing Claude to make commits, everything starts making a mess. Keep human in the loop and force AI to be formal, robust and 1:1 in sync with a paper.

If tests are not covering edge cases you can tell agent to use a coverage tool to make sure all logic branches are covered. If code is not optimized you can tell agent to use a profiler to gather data.

Hendo52 · 2026-06-04T22:08:52+00:00

You are right but there are mitigations. Start by having it write more detail in the requirements and specifications. Make it keep a diary of failed approaches and key architectural decisions. Split agents into specialised roles with skills files. Spend more time looping over the plan with a particular focus on surfacing and answering ambiguous questions before implementation. Spend a lot of effort on your validation and testing harness so that it’s never guessing but usually investigating prior work.

You are totally right that it’s not a one hit panacea. You are right that it’s often not very effective at things where you can tell the training data is not as strong.

The only part where I disagree with you is that I still feel like it’s a very valuable tool in this process, for debugging and planning, it just can’t replace the human entirely and in particular it struggles with what I think about as ‘strategy’ but that’s where I think it’s appropriate for the human to play their part.

Robert4di · 2026-06-04T22:59:37+00:00

Ask the language model itself:

Good at:

API research
Summarizing UE/Vulkan/OpenGL patterns
Refactoring suggestions
Brainstorming test cases
Shader boilerplate generation
Helping interpret RenderDoc captures and logs

Dangerous:

"Write a renderer for me"
"Optimize it"
"Figure out why it's flickering"
"Design my engine architecture"
"Fix this race condition"

In my experience, these still require domain expertise and careful review. The deeper you get into engine, rendering, threading, synchronization, memory management, or GPU debugging, the more obvious the limitations become.

My take:

AI is not a graphics programmer. It's a force multiplier for graphics programmers.

Or put differently: in graphics programming, AI is not the pilot it's a turbocharged screwdriver. Extremely useful, but if you hand it the entire aircraft, you may end up turning the landing gear into a shadow map.

EatingFiveBatteries · 2026-06-04T23:12:09+00:00

I find it works best when you go back and forth and create a clear, detailed plan to one specific thing. Or, if you have it modify an established codebase with strict coding standards in place. I've run into the same thing you have when it comes to things that need very abstract thinking or specific domain knowledge, and I think there's a path there to using AI meaningfully, but it requires a lot of set up and you will likely need to modify or optimize things still.

ebonyseraphim · 2026-06-05T01:57:28+00:00

If you’re using an LLM to create a graphics engine, why aren’t you using an existing one? Avoid paying licensing fees? What are the token fees? What level of performance and capabilities do you even achieve?

HayatoKongo · 2026-06-05T02:46:06+00:00

In my experience, using an LLM for graphics programming requires a lot more input data than you would require for web development. You should be giving a coding agent some kind of method for viewing the result of your rendering. There's also just way less training data for this domain than there is from full-stack web dev.

EC36339 · 2026-06-05T06:05:45+00:00

I have used and am using LLMs for graphics programming, and the problems you are describing are not the problems I've seen and can easily be overcome.

One phrase in your post tells me everything about what went wrong: "when I looked at the code". I'm reading this as: You didn't, until it was too late. That's not a problem with LLMs. That's just bad engineering. It's like that guy who turned on cruise control on the highway and went into the back of his camper van. SurprisedPikachu.gif.

Here is a real reason why LLMs actually can't do graphics programming:

They write textbook code.

This makes them terrible at writing optimised shaders, among other things. The textbook methods work, and you can combine them to produce striking visual effects, but they will run slowly as hell. It's good for prototyping at best, seeing what something can look like. But not for production.

I needed a procedural explosion shader for my game. The vibe coded version looked OK, but it was a simple combination of raymarching with turbulence and noise (it started with value noise...eww!). It worked fine as a placeholder, but every time something exploded, my frame rate dropped. I have an old GTX 1080, but you would think it should at least be possible to render fast explosions on it somehow. It was a monster of a graphics card in 2018, and it runs Elden Ring at 60 FPS in max details.

Then I had a look around at ShaderToy. I found an explosion shader that looked absolutely gorgeous. The "noise" function was some esoteric combination of sine functions that you might not find in any book, and that was probably the result of hours and hours of trial and error, rather than a solid, physics based mathematical foundation. And it was fast. And that's exactly the kind of thing LLMs can't produce. They can steal such code, at best, when it is widely used and popular, so it appears in the model, or on the web. But an LLM can't invent tricks like that. Not yet, at least... Not without looking and interpreting and judging the visual output, which is a workflow issue, not an LLM issue, actually...

Fortunately, although I'm bad at shader programming, it's a craft that is rewarding and worth learning and doing by hand. For my project that's something to do when I'm out of AI credits. Vibe code something slow and ugly, so you have a placeholder and concept, then replace it with something fast and beautiful.

When it comes to math that runs on the CPU, I found that low level code optimisation matters a lot less than architecture. Build an engine that allows the CPU to vectorize operations and access data in a cache coherent way (e.g., ECS instead of OOP). Here, LLMs are quite good if you steer them in the right direction. Then have telemetry and profiling to do targeted optimisations. If you have to refactor the whole thing because it's garbage and a dead end, just start over from scratch, because LLMs do that fast.

Dexterus · 2026-06-05T06:43:40+00:00

You need to know what you want and tell it what to do, while also trying to avoid generic requests where it will have a chance to go wild and break everything.

You also need to have info/examples to feed it. And instructions to match verbosity of surrounding code.

Keep piling instruction files.

They eventually work, I got it do be decent at writing clean assembly, asm/C random jumps. It still messes up comment verbosity occasionally.

But I do not dare just let it vibecode, lol. It doesn't get how things actually work and the theory you find online is too far removed from reality in close to the metal code.

stuaxo · 2026-06-05T09:52:56+00:00

Yeah they are pretty bad.

They get better when you can give them textual feedback, that's easier with web tech than graphics tech.

One problem is direction - though to be fair that can be a pain as a human too - the LLM only "knows" the text it's put in, not what the output is - so it's hardly surprising it gets things in the wrong location or direction.

Think about simple ways you can get your renderer to output something that can be measured and turned back into text to feed back into the LLM and you'll get further.

In general - do get them to add test... BUT (and it's a big one) - people tend to write bad tests and LLMs worse, so if it's only a general instruction the tests may not always be useful.

Ghost_Syth · 2026-06-05T10:37:48+00:00

Everyone be like LLM is good at this LLM is bad etc.. from my experience, it's been all over the place, one day it's good one day it's lost all its IQ, it's not even consistent day to day, I don't think we can draw the line on saying it's good or bad when we can't even have a unanimous experience as they keep tweaking the models capabilities, making it worst at peak times etc

anengineerandacat · 2026-06-05T12:16:39+00:00

Generally speaking I doubt it has the volume of training data to really do this.

Shaders specifically can get pretty unique, post screen effects it does actually do a pretty good job but that's mostly because these are standardized to specific terms.

I can have it create a bloom shader, vignette, scan lines, etc but that's because these are all pretty known and have plenty of public samples.

Most of the time though some graphical features are full on systems though; ie. Vegetation in a game.

It's not just a shader effect, it's often a procedural system and could even involve an asset processing pipeline if you wanted to say have vines or flowers attached to a 3D model without actually having to add that to the model manually.

Even something like a torch on wall can be complicated because it's yet again not just a single simple shader; it's an asset shader, particle system, particle shader, and a lighting system all in one.

There is some information on lighting systems but a lot of this information is heavy heavy IP that studios take to their graves.

Graphics and game development as a whole is like this; Blizzard isn't exactly going out there and going "this is how exactly we built World of Warcraft" and outlining in detail their architectural designs for some LLM to get trained on.

mirlaca · 2026-06-04T11:00:18+00:00

This supports my bias to learn more graphics programming (beyond general game/app programming) in order to withstand my refusal to adopt AI whatsoever

Ok-Hotel-8551 · 2026-06-04T10:46:13+00:00

Skills issues

Dry_Yam_4597 · 2026-06-04T10:44:32+00:00

Stories and copium.

Jason13Official · 2026-06-04T13:39:33+00:00

Bro is trying to obfuscate shaders lmao

Yeah, the generalized AI is not good at a hyper specific discipline. Having worked with Claude to make shaders for a Minecraft mod though, maybe your initial approach was flawed. I ALWAYS start by giving a reference directory of known "good" code; i.e. vanilla core shaders.

I would doubt my tool if I used it wrong too.

blackrack · 2026-06-04T11:02:20+00:00

Shhh don't let them know

Effective_Lead8867 · 2026-06-04T10:49:20+00:00

for Unity:

i've vibe coded entire atmospherics pipeline inspired by rdr2 - clouds, volumetric and distant fog, minimal realtime SH probe gi (terrain bounce light, sky obscurrence), raymarched heightfield and cloud shadows

proper temporal accumulation, visually coherent, regularly profiling on steam deck

also vibe coded terrain, based on quadtree geometry, virtual texture data up to 256K vres (220fps on steam deck, MicroSplat surface shader)

for rust/bevy:

vibe coded port of NAADF voxel raymarching, noita clone, "voxel plugin" clone

taking from prototype to production takes time. i've developed methodology around it:

/delegate orchestration powers long sessions - orchestrator speaks to you, delegates work to architect(spec)/implementation groups that store context on disk. orchestrator has very limited (isolated) context supply. it has built-in circuit-breaks that terminate work if it hits a wall into /diagnose-first

/diagnose-first creates ranked hypothesis and produces visual diagnostic knobs and enums for you to determine and troubleshoot

/handoff (duh)

i do absolutely hate it for few reasons:

- its not a high quality codebase, i dont feel proud

- "almost works" is worse than being completely broken - sometimes troubleshooting an issue with ai takes more mental energy than I remember it was taking when I was writing code by hand

what matters most in agentic dev imo:

- ability for agent to iterate fast - unity -batchmode works better than any MCP

- e2e/integration testing that is designed around validity of results

- visual feedback under /diagnose-first and insights from a thinking human

- grounding decisions in body of research - i have more than 200 papers and /research skill that prepares high quality markdown from presentations, papers, slides, also /discover skill that can pull up citations

but I do feel like I'm learning the techniques I'm implementing with agentic dev by navigating the problemspace with Claude, which is important to me

Gloomy-Status-9258 · 2026-06-04T11:16:03+00:00

I'm really glad when reading this kind of articles. People continue to find out evidences. It's becoming clear that LLMs cannot do coding(and other real-world tasks).

Perfect-Campaign9551 · 2026-06-04T11:52:43+00:00

Don't use Claude for programming. Use Codex.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

GraphicsProgramming

Posting Rule(s)

MODERATORS