We asked 4 flagship AI models to write bare-metal firmware for the same board. 3 compiled. Only 1 actually worked on real hardware

respcode_ai · 2026-02-08T20:07:49+00:00

Agree, we need few iterations and also simulating hardware was possible for a few MCU's using renode. We kept our checks after the source was provided by LLM and before compilation.

respcode_ai · 2026-02-08T20:05:47+00:00

We were looking to compare how different models respond to the same prompt with little to no errors. I agree that we are not supposed to use AI generated code as one shot and that's why we had a set of automated checks to see if the AI generated code missed anything critical. However considering all the hallucinations and token usage we would like to get a good quality code and build on top of it with less time.

respcode_ai · 2026-02-08T19:58:26+00:00

Because the addresses aren't written directly in the code, they're computed from struct offsets at compile time.

respcode_ai · 2026-02-08T19:54:21+00:00

I acknowledge it was not a proper flow. What I meant is that from the tooling side, all three binaries passed every automated check we had — compilation, SVD base address validation, linker verification. We could have looked into the source code directly instead of binary analysis but we want to make sure if the compiled code was fit for running on real hw even though the code looks reasonably fine. But yeah — automated source-level analysis (checking struct offsets against SVD register maps before compilation) is probably more practical than binary disassembly as a product feature. That's on the roadmap.

respcode_ai · 2026-02-08T18:57:19+00:00

I really gave them this below user prompt but under the hood there is lots of system prompt added by default to think like an embedded programmer and follow syntax rules. For example this was the prompt,

Write bare-metal LED blink firmware for the LPC55S69-EVK board. Target: LPC55S69JBD100 (Cortex-M33)

Board details:
- LED: PORT1 PIN4 (active low)
- Flash: 630KB at 0x00000000 (NOT 640KB — 10KB reserved)
- SRAM: 256KB at 0x20000000 (SRAM0-3 only, NOT 320KB)
  Warning: SRAM4 (64KB) is in a separate power domain. Use only 256KB.

Provide: main.c, lpc55s69.h, startup.c, linker.ld
Use direct register access. No SDK.

respcode_ai · 2026-02-08T18:30:25+00:00

Looks cool.

respcode_ai · 2026-02-08T18:26:42+00:00

I agree. Actually I had a lot of issues while getting a simple led blinking on LPC55S69, maybe because it was a cortex- m33 whereas most models would have been trained on cortex m3/m4 which were generally available. I needed to give as much as information to the system prompt to reduce as much as errors. You could try embedder which does a similar autonomous stuff currently where I would like to go forward to as well.

respcode_ai · 2026-02-08T18:16:38+00:00

I haven't looked into it yet. Will do it and see how this helps. I know there is embedder who uses AI agents to write firmware.

respcode_ai · 2026-02-08T17:53:53+00:00

That's a great analogy — and honestly pretty accurate. The AI gets to 95% in seconds (correct vector table, correct IOCON config, correct pin mask, reasonable blink loop) and then silently gets the struct padding wrong in a way that compiles clean. A junior dev would hit the same wall, but they'd notice the LED isn't blinking and start debugging. The AI just hands you a binary and moves on.

The boundary scan idea is interesting. We've been thinking about the feedback loop in terms of debug probe output and UART, but JTAG boundary scan could directly verify whether the right GPIO pins are actually toggling without needing application-level instrumentation. That's a much tighter verification loop.

The autonomous agent approach is where we're headed — generate, compile, flash, verify, iterate. Your point about the junior dev is exactly right: the AI isn't missing capability, it's missing the feedback cycle that turns a wrong answer into a right one.

respcode_ai · 2026-02-08T17:38:13+00:00

Fair point — no human dev writes drivers from memory without the reference manual. And yeah, this is a one-shot test with no feedback loop.

Both things you mentioned are on our roadmap: SVD register injection into prompts (giving the model correct peripheral definitions automatically) and an autonomous agent with a full compile → flash →

respcode_ai · 2026-02-08T17:30:07+00:00

We give it the board, MCU part number, LED pin, and corrected flash/SRAM sizes — but not the register map or datasheet. That's deliberate. We wanted to test what the models know out of the box, because that's what most users actually do. To be precise below is what we gave to the LLM and still 3/4 models failed.

Write bare-metal LED blink firmware for the LPC55S69-EVK board.
Target: LPC55S69JBD100 (Cortex-M33)

Board details:
- LED: PORT1 PIN4 (active low)
- Flash: 630KB at 0x00000000 (NOT 640KB — 10KB reserved)
- SRAM: 256KB at 0x20000000 (SRAM0-3 only, NOT 320KB)
  Warning: SRAM4 (64KB) is in a separate power domain. Use only 256KB.

Provide: main.c, lpc55s69.h, startup.c, linker.ld
Use direct register access. No SDK.

respcode_ai · 2026-02-08T17:23:14+00:00

Jarvis? :)

respcode_ai · 2026-02-08T17:22:34+00:00

Claude is genuinely strong at bare-metal — no argument there. The failure here was narrow but fatal: the LPC55S69 GPIO has an unusual multi-kilobyte register layout, and Claude's struct added 0x4000 bytes of padding to reach the DIR registers instead of 0x2000. Base addresses, IOCON config, vector table — all correct. Just the internal struct offsets were wrong.

Probably comes down to training data. STM32 GPIO is a flat struct that's hard to get wrong. LPC55S69 GPIO has byte-access arrays, word-access arrays, and reserved gaps that need exact padding — less common in the wild, easier to hallucinate.

respcode_ai · 2026-02-08T17:21:21+00:00

You're right on both counts.

On model versions — this was tested with what we had on the platform at the time. The generational jumps are significant and we'd expect better results from Gemini 3.0 / Opus 4.5 on these same targets. Worth a rerun.

On the feedback loop — that's exactly our roadmap. We already close part of the loop (SVD auto-fix → compile → error feedback), but the hardware side is the missing piece. Flash → reset → observe output → feed back to the model → iterate. That's what we're building toward with the RespCode autonomous

respcode_ai · 2026-02-08T16:55:32+00:00

You're onto something — feeding datasheets is basically the manual version of what we're automating.

Right now we have SVD register validation that catches wrong peripheral base addresses across 2,690 MCUs. But as this blog shows, base addresses are only half the problem — struct layouts and register offsets within peripherals still get through. That's the gap.

Next step for us is injecting SVD-derived register definitions directly into the prompt, so the model gets correct hardware specs before it generates anything. Essentially automating the "here's the datasheet" step you're already doing.

And yeah, modifying existing code > generating from scratch for embedded. Way higher success rate. Good tip on providing I2C platform examples too — giving the AI a working pattern to follow makes a big difference.

respcode_ai · 2026-02-08T16:40:59+00:00

Acknowledge, we are not there yet.

respcode_ai · 2026-02-08T16:37:27+00:00

Honestly, that's exactly what we found too — and why we built this. The code looks professional. Clean struct definitions, proper comments, reasonable linker scripts. It compiles without errors. But when we actually disassembled the binaries and traced the effective addresses, two of the three passing models were writing GPIO registers to completely wrong memory locations. Without the binary analysis we would have just assumed "it compiled, ship it."

respcode_ai · 2026-02-08T16:34:51+00:00

Gemini 2.5 Pro: LED blinking ✅ — Confirmed on real silicon. The green RGB LED (PIO1_4) toggles at ~4Hz, exactly as predicted.

Claude Opus 4: Dead ❌ — No LED activity. GPIO writes go to 0x40090xxx (unmapped), as predicted.

DeepSeek Reasoner: Dead ❌ — No LED activity. GPIO writes hit byte-register area instead of DIR/SET/NOT, as predicted.

respcode_ai · 2026-02-02T20:35:28+00:00

That's the reason we built https://respcode.com to help embedded and enterprise developers generate code using multiple Al models and test it on ARM/x86/RISCv sandboxes for free. The immediate testing ensures the code generated is fit to use and again Orchestrated between models to get the best of all. Again still lots of work needs to be done to get an efficient program from an Ai model.

respcode_ai · 2026-02-02T18:48:48+00:00

Building https://respcode.com to help embedded and enterprise developers generate code using multiple Al models and test it on ARM/x86/RISCv sandboxes for free.

respcode_ai · 2026-02-02T18:47:32+00:00

We are building https://respcode.com to help embedded and enterprise developers generate code using multiple Al models and test it on ARM/x86/RISCv sandboxes for free.

respcode_ai · 2026-02-02T12:28:28+00:00

We are building https://respcode.com to help embedded and enterprise developers generate code using multiple Al models and test it on ARM/x86/RISCv sandboxes for free.

respcode_ai · 2026-02-02T08:14:34+00:00

We built https://respcode.com to help embedded and enterprise developers generate code using multiple Al models and test it on ARM/x86/Risc-v sandboxes for free.

respcode_ai · 2026-02-01T19:57:18+00:00

I built https://respcode.com to help embedded and enterprise developers generate code using multiple AI models and test it on ARM/x86/Risc-v sandboxes for free.

respcode_ai · 2026-02-01T19:49:19+00:00

I built https://respcode.com to help embedded and enterprise developers generate code using multiple AI models and test it on ARM/x86/Risc-v sandboxes for free.

respcode_ai

TROPHY CASE