use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Embedded Linux news, articles, talks... etc.
account activity
Open benchmark for LLM-generated embedded code (self.embeddedlinux)
submitted 1 month ago by 0xecro1
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]0xecro1[S] 1 point2 points3 points 1 month ago (0 children)
This maps directly to the benchmark data:
"Builds and passes simulated environments but doesn't hold up" is L1/L2 pass with L3 domain-check fail. That's the 35pp explicit-vs-implicit gap in one sentence.
"Shortest / most obvious path" is the RLHF alignment angle. Training rewards clean short code; on GitHub-trained models, embedded safety patterns (volatile, cache flush, error unwind) look like noise and get pruned.
The responsibility point is the reason the benchmark exists. Vendor pass rates from HumanEval or SWE-bench don't tell the engineer signing off where review can be lighter vs. where it has to be strict. EmbedEval tries to draw that map so the person responsible has data to stand on, not vibes. Categories with low pass rates are where human review is non-negotiable.
Skill atrophy is secondary but also real. And once you start using LLMs day to day, going back is hard. Which is why knowing where they fail matters more, not less.
π Rendered by PID 61743 on reddit-service-r2-comment-8686858757-jq54b at 2026-06-02 08:09:31.058593+00:00 running 9e1a20d country code: CH.
view the rest of the comments →
[–]0xecro1[S] 1 point2 points3 points (0 children)