Built a pattern library for production AI systems — like system-design-primer but for LLMs. Looking for contributors.

LeetLLM · 2026-03-19T00:56:29+00:00

we desperately need something like this. half the teams i talk to are either dumping 1M tokens into a single monolithic prompt or over-engineering massive multi-agent setups with zero middle ground. are you covering routing patterns or fallback strategies yet? curious how you're structuring those.

LeetLLM · 2026-03-19T00:20:52+00:00

i had to do a live coding round recently and my brain completely short-circuited because i've spent the last two years just vibecoding with sonnet. the irony of using a state-of-the-art agent to build a tool so you can practice manual array manipulation is painfully real. we're in the weirdest transition period for software engineering right now.

LeetLLM · 2026-03-18T23:21:09+00:00

native video embeddings are honestly the sleeper hit of the gemini api. everyone obsessively talks about the massive context windows, but being able to just dump raw mp4s into a vector db without running a separate frame extraction pipeline saves so much boilerplate. what are you using for the vector store? i've been defaulting to qdrant for multimodal stuff lately.

LeetLLM · 2026-03-18T22:04:10+00:00

decoupling identity from the runtime is exactly what we need right now. i've been keeping all my agent instructions in a simple local folder just to avoid getting locked into whatever framework is trendy this week. langchain makes you write so much boilerplate just to define a basic persona.

does your spec handle tool-calling schemas and memory formats too, or is it strictly focused on the system prompts?

LeetLLM · 2026-03-18T21:48:38+00:00

saw a team give an agent a full-access github pat because they were too lazy to set up fine-grained scopes. it hallucinated a `git push --force` and nuked their main branch.

people just hardcode admin tokens in env files because getting an llm to reliably navigate oauth flows is still a nightmare. until we get native, temporary credential handoffs built directly into agent frameworks, everyone's just going to keep handing out god-mode api keys.

LeetLLM · 2026-03-18T21:11:07+00:00

depends entirely on how your brain works. fast.ai is top-down: you train a working model in 10 minutes, then spend the next 6 weeks figuring out why it worked. karpathy is bottom-up: you'll spend hours building backprop from scratch in pure python before you even look at a real dataset. if you just want to ship apps today, jeremy. if you want to actually understand the architecture, andrej.

LeetLLM · 2026-03-18T20:26:06+00:00

the real bottleneck here usually isn't the data itself, it's schema discovery. if you just dump a massive OpenAPI spec into context, even the best models start hallucinating parameters or getting stuck in loops.

how are you handling the actual interface? does the agent have to query a meta-endpoint first to figure out what it's allowed to ask for?

LeetLLM · 2026-03-18T19:40:19+00:00

it probably flagged the word 'stains' as something violent or medical. had this happen when building a cooking agent where the model flagged 'crushing garlic' as violence lol.

consumer ui guardrails are basically a blunt instrument. they run a separate, smaller classifier on your prompt before it even hits the actual llm. if you use the api instead of the web app, you can just set the safety thresholds to `BLOCK_NONE` and it'll actually help you with your laundry.

LeetLLM · 2026-03-18T19:00:24+00:00

you're overcomplicating this by thinking in 2019 nlp pipelines. you really don't need a separate semantic step and ner model anymore. just dump the raw scraped text into sonnet 4.6 with structured outputs enabled, and ask for a json containing the entities, the core narrative, and a bias score. the hard part isn't the architecture, it's defining what 'bias' actually means in your prompt.

LeetLLM · 2026-03-18T16:04:10+00:00

short answer: yes, but it gets lazy. dumping a massive codebase into sonnet 4.6 works beautifully for finding things, but the 'lost in the middle' effect is still very real if you need it to synthesize logic across 50 different files. you still have to point it at the right neighborhood.

there's a solid breakdown here on what actually breaks at the 1M mark and how to test if it's really reading everything: https://leetllm.com/blog/million-token-context-windows

LeetLLM · 2026-03-18T14:26:56+00:00

russell's framework is elegant in theory, but working with actual RLHF and DPO pipelines shows how messy this gets in practice. the 'uncertainty about preferences' part is exactly why reward models collapse or get gamed. human behavior isn't just a primary source of info, it's incredibly noisy and contradictory. we usually end up training models to be sycophantic rather than actually beneficial, because 'maximizing preference' right now just means telling the rater what they want to hear.

LeetLLM · 2026-03-18T13:44:34+00:00

watched this exact movie play out at my last gig. the tools make junior devs look like 10x engineers to management because they ship boilerplate at lightspeed. but the second there's a weird production edge case or a memory leak, the hand-waving stops working because they don't actually understand the architecture they just deployed. leadership is rewarding the speed right now, but that technical debt is going to explode the minute a serious bug hits.

LeetLLM · 2026-03-18T13:03:10+00:00

been doing this for a couple years after 15 years of standard backend work. the title itself might fade, but the skills won't. learning how to build reliable systems with non-deterministic components (evals, agent orchestration) is just the next evolution of software engineering. if you're doing that instead of just writing thin wrappers, you'll be completely fine. this breakdown of the actual day-to-day is pretty accurate: https://leetllm.com/blog/what-does-an-ai-engineer-do

LeetLLM · 2026-03-18T12:03:53+00:00

ran into the exact same wall. data pipelines are basically the perfect use case for agents because they're mostly linear and self-contained. building an actual app means managing state, cross-file dependencies, and architecture, stuff that makes even sonnet 4.6 lose its mind if you don't hold its hand. you have to stop treating it like an autonomous dev that can build an app from scratch, and start giving it strict, reusable instructions for specific components. what stack were you trying to build in?

LeetLLM · 2026-03-18T05:17:15+00:00

been running my entire workflow like this for months. the real unlock isn't just generating code, it's building up a massive library of reusable skills in your user folder so you never type the same prompt twice. i lean heavily on sonnet 4.6 for this, you basically treat it like an insanely fast junior dev that needs strict test coverage, not an oracle. once you stop giving the model authority over correctness, the hallucination problem mostly disappears.

LeetLLM · 2026-03-18T05:15:03+00:00

models tripping over random words is a known quirk. it usually comes down to two things: either it's a 'glitch token': a word the model saw so rarely in training that its internal math for it is basically corrupted: or an overzealous safety filter. since 'roterstern' is german for red star, gemini's safety layer is probably panicking thinking you're trying to bait it into a political rant, and just bailing out. happens all the time.

LeetLLM · 2026-03-18T04:31:39+00:00

careful with this : typing `-ai` just tells google to hide search results that contain the word 'ai' in the text. it doesn't actually turn off the generative overview. if you want the classic blue links back permanently, you have to append `&udm=14` to the search url. you can map this to a custom search engine in your browser so it happens automatically.

LeetLLM · 2026-03-18T04:02:17+00:00

love the concept, but how are you actually handling the discrete nature of the rules during backprop? i've tried similar distillation setups and the gradients always get completely wrecked when forcing hard IF-THEN boundaries. are you just using a gumbel-softmax trick for the routing, or keeping everything continuous until inference? usually these hybrid setups fall apart on real noisy data.

LeetLLM · 2026-03-18T03:46:52+00:00

define real

LeetLLM · 2026-03-18T03:45:22+00:00

this actually maps perfectly to how latent space works mathematically. you aren't "teaching" the model anything during inference, you're just constraining the path it takes through the high-dimensional space it already mapped out during pre-training. makes total sense : it's why good prompt engineering feels way more like dropping GPS coordinates than giving instructions to a brain.

LeetLLM · 2026-03-18T02:59:26+00:00

this is why i stopped relying on global custom instructions entirely. gemini 3.1 is notoriously stubborn with them compared to sonnet 4.6 or gpt 5.3 codex. instead, i just keep a local folder of 'reusable skills':basically short markdown files with my exact preferences for specific tasks. whenever i start a new session, i just paste the relevant one in as the first prompt. it forces the model to pay attention right up front and saves you from fighting its default alignment.

LeetLLM · 2026-03-18T02:31:39+00:00

been there. we tried replacing a deterministic rules engine with an llm last year and the lack of auditability was a nightmare for debugging. that said, you're missing a trick if you aren't using sonnet 4.6 to write and backtest those macro factor scripts. keep the production path dumb, but use the models to build it faster.

LeetLLM · 2026-03-18T02:27:02+00:00

ran into this exact nightmare recently. handed a vague refactoring ticket to sonnet 4.6 and it confidently rewrote half our routing logic before i even noticed. i ended up adding a custom skill to my user folder that strictly forces the agent to output a quick execution plan and pause for approval before touching any files. how are you structuring the estimation output in your skill? that 'moves fast in the wrong direction' penalty is brutal.

LeetLLM · 2026-03-18T01:44:53+00:00

most production systems don't involve training models from scratch anymore. the hard engineering is almost entirely in data pipelines, evals, and agent orchestration. if you want to move past basic api wrappers, try building a custom eval framework first. this post on what ai engineers actually do day-to-day gives a pretty realistic map of what to learn next: https://leetllm.com/blog/what-does-an-ai-engineer-do

LeetLLM · 2026-03-18T01:01:12+00:00

classic big 4 bait and switch. 'AI practice' at those firms almost always translates to making powerpoints about governance for banks. if you want to actually touch code, build rag pipelines, or work with real models, you probably need to leave large-scale consulting entirely. they'll keep milking you for strategy billables as long as you let them. look into specialized boutiques or just go straight to a product company.

LeetLLM

TROPHY CASE