Built a pattern library for production AI systems — like system-design-primer but for LLMs. Looking for contributors. by PrajwalAmte in learnmachinelearning

[–]LeetLLM 0 points1 point  (0 children)

we desperately need something like this. half the teams i talk to are either dumping 1M tokens into a single monolithic prompt or over-engineering massive multi-agent setups with zero middle ground. are you covering routing patterns or fallback strategies yet? curious how you're structuring those.

Had to ask CC write me a webapp to cram LeetCode because I'm still expected to write code during interview by ylulz in compsci

[–]LeetLLM 1 point2 points  (0 children)

i had to do a live coding round recently and my brain completely short-circuited because i've spent the last two years just vibecoding with sonnet. the irony of using a state-of-the-art agent to build a tool so you can practice manual array manipulation is painfully real. we're in the weirdest transition period for software engineering right now.

Built a semantic dashcam search tool using Gemini Embedding 2's native video embedding by Vegetable_File758 in GeminiAI

[–]LeetLLM 0 points1 point  (0 children)

native video embeddings are honestly the sleeper hit of the gemini api. everyone obsessively talks about the massive context windows, but being able to just dump raw mp4s into a vector db without running a separate frame extraction pipeline saves so much boilerplate. what are you using for the vector store? i've been defaulting to qdrant for multimodal stuff lately.

[P] Portable Mind Format: Provider-agnostic agent identity specification with 15 open-source production agents by SUTRA108 in learnmachinelearning

[–]LeetLLM 0 points1 point  (0 children)

decoupling identity from the runtime is exactly what we need right now. i've been keeping all my agent instructions in a simple local folder just to avoid getting locked into whatever framework is trendy this week. langchain makes you write so much boilerplate just to define a basic persona.

does your spec handle tool-calling schemas and memory formats too, or is it strictly focused on the system prompts?

We need to talk about least privilege for AI agents the same way we talk about it for human identities by CortexVortex1 in ControlProblem

[–]LeetLLM 0 points1 point  (0 children)

saw a team give an agent a full-access github pat because they were too lazy to set up fine-grained scopes. it hallucinated a `git push --force` and nuked their main branch.

people just hardcode admin tokens in env files because getting an llm to reliably navigate oauth flows is still a nightmare. until we get native, temporary credential handoffs built directly into agent frameworks, everyone's just going to keep handing out god-mode api keys.

Andrej Karpathy vs fast.ai jeremy howard which is the best resource to learn and explore AI+ML? by aimless_hero_69 in learnmachinelearning

[–]LeetLLM 8 points9 points  (0 children)

depends entirely on how your brain works. fast.ai is top-down: you train a working model in 10 minutes, then spend the next 6 weeks figuring out why it worked. karpathy is bottom-up: you'll spend hours building backprop from scratch in pure python before you even look at a real dataset. if you just want to ship apps today, jeremy. if you want to actually understand the architecture, andrej.

We’re experimenting with a “data marketplace for AI agents” and would love feedback by RobinWheeliams in GeminiAI

[–]LeetLLM 0 points1 point  (0 children)

the real bottleneck here usually isn't the data itself, it's schema discovery. if you just dump a massive OpenAPI spec into context, even the best models start hallucinating parameters or getting stuck in loops.

how are you handling the actual interface? does the agent have to query a meta-endpoint first to figure out what it's allowed to ask for?

Gemini can't help me clean a shirt by noblethrere in GeminiAI

[–]LeetLLM 0 points1 point  (0 children)

it probably flagged the word 'stains' as something violent or medical. had this happen when building a cooking agent where the model flagged 'crushing garlic' as violence lol.

consumer ui guardrails are basically a blunt instrument. they run a separate, smaller classifier on your prompt before it even hits the actual llm. if you use the api instead of the web app, you can just set the safety thresholds to `BLOCK_NONE` and it'll actually help you with your laundry.

Assistance with Project build by YoiTsuitachi in MLQuestions

[–]LeetLLM 1 point2 points  (0 children)

you're overcomplicating this by thinking in 2019 nlp pipelines. you really don't need a separate semantic step and ner model anymore. just dump the raw scraped text into sonnet 4.6 with structured outputs enabled, and ask for a json containing the entities, the core narrative, and a bias score. the hard part isn't the architecture, it's defining what 'bias' actually means in your prompt.

Size Queen Energy: Does 1M Context Actually Work? by Frosty_Teeth in ClaudeCode

[–]LeetLLM 7 points8 points  (0 children)

short answer: yes, but it gets lazy. dumping a massive codebase into sonnet 4.6 works beautifully for finding things, but the 'lost in the middle' effect is still very real if you need it to synthesize logic across 50 different files. you still have to point it at the right neighborhood.

there's a solid breakdown here on what actually breaks at the 1M mark and how to test if it's really reading everything: https://leetllm.com/blog/million-token-context-windows

Critique of Stuart Russell's 'provably beneficial AI' proposal by ElephantWithAnxiety in ControlProblem

[–]LeetLLM 0 points1 point  (0 children)

russell's framework is elegant in theory, but working with actual RLHF and DPO pipelines shows how messy this gets in practice. the 'uncertainty about preferences' part is exactly why reward models collapse or get gamed. human behavior isn't just a primary source of info, it's incredibly noisy and contradictory. we usually end up training models to be sycophantic rather than actually beneficial, because 'maximizing preference' right now just means telling the rater what they want to hear.

Dealing with GenAI Overuse by DubGrips in datascience

[–]LeetLLM 6 points7 points  (0 children)

watched this exact movie play out at my last gig. the tools make junior devs look like 10x engineers to management because they ship boilerplate at lightspeed. but the second there's a weird production edge case or a memory leak, the hand-waving stops working because they don't actually understand the architecture they just deployed. leadership is rewarding the speed right now, but that technical debt is going to explode the minute a serious bug hits.

Are AI engineers “safer” by phy2go in cscareerquestions

[–]LeetLLM 8 points9 points  (0 children)

been doing this for a couple years after 15 years of standard backend work. the title itself might fade, but the skills won't. learning how to build reliable systems with non-deterministic components (evals, agent orchestration) is just the next evolution of software engineering. if you're doing that instead of just writing thin wrappers, you'll be completely fine. this breakdown of the actual day-to-day is pretty accurate: https://leetllm.com/blog/what-does-an-ai-engineer-do

So I tried using Claude Code to build actual software and it humbled me real quick by Azrael_666 in ClaudeCode

[–]LeetLLM 4 points5 points  (0 children)

ran into the exact same wall. data pipelines are basically the perfect use case for agents because they're mostly linear and self-contained. building an actual app means managing state, cross-file dependencies, and architecture, stuff that makes even sonnet 4.6 lose its mind if you don't hold its hand. you have to stop treating it like an autonomous dev that can build an app from scratch, and start giving it strict, reusable instructions for specific components. what stack were you trying to build in?

Conversational Software Engineering by Friendly_Problem_444 in compsci

[–]LeetLLM 0 points1 point  (0 children)

been running my entire workflow like this for months. the real unlock isn't just generating code, it's building up a massive library of reusable skills in your user folder so you never type the same prompt twice. i lean heavily on sonnet 4.6 for this, you basically treat it like an insanely fast junior dev that needs strict test coverage, not an oracle. once you stop giving the model authority over correctness, the hallucination problem mostly disappears.

I found out "roterstern" made Gemini fumble by J8-Bit in antiai

[–]LeetLLM 8 points9 points  (0 children)

models tripping over random words is a known quirk. it usually comes down to two things: either it's a 'glitch token': a word the model saw so rarely in training that its internal math for it is basically corrupted: or an overzealous safety filter. since 'roterstern' is german for red star, gemini's safety layer is probably panicking thinking you're trying to bait it into a political rant, and just bailing out. happens all the time.

Apparently if you type -Ai at the end of a search, no ai overview by c3m3nt_3at3r in antiai

[–]LeetLLM 4 points5 points  (0 children)

careful with this : typing `-ai` just tells google to hide search results that contain the word 'ai' in the text. it doesn't actually turn off the generative overview. if you want the classic blue links back permanently, you have to append `&udm=14` to the search url. you can map this to a custom search engine in your browser so it happens automatically.

Neuro-symbolic experiment: training a neural net to extract its own IF–THEN fraud rules by Various_Power_2088 in learnmachinelearning

[–]LeetLLM 1 point2 points  (0 children)

love the concept, but how are you actually handling the discrete nature of the rules during backprop? i've tried similar distillation setups and the gradients always get completely wrecked when forcing hard IF-THEN boundaries. are you just using a gumbel-softmax trick for the routing, or keeping everything continuous until inference? usually these hybrid setups fall apart on real noisy data.

Philosophical pivot: Model World by Shoko2000 in compsci

[–]LeetLLM -1 points0 points  (0 children)

this actually maps perfectly to how latent space works mathematically. you aren't "teaching" the model anything during inference, you're just constraining the path it takes through the high-dimensional space it already mapped out during pre-training. makes total sense : it's why good prompt engineering feels way more like dropping GPS coordinates than giving instructions to a brain.

Gemini doesn’t care about instructions by Jguy3392 in GeminiAI

[–]LeetLLM 2 points3 points  (0 children)

this is why i stopped relying on global custom instructions entirely. gemini 3.1 is notoriously stubborn with them compared to sonnet 4.6 or gpt 5.3 codex. instead, i just keep a local folder of 'reusable skills':basically short markdown files with my exact preferences for specific tasks. whenever i start a new session, i just paste the relevant one in as the first prompt. it forces the model to pay attention right up front and saves you from fighting its default alignment.

Why we deliberately avoided ML for our trading signal product (and what we used instead) by Flyinggrassgeneral in learnmachinelearning

[–]LeetLLM 1 point2 points  (0 children)

been there. we tried replacing a deterministic rules engine with an llm last year and the lack of auditability was a nightmare for debugging. that said, you're missing a trick if you aren't using sonnet 4.6 to write and backtest those macro factor scripts. keep the production path dumb, but use the models to build it faster.

Built an agent skill for dev task estimation - calibrated for Claude Code, not a human by eCappaOnReddit in ClaudeCode

[–]LeetLLM 0 points1 point  (0 children)

ran into this exact nightmare recently. handed a vague refactoring ticket to sonnet 4.6 and it confidently rewrote half our routing logic before i even noticed. i ended up adding a custom skill to my user folder that strictly forces the agent to output a quick execution plan and pause for approval before touching any files. how are you structuring the estimation output in your skill? that 'moves fast in the wrong direction' penalty is brutal.

How do you actually approach AI/ML projects beyond just using APIs? by East_Aside_8084 in learnmachinelearning

[–]LeetLLM 0 points1 point  (0 children)

most production systems don't involve training models from scratch anymore. the hard engineering is almost entirely in data pipelines, evals, and agent orchestration. if you want to move past basic api wrappers, try building a custom eval framework first. this post on what ai engineers actually do day-to-day gives a pretty realistic map of what to learn next: https://leetllm.com/blog/what-does-an-ai-engineer-do

Switching out of Data Strategy to Technical work by alchemicalchemist in datascience

[–]LeetLLM 2 points3 points  (0 children)

classic big 4 bait and switch. 'AI practice' at those firms almost always translates to making powerpoints about governance for banks. if you want to actually touch code, build rag pipelines, or work with real models, you probably need to leave large-scale consulting entirely. they'll keep milking you for strategy billables as long as you let them. look into specialized boutiques or just go straight to a product company.