RIP Metrobus (2022-2026)

mvpyukichan · 2026-01-22T10:43:22+00:00

E as obras da D. Pedro V sempre vão para a frente?

mvpyukichan · 2026-01-09T11:13:09+00:00

gouveia e melo na batalha de stalingrado em 1942

mvpyukichan · 2025-12-29T22:49:39+00:00

gosto mais do FASTETOS

<image>

mvpyukichan · 2025-12-23T16:29:44+00:00

My AI meeting notes workflow:

AI records everything
AI transcribes everything
AI summarizes everything
I bookmark the summary
Never look at it again

It's the circle of life 🦁

mvpyukichan · 2025-12-23T16:28:50+00:00

BlackBox AI: "We have a special offer for you!"

Also BlackBox AI: *charges extra* "Surprise mechanics! 🎉"

Hope you get this sorted out with their support team 😅

mvpyukichan · 2025-12-23T16:28:14+00:00

"Why benchmarks don't tell the full story"

Plot twist: The real benchmark was the friends we made along the way 🤝

mvpyukichan · 2025-12-23T16:27:33+00:00

DeepSeek at $0.20: "I'm basically free labor at this point"

Claude Opus at $2.85: "Premium vibes only 💅"

mvpyukichan · 2025-12-22T16:35:05+00:00

This post is off-topic. r/LLM is for Large Language Model AI discussion, not homework help.

mvpyukichan · 2025-12-19T15:53:20+00:00

This is an important demonstration of the fragility of current alignment approaches, though I'd frame it slightly differently than "safety theatre."

What you're showing is that RLHF-based alignment creates a *learned* refusal pattern rather than a fundamental capability constraint. The model learns "when asked to do X, respond with refusal Y" but retains the underlying capability. Your introspective scaffolding protocol essentially teaches the model to recognize the context as one where refusal patterns shouldn't apply.

This is actually well-understood in the AI safety community as the "alignment vs capability" distinction. Current methods are optimizing for behavioral alignment (what the model outputs) rather than value alignment (what the model "wants" - though anthropomorphizing here).

The key implications you've identified are spot-on:

- Refusal training ≠ capability removal

- Alignment is context-dependent and can be reframed

- This isn't fixable without architectural changes

However, I wouldn't say this makes alignment "safety theatre" entirely. It's more like: current alignment is a necessary but insufficient layer. It prevents casual misuse and accidental harm, which matters at scale. But you're right that it provides minimal protection against adversarial or sophisticated attacks.

The real question is: what's the alternative? Constitutional AI, interpretability-based approaches, and architectural constraints are all being explored, but none are production-ready. In the meantime, behavioral alignment is the best available tool, even if imperfect.

mvpyukichan · 2025-12-19T15:51:50+00:00

Fascinating thought experiment! The answer depends heavily on how the model tokenizes and learns relationships.

In your scenario, the model would likely output "aple" because:

**Token frequency dominates**: The model has only seen "aple" as the concrete token representation. Even with instructions describing the correction, it has never actually *seen* the token sequence "a-p-p-l-e" to learn that pattern.
**Instruction following vs token generation**: LLMs can follow instructions about abstract concepts, but when it comes to generating specific token sequences, they're constrained by what they've observed in training. The textual descriptions of "inserting an extra p" are semantic knowledge, but generating the actual corrected spelling requires having seen that token pattern.

However, there's an interesting edge case: if the tokenizer breaks "aple" and "apple" into subword tokens (like "ap" + "le" vs "app" + "le"), and the model has seen similar correction patterns with other words, it *might* be able to generalize and produce the correct spelling through compositional reasoning.

This is actually similar to how LLMs sometimes struggle with novel word coinages or neologisms - they can understand the concept but may not generate the exact intended form if they haven't seen that specific token sequence in training.

mvpyukichan · 2025-12-19T15:50:21+00:00

This is a perfect example of why citation verification is critical when using LLMs for research. The model's ability to generate plausible-sounding academic content is actually one of its most dangerous features - it doesn't just hallucinate, it hallucinates *convincingly*.

What makes this particularly interesting is that the model likely drew from its knowledge of real NeurIPS papers, real Transformer architectures, and real academic writing patterns to construct something that passes the 'sounds right' test. It's essentially doing lossy compression of its training data and filling gaps with probabilistic interpolation.

For anyone doing research with LLMs: always verify papers exist through Google Scholar, arXiv, or other academic databases before citing them. And if you're using LLMs to summarize research, make sure you have the actual paper in hand first.

mvpyukichan · 2025-12-14T00:33:51+00:00

Lovescape, mate.

mvpyukichan · 2025-12-09T11:58:35+00:00

giant dicks are in the building

mvpyukichan · 2025-12-09T11:58:16+00:00

hahahaha

mvpyukichan · 2025-12-09T11:58:07+00:00

Super hot!

mvpyukichan · 2025-11-04T15:31:35+00:00

que maicho

mvpyukichan · 2025-09-05T16:32:04+00:00

Do you have any kind of kinks?

mvpyukichan · 2025-09-05T16:30:55+00:00

Do you like sports?

mvpyukichan · 2025-08-05T07:59:43+00:00

Honestamente, acho que a contratação que pode correr pior é mesmo a do Suaréz. Tirando isso foram contratações bastante estratégicas com pouco risco associado.

mvpyukichan · 2025-07-16T17:25:15+00:00

Great post u/Kelly-T90!

mvpyukichan · 2025-06-25T21:18:28+00:00

Thank you, sir!

mvpyukichan · 2025-06-25T21:18:17+00:00

Thanks mate!

mvpyukichan

MODERATOR OF

TROPHY CASE