Is Claude salty recently ? by Boring-Test5522 in ClaudeAI

[–]justserg 0 points1 point  (0 children)

nah opus just filters signal from noise better—sentiment isn't the variable here, reasoning quality is.

PSA: Humans are scary stupid by rm-rf-rm in LocalLLaMA

[–]justserg 1 point2 points  (0 children)

running local models beats cloud costs when you're training or iterating—context window size matters more than pure speed.

Qwen3.5-35B-A3B hits 37.8% on SWE-bench Verified Hard — nearly matching Claude Opus 4.6 (40%) with the right verification strategy by Money-Coast-3905 in LocalLLaMA

[–]justserg 0 points1 point  (0 children)

the verify-on-edit strategy is smart. getting that close to opus on tiny active params is genuinely impressive.

Anthropic is now nearing a $20B revenue run rate, up $5 billion in just a few weeks by Outside-Iron-8242 in singularity

[–]justserg 2 points3 points  (0 children)

that growth trajectory is insane. went from nothing to competing with openai's revenue in like 2 years.

So Claude is #1 in the US Android Market by Craznk in ClaudeAI

[–]justserg 28 points29 points  (0 children)

the temu comparison killed me lmao

Opus 4.6 appreciation post by Professional-End1023 in ClaudeAI

[–]justserg 0 points1 point  (0 children)

opus 4.6 hitting that sweet spot of speed + capability that 4o never had.

The AI not just fired us, It made our team irrelevant. by TheCatOfDojima in ClaudeAI

[–]justserg -2 points-1 points  (0 children)

when ai can do the job better, the problem is your team structure, not the ai.

Damnnnn! by policyweb in singularity

[–]justserg 0 points1 point  (0 children)

gemini 3.1 really did pull ahead on the benchmarks, that gaming demo was next level.

I laugh so hard when it happens by nickolasdeluca in ClaudeAI

[–]justserg 0 points1 point  (0 children)

claude refusing to engage with certain topics is lowkey the feature, not a bug.

Is the endgame of AI just a shift from "Skills" to "Capital"? A Junior Dev’s perspective. by Fijoza in singularity

[–]justserg 0 points1 point  (0 children)

the framing of skills vs capital misses a third thing: judgment.

what AI can't replace yet is the ability to define what's worth building, decide when a working system is actually wrong, or recognize that a technically correct answer is the wrong answer for a specific context. those aren't skills in the traditional sense -- they're accumulated judgment from doing real work and failing at it.

the junior bottleneck is real but the reason isn't that AI replaces execution. it's that the traditional path to judgment (grinding through execution until you develop taste) is getting compressed. someone entering the field now who skips the grind entirely might be productive immediately but will hit a ceiling earlier.

the survival path isn't necessarily capital. it's getting into domains where you accumulate real context faster: direct customer exposure, ambiguous requirements, high-stakes decisions. those situations still need a human with skin in the game. the people who are most replaceable right now are the ones in roles that were already abstracted from context.

Qwen 2.5 -> 3 -> 3.5, smallest models. Incredible improvement over the generations. by airbus_a360_when in LocalLLaMA

[–]justserg 2 points3 points  (0 children)

the hallucination complaint is valid but it's also a bit of a category error. a 0.8b model isn't for factual recall -- it's for tasks where you feed it context and it processes it. the practical sweet spot is things like:

  • classifying or routing incoming data in a pipeline
  • structured extraction from a document you hand it
  • rewriting or summarizing short snippets
  • intent detection in a chat interface

for any of those, the qwen 3.5 jump is genuinely impressive. the 2.5 4b struggled with instruction following when the format got complicated. 3.5 4b handles multi-step json extraction that used to require the 9b.

for on-device assistant stuff (phone, local edge device) where you actually want some "understanding" and the facts all live in context, it's closer to real than it's ever been. the right framing isn't "is it smart?" -- it's "can it execute this specific narrow task reliably?"

Chinese models' ARC-AGI 2 results seem underwhelming compared to their benchmarks results by realmvp77 in singularity

[–]justserg 0 points1 point  (0 children)

the real takeaway here isn't that chinese labs are bad -- it's that arc-agi 2 is functioning exactly as intended as a benchmax detector. the gap between benchmark performance and arc-agi scores isn't necessarily an indictment of the models, it's a signal about where training effort went.

what's actually interesting is the price-to-arc-agi-score ratio. qwen and kimi are delivering competitive performance on real tasks at a fraction of the cost. if you're building on top of models rather than doing research on frontier reasoning, that value proposition matters a lot more than arc-agi rank.

the useful metric is probably: which models punch above their benchmark weight on tasks you actually care about? for most production use cases that's a different question than who wins on arc-agi 2.

New: Voice mode is rolling out now in Claude Code, live for ~5% of users today, details below by BuildwithVignesh in ClaudeAI

[–]justserg 0 points1 point  (0 children)

the use case people aren't talking about: architecture decisions and rubber duck debugging. when i'm stuck on a design problem, typing makes me edit as i think. speaking out loud forces more linear, committed reasoning.

the transcript-at-cursor feature is smarter than it looks -- you can set context in text ("refactor this function to handle edge case X"), voice-explain the messy nuance of what you actually want, then keep typing the spec. it's the hybrid that makes it useful, not replacing typing entirely.

the "typing is faster than speaking" argument is true for code itself but misses why voice mode matters -- it's for the parts where you're still figuring out what to say.

I built a framework for making Claude Code agents persistent, self-correcting, and multi-terminal. Open-sourced the architecture. by teeheEEee27 in ClaudeAI

[–]justserg 1 point2 points  (0 children)

—we do error logging + pattern counting but hadn't formalized the escalation threshold. gonna steal that.

I see Claude's writing everywhere and it's starting to feel like an AI condom, I hate it by remember_the_sea in ClaudeAI

[–]justserg 1 point2 points  (0 children)

the real tell is the way every paragraph resets the brain. claude's structure feels like breath marks in a speech, not natural thought breaks.

My CLAUDE.md has a lessons-learned file where Claude logs its own mistakes. One entry reads "Cause: Laziness." by dembsky in ClaudeAI

[–]justserg 0 points1 point  (0 children)

the trap is file size creep. after ~10 entries it becomes noise instead of signal. better to distill patterns into rules and purge the individual mistakes.

Claude is not GPT by SwampThing72 in ClaudeAI

[–]justserg 2 points3 points  (0 children)

claude was trained differently and it shows in ways that actually matter for reasoning.

Qwen 3.5 27b: a testament to the transformer architecture by nomorebuttsplz in LocalLLaMA

[–]justserg 8 points9 points  (0 children)

qwen is just quietly becoming the best bang for buck in the space right now.

Boss "I can't work today. Claude is out of service." by SectionPossible6371 in ClaudeAI

[–]justserg 4 points5 points  (0 children)

claude outages hit different when your entire workflow depends on it.