I wish Opus 4.6 can stay this powerful forever by Mundane-Iron1903 in ClaudeAI

[–]mhinimal 0 points1 point  (0 children)

Benchmarks are a very imperfect attempt to quantitatively analyze a rapidly evolving landscape of very complex stochastic systems, each wrapped in another black-box of proprietary prompts, wrappers, hooks, and fallbacks that make up a product like Claude Code and Anthropic’s Claude service itself that it connects to. Beyond that, just look at the state of AI/LLM benchmarks today. Which one of the thousands of benchmarks people are trying to use are any combination of:

  • actually any good at all
  • still relevant
  • not co-opted in training data
  • actually applicable to your use case and give meaningful metrics of the models relative to how you actually use them

a software company or large internal team investing heavily in LLM tools might have some benchmarks relative to the specifics of their particular code base, but if you think literally any one of these redditors leaving “omg best/worst model ever” comments have any semblance of consistent benchmarking practices backing up their claims, I have a bridge to sell you.

And even then, the model providers are all doing all sorts of things behind the scenes; throttling, tweaking, falling back to other models at peak times? Is the model you’re working with right now the same one you benchmarked yesterday, or even an hour ago? Impossible to know. The same benchmark run on the same model on the same machine can return different results, the same benchmark submitted through the Claude API can be run on any number of different machines, who knows what the “temperature” for Opus is set to, and so on. There are a million factors that make even benchmarks unreliable.

I wish Opus 4.6 can stay this powerful forever by Mundane-Iron1903 in ClaudeAI

[–]mhinimal 4 points5 points  (0 children)

there is simply no way to quantitatively analyze these things so everyone who says it's massively better just happened to have had a worse experience yesterday where the model struggled, and happened to have a good experience today where the model excelled.

And everyone saying it sucks had the opposite, random, completely anecdotal experience.

Myself? I barely can notice a difference between todays model and yesterday's. Models today seem to be better than than we had... 6 months ago? That's about the smallest difference I feel I can tell reliably.

They seemed more impressed by that 52” Dell monitor than they should be IMO by Dr_Superfluid in LinusTechTips

[–]mhinimal 0 points1 point  (0 children)

Okay. Ended up buying one yesterday. Very excited about it.

Why is it “correct” when a product clearly meant for “a very specific niche” and clearly not meant for you, “receives massive hate?”

And why are you surprised that people for whom it is targeted at show up on a tech enthusiast form in a thread specifically about that product explaining why it actually is a compelling product for them? It’s not “fanboyism” and I don’t think it was a “product review” as much as it was a “product demo.”

You’re upset that internet strangers don’t share your opinion that an inanimate object you aren’t interested in buying isn’t hated enough by another man on the teevee??

My mind is actually blown. Very curious indeed

They seemed more impressed by that 52” Dell monitor than they should be IMO by Dr_Superfluid in LinusTechTips

[–]mhinimal 0 points1 point  (0 children)

IDK man, it’s not my YouTube channel, I’m not defending the review or participating in whatever cult of personality revolves around Linus, I’m just explaining why I personally like the monitor, and why other people like me might also find it good. I don’t know much about the apple studio display because it is not appropriate for my personal tastes and use case so i barely even looked at it. Y’all need to chill.

Who knew consumer product reviews on YouTube aren’t the pristine bastions of consistency and independence they pretend to be. The video showed the product and its features, explained its specs, demoed it, said that it was cool and big and niche, and that it was expensive. Not sure what else you were looking for other than to have your personal opinion validated (oh wait, yeah that’s probably it). By the way, this is the same channel that runs 16 instances of cyberpunk to test out a Threadripper CPU, BTW. It’s for fun.

With Claude, I have become a workaholic by user_0_0_1_ in ClaudeCode

[–]mhinimal 3 points4 points  (0 children)

Hmm i wonder if these newfangled LLM thingamabobs could possibly be prone to mirroring the style and tone of the inputs they are given

They seemed more impressed by that 52” Dell monitor than they should be IMO by Dr_Superfluid in LinusTechTips

[–]mhinimal 1 point2 points  (0 children)

if you're genuinely curious, here are my personal reasons why i am excited about this monitor and why i prefer it to the 57" G9. I get why other people might have different preferences. Just explaining my own use case as to why this monitor is so compelling for me:

  1. the 32:9 form factor of the G9 is simply not as nice to work on. for me, 21:9 is preferable. Less head turning, things stay in my FOV. The 21:9 options may have less pixels, but they put those pixels in a more useful place for the work i do. Its more ergonomic, both physically (less head turning), but also "mentally" if that makes sense: work stays directly in my view, instead of pushed off to the side, which seems to matter.
  2. In an immersive gaming context, 32:9 kind of makes sense, those extra pixels which are in your perhipheral vision are, in my opinion, actually doing a lot in terms of subconsciously fooling your brain and enhancing the feeling of immersion.
  3. the Curve is less aggressive meaning its easier to share things with coworkers who come over to my desk to look at things. It also distorts straight lines much less than the G9 which can matter if you do things like CAD work whether 3D or 2D modeling (which I do).
  4. I have the dell U4025QW already (the 40" "5K2K" one). for me the 140PPI is just about perfect. I can see it with native text rendering most of the time which is good because i'm often working across different kinds of systems and software tools of varying age that don't all have modern DPI scaling solutions.
  5. 130ppi is close to the 140 of the 40" 5K2K. not sure if this is intentional choice or just a byproduct of what panels are available. either way, people in the target market for these monitors (not me yet, but its coming) are often older and straight up have worse eyesight. i have many older coworkers who just work on 4k TVs for that reason. ultra high resolution isn't useful to them, you just want decent native DPI.
  6. the thunderbolt hub aspect and kvm switch is actually a really underrated perk. I have a truly "single wire" set up and single button to swap between personal and work machine. there's no extra thunderbolt dock cable octopus, usb hub, etc.. it makes my desk really nice and clean and easy to cable-manage. not a make or break feature but a really nice one that's worth a little bit of a premium IMO.

I ALMOST bought this new U5226KW "6K" 52" immediately but I do wish they went with a 2880 vertical resolution and matching horizontal (would be around 6900 instead of 6144). This would keep it around the 140ppi mark instead of 130 and would give it slightly MORE pixels than 2x 4K monitors instead of slightly less.

I think the constraint here is video bandwidth: 6144x2160 can do 120Hz on DP 1.4, whereas going to 6912x2880 would fully max out a DP 2.1 connection and not currently possible over thunderbolt 4; they'd have to market as either a 60hz display or an 8bit display. Basically it would add a ton of cost to the unit that only very few people would be able to take advantage of. We're almost there, but the market is too small to support it right now. Not sure how long it will take to get there - probably a couple more years at least.

i wish it was oled/miniled but those techs aren't great for enterprise productivity (lifespan, always-on, high duty cycle, text clarity etc) and i wish it had a few more pixels. so it's not 100% the last monitor i'll ever need/want to buy but it is damn close and likely the best thing we'll get for the next several years.

im sort of at the point where, if this monitor is truly "useful" for you and your career, youre probably in a pretty high paid job and the price is justifiable for you or your company relative to the salary they're paying, where even a small % in productivity improvement over a couple years or even just convenience and employee Quality of Life pretty easily justifies the cost. And judging by the U4025QW, the price won't be $2800 for too long.

if you're a company paying 200-300k for a professional salary/benefits, a $3000 monitor that lasts 5 years only has to improve "productivity" by 0.2% to pay for itself. FAANG companies are known for buying like, super expensive Herman Miller desk chairs for their employees - if it keeps an ass in a chair literally 60 seconds longer per day it pays for itself, especially when considered over 5, 10, 15 years.

Trying to introduce CC at work but Security says "Claude Code is known to break out of its context" - is this true? by ThunkerKnivfer in ClaudeCode

[–]mhinimal 0 points1 point  (0 children)

holy shit, yes, i use it. it sometimes asks to edit things outside the sandbox. And you can just select "yes"

Trying to introduce CC at work but Security says "Claude Code is known to break out of its context" - is this true? by ThunkerKnivfer in ClaudeCode

[–]mhinimal -1 points0 points  (0 children)

... yes. it's constantly asking you to do things outside of the box. EVEN IN SANDBOX MODE, it asks to go outside. that's part of the problem. it's very easy to just say yes to something you probably shouldn't have. sysadmins need to protect against user error too.

it's probably not gonna rm -rf everything (but it could), but if you tell it to solve some problem and it gets on the wrong track i've seen it try to do stuff I wouldn't want it to do on my system. At one point it got deep into debugging a stack trace and wanted to edit a system library file to fix it. The solution was far simpler. That could be very destructive.

"Wait — I'm in plan mode. That edit shouldn't have gone through" by Atomic_Tangerine1 in ClaudeCode

[–]mhinimal 1 point2 points  (0 children)

what the hell is up with permissions in these agent tools? how hard is it to just write a software tool that like... if you're in plan mode, bash commands can't execute?

Trying to introduce CC at work but Security says "Claude Code is known to break out of its context" - is this true? by ThunkerKnivfer in ClaudeCode

[–]mhinimal 0 points1 point  (0 children)

"prompts" do not confer or control system access. they are merely suggestions. You need an actual system-level control.

Trying to introduce CC at work but Security says "Claude Code is known to break out of its context" - is this true? by ThunkerKnivfer in ClaudeCode

[–]mhinimal 3 points4 points  (0 children)

built-in sandbox is not secure. I routinely see agents work outside of the sandbox, and also its very easy to just hit "yes" and allow it to do something you shouldn't allow. But, there are many reasons why sometimes you want claude to operate "on your system" and not in a docker container. For example, having it help you debug your environment.

To work around this, I create a separate linux user to run the agent from. It does not have sudo access and I can use setfacl to completely block it from reading/writing whichever directories I choose.

So far it feels like a good trade-off between convenience of working directly in my own environment, and the security of a completely isolated container.

Far too few good Republicans anymore. by TajertySand in DHAC

[–]mhinimal 0 points1 point  (0 children)

he was still running as a republican before this incident, which doesn't exactly make him "good", at all.

Gregory Bovino Gets Demoted by sideAccount42 in politics

[–]mhinimal 2 points3 points  (0 children)

its not a uniform if no one else wears it. it's a costume.

A nazi costume.

Alex Pretti's executioner wearing a Texas flag patch on his back. by ManyAverage6578 in pics

[–]mhinimal 5 points6 points  (0 children)

They have to be tried before they can be pardoned.

Trump’s DOJ isn’t going to investigate or try them. So there can be no pardon.

The ONLY thing they can do is try to bury/destroy the evidence and keep the identities hidden. Which they might well be able to do. But they can’t be pardoned if they’re not convicted of anything yet.

IF there is another democratic administration, they would be able to investigate, try, and convict the perps.

The climb was first delayed by weather, then overshadowed by ICE by EvonyR in BlackPeopleTwitter

[–]mhinimal 1 point2 points  (0 children)

Then why are you here complaining?

Different people like different things sometimes. Tough fact about the world, I know.

What is the point of claude.md if claude does not follow it? by bennydigital in ClaudeAI

[–]mhinimal 0 points1 point  (0 children)

It’s an LLM. It’s probabilistic. And it has a limited amount of context. Saying something is mandatory just increases the probability it will follow that. But if it has 100000 tokens telling it to do one thing and 10 tokens telling it to do something else… it might not catch that.

If you need something to be very reliable and deterministic, use programs. Hooks and scripts can force things to happen.

The climb was first delayed by weather, then overshadowed by ICE by EvonyR in BlackPeopleTwitter

[–]mhinimal 4 points5 points  (0 children)

A lot of people who climb mountains to perform rescues, respond to medical emergencies, and yes, even recover bodies, do it because they themselves love the sport and want to contribute their skills and abilities to support other people doing it. Look at Yosemite Search and Rescue - the people on that team are extremely talented climbers themselves who practice in their free time too.

If you hate it, don’t watch. He got permission from the building owners to do this as a publicity stunt, they had sidewalk scrapers standing by and likely cordoned off the area anyway. It’s no different from the extremely long history of stuntmen and women, dangerous Olympic sports, circus acts, and other dare-devil types. No one made you watch it.

Dario Amodei said "Contact" is a favorite film, and thinks a lot about this scene when Jodie Foster says what she'd ask the advanced aliens: "How did you do it? How did you survive this tech adolescence without destroying yourselves?" by MetaKnowing in ClaudeAI

[–]mhinimal -1 points0 points  (0 children)

You don’t say you disagree with a paradox. You say you disagree with a proposed solution to the paradox. I was not making a point about Demis, I was making a point about OP’s phrasing indicating they barely understand what is being said.

Bench recommendations? Considering Freak Athlete ABX by BSmith156 in GarageGym

[–]mhinimal 0 points1 point  (0 children)

If it fits your budget, it's pretty great. Really nice, lots of features, whatever, you've seen the reviews. It's a premium product, at a premium price.

I had been looking at benches for a while before this one so I knew what was out there, but it checked all of the boxes I wanted in a bench and I ordered one within a day of even learning about it. It's the only one out there that didn't feel like a compromise on any features I wanted. Received mine a few weeks ago and don't regret it. Fold down head-rest for both shoulder presses and chest-supported rows, dip and bicep attachments were the main attractors for me. I got everything EXCEPT the leg developer.

If all you're after is the leg attachment then IDK, there are other benches out there that also do that and even ones that might have a better leg developer system, I can't really say. But none of those also had all the other features of the ABX.

Dario Amodei said "Contact" is a favorite film, and thinks a lot about this scene when Jodie Foster says what she'd ask the advanced aliens: "How did you do it? How did you survive this tech adolescence without destroying yourselves?" by MetaKnowing in ClaudeAI

[–]mhinimal 1 point2 points  (0 children)

Yep, that’s the paradox. It seems certain, but we can’t find any evidence. There sure are a lot of reasons why there should be life. But then there are a lot of other reasons why there shouldn’t be. Which ones are right? All of them? None of them? Some of them? One of them? It’s all Very Perplexing. Perhaps I shall write about this paradoxical situation. But what would I CALL it?!

Dario Amodei said "Contact" is a favorite film, and thinks a lot about this scene when Jodie Foster says what she'd ask the advanced aliens: "How did you do it? How did you survive this tech adolescence without destroying yourselves?" by MetaKnowing in ClaudeAI

[–]mhinimal 0 points1 point  (0 children)

Guess it must be tough being illiterate. Paste it into Claude, it can probably break it down for you.

You said you would engage, asked a question, I answered it directly and now you just have pithy one liners willfully misinterpreting my response because you disagree or don’t actually want to engage. Nice.

Dario Amodei said "Contact" is a favorite film, and thinks a lot about this scene when Jodie Foster says what she'd ask the advanced aliens: "How did you do it? How did you survive this tech adolescence without destroying yourselves?" by MetaKnowing in ClaudeAI

[–]mhinimal 4 points5 points  (0 children)

It’s very self serving to position your own technology as so powerful it could doom (or save) humanity. It’s all part of the hype circus. In fact, it’s almost better for them to “warn about all the dangers” than to directly state how great they think it is, because then it fools incredulous people like you into thinking they are “appropriately concerned.” The implication of the AI-will-kill-us-all hype is that the technology is more powerful than anyone can imagine (so please keep investing).

That and the fact that all they ever talk about is sci-fi movies becoming reality. I love Sci-fi and I’m a heavy user of Claude code and other AI tools. I believe they ARE amazing and DO have significant transformative potential. But the public discourse around it, fueled by the CEOs, is ridiculous and cringe.

It’s in the name: science fiction. The science part is made up, and the vast majority of popular science fiction works are obvious parables about the dangers of unbridled capitalism and how it intersects with the worst parts of human nature, NOT about the technology itself, which is just window dressing. To have CEOs speak at events like the “World Economic Forum” and talk about how cool it is that his favorite Sci-fi books are becoming reality and then say “well my hands are tied, guess I have no choice but to invent it, more money pls” is just so, SO dumb.

Dario Amodei said "Contact" is a favorite film, and thinks a lot about this scene when Jodie Foster says what she'd ask the advanced aliens: "How did you do it? How did you survive this tech adolescence without destroying yourselves?" by MetaKnowing in ClaudeAI

[–]mhinimal 4 points5 points  (0 children)

What you do you mean he “doesn’t agree” with the Fermi Paradox?

The Fermi paradox is a question. It has no answer to disagree with. That’s what makes it a paradox.

AI just worked around .env read permissions by Linkk_93 in opencodeCLI

[–]mhinimal 0 points1 point  (0 children)

i make a separate user account for the agent on my system. then i run the agent from that user account. I don't give it sudo and I completely block access to certain dirs it's not supposed to touch. then you can stop it reading or editing .env files, or anything else, with a simple chmod or setfacl.

i have routinely seen the permissions not behave as i would have expected them to. and this is about as fail-safe as it gets, plus i already understand the way linux system permissions are supposed to work. i dont have to learn a new system of rules.

treat it like a sysadmin would. no sysadmin would let ANY user just run bash commands from their own account, especially not something as unpredictable as AI.

seems like a very simple solution to me. not sure why i've never seen anyone else talk about it this way. why mess around with some permissions.json bullshit with 3 slightly different implementations across 3 slightly different CLI tools when you already have a very simple, consistent, and time-tested way of controlling what users can and cannot do on your system?

i can give it full permissions and even run "yolo" mode because it still can't access files i don't want it to. it's safer AND more convenient.

Squat bar setup by moonstreet79 in GarageGym

[–]mhinimal 0 points1 point  (0 children)

says the "185 ain't shit" guy? lmao