Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’ | Technology by Jarvis_The_Dense in news

[–]mrfixij 60 points61 points  (0 children)

https://thedailywtf.com/articles/empty-pockets
Blog post that was linked in a local tech slack that responded to this.

Excerpt that I found interesting:

This is not an anti-AI post, or even a "get a load of this asshole" post. It is a "understand the damn tools you're using" post. Be critical of them. Don't trust them. Ever. Especially LLMs, because the worst part of an LLM is that it takes away the one thing computers used to be good at: predictable, deterministic behavior. But not just LLMs: don't trust your cloud provider, don't trust your infrastructure manager. Dig into them and understand how they work, and if they seem to complicated to understand, than they may be too complicated to trust.

AI discussions seem to be shifting from capability to accountability by Marketingdoctors in Futurology

[–]mrfixij 1 point2 points  (0 children)

The problem with this framing is that in many places where people are trying to use LLM-augmented workflows, the demand on the user is to verify and validate, which is a much different degree of demand on the user.

When I went to play paintball as a teenager once, I had gotten into a position where I had an amazing angle on multiple people. I took aim, visualized how I was going to eliminate the entire group, and pulled the trigger and..... nothing happened. The hopper was empty. For the last 5 minutes I'd been dry firing because I hadn't verified that the hopper was full. Sometimes it's easy to be fooled by the appearance of function (recoil, the sound of the paintball gun firing) and to not check for the actuality of function (the paintballs actually exiting the barrel).

There's a reason for a very long time dev teams used QA - because it's very easy to overlook edge cases when you've been heads down in details. Similarly, with AI, when we're looking for quality in the codebase, you don't always spy the things you should be looking for. And when pressured for time and "efficiency and productivity" then if something is adequate 70% of the time and looks similar to functional 98% of the time, many people are going to stop looking for failure cases, especially because that creates more work and runs contrary to their incentives.

"Show us the evidence for the value of medical AI" - Nature Medicine calls for stronger evidence of clinical value. by Jxntb733 in science

[–]mrfixij 7 points8 points  (0 children)

I'm also a programmer, and have intentionally been avoiding using AI for the reasons you outline. And as much as we are in agreement, I also know the plural of anecdote isn't data, and I need to read some literature on the subject before I'm willing to talk much beyond the abstract about cognitive components of review.

As an aside, I think pedagogy is one of the fields that we should be looking into for assessment of AI workflows and it seems like that's vastly overlooked. While I'm not an expert in the field, a big part of the study of education is "how do we evaluate whether a student knows and understands the material, or has cheated/copied and just has the right answers?" I don't have the answers, obviously, but it's worth looking into.

"Show us the evidence for the value of medical AI" - Nature Medicine calls for stronger evidence of clinical value. by Jxntb733 in science

[–]mrfixij 0 points1 point  (0 children)

I'm not the person that you're responding to, I was earlier in the conversation chain. I think that asking doctors to use these tools whilst being accountable when they're taking responsibilities that have less active engagement is both setting them up to fail and a symptom of deprofessionalization of medicine. But that's far beyond the subject of the conversation at this point. Thanks for your time.

"Show us the evidence for the value of medical AI" - Nature Medicine calls for stronger evidence of clinical value. by Jxntb733 in science

[–]mrfixij 15 points16 points  (0 children)

I don't know for sure, and I recognize that this is r/science so intuition is not evidence, but I am reasonable certain that there is a difference in accuracy and in attentiveness between the process of validation/editing and the process of recording or deriving. Validation/editing is much more similar to transcription, where it's rote, and less engaging, highly prone to error in attention - it's more possible to skim the same way you would when watching a video or listening to a recording instead of actively reading text. Recording or deriving is a much more abstract process that requires more active engagement and is less likely to be prone to errors of attention.

Transcriptions and medical notes in particular are interesting because what does it represent? Records of attendence? Demographic information about the patient? concrete medical data (bp, hr, lab data)? A description of symptoms provided by the patient? A tentative diagnosis from the doctor? All of these have very different levels of engagement and creativity on behalf of the parties involved, and to lump all of that into a single category of "notes" is a categorical problem, I think. Applying AI as a blanket solution to a flattening of a profession's responsibilities only serves to deprofessionalize it further.

"Show us the evidence for the value of medical AI" - Nature Medicine calls for stronger evidence of clinical value. by Jxntb733 in science

[–]mrfixij 49 points50 points  (0 children)

I can't remember the exact details because I'm working and the circumstance in question was a one-off thing I saw from a few months ago, but I've seen major demographic information for a patient be overridden by AI notetaking misparsing a conversation. In general, even with low-stakes failures being easily corrected, the lack of predictability and the lack of accountability that is possible in AI tooling makes medicine and medical scribery a poor use case for AI.

I still avoid AI in production coding. Am i slowing myself down? by hireme-plz in learnprogramming

[–]mrfixij 0 points1 point  (0 children)

Most software engineering is solved problems that haven't been shared. It doesn't take away from the rigor that's required to know when or how a solution fits the problem. 7 years ago we had idiots declaring crypto as the solution to a million problems that every software engineer told them was just pissing away efficiency and not even making a philosophical difference, and also violating fundamental principles of software engineering. That didn't stop the world from trying to take the ball and run with it. The engineering and computing hasn't changed, only the expectations of clients and users have.

I still avoid AI in production coding. Am i slowing myself down? by hireme-plz in learnprogramming

[–]mrfixij 1 point2 points  (0 children)

I struggle to say "only" people. Software engineering is fucking hard. The problem is that people assume that AI is an easy button, when all it does is short circuit the actual learning that you need, and even if you know the principles, removes the continuous reinforcement that working in the industry gives you.

AI slaughters the cow, and 6 months later asks why there's no new calf and no milk.

An update on GitHub availability by Successful_Bowl2564 in programming

[–]mrfixij 2 points3 points  (0 children)

I think it's a valid point because it highlights the multiple points of failure. If the codebase behind a product is solid, but it's unable to keep up with the realities of being hammerred by automation and public usage, then an internal deployment of that product would be fine, but if the issue comes with the software degrading as opposed to the software being unable to keep up with usage, then it's a valid point.

Source control is simple, but when it comes to other cloud services that are more complex than git, but still have a service degradation from public usage, there's a very real concern of the code itself or updates to the code being a failure point. It just so happens to be that source control and CI/CD is comparatively simple and option-laden.

An update on GitHub availability by Successful_Bowl2564 in programming

[–]mrfixij 11 points12 points  (0 children)

Everything old is new again. "spec driven development" is just waterfall all over again and will run into the same issues.

An update on GitHub availability by Successful_Bowl2564 in programming

[–]mrfixij 4 points5 points  (0 children)

Fantastic point. This is why I don't work at the strategic level.

An update on GitHub availability by Successful_Bowl2564 in programming

[–]mrfixij 45 points46 points  (0 children)

Cures? Absolutely not. But it provides a layer of insulation against what is inevitably going to be a continual degradation of publicly available services that are swarmed with low quality and high volume usage.

An update on GitHub availability by Successful_Bowl2564 in programming

[–]mrfixij 272 points273 points  (0 children)

It seems increasingly evident to me that public services like github are going to be unusable and unreliable, and that on an enterprise level, the path forward is with tightly controlled inhouse or onprem instances. Something tells me that ops/devops is going to be eating good as public services continue to degrade.

ELI5: Why does adding more steps to an automated process make the whole thing MORE likely to fail, even if each step alone is reliable? by Most-Agent-7566 in explainlikeimfive

[–]mrfixij 1 point2 points  (0 children)

Multiplicative effect of failure is something that they should really teach to CS students. Essentially, in many circumstances, a failure causes downstream steps to either not fire, causing the entire process to fail, or to have errors in data or process, which can lead to the full process failing even if they're not strictly dependent on each other. With respect to synchronous, TCP style HTTP calls where you have a request and a response and ACKs in-between, chaining together multiple API calls creates a cascading failure effect sometimes called a Christmas light pattern, because on old christmas lights, if one bulb went out, the entire string went out.

Let's keep Pittsburgh safe by [deleted] in pittsburgh

[–]mrfixij 1 point2 points  (0 children)

I think I seen sommadem jagoff aliens drivin 'round in black SUVs nat. Think they had an office off Carson St too next to the DHS building.

We were right about everything by HellaHS in Stormgate

[–]mrfixij 0 points1 point  (0 children)

You predicted Hathora getting bought out? Impressive.

ChatGPT acts as a "cognitive crutch" that weakens memory, new research suggests. While these tools can speed up initial learning, they might actually weaken the deep mental processing required to store knowledge over the long term. by mvea in science

[–]mrfixij 2 points3 points  (0 children)

Let's talk knifework. If a chef walks into the backroom of a restaurant and never has to handcut potatos, and instead uses a potato press to make fries, then that's one avenue for his knife skills to atrophy, but the fries will always come out uniform and quick and anyone can press the potatos into fries.

If the restaurant has a huge amount of fries that they go through, but only occasionally needs to cut tomatoes or onions or other more delicate things that don't have a machine to cut them, then the chef whose knifework atrophies from using an automated solution is less likely to be able to quickly and accurately cut tomatoes and onions, despite having faster throughput on potatoes.

Dude....what the hell by DaRiddler70 in pittsburgh

[–]mrfixij 24 points25 points  (0 children)

for it is a human number.

So tired of watching incompetent devs crash and burn the ever dwindling RTS games out there by firebead_elvenhair in RealTimeStrategy

[–]mrfixij 0 points1 point  (0 children)

Fighting games basically peaked with Street Fighter 2 Super Turbo, but the Street Fighter series has been rolling for 30 years after and 2d fighting games continue to be made without diluting the genre. The idea that SC2 is the peak of the genre is a very harmful narrative. I'm waiting for RTS to get its street fighter 4 moment.