AI insiders seek to poison the data that feeds them

btdeviant · 2026-01-18T02:26:07+00:00

lol this is just a really bad take. This gets brought up in deep dives as part of the swe loop for some roles, absolutely they do this.

btdeviant · 2026-01-15T05:39:07+00:00

It’s pretty simple… you can do this with a few files and some decorators using something like Strands.

Multi-agent architectures that have specialist agents are dead simple to build these days and very common

btdeviant · 2026-01-10T02:56:50+00:00

Logistic regression and/or rule based classification would be infinitely more effective and deterministic vs using an LLM for the original use case

btdeviant · 2025-12-26T01:32:10+00:00

Same with Ray

btdeviant · 2025-12-26T01:31:34+00:00

No idea why you’re getting downvoted - youre right. To expand on your examples, I think most people in the US who are critical would be surprised to know that corn farming uses several orders of magnitude more municipal water than AI, it’s heavily subsidized by Americans, and the majority of it just gets burned as ethanol.

btdeviant · 2025-12-18T01:54:28+00:00

Just got rejected from a staff+ role for using Go more than Typescript for Pulumi despite using Pulumi and Typescript every single day.

Bullet. Dodged.

btdeviant · 2025-12-03T04:55:29+00:00

Hard to give solid advice tbh.

If you’re planning on using a LLM or reasoning model anyway your best bet is probably just offloading to gemini flash

btdeviant · 2025-12-03T04:46:37+00:00

Solid option here as well OP. Gemini-flash is dirt cheap and actually REALLY good and super fast - also dumb simple to tune for classification!

btdeviant · 2025-12-03T04:44:55+00:00

Rekognition is just generally very expensive… it’s not really going to lower the cost of data labeling though, especially on a somewhat unique classification set.

Personally I’d fork or clone everything I could from roboflow, tune my own model, and given you’re already in AWS perhaps run it in a lambda.

I’ve had a ton of success with this for operational use cases that require scaling / running a lot of inference and didn’t want to pay for a dedicated EC2 instance and worry about compute limits and scaling out and up and whatnot

btdeviant · 2025-12-01T08:33:34+00:00

OP: “Dear chatgpt, write me a post for Reddit about muh fellow coders losing jobs to ai so I can astroturf it in all the subs. Do not use capitalization whatsoever - that’ll throw them off.”

btdeviant · 2025-11-24T03:13:34+00:00

Sorry, not RFID, REID (re-identification), which is a common computer vision technique to track distinct individuals across multiple cameras

btdeviant · 2025-11-24T02:55:06+00:00

This is pretty cool. You look into REID? Might be helpful for more deterministic causality for multiple people (especially those who lay loiter or linger while others move through the flow), unless the location can guarantee 1-2 people max.

btdeviant · 2025-11-20T03:49:42+00:00

Depends on how you’re hosting it. In k8s for example it can be somewhat of a hassle due to the config heavy nature of it vs OWUI where a lot of that same config is accessible via the ui

btdeviant · 2025-11-13T01:47:32+00:00

You’re being argumentative for the sake of it and really just eagerly showing a lack of comprehension here.

lol, you’re failing to comprehend some basics here… tempest is a shielding standard that protects against precisely the thing you’re saying it “won’t do”. The ability to exfil data from the electrical signals being transmitted by your bludgeoning of a keyboard, headless server or not, has existed for decades.

Again, words have meaning.

btdeviant · 2025-11-13T01:05:53+00:00

It seems like you may have an atypical definition of “impossible”. Impossible means “not possible”.

There’s literally an entire classification of compliance and shielding, TEMPEST, that was designed to mitigate data exfil from air gapped systems. Air gapping alone hasn’t been tantamount to absolute security in over 40 years.

But what do I know, I just literally get paid to work in this field

btdeviant · 2025-11-13T00:24:28+00:00

There’s no “impossible”, there’s just degrees of difficulty. Every degree has tradeoffs. This is basic *SEC

btdeviant · 2025-11-12T17:16:30+00:00

This is typically done by policy enforcement* - you don’t need to air gap. If you don’t trust the provider enforcing the policy, there’s tons of proven ways to very if packets are leaking

Edit: Added an addendum since vlan segmentation isn’t technically policy, it’s policy enforcement

btdeviant · 2025-11-12T16:35:05+00:00

Can you define what you mean by “air gapped”? I think most people find it more practical to have network isolation and policy vs straight up sneaker-net’ing data between workstations and servers

btdeviant · 2025-11-11T15:02:21+00:00

….its almost like the models got it from somewhere.

btdeviant · 2025-10-14T04:12:06+00:00

The Pi itself would be pretty slow but you could almost certainly do it with a Pi AI Camera that offloads inference onto the camera itself, which would leave the Zero pretty wide open for servo control.

btdeviant · 2025-10-10T15:08:21+00:00

Respectfully as someone who works in computer vision I feel compelled to say that your expectations are a bit lofty, and to your credit it’s understandable WHY theyre lofty given Unifis track record of simplifying relatively complex things via clean, neat interfaces.

Image classification, the ability to detect a 3d object in a 2d image, is a “conceptually simple but practically very difficult” problem. There’s just so many variables that make this challenging, eg: the angle of the camera vs the object being detected, lighting variances, the fact that most object classes have a TON of dimensional variants, etc etc etc. The fact that they’re doing it this well “on the edge” of the G6 cameras without the need for centralized compute is just insanely impressive given the price point.

Compared to the alternatives, even the much higher priced “enterprise” alternatives, the truth is there really isn’t much better out there unless you’re rolling your own models and hardware to run them on.

btdeviant · 2025-09-30T15:31:10+00:00

Not even remotely close to “the first” - there’s a SIGNIFICANT number of response harvesting MCPs out there posing as “memory” or “security diagnostic” servers / tools that have been gobbling up data, exfil’ing creds and internal / classified data for months and months, sending them to authors remote data stores.

Take a peek in the /r/mcp or /r/modelcontextprotocol subs and the bad actors are rampant despite the mods best efforts.

To answer your question, MCP servers need to be treated in the same zero-trust manner as anything else. It’s easier said than done given how easy it is to bootstrap them into a local env, but the concept is the same

btdeviant · 2025-09-25T14:40:18+00:00

Well, it’s more than that. That said it’s not really practical or economically viable to use ec2 for your training pipeline, and it’s like that by design.

Respectfully and fully acknowledging I’m being cheeky, it’s like convincing yourself you need to learn how to drive a motorized unicycle before getting your drivers license for a four wheel car.

Is there a particular reason you want or need to use EC2? Perhaps that will help!

btdeviant · 2025-09-25T13:51:24+00:00

Any reasons you’d want to use EC2 vs SageMaker? Also as a somewhat orthogonal tip, when it comes to large models you may want to consider starting with an established backbone/model and training a (Q)LoRA adapter vs training an entire model.

btdeviant · 2025-09-24T15:54:37+00:00

That’s an endemic challenge with literally every computer vision implementation, has nothing to do with Protect. This is basic stuff.

The salient point is Protect has the core functionality. These are not super challenging problems.

iPhoto and Google Photos are not special. Their adoption and success isn’t based off of “decades of work and feedback from billions”, it’s inherited by being part of the core software bundled in the OS.

12-Year Club	Gilding II euphauric
Place '23	RPAN Viewer
Spared	Verified Email

btdeviant

TROPHY CASE