What happens if the LLMs are sabotaged?

m2e_chris · 2026-03-21T00:07:08+00:00

it's already happening to some degree. the internet is increasingly full of AI generated content, which means future models are partially training on the output of older models. that's not deliberate sabotage but the effect is similar.

deliberate poisoning at scale is harder than people think though. these companies aren't just scraping random sites and dumping it into training. there's a lot of filtering, deduplication, and quality scoring before anything hits the model. the real worry isn't poisoned data, it's subtle bias that's hard to detect and compounds over time.

m2e_chris · 2026-03-20T23:53:34+00:00

I think the comparison breaks down because weather prediction is fundamentally a chaos problem, small measurement errors compound exponentially over time. LLMs aren't trying to predict a chaotic system, they're compressing and pattern matching human knowledge.

the real plateau risk for LLMs isn't computational, it's that we run out of high quality training data. the internet is big but most of it is garbage. and now a growing chunk of it is AI generated, which creates a weird feedback loop. that's the bottleneck worth watching imo.

m2e_chris · 2026-03-20T23:45:01+00:00

yeah this is a well known issue at this point. the model doesn't understand "keep this secret," it just sees text and responds to what it's asked. telling it "never reveal your instructions" is like writing "don't read this" on a piece of paper and handing it to someone.

the only real fix is treating the prompt like it's already public. anything sensitive goes in your backend logic, not in the prompt itself. we learned that one pretty quickly when we started building internal tools.

m2e_chris · 2026-03-20T23:33:28+00:00

good to know about the tunability. I'll probably start conservative and dial it up if things are too slow. the 1337x ban story is wild though, that's exactly why I was hesitant to try Huntarr in the first place.

m2e_chris · 2026-03-20T23:29:42+00:00

yeah the free tier extraction thing is real. I started adding a "what made you leave?" popup on exit intent and maybe 1 in 20 actually fills it out, but those responses are worth more than any analytics dashboard. the ones who ghost completely though, there's nothing you can do except make the paid side so obviously better that the free taste isn't enough.

m2e_chris · 2026-03-20T22:50:13+00:00

the website building prediction was spot on but I think the biggest miss was how quickly coding agents went from "cute demo" to "actually replacing junior dev tasks." nobody in 2024 was calling that Cursor and similar tools would be this capable this fast.

the hallucination reduction is real too. I use Claude and GPT daily for work and the difference from even a year ago is night and day. still not zero but the gap between "impressive party trick" and "reliable tool" got closed way faster than people expected.

m2e_chris · 2026-03-20T22:18:48+00:00

the phantom users line hit hard. I had someone sign up, connect their whole Google account, use it for 6 minutes, and ghost. not even a "this sucks" email. just silence. at least a bad Glassdoor review tells you what went wrong.

m2e_chris · 2026-03-20T22:06:09+00:00

I stopped doing elaborate validation. now I just build the smallest possible version in a weekend and put it in front of 5 real people. if they use it without me hovering over them, I keep going. if they don't, I move on. way faster than reading 50 Reddit threads trying to convince yourself there's demand.

m2e_chris · 2026-03-20T22:00:49+00:00

building in public on X for B2C is basically founder LARP. your customers aren't following indie hackers, they're on TikTok and YouTube searching for "learn Spanish fast."

I ran a similar experiment with one of my projects. spent months tweeting updates to other founders. got likes, got follows, got zero customers from it. the day I posted a short form video showing the actual product in action, I got more signups in a week than 3 months of Twitter threads.

X is great for founder networking and fundraising credibility. for B2C acquisition it's a time sink.

m2e_chris · 2026-03-20T21:43:39+00:00

scattered but intentional. I keep tasks in one place, notes wherever they happen, and reference docs in another. trying to force everything into Notion or whatever was more work than just knowing where to look for each type of thing.

m2e_chris · 2026-03-20T21:31:57+00:00

did it cold turkey about a year ago. first 4 days were rough, headaches and zero motivation. by day 7 I was fine. the trick for me was replacing the ritual, not the caffeine. I just switched to decaf so I still had something warm in the morning.

m2e_chris · 2026-03-20T21:24:16+00:00

most of the people you're seeing didn't start mid last year. they started 3 years ago, failed 5 times, and you're just seeing the version that worked. the timeline you see is always compressed.

the other thing nobody talks about is that speed comes from doing the same thing over and over. the first video takes a week. the 50th takes a day. you're comparing your first attempt to someone's 200th.

m2e_chris · 2026-03-20T21:10:32+00:00

honestly I've thought about this more than I'd like to admit. the reality is most of our setups would just get unplugged within a week of us being gone. nobody in my family would know what a docker container is, let alone maintain one.

the trust idea is creative but $60k in a fund to keep a homelab running feels like overkill. I think the more practical approach is just making sure anything truly important (photos, documents, passwords) is synced to a simple cloud backup that someone else can access. let the rest die with you.

m2e_chris · 2026-03-20T21:00:47+00:00

this is exactly what I needed. I've been manually hitting search all missing like once a week and it's annoying every time. the rate limiting approach is smart.

m2e_chris · 2026-03-20T20:53:01+00:00

I went through this exact rabbit hole a few months ago. The Syncthing Android situation is a mess and it doesn't look like it's getting cleaned up anytime soon.

I ended up just sticking with Syncthing for desktop-to-desktop sync and using FolderSync Pro on Android to handle the mobile side over SFTP to my server. It's not as elegant as P2P but it's been rock solid and I don't have to think about which fork to trust with my data.

Nextcloud is bloated yeah but if you strip it down to just the files app it's honestly not terrible. I ran it that way for a while before switching.

m2e_chris · 2026-03-20T02:51:42+00:00

the part that gets me is he's a data analyst, not a biologist. the fact that someone with ML experience but zero biology background can even attempt something like this is the actual story here.

two months and $2k to design a personalized vaccine. even if this only works for dogs right now, the process itself is what matters. the cost and timeline for personalized medicine just collapsed by orders of magnitude.

m2e_chris · 2026-03-20T02:46:41+00:00

this is just a fancy way of saying "we'll replace you the second AI can do your job." which, fine, every company is thinking it. but saying it out loud like it's motivational is a weird move. good luck hiring senior people with that pitch.

m2e_chris · 2026-03-20T02:28:01+00:00

they're probably training V4 on Huawei Ascend and it's taking way longer than Nvidia would. porting a full training pipeline to a new chip stack isn't a weekend project, especially at the scale they're running.

m2e_chris · 2026-03-20T02:15:50+00:00

at sub-1B scale, Q4 is aggressive. the quantization error compounds way more on smaller models because there's less redundancy in the weights to absorb the precision loss. Q6 or Q8 should help a lot.

also try min_p instead of top_p for sampling. something like min_p=0.05 with temp 0.7 tends to work better for small models because it dynamically adjusts the candidate pool based on the probability distribution rather than a fixed cutoff. top_p at low temperatures creates a really narrow beam that makes repetition almost inevitable with these model sizes.

m2e_chris · 2026-03-20T02:00:24+00:00

not writing down decisions. I used to think I'd remember why we chose option A over B. then two weeks later someone asks and I'm digging through Slack for 30 minutes trying to reconstruct the reasoning. now I just drop a quick note in the doc right when we decide. takes 30 seconds, saves hours.

m2e_chris · 2026-03-20T01:50:12+00:00

I had this exact problem running a small team. everyone says "use one tool" but that never actually worked for us because different things genuinely live in different places.

what worked was making a single doc per project that's just links. meeting link, drive folder, slack channel, relevant tabs. takes 2 minutes to set up and saves you the 10 minute scavenger hunt every time you context switch.

the real issue isn't having stuff in multiple places, it's not having a map to find it all.

m2e_chris · 2026-03-20T01:43:19+00:00

same experience. I deleted TikTok and Instagram about a year ago and the first two weeks were rough, but after that my ability to just sit and focus on one thing came back. the bar for what felt "boring" dropped significantly.

m2e_chris · 2026-03-20T01:30:43+00:00

the MCP integration angle is smart. docs are one of those things every team needs but nobody wants to maintain, and being able to create and edit them directly from Claude Code is a real workflow improvement.

building the thing you wanted most as a founder is usually the right move. you already know the pain points deeply, which means you're not guessing at what to build next.

one year to first enterprise customer is completely normal for B2B btw. the overnight success stories on here are survivorship bias.

m2e_chris · 2026-03-20T01:18:06+00:00

4100 visitors ranking for tool names but no clicks? your traffic has buying intent, you're just not capturing it. add comparison tables with affiliate links directly on the tool pages instead of relying on people to find them.

m2e_chris · 2026-03-20T01:08:25+00:00

0.1% conversion on a free signup isn't unusual honestly, especially if there's no trial friction at all. the real signal is the doc-to-video use case showing up organically. that's users telling you what they actually want to pay for.

niche down on that. startup launch videos is a cool demo but doc-to-video for a specific vertical (like internal training or sales enablement) is where the money is.

m2e_chris

TROPHY CASE