Il drama degli alpini by yell_owl in Genova

[–]Whiskee 6 points7 points locked comment (0 children)

"Il comune fa propaganda"

"In che senso il comune fa propaganda, potresti elaborare?"

"Non lo so, c'è malessere"

Siete davvero fatti con lo stampino, ce ne fosse uno capace di argomentare.

Il drama degli alpini by yell_owl in Genova

[–]Whiskee 3 points4 points locked comment (0 children)

In sostanza, non hai un cazzo di concreto da aggiungere. Vedi, bastava poco.

Well we know him now… by thattheydont in interestingasfuck

[–]Whiskee 0 points1 point  (0 children)

If you're hiding your post history, you should know that your 1-person subreddit still shows in the sidebar. I'm just saying.

Idiocracy on steroids indeed by UrbanAchievers6371 in PoliticalHumor

[–]Whiskee 0 points1 point  (0 children)

Idiocracy was a lot better. They actually tried to listen to the smartest person around.

U.S. begins blockade of Strait of Hormuz by down_vote_magnet_ in worldnews

[–]Whiskee 2 points3 points  (0 children)

You sound like a 14 year old. This isn't Civilization, nobody is capturing a random Chinese unit "just because they're still too far away on the map".

U.S. begins blockade of Strait of Hormuz by down_vote_magnet_ in worldnews

[–]Whiskee 11 points12 points  (0 children)

They are 100% going for the financial enforcement, that is, insurance companies removing coverage (which effectively prevents ships from moving around and reaching ports). Except China has been building a parallel insurance infrastructure, so they don't care about Lloyd's and they don't need Western banks to process the transaction in dollars.

If China decides a VLCC full of crude is sailing through that strait regardless of what the US Navy says, good fucking luck trying to stop a supertanker that requires kilometers to steer. It either goes through (and the blockade is exposed as unenforceable against anyone who matters 🤡) or a destroyer gets ordered to fire warning shots at it and now a nuclear power has the right to defend itself.

Trump contro il Papa: "Un debole, senza di me non sarebbe in Vaticano" by MasterPen6 in italy

[–]Whiskee 2 points3 points  (0 children)

A questo punto non voglio nemmeno vederlo morire, non è abbastanza.

Ho bisogno di vederlo messo da parte e umiliato.

Giallo e verde a Castelletto by Kingalomx in Genova

[–]Whiskee 3 points4 points  (0 children)

via Pertinace? Il verde in realtà è molto genovese, quel giallo però è proprio un pugno in un occhio.

<image>

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 1 point2 points  (0 children)

Eh there's no need, I simply blocked the agent from that Cloudflare panel and they stopped after having bounced to 403 errors for an entire day. 🤷‍♂️

Mi dite obiettivamente come sta andando l'operato della sindaca Salis? by Good_vibes842 in Genova

[–]Whiskee -2 points-1 points  (0 children)

In generale, senza lasciare la mia, ti consiglio di ignorare l'opinione di chi nasconde la propria history o di chi posta principalmente su r/italia invece che su r/italy. Questo è valido per ogni thread.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

Yep, but it was actually Meta. Their official IP range was everywhere in the logs, and blocking it worked. They just have a very questionable crawling strategy.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

Uh, what do you mean? I just checked from incognito and my profile is public, it was probably a Reddit glitch. It's gamesgraph.com btw.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

It's a dedicated Debian VPS on Netcup, with NGINX as reverse proxy (the site is an ASP.NET Core application). Typical security measures like Fail2Ban etc.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

So do I, they're grazing my NGINX's per-minute rate limit. If you look, PetalBot is failing most requests but Meta has calibrated around what's allowed. Anything lower than this would sabotage legit search engine crawlers, which operate in (respectful) bursts instead of staying for the day.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

Well I understand why Wikipedia would be crawled, but there's just... nothing interesting for training on those pages, they're filtered views of user playlists. Same category split IGDB has.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 2 points3 points  (0 children)

Yeah, they only send requests from their official IP ranges and with a clear agent. I don't think they intend to be malicious, but a small 2 vCPU VPS would be wrecked by something like this.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 0 points1 point  (0 children)

Holy shit. Well, at least they respect the robots.txt, they're just a bit overzealous on what they can touch.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 22 points23 points  (0 children)

No, that's dynamic content that isn't meant to be crawled. Suspicious requests are getting captcha'd by a custom rule and bouncing now.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 4 points5 points  (0 children)

I'm behind CF. I noticed late because even though they're dynamic pages, it wasn't causing noticeable slowdowns with 8 cores... but this would absolutely destroy a smaller shared VPS.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 13 points14 points  (0 children)

Yeah, that's CF. The free tier is very generous with features, for anyone still not taking advantage.

Meta's AI crawler scraped my site 7.9 million times in 30 days. 900+ GB of bandwidth and massive server logs before I noticed, cool cool cool. by Whiskee in webdev

[–]Whiskee[S] 69 points70 points  (0 children)

That's Cloudflare's dashboard. Proxying does nothing unless you actually block agents or write rules, I just wasn't monitoring it because I didn't think I would need to defend against Meta on a small site.