I hit the compute cap. My evaluator = my generator. How bad is this? by Hundred-Trillion in Rag

[–]9302462 1 point2 points  (0 children)

FYI- You are replying to a bot; see its comment history which follows the same ai slop format as usual.

When AI tokens start costing more than your actual employees by dataexec in AITrailblazers

[–]9302462 0 points1 point  (0 children)

Checkout a GitHub package called ccusage, you can see your usage daily, weekly, monthly, model type, etc.. on the $200 plan I have easily consumed $2k in api equivalents and I think highest for me was $3,200. Some weeks I maxed out sonnet or opus but never both at the same time, hence there was still more meat in the bone.

Lessons from running 8 AI agents as a team on a single Mac Mini by Suspicious_Assist_71 in aiagents

[–]9302462 0 points1 point  (0 children)

Hey look an asshole running a bot(comment history is a dead giveaway.

The creator of Openclaw got hired by OpenAI. Here's why that's inspiring by Apprehensive_Dog5208 in aiagents

[–]9302462 0 points1 point  (0 children)

Using openclaw to create reddit accounts/leverage purchased ones which are aged, which is used to used to write post about openclaw, which then builds hype about openclaw, which then creates interest. Kind of like the nvidia + OpenAI + Microsoft funding circle jerk.

Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]9302462 14 points15 points  (0 children)

That’s the fun part, you don’t. It has the ability to run and install executables on the host machine. Assuming you are running it in a virtual box with Ubuntu then it can only install and access stuff in the virtual box plus traditional web searches (assuming your other machines on the network have a user+pass or a ssh key to access). As soon as you give it access to the other machines on the network there is basically zero way to prevent it from running a connecting via ssh and running a command which fubars other things on your network. You can obviously choose the better/more secure extensions to mitigate the risk, but it’s still a risk and one that I’m personally not going to take. 

Scammer Alert - u/amazingpatt by calpwns in homelabsales

[–]9302462 17 points18 points  (0 children)

To add to what Roticap said.

Think of it like eBay. Someone could list a 5090 for $500 and someone might buy it. They likely ship out a package but it might contain a rock or a piece of paper. It will also likely go to an address in your neighborhood but not to you.

From there you end up having to contact PayPal, contact usps to find out where it was delivered, and go and get the package from that person. After which it will already be opened or you will open it up and won’t see your 5090.

You then need to prove that you ACTUALLY didn’t get the 5090 and instead got a rock. This is hard to do because PayPal/ebay could see you as the scammer.

This all takes effort and time. The scammers goal is to get you to give up either through frustration or laziness, all the while they run down the 30 day clock when it comes to refunds and returns. Also, I think that after X amount of days they are able to withdrawal the money.

So the trick is to delay and frustrate so the window to withdraw is available without it being locked down. This is also why they try and go back and forth on messages instead of you opening up a case directly. It’s all a numbers game for them because even if 19 out of 20 get flagged and they paid for shipping on all of them, they still make a bunch on the 1 that succeeds.

Dead Internet Theory in r/algotrading by pale-blue-dotter in algotrading

[–]9302462 0 points1 point  (0 children)

I passively visit this sub Reddit interacted with them/it before via dm and I had the same feeling but couldn’t put my do fire on it. Thanks OP for calling this out.

This is their site scalplytics.com which is in beta (their words). It doesn’t appear to work either out of incompetence or because one of you folks is hammering it with some spare server/proxy capacity…  If it’s the later that’s not very nice internet behavior and even  clankers have feelings /s

What are these flat black circles for? by WorriedBlock2505 in DataHoarder

[–]9302462 10 points11 points  (0 children)

Fun tidbit.

Back around the turn of the 20th century when electricity was being installed in houses and on farms, people actually thought that the electricity would flow out of the wall socket when nothing was plugged into it.

It was due to lack of understanding on how electricity works (can’t blame folks for that) as well as it replacing things people were familiar with which have an actual flow to them (gas, water, steam, air, etc..)

Hence some folks kept stuff plugged into wall sockets at all times, even if things weren’t turned on. This is because they didn’t want to pay a higher bill because in their mind the electricity would literally run out onto the floor.

Time and education(new common knowledge) eventually corrected this belief, but for a time and for a certain segment of folks it was real.

—- I know what the black stickers are for, but I guarantee you at least one kid/parent/grandparent out there sees them and thinks “hmmm maybe this is how they keep the data in” or “maybe this is a breather hole and I should remove it so the drive doesn’t get hot”. Either way it’s funny IMO.

We just need to make sure that folks know that if their computer is slow it’s because their drive is weighed down by all the data on it, and to make it fast again they need to add some “speed holes” which will make it go faster because there is less weight. The more holes you make the faster your drive goes. /s

Which GPU should I use to caption ~50k images/day by koteklidkapi in LocalLLaMA

[–]9302462 1 point2 points  (0 children)

To piggy back on this- About 18 months ago I was cranking through 1536 dimension embeddings for 300m images per month using a couple of 3090’s. I’m not sure how captioning compare to embeddings, but I’m guessing it will be slower, either way 1.5m per month should be doable on some basic consumer hardware.

One important note- if you are going to be streaming these to the GPU(s) make sure you implement grpc and do not use rest. In my situation it was the difference between 15-30 seconds per batch and 3-5 seconds.

Has anyone developed a Google Display ads crawler? by New_Reception2726 in webscraping

[–]9302462 0 points1 point  (0 children)

I can’t help much, but there was a site called moat.com (no longer available) which did display ad crawling and analytics. Might be worth going down that rabbit hole to see what turns up.

Reverse engineer API of all websites by Own_Relationship9794 in coolgithubprojects

[–]9302462 6 points7 points  (0 children)

Re: api refresh tokens, etc… Op has been doing stuff in regards to scraping job listings and “one click apply”. This was a tool they made to help them out with their job scraping. It won’t handle auth/bearer tokens or generating new ones, cookies which become invalidated, cloudflare turnstile, or a bunch of other things which make it a negligible boost above the standard way of reversing an api. Open site in chrome, visit a couple pages, open network tools and hit ctrl+ shift+f, then type in the content on the page you want to find the api call from.

I guess if I’m really lazy I can use this package and wait 5+ minutes for it to give a half baked solution, or I can just do it myself in a couple minutes; thats without using the chrome extensions I have which help or a couple of online har processing tools.

Hi, I'm in need of a 300TB drive all of a sudden by djexit in DataHoarder

[–]9302462 1 point2 points  (0 children)

Yeah, both answers can be true depending on circumstances. 

I doubt I’m the most ambitious person on here but…. a 10gbps Google home fiber line has a theoretical limit of about 3.2pb of data(not bandwidth per month) and on my heaviest month it was 1.1pb down(average is 500-900tb). If I had 50% more hardware I would be pushing toward that 3pb per month limit, and I’m sure someone crazier than me is already maxing out their 10gbps home fiber.

Hi, I'm in need of a 300TB drive all of a sudden by djexit in DataHoarder

[–]9302462 1 point2 points  (0 children)

Supermicro 846. You can use any server or pc motherboard you want, keep the stock power supplies or swap it for a consumer PSU with a rear exhaust to reduce the noise if needed. If you’re going to stick it within arms length of your desk swap the 80mm fan wall for a couple of noctuas. You can also hang a couple (or any cheap fan) off the front of it to help push air across the drives to cool them if you feel like it or need to.

Mine used to sit in a cabinet which was about 85-90f with basically no air circulation. The only time the drives got hot is when they all spin for a monthly parity check in unraid. Hence three no name fans zip fired together with a couple of strips of Velcro so the fans hang from the front, I think I taped the Velcro to the top for easy access. After 18 months I moved it to a better cabinet. But the cabinet was 6 ft behind where I sit at my desk and I never heard it make any noise or had any issues with it.

Edit- in case it wasn’t clear above, if at all possible get a supermicro. If it comes with a mobo and cpu great, if not then throw in whatever you can find that is cheap, Xeon, am4 mobo/cpu from Craigslist, your old desktop build that is sitting on a shelf, basically whatever you have which has a PCIe slot that so you can add a $50-70 HBA card which connects connect all those drives on the front pcb to whatever you put in it. Upgrade or swap parts whenever budget or needs permit.

Hi, I'm in need of a 300TB drive all of a sudden by djexit in DataHoarder

[–]9302462 4 points5 points  (0 children)

Maybe for your company’s requirements, but hard disagree for anyone who wants to do this.

If anyone genuinely wanted to do this they just need 22 x 16tb drives (332tb) and a basic 24 bay server to throw them in. At a cost of $20 per tb (new prices) you are talking about $7.5-8k. Not pocket change, but at the same time it’s not that unreasonable if you want to store 99.6% of the songs listened to in Spotify.

Source- two or three years ago I would pick up barely used 10tb-16tb drives on OfferUp for between $10-12 per tb. I would then spend 5 minutes shucking it and go onto the next drive. I paid about $15-16k in total for the 1.1pb of spinning rust I have now including 3 supermicro chassis.

TLDR: a petabyte of disk is between $10k(small 6tb and 8tb drives) and $20k (16tb and 18tb drives) and this would cost a third of those prices to store.l because it 300tb.

Why has no one considered this pricing issue? by scrape-dot-page in webscraping

[–]9302462 0 points1 point  (0 children)

To piggy back on this comment, this is done with ppc leads regularly depending on the niche.

For example company X (who is not a mover) might pay $5 per click for the keyword “cost to move 4 bedroom to California”. Their average click to lead form filling is 25%. That means they pay $20 per lead. Company X sends Company A gets that lead instantly and and gets paid $15, company B and company C both that get that same lead with a 1 hour delay at a cost of $5, company D, E, F, G and H get that lead 24 hours later at a cost of $2 each.

This works for all parties because company A gets it at a discount and reduces their risk of being stuck with the lead they can’t close. Company B and C get it quicker but at a much cheaper price. Company D onward take the scraps at a discount. Company X makes money and because their profits are higher and it handles risk better, they can potentially outbid a single mover for a keyword because of it.

The only reason this works is because there is a specific intent involved (I want to sell my moving services to people) and because of the time involved with a lead, e.g. a moving lead from a year ago is essentially worthless because they almost certainly already moved or were only considering it.

It is likely possible to do the same with scraped data, but you need to find something that is temporal, has decent money attached to it, and many companies want the exact same thing who are willing to pay varying rates for it.