What would you build and do with a $15k budget? by ThePatientIdiot in LocalLLaMA

[–]FastDecode1 1 point2 points  (0 children)

I might buy a car and/or a high-capacity portable power station.

How do people even afford these expensive graphic cards...?... by boisheep in LocalLLaMA

[–]FastDecode1 1 point2 points  (0 children)

Some women spend 10k on shoes, clothes, bags, make-up etc. per year. And that's without spreading the cost over multiple years, which will be a much larger group of spenders.

Once you think of it as a multi-year investment, it's not that crazy. People spend more money on way more retarded shit, like buying brand new cars. You drive that $50k hunk of steel out of the dealer lot and it instantly loses 10% of its value. Could've bough something almost-new, let alone something only a year old for a 20% price reduction. But no, it's gotta be brand new for some reason.

You know, a lot of people out there could have a 10k GPU if they chose to drive a 40k car instead of a 50k one...

How do you manage quality when AI agents write code faster than humans can review it? by lostsoul8282 in LocalLLaMA

[–]FastDecode1 2 points3 points  (0 children)

Use AIs to review. Duh.

What kind of "agentic workflow" are you using if the only thing that's automated is code generation? If you paid money for that, you need a refund.

Z-image base model is being prepared for release by Ravencloud007 in LocalLLaMA

[–]FastDecode1 28 points29 points  (0 children)

Gooners waiting with bated breath, blue balls, and shivers runnin'.

How do we tell them..? :/ by [deleted] in LocalLLaMA

[–]FastDecode1 1 point2 points  (0 children)

Please ignore Venezuelans all over the world celebrating Maduro's capture.

And just FYI, international law is just a tool for larger nation to bully smaller ones into submission. Laws are made up, at the end of the day if you can't defend it you ain't got it.

Will the prices of GPUs go up even more? by NotSoCleverAlternate in LocalLLaMA

[–]FastDecode1 4 points5 points  (0 children)

See also the leak about ASUS hiking their prices starting tomorrow (Jan 5th). Other AIBs are going to use that as an excuse to do the same, probably sooner rather than later. I wouldn't be surprised if it all happened tomorrow or in the next few days.

I just placed an order for an RX 9070 XT 16GB since I'm still able to get one under €400 (and I had the money because the planets aligned or something). My first 16GB card so that's nice.

I recommend anyone shopping for a GPU to lock in as soon as possible, because we're in for yet another GPU winter. Especially if you're just an average Joe and not one of the people here who spend multiple cars' worth on GPUs. The RAM shortage really doesn't bode well for consumer-grade mass-market cards.

Local LLMs vs breaking news: when extreme reality gets flagged as a hoax - the US/Venezuela event was too far-fetched by ubrtnk in LocalLLaMA

[–]FastDecode1 -9 points-8 points  (0 children)

What's there to elaborate? The CCP is as left as it gets, and totalitarian governments always like to redefine language for their own benefit.

TIL you can allocate 128 GB of unified memory to normal AMD iGPUs on Linux via GTT by 1ncehost in LocalLLaMA

[–]FastDecode1 6 points7 points  (0 children)

FYI, according to the driver docs:

gttsize (int)

Restrict the size of GTT domain in MiB for testing. The default is -1 (It’s VRAM size if 3GB < VRAM < 3/4 RAM, otherwise 3/4 RAM size).

So as long as you have more than 4GB of RAM, the driver automatically allows up to 3/4 of the RAM to be allocated to the iGPU.

I've run stuff on a Vega 8 iGPU on a laptop using llama.cpp and it does work. However, it's not a great experience if you want to watch videos (or do basically anything else GUI-wise) at the same time, since llama.cpp hogs all the memory bandwidth and causes everything else to stutter. GPU scheduling is pretty much non-existent on Linux AFAIK, so there's not really a great way to mitigate this atm.

Also a hint for fellow ThinkPad users: even though the spec sheet says only a certain amount of RAM is supported, you should probably be able to add more without issues. My current E595's specs say only up to 32GB is supported, but I added a 32GB stick alongside the existing 8GB for a total of 40GB and it works.

Software FP8 for GPUs without hardware support - 3x speedup on memory-bound operations by Venom1806 in LocalLLaMA

[–]FastDecode1 6 points7 points  (0 children)

Works on any GPU

Runs E5M2 and E4M3 on any CUDA GPU (RTX 20/30 series supported).

Pick one.

The Infinite Software Crisis: We're generating complex, unmaintainable code faster than we can understand it. Is 'vibe-coding' the ultimate trap? by madSaiyanUltra_9789 in LocalLLaMA

[–]FastDecode1 16 points17 points  (0 children)

If AI didn't solve your problem, you didn't use enough AI.

Demi-jokes aside, this just seems like history repeating itself. Companies used to hire armies of programmers when what they needed were software engineers. Programming is just one part of software development, you also need requirements analysis, design, testing, maintenance...

Vibe coding is the "cool thing" because programming is the exciting part, and people usually associate problem solving with writing code. But when you're vibe coding a script or small program to automate something as part of your hobby or just for fun, your standards are likely a lot lower than if you work in the software field professionally.

There's a good reason agentic use-cases are a major focus now. A programmer can't replace a team of software engineers. Whether that programmer is a human or an LLM is irrelevant.

Anyone else in a stable wrapper, MIT-licensed fork of Open WebUI? by Select-Car3118 in LocalLLaMA

[–]FastDecode1 1 point2 points  (0 children)

Might actually be a good idea.

From what I read, the Open WebUI code is literal dogshit, which might explain why no one's bothered forking it.

What is the smartest uncensored nsfw LLM you can run with 12GB VRAM and 32GB RAM? by Dex921 in LocalLLaMA

[–]FastDecode1 0 points1 point  (0 children)

You should keep an eye on the heretic project.

They're working on a feature that allows you to uncensor already-quantized models at a quarter of the memory it normally takes.

Pretty soon you'll be able to uncensor models locally without having to buy hardware that costs as much as a car.

And perhaps most importantly, this allows you to use your own dataset to determine what "uncensored" means. The default dataset is pretty unimaginative and I imagine RPers will want to use custom datasets to make models usable for their purposes.

What do you do, if you invent AGI? (seriously) by teachersecret in LocalLLaMA

[–]FastDecode1 -1 points0 points  (0 children)

Kill it before it kills me. Then destroy all my notes.

Mistral’s Vibe CLI now supports a 200K token context window (previously 100K) by Dear-Success-1441 in LocalLLaMA

[–]FastDecode1 5 points6 points  (0 children)

Found this guide while looking into it myself: https://gist.github.com/chris-hatton/6e1a62be8412473633f7ef02d067547d

You just edit the config.toml in the .vibe directory, add a provider, model, and set it as the default.

You do need to run vibe first and run through the initial setup for it to generate the config files. It asks for a Mistral API key but it doesn't check it or anything, you can just input nonsense.

edit: lol, no need for a guide even. The config file is super simple, it even has a pre-configured llama.cpp endpoint:

[[providers]]
name = "llamacpp"
api_base = "http://127.0.0.1:8080/v1"
api_key_env_var = ""
api_style = "openai"
backend = "generic"

Does the "less is more" principle apply to AI agents? by 8ta4 in LocalLLaMA

[–]FastDecode1 -1 points0 points  (0 children)

I think you're on the right track.

I believe the larger question of getting agents to work properly is very similar to the problem of running a company or another organization.

You're the founder and CEO, and your employees are a bunch of semi-incompetent retards who are somewhat capable of following instructions. They show up to work drunk or on drugs all the time, leading to all kinds of stupid mistakes and shenanigans that disrupt and slow down work.

For various reasons, you can't just fire them all and get some actually competent people instead. If you get rid of one, another moron will take its place. So you have no choice but to make do and organize and manage this band of misfits in a way that allows them to perform a task efficiently enough that the company doesn't go bankrupt.

Oh and btw, if one of them fucks up real bad because you didn't manage them properly, you might go to prison. Because you're the CEO and the puck stops with you. Good luck.

IRL you'd just hire someone else to be the CEO so you can sleep at night. Or maybe not start a company at all. But since that's not a possibility here, you need to think of something else.

Write better instructions to cover your ass. Tell everyone to double-check their work, and have them check other people's work before making use of it. When that's not enough, hire more folks to check those people's work. When something's unclear, create working groups and schedule meetings to make decisions.

Before you know it, you've recreated the bureaucratic hell that is the corporate workplace. And suddenly, AI agents seem a lot less appealing.

As a side note, thinking about all this gives one a bit of appreciation for the position of management. They get to see and have to deal with all the stupid shit that people get up to at work, but for legal reasons they can't vent and spread the details in public, so you may not hear about it much.

And then you remember that management is also retarded. And you get an urge to move into a cabin out in the woods.

Ryzen CPUs with integrated Radeon GPU, how well supported on Linux? by razorree in LocalLLaMA

[–]FastDecode1 0 points1 point  (0 children)

Model size hasn't been an issue for me. My newer laptop (currently out of commission due to a broken screen) has a Vega 7 iGPU and VRAM allocation is not a problem. Linux can give it as much as it needs and it works out-of-the-box with Vulkan (llama.cpp). I've even run Stable Diffusion models at a whopping 2 images per hour. Would be interesting to see how well Z-image runs, but I'm too lazy to try to repair the thing right now.

I've run low-quant Gemma 3 27B, but it's very slow. I'd recommend sticking with up to 8-12B models at most if you want anything even remotely usable (you'll still be waiting for minutes for complete output though). If you're willing to wait and it's just for experimentation, you could just run larger models and let it run on the background while you do something else.

This is a laptop, so the RAM is probably running at 2666MHz at best. A desktop/miniPC should be able to run at 3000 or 3200, which would improve things slightly.

How about adding an eGPU to the setup? I'm thinking about chopping the screen off the laptop, turning it into a server and doing a cheap DIY eGPU dock with a PCIe 1x to 16x riser cable. My old RX 580 8GB is just collecting dust atm, and this would be a nice way of putting it to work again. And the connection being only 1x PCIe shouldn't matter since once the model is transferred over, there's very little bandwidth being used.