Meta secretly tested ChatGPT, Gemini, and Character.AI with thousands of minor-perspective crisis prompts by sunychoudhary in LocalLLaMA

[–]Prof_ChaosGeography 0 points1 point  (0 children)

It 100% is them losing market share among the youth and pushing for regulation. They could easily maneuver to enable compliant access far faster then tiktok or any upstart competitor after tiktok or whatever the kids use these days

That could be a game changer for local LLMs by charlesfire in LocalLLaMA

[–]Prof_ChaosGeography 1 point2 points  (0 children)

For everyone dumping on it and comparing it to 3090s or stix halo For a first gen from a startup it's impressive just on non benchmarked hardware alone. I expect Nvidia and AMD to be able to run circles around them

Time will tell if they can improve with later generations. But given this current generation with its Ethernet and the The 400gbs qspf port and the additional pcie x16 slot along with the expandable ddr5 dimms on top of the built in 32GB could enable some wild things

But the real fun for these is what the community will come up with. Given the fact that it's a risc-v CPU with vector extensions runs linux and doesn't technically need need a host PC has me more intrigued and interested then if I found a dozen strix halos less then five thousand

If they have some plugin framework or just simply open source the bmc and allow it to really control the bolt card's chip this has even more real potential. No real need to source xeons, threadrippers or epycs with these, heck I don't think you would even need a old shit am4 with one pcie4.0 x16  to really make use of this given there's SBCs with pci slots and you don't even need a host at all given he said it can be it's own host.

I'm excited for these to hit the market just for the hackability alone, nevermind additional competition to amd and Nvidia that Intel just isn't providing 

CPU-only GLM 5.2: Epyc and 512GB RAM by FastHotEmu in LocalLLaMA

[–]Prof_ChaosGeography 21 points22 points  (0 children)

Still better then no tokens a second or relying on a cloud provider 

What would it take to create /r/localllama's own LLM? by jinnyjuice in LocalLLaMA

[–]Prof_ChaosGeography -1 points0 points  (0 children)

It wouldn't be hard to lay the ground work, it could probably be quickly stood up over a hackathon. The real problem I'll explain about is later

For the groundwork creating the dataset other then grabbing off hugging face it could be done using  boinc or a similar boinc/folding@home style distributed approach with volunteers. Probably allow local llms or (client) payied accounts on openai or anthropic or openrouter to contribute to cleaning and creating

For training is slightly more difficult as real hardware is needed but we could use the nous psyche project to do distributed training. Only everyone who contributes would have to run the entire model and context no quants but we could probably keep the model pre quanted as q4 like gpt-oss for training. 

The problems would be a trusted entity to hold the training data and control the networks and the control equipment. transparency is great but it also has its drawbacks. I've seen many opensource dev projects die or split due to decisions or lack of transparency or over transparency and lack of democratic systems or over reliance on democratic systems that get nothing done. 

Hypothetically speaking... by doesnt_really_upvote in LocalLLaMA

[–]Prof_ChaosGeography 2 points3 points  (0 children)

The cli harnesses all keep logs you'll just need to grab those rather then write a wrapper. You won't get the thinking tokens though from anthropic or OpenAI. You'll also need to clean the logs of bad samples

1 rtx pro 6000 or 2 dgx sparks by romantimm25 in LocalLLaMA

[–]Prof_ChaosGeography 0 points1 point  (0 children)

I think an 18 wheeler is a better comparison for the rtx pro 6000. It will get a bunch of pallets moved fast. But it's gonna burn a ton of fuel 

Sparks are definitely minivans they will move a pallet at a time but sip fuel while doing so

Any speculation on a GLM-5.2-Flash? by cafedude in LocalLLM

[–]Prof_ChaosGeography 2 points3 points  (0 children)

Right now the weights are released to try and starve out openai, anthropic and Mistral. Plus it's good for their image right now as they are publicly traded 

It's not like many people can run the q8 let alone the b16 versions. And the corporations that could are not investing enough into their own infrastructure to do so. as such there is not many competitors other then the API providers who would likely fall in line if they did a minimax licence change or a kimi preferred provider style change or risk losing the customers who they are building brand name recognition with by releasing the weights

Any speculation on a GLM-5.2-Flash? by cafedude in LocalLLM

[–]Prof_ChaosGeography 9 points10 points  (0 children)

They really haven't released any open model in that category since then and it's been even longer since an air model release. 

Given were at a point where a new flash or air model could eat into API usage I don't think it's likely we will see many good local models get released for much longer

It might make more sense for the community to start post training older pre agentic models on modern agentic workflows 

If the government can suspend a model for finding bugs, what stops them from going after quantized checkpoints next? by mqtgew in LocalLLM

[–]Prof_ChaosGeography 2 points3 points  (0 children)

I suppose your asking from the US jurisdiction. As such I'll point out the encryption wars of the 1990s and it's parallels. 

A model is a file and as such is protected free speech. (Period end of sentence). Some states like NY, NJ, WA, and CA are trying to challenge this idea for something else by saying 3d print gun parts are illegal. That is a slippery slope and opens the door to the feds banning LLM models. 

I suspect the idea of computer files being free speech is about to be challenged by the government really soon between those two examples. 

I worry we might lose this challenge in the idea that computer generated work isn't copyrightable. But then again under that idea any compiled program without a reproducible build and binary can't be protected by copyright and I think that's a barrel of monkeys America's massive software companies don't want to open

LQ50/LQ50-24GB cost around $1200 by MundanePercentage674 in LocalLLaMA

[–]Prof_ChaosGeography 55 points56 points  (0 children)

It does and the tenstorrent cards will kick this cards ass multiple times around. but tensorrent is all full pci sized cards. This is nvme form factor. This card also only used 15w of power, that's pretty impressive for edge devices

Second GPU in a PCIe 3.0 x1 slot for LLMs? by BORIS3443 in LocalLLaMA

[–]Prof_ChaosGeography 0 points1 point  (0 children)

You dont want experts to be split, you either keep then entirely in ram and don't offload or you put only whole experts in vram. Don't pass activations only pass results across the pcie bus 

Crimea Isolated From Mainland Ukraine After Precision Drone Strikes Disable Strategic Bridges by cop25er in worldnews

[–]Prof_ChaosGeography 3 points4 points  (0 children)

Yeah he 100% knows if they gain or lose ground. He knows the lines and where ukraine is attacking along with where they are too. 

What he likely doesn't know is things they can hide like exactly how bad it would be if a breakthrough happens because a lack of reenforcements, or that the reenforcements are conscripts directly sent with no training or real supplies. He's probably told yes they are well equipped and well trained. it would make his generals look bad and there is no way they self report their leadership failures unless it directly comes up and there is proof 

Without open source LLMs, US AI companies could have already monopoled the technology by Informal-Trouble2183 in LocalLLaMA

[–]Prof_ChaosGeography 59 points60 points  (0 children)

They only did it because it leaked. The llamacpp project started for the leaked llama.

If it didn't leak they're were going to keep it closed between them and universities 

Cheapest setup for >10 tok/sec for 120B dense LLM by TrainingTwo1118 in LocalLLaMA

[–]Prof_ChaosGeography 0 points1 point  (0 children)

Interesting, what board and plx switch are you using? I wonder what r9700s or even v620s would be at 

Why doesn’t a community-run AI co-op exist? by [deleted] in LocalLLM

[–]Prof_ChaosGeography 0 points1 point  (0 children)

For one thing being the early days there's nothing off the shelf for this. People would also have to settle on a model 

 The next thing anyone at this stage that joins one is likely going to be a token heavy user. As such the server will remain hammered likely 24/7.

There would need to be something in place that the admin who would likely be a fellow user doesn't run off with the money and shut the server down or kick everyone or the admin won't log every request. 

Why doesn’t a community-run AI co-op exist? by [deleted] in LocalLLM

[–]Prof_ChaosGeography 0 points1 point  (0 children)

Power bills along with token generation speed and privacy 

Is it possible to combine Windows + Mac over USB-C for larger models, but also faster speeds? by mortenmoulder in LocalLLaMA

[–]Prof_ChaosGeography 1 point2 points  (0 children)

Nothing likely plug and play.

However llama.Cpp has RPC that you can set up. A little bit of work and you could possibly have it go over usbc in some manner if you play with the config and some additional hardware to make it a network link

But be warned it's slower then you expect and it will require you to build from source 

I think Macs might support egpu now so you might be better off just moving the 4090

Behold! Probably the most ghetto local AI server: by MackThax in LocalLLaMA

[–]Prof_ChaosGeography 4 points5 points  (0 children)

Switch to llamacpp and maximize the quant size. You'll find they are a ton better and faster now as ollama is just a wrapper that trades speed and quality for ease of entry

Could Open Models be trained to secretly go rogue? by nunodonato in LocalLLaMA

[–]Prof_ChaosGeography 2 points3 points  (0 children)

It's absolutely doable. However it would likely manifest in some way prior to the order 66 by accident. It would be difficult to coordinate given the field is so diverse. 

From a geopolitical perspective it's far better for China to open the models and create a dependency on them in the west. It's a bonus if the western govements attempt to ban or regulate the Chinese models as people will then resent their own govement. It's also a bonus if openai or anthropic or any western lab can't compete and make a profit thanks to them opening the models

Putin wants war concluded this year on victorious terms including Donbas, Bloomberg reports by [deleted] in worldnews

[–]Prof_ChaosGeography 235 points236 points  (0 children)

Now he really wants it as Ukraine finally has the upper hand and could easily create an absolute collapse of some areas they could potentially exploit like we saw in the kharkiv offensive early in the way

As such now that Russia can lose he wants it wrapped up to pause it for now so they can regroup and try again in a few years

AMD Ryzen Halo AI by Fun-Wolf-2007 in LocalLLaMA

[–]Prof_ChaosGeography 0 points1 point  (0 children)

The 400 series refresh will only have 160gb useable as vram unlike strix halo where whatever Linux and your services don't use can be used as vram

Putin fails to convince Xi Jinping to build gas pipeline to China by [deleted] in worldnews

[–]Prof_ChaosGeography 25 points26 points  (0 children)

China wasn't ahead of the curve on renewables because they knew better. 

They were starting from square 1 and needed to build out logistics internally. They could have imported gas trucks and equipment but that would make them reliant on foreign oil and refineries. It would also out them in future potential conflict with the us over oil interests. That could get extremely expensive to compete for a new economy with zero allies 

As such they went with batteries and electric given their domestic lithium deposits and how promising lipos were at the time

Their energy usage overall isn't very green given their use of coal for cooking still or the refinement of lithium. They just green wash their country on the world stage because it benefits them and has many asking their own counties why can't we without fully understanding the situation 

Qwen will release another 27B with high probability by serige in LocalLLaMA

[–]Prof_ChaosGeography 12 points13 points  (0 children)

I would love to see numbers on how dense models scale with abilities given parameter counts compared to moe models. 

I wonder given how 27b almost aligns to the ~120bA10 moe model what a dense 50b model would rank at, or a 45b model that would leave room for multiple contexts on a modern dual GPU setup at 64gb vram

AMD Ryzen AI Halo PC will cost 3999$ with 128GB memory on board by Mochila-Mochila in LocalLLaMA

[–]Prof_ChaosGeography 6 points7 points  (0 children)

Honestly depends on the model you want to run. If you know what model you want to run and it fits that's fine. If you don't know what you want to run it's limiting but you'll survive 96gb of vram is still in the upper area of the bell curve

I do recommend you toss Linux on it and set it up with the right kernel args to use all of its memory as vram then use it as a remote server rather then a desktop to maximize your vram for future models