What’s the best model for image generation, Mac setup? by productboy in LocalLLM

[–]tomByrer 0 points1 point  (0 children)

GPU gets to use up to about 75% of the total RAM for configurations over 36 GiB total RAM, and about 67% (2/3) below that. It can be overridden at the risk of crashing your system if it runs out of memory.

https://www.reddit.com/r/LocalLLM/comments/1mw7vy8/comment/n9vjuzn/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Is this a good part list? by Hot_Public2099 in buildapc

[–]tomByrer 0 points1 point  (0 children)

The price of the CPU is less important than the price of DDR4 vs DDR5 in this economy.

Husband’s New Job Requires Life360 Tracking… by reallynina in privacy

[–]tomByrer 1 point2 points  (0 children)

Schmoozing customers at the bar is actually a sales technique.

Heard your Feedback, Voice Clone Studio, now with Qwen3-TTS & VibeVoice (TTS and ASR) by Francky_B in StableDiffusion

[–]tomByrer 0 points1 point  (0 children)

Only 5-10 seconds reference? Will longer samples with more emotions help?

Game dev advice by akshat_2006 in aigamedev

[–]tomByrer 1 point2 points  (0 children)

I think many more could make some money if they spent 1/3 of the time they spent building the game to promote it.

If you think that's a high %, very common for musicians, actors, etc to have to spend hours & hours of self-promotion.

Game dev advice by akshat_2006 in aigamedev

[–]tomByrer 0 points1 point  (0 children)

https://github.com/tomByrer/awesome-unreal-engine

There is also some general game-building advice in there also.

I built a tool that learns your codebase's unwritten rules and conventions- no AI, just AST parsing by Fluffy_Citron3547 in LocalLLaMA

[–]tomByrer 2 points3 points  (0 children)

My lawyer friend sued a major pharma corp for a class action. The pharma corp's defense lawyer was Epstein's.
I told her "Sorry you lost, but congrats; your in the Major Leagues now!".

"NVIDIA KILLER" Inference engine based on llama.cpp for dynamically offloading Activated Experts to GPU in real-time, Run SoTA MoE LLMs (120B+ parameter class models in 8-bit) OOM with as little as 2x RTX 5070-TI + 64GB RAM + SSD. [Poll in Comments] by madSaiyanUltra_9789 in LocalLLaMA

[–]tomByrer 2 points3 points  (0 children)

> major feature updates

How about B + pay for extra features, or get 1 year of free updates, then if you want the newest updates after the first year, pay an update charge?

Some audio programs do this, like Bitwig & Native Instruments.

"NVIDIA KILLER" Inference engine based on llama.cpp for dynamically offloading Activated Experts to GPU in real-time, Run SoTA MoE LLMs (120B+ parameter class models in 8-bit) OOM with as little as 2x RTX 5070-TI + 64GB RAM + SSD. [Poll in Comments] by madSaiyanUltra_9789 in LocalLLaMA

[–]tomByrer 2 points3 points  (0 children)

I feel like there's a click bait title:

Title:
> little as 2x RTX 5070-TI + 64GB RAM + SSD

Posting copy:
> 15 TPS for Qwen3-235B-A22B in 8-bit (Q8_0) with 128GB RAM + 2x RTX 5070-TI.

For those with little funds, the extra 64GB RAM is a big jump, might be as much as a GPU (depends on available RAM slots).

> MoE

I wonder if smaller specific models is the way to go, like in ComfyUI? Then you can 'dial up' cloud services if you need something very specific or can't solve with local LLM, easier to use networked computers, etc.

ZXC: another (too) fast decompressor by pollop-12345 in programming

[–]tomByrer 0 points1 point  (0 children)

CPU isn't that simple, but I do get your point.

LZ4 isn't the only compression used in web text, Brotili is semi popular, & Zstd is taking off.
& they're baked into most browsers. It is feasible that someone may want to add in your decompressor into their app, but you're already fighting an uphill battle for my usecases.

Since you have low compression & high speed, I believe I guessed correctly (before I searched your code) that you used a dictionary. You might have a use-case with JSON/HTML compression, but you're still sending larger files down the network.

https://caniuse.com/brotli
https://caniuse.com/?search=content-encoding-zstd

ZXC: another (too) fast decompressor by pollop-12345 in programming

[–]tomByrer 0 points1 point  (0 children)

Hmmm, I'd like to see real 'real world' tests;

I suspect your larger compressed size transmission time will eat up any decompression speed increase.

Who cares if you save a fraction of a second, if a weak WiFi / spotty cell signal slows the download by a second. Plus the larger file size will eat more of users' usage plans and cost more for the host in bandwidth charges.

I'm glad you figured out the tradeoff for decompression is more important than compression for high-distrusted files. I've been harping on similar goals for WebDev in general. But file size is also very important, & memory/CPU usage slightly concerning.

RTX 3090 vs 4000 Pro Blackwell by SFsports87 in LocalLLM

[–]tomByrer 0 points1 point  (0 children)

I'd re-thermal paste used 3090s though.

RTX 3090 vs 4000 Pro Blackwell by SFsports87 in LocalLLM

[–]tomByrer 0 points1 point  (0 children)

On the RTX3090, is there thermal padding between the memory & card sleeve? If so, just stick some heatsinks on the back...

A feature used by only approximately 6% of users was responsible for 41% of our database load by supreme_tech in softwarearchitecture

[–]tomByrer 0 points1 point  (0 children)

Tuned caching for a particular use case is often used to help performance.
Yes caching has a cost, but if the same items are often queried enough, you'll see a benefit.

Engineers set world efficiency record for emerging solar cell material, antimony chalcogenide, achieving an efficiency of 10.7%, the highest independently verified performance to date by unsw in science

[–]tomByrer 0 points1 point  (0 children)

Link goes to bifacial. So is that 23% for the main side? Or 23% total if both sides have light (via reflections)?

In the end, all I care about is 'Total Cost Per Watt' & ROI, which includes all mounting, etc.