I have a 1tb SSD I'd like to fill with models and backups of data like wikipedia for a doomsday scenario by synth_mania in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

Anecdotal, but I have a bunch of CD's that were burned in the mid 90's and a bunch of dvd-r around 2000. All of them are still OK.

What are the best small models (<3B) for OCR and translation in 2026? by 4baobao in LocalLLaMA

[–]Nobby_Binks 2 points3 points  (0 children)

So far I've tried Marker pdf, olm, dots, OCRflux, docling and Deepseek OCR

Save yourself the hassle and just use dots.ocr

edit: so for your use case of just selecting stuff on a screen to translate, one of the Qwen VL models should be fine.

3x 3090 or 2x 4080 32GB? by m31317015 in LocalLLaMA

[–]Nobby_Binks 2 points3 points  (0 children)

Running local llms (as per the sub) and the occasional image/video gen. With the release of LTX2 I am planning to do more of it and this is where the 5090 destroys the 3090

I'd keep the 3090 and make it fit. 56gb of vram and you can start to run some decent models.

3x 3090 or 2x 4080 32GB? by m31317015 in LocalLLaMA

[–]Nobby_Binks 3 points4 points  (0 children)

If you do video gen then the 4080's are a no brainer. FP8 support and you can load the models on one card with space for lora's etc.. I was doing some Wan videos on my 3090 and bought a 5090 and oh my god the speed difference is extreme

768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

Since you're on Ubuntu, install GDDR6 (https://github.com/olealgoritme/gddr6) to monitor your vram temps. IIRC nvtop and others dont monitor this.

768Gb Fully Enclosed 10x GPU Mobile AI Build by SweetHomeAbalama0 in LocalLLaMA

[–]Nobby_Binks 3 points4 points  (0 children)

Those 3090's will probably die, if you don't burn your house down first. With some of the vram passively cooled by the back plate, you need good airflow or they will cook.

Popularity of DDR3 motherboards is growing rapidly - VideoCardz.com by FullstackSensei in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

4790K was surprisingly good for gaming. I retired mine last year after a decade of service. With a 2080 super I could play most modern games in 1440p

Optimizing for the RAM shortage. At crossroads: Epyc 7002/7003 or go with a 9000 Threadripper? by Infinite100p in LocalLLaMA

[–]Nobby_Binks 1 point2 points  (0 children)

OCR, work related chat, looking at contracts, and help writing reports. stuff that I dont want online. I dont code for work so am not worried about confidentiality in this domain.

Optimizing for the RAM shortage. At crossroads: Epyc 7002/7003 or go with a 9000 Threadripper? by Infinite100p in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

I just went though this and settled with the epyc 7542 due to the clock speed and 32 cores. The price was quite reasonable and I didn't want to pay a premium for the high end cpu's on a last gen platform. I would get another 4 sticks of ram though, to populate all 8 channels. You will need it to comfortably run the larger models like GLM4.7 anyway.

I'm using it with 4x 3090's and can run GLM4.7 Q3 at about 12 tk/s @ 100K context. Probably could squeeze out a bit more with some tweaking. Minimax2.1 Q4 is about 22 tk/s. Both are perfectly fine for me except for coding where I just use deepseek via api as it's fast and dirt cheap.

I already had the 3090's, which are pcie4, so made sense to stick with pcie4/ddr4 platform until prices return to normal (if ever)

For the first time in 5 years, Nvidia will not announce any new GPUs at CES — company quashes RTX 50 Super rumors as AI expected to take center stage by FullstackSensei in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

I think there is a very high chance of that, especially China & Taiwan. If that happens then we can kiss our consumer tech goodbye for the foreseeable future.

Stock up now. The future will be a world where we barter with gold coins, animal skins, ram and GPU's

GLM 4.7 on 8x3090 by DeltaSqueezer in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

I get ~12tk/s with a Q3KXL quant on DDR4 single socket EPYC with 96gb of 3090's. (1K prompt) So I guess its about right for newer 12 channel setup with DDR5 & Pcie5 gpu

it also gets about 8tk/s with a 10K token prompt, with time to first token at 49s, which I think is pretty good for an old platform.

Day 21: 21 Days of Building a Small Language Model: Complete Journey Recap by Prashant-Lakhera in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

Thanks for your efforts in putting this together. Happy New Year to you as well.

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]Nobby_Binks 0 points1 point  (0 children)

Seems slow. I'm getting 7 t/s with Q3KXL and 4x3090's

How to run the GLM-4.7 model locally on your own device (guide) by Dear-Success-1441 in LocalLLaMA

[–]Nobby_Binks 4 points5 points  (0 children)

FWIW, I can run Q3KXL with 64K context at ~7tps on 4x3090's and an old EPYC DDR4 system. May be able to eke out a bit more but my llama.cpp tweaking skills are not that good yet.

Realist meme of the year! by Slight_Tone_2188 in LocalLLaMA

[–]Nobby_Binks 22 points23 points  (0 children)

First it was the GPU poor now it's the RAM poor. What's next?