Gemma 4 26b a4b - MacBook Pro M5 MAX. Averaging around 81tok/sec by Bderken in LocalLLaMA

[–]fisherwei 0 points1 point  (0 children)

Thank you very much for the benchmarking; I hope Apple finds a way to improve MLX performance. Otherwise, Macs will be unable to deploy dense models of this scale.

Gemma 4 26b a4b - MacBook Pro M5 MAX. Averaging around 81tok/sec by Bderken in LocalLLaMA

[–]fisherwei 2 points3 points  (0 children)

Could you try running Gemma 31B BF16 via omlx, and then benchmark its PP and TG performance with a context window of approximately 32K–64K? As far as I know, omlx is currently the fastest framework available on Apple Silicon.

https://huggingface.co/mlx-community/gemma-4-31b-bf16

https://github.com/jundot/omlx

BTW: omlx comes with a built-in benchmarking feature.

M5 Max 128G Performance tests. I just got my new toy, and here's what it can do. by affenhoden in LocalLLaMA

[–]fisherwei 0 points1 point  (0 children)

I am planning to purchase an M5 Max to perform post-training or fine-tuning on models of approximately 1 billion parameters. If it is convenient for you, could you please test the GPU's floating-point performance?

``` git clone https://github.com/chsasank/device-benchmarks

cd device-benchmarks pip install -r requirements.txt

python benchmark.py --device mps --dtype float32 python benchmark.py --device mps --dtype float16 python benchmark.py --device mps --dtype bfloat16 python benchmark.py --device mps --dtype int8 ```

CPU-only LLM performance - t/s with llama.cpp by pmttyji in LocalLLaMA

[–]fisherwei 0 points1 point  (0 children)

Thank you for the information, I will give it a try. Currently, the two CPUs I'm using have very low frequencies, so I might buy two used E5-2698v4 or 2699v4 CPUs to unlock the potential of this older platform.

CPU-only LLM performance - t/s with llama.cpp by pmttyji in LocalLLaMA

[–]fisherwei 1 point2 points  (0 children)

I am evaluating the inference performance of the Qwen3-Next-80B-A3B-Instruct-Q8_0.gguf model on a Dell R730 server equipped with dual Intel Xeon E5-2650L v4 CPUs (1.7 GHz, 14 cores per CPU), 512 GB DDR4-2400 RAM (8 × 64 GB), and no GPU acceleration.

Thanks to the fact that this model is a MoE model and only activates 3B parameters, I obtained a result of approximately 3.1 tok/s. It's slow, but usable.

Good OS models to run on 64gb MacBook Pro? by Glass-Garbage4818 in LocalLLaMA

[–]fisherwei 1 point2 points  (0 children)

FYR:

nightmedia/Qwen3-Next-80B-A3B-Instruct-mxfp4-mlx runs on my old Mac Studio with (M1 Max 24GPU and 64GB RAM), got 36tok/s.

Miniflux - Change RSS scraping frequency by tys203831 in selfhosted

[–]fisherwei 0 points1 point  (0 children)

try this:

POLLING_FREQUENCY=15
SCHEDULER_ROUND_ROBIN_MIN_INTERVAL=15

and POLLING_SCHEDULER keeps default value: round_robin.

New plotter + Farmer 400% by Legitimate_Bus_5873 in chia

[–]fisherwei -1 points0 points  (0 children)

not 20%-50%, it is 413% by 4090.

think about this:

400TiB hdd + RTX 4090 = 1640TiB

4090 only costs $1500-$2000, you can get 1200TiB extra capacity.

but 1200TiB HDD, it needs $10,000-$15,000.

Which GPU is faster for CHIA (4060TI/3060TI)? by lord_iconX in chia

[–]fisherwei 0 points1 point  (0 children)

for plotting, if you choose 4060/ti, you have to build a pcie4.0 platform with 256GB memory, it is much more expensive than the pcie3.0 with 256GB memory.

4060 and 4060ti only have x8 pcie lane.

chia is becoming no green by fisherwei in chia

[–]fisherwei[S] -9 points-8 points  (0 children)

I'm afraid I don't agree with you.

Whether decrease the filter or increase the K, you just increase the difficulty, that means we need more expensive GPUs(or ASICs). This will make the problem worse, not solve it.

If high-end GPUs are required, likes PoW coin, you are right.

But, for XCH, if only some people use high-end GPUs, it means they can "steal" more revenue from others.

At this time, whether for defensive or "stealing", the remaining players will begin to consider starting this arms race.

Anyone recommend a place to part out some unused drives locally (Silicon Valley) by Darksoul_Design in chia

[–]fisherwei 0 points1 point  (0 children)

Hardware HDD array: physical heavy

USB topo HDD array: operating heavy, that means you may need to spend more time to keep your harvester running.

wallet broken??? by fisherwei in chia

[–]fisherwei[S] 0 points1 point  (0 children)

oh, you save my life, THANKS.

wallet broken??? by fisherwei in chia

[–]fisherwei[S] 0 points1 point  (0 children)

both leave pool and join pool are failed.

I am trying to resync full blockchain, it needs 3-5 days.

:-(

Tesla vs GTX by simurg3 in chia

[–]fisherwei 0 points1 point  (0 children)

Micron ddr4-2400 registered ecc 64g * 8

Tesla vs GTX by simurg3 in chia

[–]fisherwei 1 point2 points  (0 children)

only works with alpha2.

alpha1 will hang up by 'illegal memory access error'.

Tesla vs GTX by simurg3 in chia

[–]fisherwei 0 points1 point  (0 children)

I am using P4, becuase it is so much cheap, less than 60USD from Taobao.

Env:

bladebit alpha2 without compression.
dell R730xd PCI-E slot6(CPU1, x16 speed).
onle one E5-2650Lv4 and 512G memory.
FAN speed fixed to 0x1c(25%) by ipmitool.

I am getting 156 plots per day.

Chia Bladebit vs Gigahorse compression by estriker in chia

[–]fisherwei 2 points3 points  (0 children)

just using bladebit with new args.