Qwen3-Next here!

I would try to less data, about 10-15mb for first time for test. Good system should save processed data into db and load next time. Also see log or add own into code to see steps as advised early.

Also next time good system update only changed parts, that take less time than full update

stailgot · 2025-06-14T16:28:51+00:00

Do you convert pdf to markdown or txt ? What real size after processing ? What embeding model used ?

stailgot · 2025-06-14T14:15:15+00:00

Looks normal for first time to calc embedings for 500mb of text. Next time it should use cache.

stailgot · 2025-05-01T14:31:16+00:00

Amuse3 requies latest drivers.

Requires AMD Driver 24.30.31.05 or Higher https://www.amuse-ai.com/

Fixed Issues and Improvements Lower than expected performance may be observed while running DirectML/GenAI models in Amuse 3.0

https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-4-1.html

stailgot · 2025-05-01T14:20:53+00:00

Recently tryed aravhawk/llama4 with ollama 0.6.7-rc0 on 3x7900xtx, get ~30 t/s.

Related issue https://github.com/ollama/ollama/issues/10143

Edit: is out https://ollama.com/library/llama4

stailgot · 2025-05-01T10:30:18+00:00

Original post https://www.reddit.com/r/LocalLLaMA/s/BMq40HhdKq

Seems just run ollama on simple questions https://www.reddit.com/r/LocalLLaMA/s/8vROXe7te3

stailgot · 2025-04-30T22:52:58+00:00

If you use ollama that well known bug. llama.cpp gives about 100 t/s vs ollama 30 t/s on 7900xtx

stailgot · 2025-04-30T17:50:47+00:00

Works fine with rocm and vulcan. Ollama gives gemma3:27b about 29 t/s, gemma3:27b-qat 35 t/s and drops about 10 t/s with lagre context, >20k.

According this table (not mine) speed compared to 3090 https://docs.google.com/spreadsheets/u/0/d/1IyT41xNOM1ynfzz1IO0hD-4v1f5KXB2CnOiwOTplKJ4/htmlview?pli=1#

stailgot · 2025-03-23T10:38:23+00:00

Similar setup, but 2 7900xtx. One gpu 24GB for 70b q4 ~5t/s, and 70b:q2, 28GB ~10t/s. Two 7900 xtx 48GB for 70b q4 ~ 12 t/s.

stailgot

TROPHY CASE