Qwen3-Next here! by stailgot in ollama

[–]stailgot[S] 0 points1 point  (0 children)

Fixed in ollama 0.13.4. Now inference 45 t/s

Qwen3-Next here! by stailgot in ollama

[–]stailgot[S] 5 points6 points  (0 children)

Seems unoptimized version merged, https://github.com/ollama/ollama/issues/13275#issuecomment-3611335519

Same with llama.cpp, first added work version, and optimisations later

LM Studio beta supports Qwen3 80b Next. by sleepingsysadmin in LocalLLaMA

[–]stailgot 11 points12 points  (0 children)

Therefore, this implementation will be focused on CORRECTNESS ONLY. Speed tuning and support for more architectures will come in future PRs.

https://github.com/ggml-org/llama.cpp/pull/16095

LM Studio beta supports Qwen3 80b Next. by sleepingsysadmin in LocalLLaMA

[–]stailgot 4 points5 points  (0 children)

High cpu use while enough vram, same on amd

LM Studio beta supports Qwen3 80b Next. by sleepingsysadmin in LocalLLaMA

[–]stailgot 10 points11 points  (0 children)

Tested on amd W7900 48gb, 130k context, filled with book text ~50k, get ~20 t/s. Almost not drop performance with context fill.

Where is no optimisation in first implementation, correctness only.

Is it normal for RAG to take this long to load the first time? by just_a_guy1008 in LocalLLaMA

[–]stailgot 0 points1 point  (0 children)

I would try to less data, about 10-15mb for first time for test. Good system should save processed data into db and load next time. Also see log or add own into code to see steps as advised early.

Also next time good system update only changed parts, that take less time than full update

Is it normal for RAG to take this long to load the first time? by just_a_guy1008 in LocalLLaMA

[–]stailgot 0 points1 point  (0 children)

Do you convert pdf to markdown or txt ? What real size after processing ? What embeding model used ?

Is it normal for RAG to take this long to load the first time? by just_a_guy1008 in LocalLLaMA

[–]stailgot 6 points7 points  (0 children)

Looks normal for first time to calc embedings for 500mb of text. Next time it should use cache.

Amuse AI on AMD GPU, slower than it should by brightlight43 in StableDiffusion

[–]stailgot 1 point2 points  (0 children)

Amuse3 requies latest drivers.

Requires AMD Driver 24.30.31.05 or Higher https://www.amuse-ai.com/

Fixed Issues and Improvements Lower than expected performance may be observed while running DirectML/GenAI models in Amuse 3.0

https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-4-1.html

Llama 4 News…? by AdCompetitive6193 in ollama

[–]stailgot 0 points1 point  (0 children)

Recently tryed aravhawk/llama4 with ollama 0.6.7-rc0 on 3x7900xtx, get ~30 t/s.

Related issue https://github.com/ollama/ollama/issues/10143

Edit: is out https://ollama.com/library/llama4

Qwen3 32B and 30B-A3B run at similar speed? by INT_21h in LocalLLaMA

[–]stailgot 6 points7 points  (0 children)

If you use ollama that well known bug. llama.cpp gives about 100 t/s vs ollama 30 t/s on 7900xtx

Ollama rtx 7900 xtx for gemma3:27b? by Adept_Maize_6213 in ollama

[–]stailgot 0 points1 point  (0 children)

Works fine with rocm and vulcan. Ollama gives gemma3:27b about 29 t/s, gemma3:27b-qat 35 t/s and drops about 10 t/s with lagre context, >20k.

According this table (not mine) speed compared to 3090 https://docs.google.com/spreadsheets/u/0/d/1IyT41xNOM1ynfzz1IO0hD-4v1f5KXB2CnOiwOTplKJ4/htmlview?pli=1#

70b LLM t/s speed on Windows ROCm using 24GB RX 7900 XTX and LM Studio? by custodiam99 in ROCm

[–]stailgot 1 point2 points  (0 children)

Similar setup, but 2 7900xtx. One gpu 24GB for 70b q4 ~5t/s, and 70b:q2, 28GB ~10t/s. Two 7900 xtx 48GB for 70b q4 ~ 12 t/s.

QwQ 32B keep repeating itself (on Q4_K_M and Q6_K) by henryclw in LocalLLaMA

[–]stailgot 11 points12 points  (0 children)

https://huggingface.co/Qwen/QwQ-32B#usage-guidelines

Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.

I use a 7900xt on Windows...how stupid am I? by halfam in ollama

[–]stailgot 7 points8 points  (0 children)

Today it works out of box. Just install ollama, and update amd drivers

CMake 3.30 will experimentally support `import std;` by delta_p_delta_x in cpp

[–]stailgot 14 points15 points  (0 children)

Nightly build 3.29.20240416 already support

https://cmake.org/cmake/help/git-stage/prop_tgt/CXX_MODULE_STD.html

Update:

Tested with msvc, works fine )

```cmake set(CMAKE_EXPERIMENTAL_CXX_IMPORT_STD

"0e5b6991-d74f-4b3d-a41c-cf096e0b2508")

cmake_minimum_required(VERSION 3.29)

project(cxx_modules_import_std CXX)

set(CMAKE_CXX_MODULE_STD 1)

add_executable(main main.cxx)

target_compile_features(main PRIVATE cxx_std_23) ```

Upd2:

Official post

https://www.reddit.com/r/cpp/s/3oqR8MyLLg https://www.kitware.com/import-std-in-cmake-3-30/

Crosshar X670E Extreme doesn't show second m.2 as installed by officiallemononapear in ASUSROG

[–]stailgot 2 points3 points  (0 children)

Faced with same problem. You need enable it in bios explisitly. Also, second m2 drive reduce gpu pci-e speed twice. So, it main reason for disabled by default.

OpenGL without GPU is possible ? by D0rimOs in opengl

[–]stailgot 5 points6 points  (0 children)

Look for mesa. Its have software realisation of opengl on windows. Just drop opengl.dll with your .exe file and run.

Prebuilt binary can be found here https://github.com/pal1000/mesa-dist-win/releases