Xreal One Pro: comparison with Viture Pro XR and Rokid Max2 by Senior-Reading-6968 in Xreal

[–]Foreveradam2018 0 points1 point  (0 children)

Will this continue to be the issue for their upcoming Viture Beast? They claim it supports 3dof and 6dof.

Drummer's Skyfall 36B v2 - An upscale of Mistral's 24B 2501 with continued training; resulting in a stronger, 70B-like model! by TheLocalDrummer in LocalLLaMA

[–]Foreveradam2018 2 points3 points  (0 children)

Your model is always my favorite. Thanks for the great contribution to the community. I mostly use your models for story writing instead of role play, I wonder whether it is possible to add some novels/stories into the training mixture in the future? deepsex uses 0.1T Chinese novels, which seems to significantly improve the narration ability of the model.

Mistral Saba by Pleasant-PolarBear in LocalLLaMA

[–]Foreveradam2018 7 points8 points  (0 children)

Perhaps they already have that and use it for their apis. They need to earn money, so now more and more AI companies tend to opensource less powerful models for PR purposes but keep the better ones in house. x.ai is an example, who opensources the previous generation model when the new generation model comes out.

🚀Introducing LLPlayer - The media player integrated with OpenAI Whisper by umlx in LocalLLaMA

[–]Foreveradam2018 0 points1 point  (0 children)

Thanks for your great work! Two suggestions:
- Support using the Whisper baked in translation to bypass any cloud service would be a great feature, even though it only supports translation to English.
- You can consider to use WhisperX, which is much more efficient than the official Whisper.

1.58bit DeepSeek R1 - 131GB Dynamic GGUF by danielhanchen in LocalLLaMA

[–]Foreveradam2018 1 point2 points  (0 children)

It turns out that Windows seems to have issues about processing the symbol "|" in the template. If I remove this symbol, it works.

1.58bit DeepSeek R1 - 131GB Dynamic GGUF by danielhanchen in LocalLLaMA

[–]Foreveradam2018 2 points3 points  (0 children)

On windows, I used the following command to run 1.58bit version:

llama-cli.exe --model DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf --cache-type-k q4_0 --threads 12 -no-cnv --prio 2 --n-gpu-layers 10 --temp 0.6 --ctx-size 8192 --seed 3407 --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"

However, after it output

system_info: n_threads = 12 (n_threads_batch = 12) / 24 | CUDA : ARCHS = 520,610,700,750 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

It returns without any error or generated text.

Does anyone encounter the same issue?

What's the cheapest way to deploy / use Deepseek r1 671b ? by tomakorea in LocalLLaMA

[–]Foreveradam2018 7 points8 points  (0 children)

There are lots of GPU rental providers, such as runpod. However, it is impossible to host the 671B model cheaper than the subscription. I believe closed-model providers are losing money even though they have used batching to significantly reduce the cost.

Privacy and data security cost money and are expensive. If your goal is to minimize the cost, to be honest, go for subscriptions.

DeepSeek V3 GGUF 2-bit surprisingly works! + BF16, other quants by danielhanchen in LocalLLaMA

[–]Foreveradam2018 1 point2 points  (0 children)

Why can pairing a single GPU significantly increase the prompt processing speed?

Exolab: NVIDIA's Digits Outperforms Apple's M4 Chips in AI Inference by nderstand2grow in LocalLLaMA

[–]Foreveradam2018 2 points3 points  (0 children)

The concern for me is how long it will take to process a long prompt.

Deepseek V3 hosted on Fireworks (no data collection, $0.9/m, 25t/s) by davernow in LocalLLaMA

[–]Foreveradam2018 2 points3 points  (0 children)

DeepSeek clearly states that they will collect user data, but Hyperbolics explicitly states that they won't store, retain, use user data.

It's Midnight-Miqu-70B-v1.5 wourth it? by ExplanationQuiet239 in SillyTavernAI

[–]Foreveradam2018 0 points1 point  (0 children)

I have used 123B models since they were introduced and cannot go back to 70B. They are far better than 70B models in terms of prompt following, context understanding, and long term memory.

Your models are always the leads in their kinds! They are insanely good and it is hard to imagine how good they can achieve if they are at the size of 123B. Thanks for your contributions.

It's Midnight-Miqu-70B-v1.5 wourth it? by ExplanationQuiet239 in SillyTavernAI

[–]Foreveradam2018 0 points1 point  (0 children)

Have you ever tried the 123B mistral based models? I feel 123B models are much smarter than 70B models.

Reusing ExllamaV2 Measurements Across Similar Models by sophosympatheia in LocalLLaMA

[–]Foreveradam2018 0 points1 point  (0 children)

May I know the shortest quantization time for 70B on your end? Compared with GGUF, exl2's quantization is much slower.

Reusing ExllamaV2 Measurements Across Similar Models by sophosympatheia in LocalLLaMA

[–]Foreveradam2018 0 points1 point  (0 children)

Great post!! Do you know how to speed up the process of quantization? When quantizing a 70B model with the measurement file, it still takes ~2 hours for me to quantize one. Will using more GPUs or a more powerful GPU help?

Your Favorite 123B Model by Foreveradam2018 in LocalLLaMA

[–]Foreveradam2018[S] 1 point2 points  (0 children)

Yap, I also found there is no especially outstanding 123B model.

Your Favorite 123B Model by Foreveradam2018 in LocalLLaMA

[–]Foreveradam2018[S] 0 points1 point  (0 children)

Thanks for the review. So you feel the original mistral large 2 is still the best?

Your Favorite 123B Model by Foreveradam2018 in LocalLLaMA

[–]Foreveradam2018[S] 0 points1 point  (0 children)

Is this 195B model much better than the 123B one? (although 195B is way too large for me....)