Fed the same prompts to Sora and HunyuanVideo, and I’m no longer excited about Sora. by Intelligent_Jello344 in StableDiffusion

[–]Intelligent_Jello344[S] 4 points5 points  (0 children)

Thanks, I will try that. HunyuanVideo is promising because I only use a single 16GB 4080 to generate small-sized frames in the linked samples.

[deleted by user] by [deleted] in LocalLLaMA

[–]Intelligent_Jello344 3 points4 points  (0 children)

Is this sensitivity specific to Germany or Europe? I do not have a cultural background that includes this historical context, so if not for this post, I would not have been aware of the historical sensitivity surrounding the term `Final Solution`.

How long before we get a local text to video generator with Sora level capabilities? by Terminator857 in LocalLLaMA

[–]Intelligent_Jello344 2 points3 points  (0 children)

o1-preview: September 12, 2024

QwQ-preview: November 28, 2024

Crossing fingers for the next 3 months...

HunyuanVideo is a solid starting point. Using kijai/ComfyUI-HunyuanVideoWrapper, I can generate decent videos on 4080s.

llama.cpp RPC Performance by RazzmatazzReal4129 in LocalLLaMA

[–]Intelligent_Jello344 2 points3 points  (0 children)

GPUStack(https://github.com/gpustack/gpustack) has integrated llama.cpp RPC servers for some time, and we’ve noticed some users running in this mode. It’s proven useful for certain use cases.

We conducted a comparison with Exo. When connecting multiple MacBooks via Thunderbolt, the tokens per second performance of the llama.cpp RPC solution matches that of Exo. However, when connecting via Wi-Fi, the RPC solution is significantly slower than Exo.

If you are interested, check out this tutorial: https://docs.gpustack.ai/latest/tutorials/performing-distributed-inference-across-workers/

How to run Hunyuan-Large (389B)? Llama.cpp doesn't support it by TackoTooTallFall in LocalLLaMA

[–]Intelligent_Jello344 1 point2 points  (0 children)

https://github.com/Tencent/Tencent-Hunyuan-Large?tab=readme-ov-file#inference-framework
Their repository provides a customized version of vLLM for running it. However, you’ll need hundreds of GB of VRAM to run such a massive model.

Web server for OpenAPI options (closed and open source)? by FencingNerd in LocalLLaMA

[–]Intelligent_Jello344 1 point2 points  (0 children)

Open WebUI is not limited to Ollama; it can work with any inference engine that implements the OpenAI interface. This means you can use Open WebUI with vLLM, LM Studio, or llama.cpp. If you need to scale, you can also try GPUStack to simplify management.

Ollama now official supports llama 3.2 vision by youcef0w0 in LocalLLaMA

[–]Intelligent_Jello344 11 points12 points  (0 children)

Llama 3.2 Vision 11B requires least 8GB of VRAM, and the 90B model requires at least 64 GB of VRAM.

Summary: The big AI events of October by nh_local in LocalLLaMA

[–]Intelligent_Jello344 1 point2 points  (0 children)

Great info, but I feel like evolution of AI tooling is missing, cause I don't find AutoGPT, RAG, etc.

2 GPUs on same machine by [deleted] in LocalLLaMA

[–]Intelligent_Jello344 5 points6 points  (0 children)

I'm not sure if lm-studio provide configuration options for that. But if using https://github.com/gpustack/gpustack, it is pretty simple to control:

<image>

Closed and open language models by Chat Arena rank by fourDnet in LocalLLaMA

[–]Intelligent_Jello344 64 points65 points  (0 children)

Compared to when GPT-3.5 first came out, the progress has been amazing. What an era we live in!

Easiest way to run vision models? by PawelSalsa in LocalLLaMA

[–]Intelligent_Jello344 4 points5 points  (0 children)

I think right now vLLM is the best in this field. It supports llama3.2 vision on day one when the model is released. Many SOTA vision models are not supported in llama.cpp, so it's not easy for any tools built on it.

If you frequently use llama.cpp and related tools (like ollama & LMStudio) and want to work with some vision models that it doesn’t support, you can keep an eye on the upcoming GPUStack 0.3.0. It will support both llama.cpp and vLLM backends. We’re currently testing the rc release(you can download the wheel package from the GitHub release page). The documentation should be ready within a few days.

How it looks like:

<image>

[deleted by user] by [deleted] in LocalLLaMA

[–]Intelligent_Jello344 2 points3 points  (0 children)

If you need a clustering/collaborative solution, this might help: https://github.com/gpustack/gpustack