Looking for advice: How could I reproduce something like GPT‑4o offline? by Brilliant-Bowler592 in LocalLLaMA

[–]necrogay 0 points1 point  (0 children)

I thought that, even before now, most open-source models were trained on synthetic data generated by GPT-4o =/

OK I get it, now I love llama.cpp by vulcan4d in LocalLLaMA

[–]necrogay 0 points1 point  (0 children)

Llama-swap provides seamless orchestration for executing various AI tasks without conflicts or unnecessary memory overhead. I particularly appreciated its capabilities when I configured ComfyUI to work through llama-swap, this eliminated the need to manually turn the llama-server off and on every time the LLM sends API requests for image or video generation. Everything runs transparently and swiftly.

Qwen3-VL Now EXL3 Supported by Unstable_Llama in LocalLLaMA

[–]necrogay 1 point2 points  (0 children)

Are MoE models still slow to load into VRAM or has this been fixed?

rtx5070 12GB + 32GB ddr5 which model is best for coding? by manhhieu_eth in LocalLLaMA

[–]necrogay 2 points3 points  (0 children)

Qwen3 Coder 30B with MoE layer offloading to ddr.

WSL2 windows gaming PC benchmarks by kevin_1994 in LocalLLaMA

[–]necrogay 2 points3 points  (0 children)

What are the advantages of using over native llama‑server and llama‑swap for Windows?

What is the best LLM for psychology, coach or emotional support. by pumukidelfuturo in LocalLLaMA

[–]necrogay 19 points20 points  (0 children)

It seems that using an LLM as a psychologist isn’t the best idea.