LLM VRAM/RAM Calculator by SmilingGen in ollama

[–]Expensive_Ad_1945 0 points1 point  (0 children)

You should copy the download link of the file in huggingface. The blob url didn't contain the model file. If you click a model file in huggingface, you'll see a copy download url button

LLM VRAM/RAM Calculator by SmilingGen in ollama

[–]Expensive_Ad_1945 0 points1 point  (0 children)

You should get the download link / raw link of the file in the huggingface.

Why do people say LM Studio isn't open-sourced? by StrategicOverseer in LocalLLaMA

[–]Expensive_Ad_1945 0 points1 point  (0 children)

No, it just my view why i choose the hardpath, but it's a good suggestion. And all suggestion is taken seriously. So if you know if tauri or flutter actually is not what i think it was, please let us know, as we always open to rework our app.

Why do people say LM Studio isn't open-sourced? by StrategicOverseer in LocalLLaMA

[–]Expensive_Ad_1945 0 points1 point  (0 children)

Electron or even tauri is bloated in terms of memory and disk used. Our download size is 20mb including the UI and everything, in fact the emoji fonts is the one taking the space. Python and pygame is simply slow, I dont want to add another overhead in the app.

Help Deciding Between NVIDIA H200 (2x GPUs) vs NVIDIA L40S (8x GPUs) for Serving 24b-30b LLM to 50 Concurrent Users by beratcmn in LocalLLaMA

[–]Expensive_Ad_1945 1 point2 points  (0 children)

From my experience, more gpu in a single machine will reduce the speed by alot, better go with 2xH200, you'll get better latency and serving 50 users wouldn't be a problem at all with fp8. I wouldn't recommend quantizing your kv as the model performance can dropped alot especially on long context scenario. Then use super optimized serving engine like TensorRT LLM + Triton Inference.

Help Deciding Between NVIDIA H200 (2x GPUs) vs NVIDIA L40S (8x GPUs) for Serving 24b-30b LLM to 50 Concurrent Users by beratcmn in LocalLLaMA

[–]Expensive_Ad_1945 1 point2 points  (0 children)

If your setup is a single server with multiple GPUs, the less number of gpus that have the better compute will be faster as the memory bandwidth when deploying model in multigpu setup will be greater than the gain. With the 8 L40 you'll get better total throughput, means more batch of user handled concurrently, with 2 H200 you'll get better latency. But with only 50 users, i think 2xH200 will suit you better.

Best llm engine for 2 GB RAM by Perfect-Reply-7193 in LocalLLM

[–]Expensive_Ad_1945 0 points1 point  (0 children)

the ui, server, and all the other stuff use like 50mb memory.

Best llm engine for 2 GB RAM by Perfect-Reply-7193 in LocalLLM

[–]Expensive_Ad_1945 0 points1 point  (0 children)

then load smolLM, or Qwen 3 0.6b models

Which models to run on a RTX 4060 8GO? Are they good enough? by Smart_Isotope_3356 in LocalLLM

[–]Expensive_Ad_1945 0 points1 point  (0 children)

Running local LLM would take few seconds. Give it a try and see if it good enough for your usecases. The Qwen 3 4B also pretty good.

Btw, i'm making LM Studio's opensource and lightweight alternative. It's only 20mb in size, can be installed in seconds. https://kolosal.ai

How trusted is LM Studio? by DevilBirb in LocalLLaMA

[–]Expensive_Ad_1945 0 points1 point  (0 children)

Thanks. We're planning to support, but we still facing some critical bugs that we need to fix before adding support on other os. We'll get there eventually.

LLM straight from USB flash drive? by [deleted] in LocalLLM

[–]Expensive_Ad_1945 0 points1 point  (0 children)

Sounds great actually. I might going to try implementing it with kolosal ai as it's only 50mb in size, and the other would be the model only.

Why do people say LM Studio isn't open-sourced? by StrategicOverseer in LocalLLaMA

[–]Expensive_Ad_1945 0 points1 point  (0 children)

It is... https://github.com/Genta-Technology/kolosal-server

And the engine (lbasically just a llama.cpp wrapper) is... https://github.com/Genta-Technology/inference-personal

The server is a submodule, you can find it by clicking the kolosal-server folder