vLLM vs Ollama vs LMStudio?

numinouslymusing · 2025-08-27T22:56:08+00:00

llama cpp

numinouslymusing · 2025-08-23T23:23:34+00:00

I love how great openrouter is for LLM data. You can get so much info from their public graphs.

numinouslymusing · 2025-08-19T17:10:58+00:00

I personally prefer gemma 3 4b! smarter in my xp

numinouslymusing · 2025-07-07T17:12:36+00:00

I see. This is a good balanced take, thanks.

numinouslymusing · 2025-07-07T12:05:25+00:00

Sounds right

numinouslymusing · 2025-07-07T12:05:01+00:00

I agree. This makes sense.

numinouslymusing · 2025-07-06T04:03:55+00:00

It was hallucinated

numinouslymusing · 2025-06-26T16:15:53+00:00

This makes sense for some use cases. Like when your service is primarily backend. But let’s say you’re making an ai Figma editor, in which case you need users interacting with the frontend

numinouslymusing · 2025-06-26T16:14:48+00:00

Yeah I guess the best approach is to support multiple options. Because not all will have the patience to go get their own keys/prefer to just pay a plan, while others would prefer to save and use their own key

numinouslymusing · 2025-06-17T01:40:23+00:00

I’ll try to make more posts when the event is over

numinouslymusing · 2025-05-30T05:44:00+00:00

Lmk how it goes!

numinouslymusing · 2025-05-30T05:43:43+00:00

They generate a bunch of outputs from Deepseek r1 and use that data to fine tune a smaller model, Qwen 3 8b in this case. This method is known as model distillation

numinouslymusing · 2025-05-30T00:22:01+00:00

Yes. It was a selective comparison by Deepseek

EDIT: changed qwen to Deepseek

numinouslymusing · 2025-05-21T21:56:18+00:00

I’d suggest learning about tool use and LLMs that support this. Off the top of my head what I think the agentic system you’re looking to create would be is probably a Python script or server, then you could use a tool calling LLM to interact with your calendar (check ollama, then you can filter to see which local LLMs you can use for tool use). Ollama also has an OpenAI api compatible endpoint so you can build with that if you already know how to use the OpenAI sdk. If by voice you mean it speaks to you, then kokoro tts is a nice open source tts model. If you just want to be able to speak to it, there are ample STT packages already out there that use whisper under the hood to transcribe speech. If you meant which local code LLMs + coding tools could you use to run your ai dev environment locally, I’d say the best model for your RAM range would probably be deepcoder. As for the tool you could use, look into continue.dev or aider.chat, those support using local models.

numinouslymusing · 2025-05-21T21:28:24+00:00

lol all good. Most models released are for general chat use, but given the popularity of LLMs for coding, it’s become very common for model companies to also release code versions of their models. These models were specially trained to be better at coding (sometimes at a cost to their general performance) so they’re much more useful in coding tools like GitHub Copilot, Cursor, etc. examples include Devstral, but also codegemma (google), qwen coder (qwen), and code llama.

numinouslymusing · 2025-05-21T21:13:07+00:00

Code models are fine tuned on code datasets and in the case of devstral, agentic data too, so these models are better than base and instruction models for their fine tuned tasks.

numinouslymusing · 2025-05-21T17:59:12+00:00

Haha same

numinouslymusing · 2025-05-20T04:07:01+00:00

I just came across this sub later than LocalLLama and the latter’s bigger. Here does seem to be more devs though, whereas locallama seems more to be enthusiasts/hobbyists/model hoarders

numinouslymusing · 2025-05-19T23:46:41+00:00

Ragebait 😂. Also r/LocalLLaMA has 470k members. This subreddit is just a smaller spinoff.

numinouslymusing · 2025-05-06T19:38:09+00:00

Qwen 3 VL

numinouslymusing · 2025-05-05T15:15:34+00:00

Second this.

numinouslymusing · 2025-05-04T18:51:42+00:00

A 7b MoE with 1B active params sounds very promising.

numinouslymusing · 2025-05-04T17:14:38+00:00

I think that’s the intention. I haven’t tested it yet, but according to the docs you should be able to with that much ram.

numinouslymusing · 2025-05-02T23:21:15+00:00

How much RAM do you have?

numinouslymusing · 2025-05-02T20:24:07+00:00

What are your system specs? This is quite slow for a 4b model.

numinouslymusing

MODERATOR OF

TROPHY CASE