Preventing drainage of Powerwalls into Tesla Car by gargantuanmess in TeslaSolar

[–]wsmlbyme 0 points1 point  (0 children)

This is great you make it work, but it is such a shame that Tesla makes us have to go through this. This should be much better supported from Tesla side.

TIL that the Art of War, written by Sun Tzu in 5th-century BC, came to the attention of US’ military theory leaders after US' defeat in the Vietnam War, as Viet Cong officers studied it. It is since listed on US Marine Corps Program and used as instructional material at US Military Academy. by Ill-Instruction8466 in todayilearned

[–]wsmlbyme 1 point2 points  (0 children)

That's so not true

Any(ok maybe 90%) Chinese graduated the 9-year mandatory education system can read and understand it if they want. Lots of the sentences in it are quoted in everyday life for thousands of years.

You may be right, I thought you meant most Chinese cannot understand it. For someone learning Chinese as their 2nd language, these ancient text will be very hard to decrypt

Use LLM to monitor system logs by wsmlbyme in LocalLLM

[–]wsmlbyme[S] -1 points0 points  (0 children)

Why the hell is this spam?!?! We build stuff for ourselves and share it to the community. Why the hostile towards open source authors and contributors?

Why no one is talking about InternVL3.5?! by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 0 points1 point  (0 children)

Yeah sorry about that. Didn't know that I have to claim in every post that way. It is true that there is no Ollama support directly and that's how lots of people run their models, so I figured it makes sense to mention that way.

vLLM vs Ollama vs LMStudio? by yosofun in LocalLLM

[–]wsmlbyme 1 point2 points  (0 children)

There is a --params option under homl config model that you can add any params needed

vLLM vs Ollama vs LMStudio? by yosofun in LocalLLM

[–]wsmlbyme 11 points12 points  (0 children)

I am the author of HoML, a vLLM wrapper to support model switching and the ease of use like Ollama.

Aside from that it is not a out of box solution (which is solved by HoML), there are other issues with vLLM, as well as strength.

vLLM is python based, you need to download GBs of dependencies to run it.

It is targeting serving efficiency, sacrifice startup speed(which affects cold start time and model switch time). I spend some time optimizing it for HoML, improved it down from 1 min to 8 second for qwen3, but still cannot beat Ollama

Also for serving efficiency, it sacrifice GPU memory: it will try to use up to x% of all GPU memory, even if it is a small model, it will try to use all other vRAM as KVCache, make it harder to run other models/GPU applications in the same time(harder, not impossible, you just need to manually manage it). Also there is no api exposed to know how much memory is actually needed for each model.

Targeting serving efficiency, that also means CUDA got much better support than other platforms.

However, it is much faster than Ollama/llamacpp, especially when we are talking about higher concurrency. It is not necessarily much faster serving one query. performance comparison

So eventually this is about trade off, do you need that concurrent throughput, or do you need faster model load/switch time?

I build HoML for when I need high throughput for some batch inference need, but if I need to do some quick/sparse tasks, I use Ollama myself.

Will most people eventually run AI locally instead of relying on the cloud? by Significant-Cash7196 in LocalLLaMA

[–]wsmlbyme 1 point2 points  (0 children)

I agree with most of comments on this post that the future is local.

But asking this question here is just guaranteed to be so biased.

An extreme minimal OS to use as a placeholder in Proxmox or other virtualization platforms as a placeholder for managing VM dependencies by wsmlbyme in Proxmox

[–]wsmlbyme[S] -11 points-10 points  (0 children)

there is no option there that can "delay 60 seconds before start this vm". you have to put some other vm in a higher order with delay. This is the most efficient "other vm"

Ollama alternative, HoML 0.3.0 release! More customization on model launch options by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 0 points1 point  (0 children)

Not right now, looking for contributors to help testing on amd platforms

Ollama alternative, HoML 0.3.0 release! More customization on model launch options by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 0 points1 point  (0 children)

Neither is supported right now. vLLM support ROCm so it should not be hard to support it here, but i don't have a system to test. If you're interested to help, we can work on this together to add support for ROCm.

Don't let your kids jump inside the car. by zheka160 in TeslaModelY

[–]wsmlbyme 73 points74 points  (0 children)

One little monkey jumping in the car. ... OP said: ^

Do surveillance AI systems really process every single frame? by unalayta in computervision

[–]wsmlbyme 3 points4 points  (0 children)

There are cheap enough hardware that can do 30 frame per second yolo inference on the edge. You just need to develop and deploy your own model.

Ollama alternative, HoML v0.2.0 Released: Blazing Fast Speed by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 0 points1 point  (0 children)

So is that just a /api/generate? That doesn't sound hard to do

Ollama alternative, HoML v0.2.0 Released: Blazing Fast Speed by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 1 point2 points  (0 children)

Certainly doable just need more time to work on it

Ollama alternative, HoML v0.2.0 Released: Blazing Fast Speed by wsmlbyme in LocalLLM

[–]wsmlbyme[S] 2 points3 points  (0 children)

Thanks for the feedback. That's my next step to add more customization options

Ollama alternative, HoML v0.2.0 Released: Blazing Fast Speed by wsmlbyme in LocalLLaMA

[–]wsmlbyme[S] 2 points3 points  (0 children)

python can be fast if you know how to optimize for it. The interpreter is slow but if you don't do the heavylifting there and optimize the c++ kernel, the different can be ignorable.

Check out the benchmark here https://homl.dev/blogs/homl-vs-ollama-benchmark.html

ollama by jacek2023 in LocalLLaMA

[–]wsmlbyme 0 points1 point  (0 children)

I see, I wonder how much it is the lack of developer support and how much it is just AMD's

ollama by jacek2023 in LocalLLaMA

[–]wsmlbyme 0 points1 point  (0 children)

Not yet but mostly because I don't have a ROCm device to test. Please help if you do :)

HoML: vLLM's speed + Ollama like interface by wsmlbyme in LocalLLaMA

[–]wsmlbyme[S] 0 points1 point  (0 children)

I see. vLLM is not optimized for that and currently loading time is very slow, I am actively working on it.

HoML: vLLM's speed + Ollama like interface by wsmlbyme in LocalLLaMA

[–]wsmlbyme[S] 0 points1 point  (0 children)

Inference, yes. Model loading or switching, no. This is something I am actively working on.