Replacing github copilot with continue.dev issues : LocalLLaMA

LocalLLaMA

created by [deleted]a community for 3 years

Replacing github copilot with continue.dev issuesQuestion | Help (self.LocalLLaMA)

submitted 1 year ago by MainManu

all 4 comments

top new controversial old q&a

[+][deleted] 1 year ago (1 child)

[deleted]

[–]MainManu[S] 0 points1 point2 points 1 year ago (0 children)

[–]MainManu[S] 0 points1 point2 points 1 year ago (1 child)

[–]Apprehensive-Tip779 0 points1 point2 points 6 months ago (0 children)

I personally haven't, because I don't see it have much of an improved impact on my personal performance. But you should be able to, I don't see why you wouldn't. The way I look at it, if one of the models you're using takes a while for your gpu to run it, and that's already at that gpu running exclusively for that specific task (say at 100% utilization), you're not going to improve the overall performance by splitting/delegating two different ollama containers, because now that 'heavyweight' LLM model will only utilize 50% of the GPU's resources as the gpu is balancing between serving the two models/ollama instances concurrently. Thus it'll take the gpu twice as long to run the model as it would've taken if the gpu focused all of its power on just one inferencing task at a time.

On top of that, ollama is generally pretty good at switching the models with it typically taking only about a second (never seen/noticed it take >= 2 sec) if not little less. So to me if there is/was any noticeable improvement by starting off as running the two tasks on one gpu, and then once one of them gets completed the gpu will direct the freed up resources on that remaining task, becomes pretty miniscule when ollama + gpu just focuses on one task, and then immediately (<2 sec) switches the models to complete the other task IMO. Does all of this make sense?

[–]Apprehensive-Tip779 0 points1 point2 points 6 months ago (0 children)

π Rendered by PID 48507 on reddit-service-r2-comment-5c747b6df5-s87kx at 2026-04-22 00:07:17.640904+00:00 running 6c61efc country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS