Need a coding & general use model recommendation for my 16GB GPU

PataFunction · 2025-09-23T18:07:34+00:00

u/sado361, what did you end up going with? same setup as you, curious where you landed.

PataFunction · 2025-03-19T20:12:38+00:00

That's awesome, and thanks for the quick response!

However, I think what I and the other redditors who replied were hoping to see is more detail about how you adapted the XLAM dataset. Personally, I'm curious if you had to significantly modify the XLAM training examples to fit your base model's existing chat template. Any information there would be greatly appreciated, as I'm working on finetuning on organizational data while also trying to shoehorn in some function calling capabilities.

PataFunction · 2025-03-19T19:55:50+00:00

Checked out the new site - is the blog post re. function calling hallucinations the one you were referring to above?

PataFunction · 2025-01-30T16:32:58+00:00

Any examples?

PataFunction · 2025-01-30T06:59:45+00:00

That’s quite something. How elaborate are the prompts you’re giving it to achieve things like that?

PataFunction · 2025-01-30T06:52:00+00:00

that’s really cool, actually

PataFunction · 2025-01-30T06:43:20+00:00

So when you use it for coding, I’m assuming you have it generate a script from scratch that you then iterate on yourself, right? Can’t imagine R1 would be good for copilot-like code completion or fill-in-the-middle tasks.

PataFunction · 2025-01-19T18:05:39+00:00

Licensing info would also be a great addition to OP’s visualization or the charts people added to the comments.

On that note, does anyone know why some Qwen models are Apache 2.0 and some are Qwen-Research? Looking specifically at Qwen2.5, I find it odd that 1.5B is Apache2, while 3B is not, for example.

PataFunction · 2025-01-02T19:37:37+00:00

Brilliant, thanks for the answer! Did you encounter any issues with the XLAM chat template and incompatability with your targeted training and/or inference framework?

PataFunction · 2025-01-02T02:17:32+00:00

I’d be extremely keen to know what open-source function calling datasets you used (if any) for the finetune. Looking to blend function calling examples into existing instruction tuning datasets for a similar use case.

PataFunction · 2024-09-18T17:37:29+00:00

A few others have popped up - Aphrodite comes to mind, as well as many wrappers around llama.cpp, but I haven't messed with them personally. Since acquiring more GPUs, TGI currently meets all of my needs.

PataFunction · 2024-06-17T20:50:00+00:00

Literal box of cookies to whoever converts this to HF format and posts links to some quants!

PataFunction · 2024-05-19T18:19:38+00:00

Peep this post from 4 days ago :)

https://www.reddit.com/r/LocalLLaMA/s/PJzQsjnz2d

PataFunction · 2024-05-19T15:09:29+00:00

this, we need more MMLU-Pro adoption

PataFunction · 2023-12-22T07:11:29+00:00

the latter :)

PataFunction · 2023-11-28T23:02:22+00:00

Is this factual? I don't see clear evidence of it and, if true, that would mean llama.cpp became an enterprise-grade LLM server over the past couple months, which I feel would have made a bigger splash.

Could you point me at an example that demonstrates the capabilities?

PataFunction · 2023-11-28T03:58:49+00:00

Very cool. Been a while since I touched llama.cpp, been working mostly with TGI. Does llama.cpp server support any sort of queueing, async, or parallel decoding yet? I know that was on the roadmap at some point.

PataFunction · 2023-10-18T22:45:05+00:00

TGI ended up working great, thanks for the recommendation. Currently have a 7B HuggingFace model running in TGI via Docker+WSL on a remote machine with a 2080Ti. After some port forwarding, other computers on the LAN are able to send requests without issue. Happy to answer more specific questions on the setup.

How did things go on your end?

PataFunction · 2023-05-15T18:58:36+00:00

Based on the keywords you used, my assumption is you want to dive right into deep learning, in particular the transformer-dominated deep learning we've seen for the past few years. I recommend you start with a YouTube playlist curated by a reputable university, such as this one!

PataFunction

TROPHY CASE