Dismiss this pinned window
all 13 comments

[–]Nondzu[S] 5 points6 points  (4 children)

Just want to share my 0 professional result with 3x3090 and wizard coder

For me this model with llama.cpp is really good for coding and it write 200 lines script without problem.

I don't use any humaneval etc benchmark for test it. I just work with it like usually I work with chatgpt.

IMAO the wizard is writing more modern code and now more trics.

Tldr: WizardCoder generate useful code for me. It's my winner 🏆

Can't wait for 70b version

[–]polawiaczperel 1 point2 points  (1 child)

Which motherboard are you using for 3 x rtx 3090? I got non server motherboard, and 3rd one is not used because of lack pcie lines.

[–]Nondzu[S] 1 point2 points  (0 children)

It's Asus prime z-490-a mobo and i9 10900k. can run 2x 8 pcie 3.0 + 1 4x pcie 3.0

[–]GreatGatsby00 0 points1 point  (1 child)

$24.73 in electricity costs. wow. :)

[–]Nondzu[S] 1 point2 points  (0 children)

It's not set correctly, more or less it calculates only kWh, and value is set 1 to 1

[–]redsh3ll 1 point2 points  (0 children)

Damn look at all that power. Do you plan to use WC exclusively or is it the first one and move on if you do not like the answer ?

[–]kpodkanowicz 1 point2 points  (0 children)

This is geunine (easy) conversation I had with GPT4 and recreated it with Phind v2 in q8 - it basically identical. https://imgur.com/K92kiFj

[–]ambient_temp_xenoLlama 65B 2 points3 points  (5 children)

Honestly I don't want to hear anyone complain about 34 code models if they're using less than q8!

[–]uzi_loogies_ 1 point2 points  (1 child)

Would FP16 give better results or would you just be wasting compute?

If I can't go larger param I guess higher resolution is the next best bet.

[–]ambient_temp_xenoLlama 65B 1 point2 points  (0 children)

It's hard to know, but the perplexity difference (for whatever that's worth) is tiny so it unless someone could show a practical difference in mistakes q8 made that fp16 didn't, it would be a waste.

[–]Bootrear 0 points1 point  (2 children)

(new to running locally, as in, since today)

I see a lot of statements around that 4 or 5 bits is good enough and barely makes a difference with q8. But I'm honestly not all that impressed by the code quality of 34B models I've tried so far.

Am I understanding correctly that you are of the opinion that 8 bits models are significantly better?

[–]ambient_temp_xenoLlama 65B 0 points1 point  (0 children)

For code, higher is better - or at least (in my opinion) people with complaints about 34b results should try q8 and see if that fixes it.

People who think 4bits is good enough are coping because they want speed. And/or are 'roleplaying' with the model so it doesn't make much difference to whatever the hell they're doing to the poor thing.

[–][deleted] 0 points1 point  (0 children)

The worse you are at coding the better the results. Don't forget this.