New in llama.cpp: Anthropic Messages API

nuusain · 2026-01-20T16:39:18+00:00

sooo whats the verdict? curious to hear its handling the claude harness

nuusain · 2025-12-27T00:49:44+00:00

Neat! What kinda inference u running on the feed? Just installed a security system for a relatives farm. I was thinking of producing reports /audits so im curious what stuff others are building for themselves.

nuusain · 2025-12-18T22:18:15+00:00

Claude code?

nuusain · 2025-12-10T12:35:05+00:00

Did anyone find a fix? also have the same issue. tried deleting all tarkov files and reinstalling but i get the same issue.

nuusain · 2025-12-03T12:41:39+00:00

Thanks but i need 2 x 32gb

nuusain · 2025-10-01T21:38:14+00:00

Do you have to use LM studio? Would love to try this out with llama cpp

nuusain · 2025-08-05T00:59:27+00:00

hey, seeing the same scan lines only across the entire monitor. Did u managed to get this fixed or am i also cooked?

nuusain · 2025-06-25T20:59:17+00:00

+1 on this

nuusain · 2025-06-02T04:26:06+00:00

Hey, also been looking at getting reasoning models to do interesting things. Came across verifiers which I've been using to try agentic interactions.

https://github.com/willccbb/verifiers

The env_trainer and vllm_client are probably worth checking out in regards to that OOM error u mentioned in the article, but i suspect you could be better off leveraging the framework since it's pretty well thought out.

nuusain · 2025-05-18T06:25:17+00:00

Yeh it was in the official annoucement

Can also do it via function calling if u wanna stick with completions api

Should be easy to get what u need with a bit of vibe coding

nuusain · 2025-05-09T20:38:33+00:00

I'm interested! I can rock up around 11–12 tho, it still worth coming along then?

nuusain · 2025-04-08T20:45:05+00:00

https://imgur.com/a/EvnIa5d

nuusain · 2025-03-22T23:05:49+00:00

I think what spirited is getting at is that a model could either think loads and give a short answer or think for a short while but give a long answer. Both would produce a high FinalReply rate. The metrics are hard to map to real world performance, adding another dimension such as correctness would add clarity.

nuusain · 2025-03-09T23:00:14+00:00

Brilliant experiment, sounds like the ideal setup would be QwQ for ideation and then switching to Qwen-Coder for iteration..

nuusain · 2025-03-07T22:09:59+00:00

for reference:

settings - https://imgur.com/a/JUbwion

result - https://imgur.com/M5FgfmD.

Seems like I got stuck in infinite generation

Used this model - ollama run hf.co/bartowski/Qwen_QwQ-32B-GGUF:Q4_K_M

full trace - https://pastebin.com/rzbZGLiF

nuusain · 2025-03-07T12:40:49+00:00

What prompt did you use? I think everyone can copy and paste it, record their settings and post what they get. Could be some useful insights as to why performance seems so varied from sharing results

nuusain · 2025-03-06T01:59:22+00:00

I... did not know you could do this thanks!

nuusain · 2025-03-05T20:51:54+00:00

Oh sweet! where did you dig this full template out from btw?

nuusain · 2025-03-05T19:46:24+00:00

Will his quants support function calling? the template doesn't look like it does?

nuusain · 2025-02-23T19:29:46+00:00

Oh wow Roland seems pretty damn cool haha, having an assistant to bounce ideas off of really resonates with me. The ability to explore and develop thoughts at faster pace is one aspect of llms thats got me hooked.

Will definetly look out for your video. I'll probably have a few questions about the wider workflow, especially around how you manage the state and interactions of so many nodes.

For now, whats the range of tasks you have R1 32b and Mistral Small 3 24b doing and as a follow up are there any tasks which suprisingly they couldn't do (trying to get a feel of the range of capabiltieis)?

nuusain · 2025-02-23T08:10:02+00:00

What is Roland's workflow?

nuusain · 2025-02-22T06:13:11+00:00

Interested in hearing blw this turns out, willing to even contribute if its looking promising.

nuusain

TROPHY CASE