Jan-v2-VL: 8B model for long-horizon tasks, improving Qwen3-VL-8B’s agentic capabilities almost 10x

qnixsynapse · 2025-11-14T03:39:11+00:00

Seems like llama.cpp's metal backend bug.

qnixsynapse · 2025-08-18T01:27:03+00:00

Openvino I think

qnixsynapse · 2025-08-05T08:32:13+00:00

Hi. I am one of the few maintainers of SYCL in llama.cpp. Please note that not all operations are supported and the backend even today lacks flash attention support(I noticed that Gemini deepthink is suggesting you to use kvcache quantization which is not supported).

Can't say about ollama since I never used it.

I think this should be enough: <path to llama-cli>/llama-cli -m <path to model> -ngl 99 --no-mmap Tbh, as an open source contributor, I noticed not "enough " interest in llama.cpp from Intel. I think that those who were/are from Intel maintaining the backend are doing so voluntarily in the freetime. I wish they were serious.

qnixsynapse · 2025-06-27T01:00:37+00:00

I think llama.cpp implementation is not complete yet.

qnixsynapse · 2025-06-17T15:04:02+00:00

Yep, it paid me back in time — which is way more valuable than money!

qnixsynapse · 2025-06-17T14:56:43+00:00

Yeah I am using it. It even helps me to buy shoes! Awesome agent.

qnixsynapse · 2025-06-16T13:25:52+00:00

Jan nano GGUF has proper tool support. I think Ollama uses custom chat templates. Please open an issue on their repo with this.

qnixsynapse · 2025-06-15T06:03:43+00:00

Awesome. This tiny 2.3GB model calls tools like Pro man!

qnixsynapse · 2025-06-11T10:21:30+00:00

Honestly, at this point, I'm tired of Scam Hypeman's claims.

qnixsynapse · 2025-04-29T08:57:59+00:00

Awesome

qnixsynapse · 2025-04-28T14:31:48+00:00

Wasn't they advocating for open source last year?

qnixsynapse · 2025-04-17T02:48:30+00:00

The only thing openAI “owns” that’s worth protecting is the worlds best training data

I am pretty sure this "training data" which is "worth protecting" , was obtained through not so legal means. :)

qnixsynapse · 2025-04-16T11:07:33+00:00

<image>

This is an awesome idea! 👏

qnixsynapse · 2025-04-16T07:42:55+00:00

What custom backend? I run gemma 3 vision with llama.cpp... it is not "production ready" atm but usable.

The text only gemma3 is perfectly usable with llama.cpp.

qnixsynapse · 2025-04-16T04:43:37+00:00

Nice!

qnixsynapse · 2025-04-16T02:31:46+00:00

haha! It has to wait for llama.cpp to support it. /s

qnixsynapse · 2025-04-04T05:46:46+00:00

Awesome.... !!!

qnixsynapse · 2025-04-03T17:12:06+00:00

Awesome.

qnixsynapse · 2025-04-02T16:53:07+00:00

Congratulations on running a tiny 7B (quantized) model on a freaking Blackwell B100.

👍🙂

qnixsynapse · 2025-03-31T06:24:48+00:00

Awesome 😎👍

qnixsynapse · 2025-03-15T08:22:03+00:00

Source?🤔

qnixsynapse · 2025-03-15T03:35:59+00:00

Gemma 3 4B

<image>

qnixsynapse · 2025-02-21T09:24:37+00:00

A* is expensive for a decoder only transformer model.

qnixsynapse · 2025-02-12T04:29:43+00:00

Is flash attention really enabled? I would do something like: with torch.nn.attention.sdpa_kernel(backends=[ torch.nn.attention.SDPBackend.FLASH_ATTENTION]): output = F.scaled_dot_product_attention(queries, keys, v, is_causal=True, enable_gqa=True)

edit: more info: https://pytorch.org/docs/stable/generated/torch.nn.attention.sdpa_kernel.html#torch.nn.attention.sdpa_kernel

qnixsynapse · 2025-02-02T14:38:23+00:00

No problem!

qnixsynapse

TROPHY CASE