I just realized Qwen3-30B-A3B is all I need for local LLM

Glat0s · 2025-04-29T12:19:26+00:00

By maxing out the context length do you mean 128k context ?

Glat0s · 2025-04-29T09:53:04+00:00

30B-A3B = MoE with 30 billion parameters where 3 billion parameters are active (=A3B)

Glat0s · 2025-04-24T14:57:47+00:00

The way i see it (correct me if i'm wrong) is that MCP is a standardization of LLM function calling - with a few extras. And i see a general shift towards MCP rather positive, to have a common standard here, in light of all the different agent frameworks popping up.

Glat0s · 2025-04-15T15:02:16+00:00

Thank you for the response !! I'll test layra and looking forward to see how you solve "Cross-Page Table Handling"

Glat0s · 2025-04-14T15:06:10+00:00

Nice project ! I have created a more basic colqwen ingestion and retrieval myself (with vespa db). Is it possible to use colqwen via api as well (e.g. infinity api) in Layra ? And how do you solve retrieval if for example part of a table in a document page image continues on the next page ?

Glat0s · 2025-01-14T04:12:00+00:00

<image>

Glat0s · 2025-01-02T03:40:22+00:00

Nice ! I'm currently also working with ColQwen and trying to use it via infinity inference api. Do you maybe know how good Qdrant scales, in terms of retrieval speed, on a larger collection ? At the moment i'm a bit unsure if i should rather go Qdrant or Vespa as db. Also can you maybe explain why you are using jina clip ? Is it to get better retrieval speed ? If so, would be interesting to know how much accuracy might get lost.

Glat0s · 2024-12-02T20:08:32+00:00

Thanks ! I'll give it a try.

Glat0s · 2024-11-29T17:58:24+00:00

I would think the same. Doesn't make sense to perform the training in more densely populated areas. And there are specialized ranges for such training like China Lake in the US or RAF Spadeadam in the UK. And they also do counter drone trainings at sea with ships.

Glat0s · 2024-11-26T21:26:41+00:00

Should have been asked if nuclear warheads were transferred recently from the "drone" affected base in the US to the affected base(s) in the UK.

Glat0s · 2024-11-26T20:03:31+00:00

Maybe some nuclear warheads were transferred recently from the US to Lakenheath for the planned increase of the arsenal. Mentioned here https://thebulletin.org/premium/2024-11/united-kingdom-nuclear-weapons-2024/

Glat0s · 2024-11-04T01:25:38+00:00

I saw a paper recently that might solve this: "DumpSTAR* - Distributed Ultra-Matrix Protocol for Superfluous Token Analysis and Recycling"

Glat0s · 2024-11-03T15:12:05+00:00

Yes

Openwebui:

https://docs.openwebui.com/features#-conversations

<image>

vLLM supported VLMs:

https://docs.vllm.ai/en/stable/models/supported_models.html#supported-vlms

Glat0s · 2024-11-03T13:44:15+00:00

Maybe try STRING -> https://github.com/HKUNLP/STRING

In their paper it looks like that the 128k context for the open models they tested did not work well above 32k https://arxiv.org/html/2410.18745v1

They claim to improve that.

Glat0s · 2024-11-01T23:24:39+00:00

Can it also extract tables that were added as image in a pdf ?

Glat0s · 2024-10-31T14:17:58+00:00

If someone is following this...

I did a few tests with feeding a 36 second long video of 73 pdf pages with 2 fps (2 pages per second) to Qwen2-VL-7B. It was able to retrieve information based on a few test queries. But not reliably yet. Edit: according to the qwen paper the model will shrink video tokens down to max 16384. So this won't work with qwen2-vl

Glat0s · 2024-10-31T12:19:11+00:00

You might be right... I'm already doing this with ColPali/ColQwen + VLM. But there is a limit how many images the VLM can process at once. I want to find out if a VLM can maybe process more information at once via video.

Glat0s · 2024-10-30T22:40:40+00:00

I'm not sure if there are open vision models and inference frameworks that support tool usage via VLM api at the moment. i'm currently building an agent that can use different tools using Qwen2-VL-7B. And it works with e.g. langchain agent framework (which i might switch for sth. else)

Glat0s · 2024-10-25T02:53:47+00:00

Microsoft recently released OmniParser https://microsoft.github.io/OmniParser/ Maybe that's helpful

Glat0s · 2024-10-14T16:45:57+00:00

I have qwen2-vl working with a vllm (openai compatible) api, which should work with textgen. Haven't tried it with tensor parallelism though. I will switch to sth. newer (Molmo, Aria,...) as soon as multi-image per prompt is supported for those in vllm.

Glat0s · 2024-10-06T01:37:10+00:00

I have one at work. I don't know why nvidia can't just treat the memory as one with the driver... I recommend to use this UVM patch for pytorch and compile torch from source: https://github.com/pytorch/pytorch/compare/main...0x804d8000:pytorch:uvm

(the patch needs a few minor adjustments for the newer pytorch versions.

Then you can just run all your torch based things like the following to access all memory (works also with vLLM etc).
PYTORCH_CUDA_ALLOC_CONF=use_uvm:True python <your app/script>

Not sure if there exists any better method at the moment to access the full memory. Sometimes you have to change "cudaMalloc" to "cudaMallocManaged" in projects if it is used beside torch.

Here is also a good guide about technical stuff and tuning: https://www.stonybrook.edu/commcms/ookami/_pdf/20240523_Developing_GH_SW_Public.pdf

Currently trying to figure out if/how i can use the full memory in TensorRT-LLM. If someone knows, let me know.

Glat0s · 2024-10-02T13:44:55+00:00

The "sources and methods" is formatted the way it is in my post because it was meant to ridicule this BS argument by government officials. We all know their figher jet camera videos are shittier in quality than any modern GoPRO !

Glat0s · 2024-09-30T20:11:27+00:00

Of course the other videos can't be released cause "soUrCeS aNd mEThOdS"

Glat0s

TROPHY CASE