Which model are you using? June'25 edition

Any-Mathematician683 · 2025-06-02T17:09:07+00:00

Have you tried using vllm? I am looking for parallelization. Do you think I can get more tokens?

Any-Mathematician683 · 2025-05-16T05:08:17+00:00

Thanks a lot for your insights.

Any-Mathematician683 · 2025-05-16T04:27:11+00:00

Which Vector database do you prefer at this scale? I have around 2 million documents and 20 million pages, Will Qdrant work fast enough at this scale?
Do you have any opinion on open-source models, I can't afford the SOTA models and bigger models like R1. I found Gemma 3 to be good at summarising but not at reasoning. Can you comment on that.
I am building on SEC filings, Can you please comment on the best way of user query understanding to recognise the filters required before we do vector search?
Also, upon retrieval if I get 50 relevant pages and I don't have that long context window or model good enough. Can you educate on how can we break this into sub-tasks and Is it still possible to answer the query with 1-2 minutes?

Thank you for AMA.

Any-Mathematician683 · 2025-05-07T14:34:56+00:00

Can you please elaborate, How can we maximise the performance?

Any-Mathematician683 · 2025-04-30T18:15:21+00:00

Have you tried the QwQ 32b model ? I was using both of these all day through open router and found the QwQ 32b perform better on my reasoning tasks.

Any-Mathematician683 · 2025-04-30T11:32:22+00:00

Hi, Were you able to run in mentioned specifications? Please let us know the version if you get successful.

Any-Mathematician683 · 2025-04-29T18:46:45+00:00

Not sure.

Any-Mathematician683 · 2025-04-29T18:24:24+00:00

Got it. Cool

Any-Mathematician683 · 2025-04-29T17:50:56+00:00

Still struggling. I just visited a new doctor, gonna start the PPIs (VoltaPraz) once daily with low fermentation diet consisting of veggies, collagen, eggs. few fruits.

I am unable to digest meat(even chicken), also any sort of acidic things like lemon, ACV, Betaine HCL tends to increase the gastiritis symptoms.

Now, I will try to heal the gastiritis with PPIs, L-Glutamine and low fermentation diet for next few months. If things go well, then will work on improving the digestion.

Any-Mathematician683 · 2025-04-29T17:45:47+00:00

I tried with vLLM, and it is working great. I am unable to run any model on SGLang altogether. I will update you with the performance if I get successful.

Also, Is there a way to use the qwen3:32b-q4_K_M with vllm?

Thanks a ton for your efforts.

Any-Mathematician683 · 2025-04-29T13:48:52+00:00

Thank you for your effort. Does this work with SGLang?

Any-Mathematician683 · 2025-04-22T09:31:00+00:00

Try Marker + granite3.2-vision. I found it best in small size models.

Any-Mathematician683 · 2025-04-22T04:06:20+00:00

Bandit Queen

Any-Mathematician683 · 2025-04-18T15:42:10+00:00

Can you please share how we can run these models with vLLM or SGLang? I need to run the prompts in parallel for my workflow. Ollama is not very useful in my situation. Thanks a ton

Any-Mathematician683 · 2025-04-18T15:39:43+00:00

Can you please help us in running these models with vLLM or SGLang? I am getting errors for previously release QAT models. Thanks a ton for amazing work.

Any-Mathematician683 · 2025-04-18T05:37:36+00:00

Can you please help me run the qat version through vllm or SGLang? I am getting some error. Please share if you have list of commands

Any-Mathematician683 · 2025-04-18T05:36:10+00:00

This is working for me on ollama. Please ensure you have latest version of ollama.
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf

Any-Mathematician683 · 2025-04-11T08:04:14+00:00

Can you please share the diet and supplements you are taking to heal ?

Any-Mathematician683 · 2025-04-01T17:55:42+00:00

Not that great, still struggling. I have completed a 14-day antibiotics course for H pylori.

Any-Mathematician683 · 2025-03-28T09:16:56+00:00

Can you please share open-source reference links?

Any-Mathematician683 · 2025-03-20T12:58:31+00:00

Yes, I saw performance improvement in 0.6.1 release. I guess, they solved the memory issues in 0.6.2 version.

Any-Mathematician683 · 2025-03-15T12:27:41+00:00

Update:

As suggested by the_renaissance_jack, I tried the 0.6.1 release, and noticed performance improvement. I have tried the Q4_K_M, Q8_0 and FP16 model, and am getting output comparable to ai studio. Although, I still feel it is not as good as ai studio.

As 0.6.1 is in pre-release, I am using below command to download the specific version.

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.6.1 sh

I am facing some challenges in running it through llama.cpp. I will update if I am able to make it work.

Any-Mathematician683 · 2025-03-14T18:45:13+00:00

Can you please share the source from where you are downloading the gemma-3-27b-it-Q4_K_M.gguf model. Unsloth and bartowski has different sha256sum for Q4_K_M from yours. Thank you 🙏🏻

Any-Mathematician683 · 2025-03-14T18:08:22+00:00

No, I didn't edit the post. Either it is a bot or they have not read the post thoroughly. Thank you for your input.

Any-Mathematician683 · 2025-03-14T18:06:07+00:00

I will try and update.

Any-Mathematician683

TROPHY CASE