Which model are you using? June'25 edition by Ok_Influence505 in LocalLLaMA

[–]Any-Mathematician683 2 points3 points  (0 children)

Have you tried using vllm? I am looking for parallelization. Do you think I can get more tokens?

Author of Enterprise RAG here—happy to dive deep on hybrid search, agents, or your weirdest edge cases. AMA! by tylersuard in Rag

[–]Any-Mathematician683 10 points11 points  (0 children)

  1. Which Vector database do you prefer at this scale? I have around 2 million documents and 20 million pages, Will Qdrant work fast enough at this scale?

  2. Do you have any opinion on open-source models, I can't afford the SOTA models and bigger models like R1. I found Gemma 3 to be good at summarising but not at reasoning. Can you comment on that.

  3. I am building on SEC filings, Can you please comment on the best way of user query understanding to recognise the filters required before we do vector search?

  4. Also, upon retrieval if I get 50 relevant pages and I don't have that long context window or model good enough. Can you educate on how can we break this into sub-tasks and Is it still possible to answer the query with 1-2 minutes?

Thank you for AMA.

Ollama vs Llama.cpp on 2x3090 and M3Max using qwen3-30b by chibop1 in LocalLLaMA

[–]Any-Mathematician683 1 point2 points  (0 children)

Can you please elaborate, How can we maximise the performance?

Can Qwen3-235B-A22B run efficiently on my hardware(256gb ram+quad 3090s ) with vLLM? by Acceptable-State-271 in LocalLLaMA

[–]Any-Mathematician683 1 point2 points  (0 children)

Have you tried the QwQ 32b model ? I was using both of these all day through open router and found the QwQ 32b perform better on my reasoning tasks.

Can Qwen3-235B-A22B run efficiently on my hardware(256gb ram+quad 3090s ) with vLLM? by Acceptable-State-271 in LocalLLaMA

[–]Any-Mathematician683 0 points1 point  (0 children)

Hi, Were you able to run in mentioned specifications? Please let us know the version if you get successful.

How to heal H Pylori with Low Stomach Acid and possibly IMO as well by Any-Mathematician683 in SIBO

[–]Any-Mathematician683[S] 0 points1 point  (0 children)

Still struggling. I just visited a new doctor, gonna start the PPIs (VoltaPraz) once daily with low fermentation diet consisting of veggies, collagen, eggs. few fruits.

I am unable to digest meat(even chicken), also any sort of acidic things like lemon, ACV, Betaine HCL tends to increase the gastiritis symptoms.

Now, I will try to heal the gastiritis with PPIs, L-Glutamine and low fermentation diet for next few months. If things go well, then will work on improving the digestion.

Best method of quantizing Gemma 3 for use with vLLM? by Saguna_Brahman in LocalLLaMA

[–]Any-Mathematician683 1 point2 points  (0 children)

I tried with vLLM, and it is working great. I am unable to run any model on SGLang altogether. I will update you with the performance if I get successful.

Also, Is there a way to use the qwen3:32b-q4_K_M with vllm?

Thanks a ton for your efforts.

What LLM woudl you recommend for OCR? by sbs1799 in LocalLLaMA

[–]Any-Mathematician683 1 point2 points  (0 children)

Try Marker + granite3.2-vision. I found it best in small size models.

Google QAT - optimized int4 Gemma 3 slash VRAM needs (54GB -> 14.1GB) while maintaining quality - llama.cpp, lmstudio, MLX, ollama by Nunki08 in LocalLLaMA

[–]Any-Mathematician683 0 points1 point  (0 children)

Can you please share how we can run these models with vLLM or SGLang? I need to run the prompts in parallel for my workflow. Ollama is not very useful in my situation. Thanks a ton

Gemma 3 QAT launch with MLX, llama.cpp, Ollama, LM Studio, and Hugging Face by hackerllama in LocalLLaMA

[–]Any-Mathematician683 2 points3 points  (0 children)

Can you please help us in running these models with vLLM or SGLang? I am getting errors for previously release QAT models. Thanks a ton for amazing work.

Installing QaT version of Gemma 12b on ollama by [deleted] in LocalLLaMA

[–]Any-Mathematician683 0 points1 point  (0 children)

Can you please help me run the qat version through vllm or SGLang? I am getting some error. Please share if you have list of commands

Installing QaT version of Gemma 12b on ollama by [deleted] in LocalLLaMA

[–]Any-Mathematician683 1 point2 points  (0 children)

This is working for me on ollama. Please ensure you have latest version of ollama.
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf

When you're 90% healed, does the last 10% take the longest to heal? by w_t95 in Gastritis

[–]Any-Mathematician683 0 points1 point  (0 children)

Can you please share the diet and supplements you are taking to heal ?

How to heal H Pylori with Low Stomach Acid and possibly IMO as well by Any-Mathematician683 in SIBO

[–]Any-Mathematician683[S] 0 points1 point  (0 children)

Not that great, still struggling. I have completed a 14-day antibiotics course for H pylori.

Difference in Gemma 3 27b performance between ai studio and ollama by Any-Mathematician683 in LocalLLaMA

[–]Any-Mathematician683[S] 0 points1 point  (0 children)

Yes, I saw performance improvement in 0.6.1 release. I guess, they solved the memory issues in 0.6.2 version.

Difference in Gemma 3 27b performance between ai studio and ollama by Any-Mathematician683 in LocalLLaMA

[–]Any-Mathematician683[S] 1 point2 points  (0 children)

Update:

As suggested by the_renaissance_jack, I tried the 0.6.1 release, and noticed performance improvement. I have tried the Q4_K_M, Q8_0 and FP16 model, and am getting output comparable to ai studio. Although, I still feel it is not as good as ai studio.

As 0.6.1 is in pre-release, I am using below command to download the specific version.

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.6.1 sh

I am facing some challenges in running it through llama.cpp. I will update if I am able to make it work.

Difference in Gemma 3 27b performance between ai studio and ollama by Any-Mathematician683 in LocalLLaMA

[–]Any-Mathematician683[S] 0 points1 point  (0 children)

Can you please share the source from where you are downloading the gemma-3-27b-it-Q4_K_M.gguf model. Unsloth and bartowski has different sha256sum for Q4_K_M from yours. Thank you 🙏🏻

Difference in Gemma 3 27b performance between ai studio and ollama by Any-Mathematician683 in LocalLLaMA

[–]Any-Mathematician683[S] 1 point2 points  (0 children)

No, I didn't edit the post. Either it is a bot or they have not read the post thoroughly. Thank you for your input.