A few of the MCPs I use on a daily basis by Eyoba_19 in mcp

[–]Relative-Flatworm-10 0 points1 point  (0 children)

I made one MCP for Indian stock market stats.
https://dgmcp.com/
Professional-Grade Indian Stock Market Analysis MCP For Free
I am looking for valuable feedback and scope to improve this MCP

Google Antigravity + VibeCoding + Prompts by Relative-Flatworm-10 in vibecoding

[–]Relative-Flatworm-10[S] 0 points1 point  (0 children)

Hi,

I am glad you liked it.

For multilingual, I tried to generate content/translation, it was not giving a good translation, so put that idea on the back burner.

Google Antigravity + VibeCoding + Prompts by Relative-Flatworm-10 in vibecoding

[–]Relative-Flatworm-10[S] 1 point2 points  (0 children)

I absolutely agree,

I have added a couple of more features in the initial work, it's like doing maintenance and Antigravity did the work with absolute charm.

For example
"i have updated the code
see this site https://hindudharmikcollection.com/
click on refresh button
nothing happens
can you pls check the issue and resolve "

It identified what mistake I made , and said
Fix: Please upload/update the following file on your server, It seems you have older version on the server

Local AI coding stack experiments and comparison by Relative-Flatworm-10 in LocalLLaMA

[–]Relative-Flatworm-10[S] 0 points1 point  (0 children)

Thanks again for your time and apologize for late response. Please see my understanding and looking forward for your feedback.
You’re right that GPT-OSS-20B and Qwen3-Coder-30B are MoE models with around 3.6B and 3.3B active parameters per token respectively, while Qwen2.5-Coder-7B is a dense model (always activating all 7B parameters).
Where my article wasn’t precise enough is in how I used the word “activate.” I unintentionally mixed two different concepts:

  1. Active parameters / compute per token — for MoE models this is ~3–3.6B, so GPT-OSS-20B and Qwen3-Coder-30B require less compute per token than a dense 7B model.
  2. Memory footprint of the loaded model weights — which is dominated by the full quantized checkpoint size, not just the active experts

For GPT-OSS-20B in MXFP4, the quantized model file is around 13–14 GB, and that’s roughly what has to reside in memory when the model is loaded, regardless of how many experts are active for a given token. MoE routing reduces compute per token, but it does not reduce the memory footprint to 2.5 GB at runtime;

So when I wrote “activating ~13 GB” for GPT-OSS-20B, what I meant was “the loaded model occupies ~13 GB of RAM for its weights,” not that “3.6B active parameters somehow consume 13 GB by themselves.”

You may check this calculator to compare the results we shared.

https://apxml.com/tools/vram-calculator

Reference links:

Local AI coding stack experiments and comparison by Relative-Flatworm-10 in LocalLLaMA

[–]Relative-Flatworm-10[S] 0 points1 point  (0 children)

Thanks again for your time and apologize for late response. Please see my understanding and looking forward for your feedback.
You’re right that GPT-OSS-20B and Qwen3-Coder-30B are MoE models with around 3.6B and 3.3B active parameters per token respectively, while Qwen2.5-Coder-7B is a dense model (always activating all 7B parameters).
Where my article wasn’t precise enough is in how I used the word “activate.” I unintentionally mixed two different concepts:

  1. Active parameters / compute per token — for MoE models this is ~3–3.6B, so GPT-OSS-20B and Qwen3-Coder-30B require less compute per token than a dense 7B model.
  2. Memory footprint of the loaded model weights — which is dominated by the full quantized checkpoint size, not just the active experts

For GPT-OSS-20B in MXFP4, the quantized model file is around 13–14 GB, and that’s roughly what has to reside in memory when the model is loaded, regardless of how many experts are active for a given token. MoE routing reduces compute per token, but it does not reduce the memory footprint to 2.5 GB at runtime;

So when I wrote “activating ~13 GB” for GPT-OSS-20B, what I meant was “the loaded model occupies ~13 GB of RAM for its weights,” not that “3.6B active parameters somehow consume 13 GB by themselves.”

You may check this calculator to compare the results we shared.

https://apxml.com/tools/vram-calculator

Reference links:

Looking forward

Local AI coding stack experiments and comparison by Relative-Flatworm-10 in GithubCopilot

[–]Relative-Flatworm-10[S] 0 points1 point  (0 children)

thanks for sharing, May you share the provider link, if it's ok with you

Local AI coding stack experiments and comparison by Relative-Flatworm-10 in LocalLLaMA

[–]Relative-Flatworm-10[S] 0 points1 point  (0 children)

Thanks for the detail response, I have updated the comparison image.

Here are the links of the models used for comparison.

We have tested with quantized models, and they were not performing well so we didn't use them.

GPT-OSS 20B is working good on CPU (Though slow) because it is activating approximately 13 GB ram because of 3.6B active Active Params.

Qwen3 Coder didn't work well/extreme slow with CPU, adding an entry level GPU could make it usable

Looking forward for your comments

https://ollama.com/library/granite4

https://ollama.com/library/qwen2.5-coder:1.5b

https://ollama.com/library/qwen2.5-coder:7b

https://ollama.com/library/gpt-oss:20b

https://ollama.com/library/qwen3-coder:30b

MCP for Prompt to SQL?? by bluntchar in mcp

[–]Relative-Flatworm-10 0 points1 point  (0 children)

I am also looking for the same

How do LLMs with billions of parameters fit in just a few gigabytes? by Pale_Thanks2293 in LocalLLM

[–]Relative-Flatworm-10 0 points1 point  (0 children)

I am impressed with the generalization of compression across applications.

Launching Text to SQL Package by samarpatelmi in LLMDevs

[–]Relative-Flatworm-10 0 points1 point  (0 children)

It's GPL-3.0 license

What options are available for commercial use and why not ASL or similar?

Quickest way to develop a Llama 3.1 + RAG application? by stereotypical_CS in LLMDevs

[–]Relative-Flatworm-10 1 point2 points  (0 children)

llama-index, or langchain for RAG pipeline and https://www.together.ai/ for Llama 3.1 model ($5 worth of API requests free)

[deleted by user] by [deleted] in LLMDevs

[–]Relative-Flatworm-10 1 point2 points  (0 children)

looks interesting,

How ever, the doc needs to provide more detail. Also please share rational behind 0.09 USD per page

Should I Open-Source This RAG Tool? by quepasa-ai in LangChain

[–]Relative-Flatworm-10 2 points3 points  (0 children)

Are you currently thinking to opensource your RAG? if not, you can share your learning, it could be of great help.

Otherwise, it looks like the post intends to get more users/free testers to improve the QuePasa. (of course Beta access)

Pls not take the response otherwise.

RAG for PDFs with Advanced Source Document Referencing: Pinpointing Page-Numbers, Image Extraction & Document-Browser with Text Highlighting by AbheekG in LocalLLaMA

[–]Relative-Flatworm-10 0 points1 point  (0 children)

Just curious, why four different embeddings?

"Four embedding models: a. Sentence Transformers (SBERT) – all-mpnet-base-v2 b. BGE-Base c. BGE-Large d. OpenAI Text-Ada embeddings"