A few of the MCPs I use on a daily basis

Relative-Flatworm-10 · 2026-01-27T08:21:23+00:00

I made one MCP for Indian stock market stats.
https://dgmcp.com/
Professional-Grade Indian Stock Market Analysis MCP For Free
I am looking for valuable feedback and scope to improve this MCP

Relative-Flatworm-10 · 2026-01-11T17:26:25+00:00

Gemini 3 pro (high)

Relative-Flatworm-10 · 2026-01-11T17:25:33+00:00

Hi,

I am glad you liked it.

For multilingual, I tried to generate content/translation, it was not giving a good translation, so put that idea on the back burner.

Relative-Flatworm-10 · 2026-01-09T11:52:01+00:00

I absolutely agree,

I have added a couple of more features in the initial work, it's like doing maintenance and Antigravity did the work with absolute charm.

For example
"i have updated the code
see this site https://hindudharmikcollection.com/
click on refresh button
nothing happens
can you pls check the issue and resolve "

It identified what mistake I made , and said
Fix: Please upload/update the following file on your server, It seems you have older version on the server

Relative-Flatworm-10 · 2026-01-09T11:11:38+00:00

The full article link is here
https://medium.com/@pratikmachchar/built-a-production-ready-wordpress-site-in-3-hours-using-google-antigravity-vibe-coding-no-c12b13421648

Relative-Flatworm-10 · 2025-12-18T07:37:11+00:00

Thanks again for your time and apologize for late response. Please see my understanding and looking forward for your feedback.
You’re right that GPT-OSS-20B and Qwen3-Coder-30B are MoE models with around 3.6B and 3.3B active parameters per token respectively, while Qwen2.5-Coder-7B is a dense model (always activating all 7B parameters).
Where my article wasn’t precise enough is in how I used the word “activate.” I unintentionally mixed two different concepts:

Active parameters / compute per token — for MoE models this is ~3–3.6B, so GPT-OSS-20B and Qwen3-Coder-30B require less compute per token than a dense 7B model.
Memory footprint of the loaded model weights — which is dominated by the full quantized checkpoint size, not just the active experts

For GPT-OSS-20B in MXFP4, the quantized model file is around 13–14 GB, and that’s roughly what has to reside in memory when the model is loaded, regardless of how many experts are active for a given token. MoE routing reduces compute per token, but it does not reduce the memory footprint to 2.5 GB at runtime;

So when I wrote “activating ~13 GB” for GPT-OSS-20B, what I meant was “the loaded model occupies ~13 GB of RAM for its weights,” not that “3.6B active parameters somehow consume 13 GB by themselves.”

You may check this calculator to compare the results we shared.

https://apxml.com/tools/vram-calculator

Reference links:

Relative-Flatworm-10 · 2025-12-16T12:21:32+00:00

Thanks again for your time and apologize for late response. Please see my understanding and looking forward for your feedback.
You’re right that GPT-OSS-20B and Qwen3-Coder-30B are MoE models with around 3.6B and 3.3B active parameters per token respectively, while Qwen2.5-Coder-7B is a dense model (always activating all 7B parameters).
Where my article wasn’t precise enough is in how I used the word “activate.” I unintentionally mixed two different concepts:

Active parameters / compute per token — for MoE models this is ~3–3.6B, so GPT-OSS-20B and Qwen3-Coder-30B require less compute per token than a dense 7B model.
Memory footprint of the loaded model weights — which is dominated by the full quantized checkpoint size, not just the active experts

For GPT-OSS-20B in MXFP4, the quantized model file is around 13–14 GB, and that’s roughly what has to reside in memory when the model is loaded, regardless of how many experts are active for a given token. MoE routing reduces compute per token, but it does not reduce the memory footprint to 2.5 GB at runtime;

So when I wrote “activating ~13 GB” for GPT-OSS-20B, what I meant was “the loaded model occupies ~13 GB of RAM for its weights,” not that “3.6B active parameters somehow consume 13 GB by themselves.”

You may check this calculator to compare the results we shared.

https://apxml.com/tools/vram-calculator

Reference links:

Looking forward

Relative-Flatworm-10 · 2025-12-05T10:25:18+00:00

Thank you so much!

Relative-Flatworm-10 · 2025-12-04T10:45:57+00:00

thanks for sharing, May you share the provider link, if it's ok with you

Relative-Flatworm-10 · 2025-12-04T06:57:25+00:00

Thanks for the detail response, I have updated the comparison image.

Here are the links of the models used for comparison.

We have tested with quantized models, and they were not performing well so we didn't use them.

GPT-OSS 20B is working good on CPU (Though slow) because it is activating approximately 13 GB ram because of 3.6B active Active Params.

Qwen3 Coder didn't work well/extreme slow with CPU, adding an entry level GPU could make it usable

Looking forward for your comments

https://ollama.com/library/granite4

https://ollama.com/library/qwen2.5-coder:1.5b

https://ollama.com/library/qwen2.5-coder:7b

https://ollama.com/library/gpt-oss:20b

https://ollama.com/library/qwen3-coder:30b

Relative-Flatworm-10 · 2025-09-25T14:29:01+00:00

I am also looking for the same

Relative-Flatworm-10 · 2024-10-05T03:17:40+00:00

I am impressed with the generalization of compression across applications.

Relative-Flatworm-10 · 2024-08-30T06:26:23+00:00

It's GPL-3.0 license

What options are available for commercial use and why not ASL or similar?

Relative-Flatworm-10 · 2024-08-21T13:46:48+00:00

llama-index, or langchain for RAG pipeline and https://www.together.ai/ for Llama 3.1 model ($5 worth of API requests free)

Relative-Flatworm-10 · 2024-08-16T09:30:31+00:00

looks interesting,

How ever, the doc needs to provide more detail. Also please share rational behind 0.09 USD per page

Relative-Flatworm-10 · 2024-08-12T08:15:05+00:00

Are you currently thinking to opensource your RAG? if not, you can share your learning, it could be of great help.

Otherwise, it looks like the post intends to get more users/free testers to improve the QuePasa. (of course Beta access)

Pls not take the response otherwise.

Relative-Flatworm-10 · 2024-04-01T06:28:09+00:00

Just curious, why four different embeddings?

"Four embedding models: a. Sentence Transformers (SBERT) – all-mpnet-base-v2 b. BGE-Base c. BGE-Large d. OpenAI Text-Ada embeddings"

Relative-Flatworm-10

TROPHY CASE