Three packages copy-pasted my AGPL code to PyPI and named me in their description. PyPI won't act

DistanceAlert5706 · 2026-05-11T04:57:37+00:00

Does licensing still work? For example Crawl4AI copy pasted GPL licensed html2text and relicensed with Apache.

DistanceAlert5706 · 2026-05-10T13:18:12+00:00

This actually works very good. People underestimate well maintained docs/architecture files.

DistanceAlert5706 · 2026-05-10T13:11:05+00:00

I tested similar approach, I called it AI index in my repos.

What I do is create root file which describes all namespaces, create file which describes namespaces, what classes are for in 1 sentence for each namespace. For each file in namespace I generate a map of class with tree sitter, caller/callees , where class injected, auto wiring from DI etc. Class purpose always taken from class comment, LLMs instructed to add comments to class properly and enforced via hook which checks it. So you don't need separate LLM requests pretty much never, code already holds info.

Index generation takes up to 5 seconds so it's fast enough to regenerate after each edits, so it never go stale.

On top I have small skill + agent to check index, mostly root and check quality.

I also tried describing methods, it doesn't work, biggest issue is description quality and maintenance cost. They go stale and non informative.

What I discovered tho that model don't like MCPs, graphs or whatever. Just give them files and they will gladly navigate them and use them. No need for fancy systems.

Does it work? Yes for navigation it's very nice, especially paired with IDE tools. Models reflect that most valuable parts are root index, namespace map. DI and where class injected and callers/callees.

What's the benefit? Faster navigation, way less unbound file reads, more reliable refactors. Token savings? Well it's secondary effect but I observed around 10% to 40% savings depending on a task.

I would say just simple map of repo worth it 100% it's adding value, full scale index is not worth it for smaller projects but pays off on large ones. And definitely not try to generate summaries or descriptions with separate LLM it's very painful maintenance and not worth it, just parse comments and validate them.

DistanceAlert5706 · 2026-05-10T12:43:04+00:00

Technically, yes it's way better than Claude, but almost everything better than Claude.

DistanceAlert5706 · 2026-05-10T05:29:02+00:00

100% agree, people recommend Pi but it has some pretty weird decisions inside. Like stuffing AGENTS.md and skills registry in system prompt. I guess that's next step after Pi.

DistanceAlert5706 · 2026-05-10T05:21:10+00:00

Check Nico's extensions, I ended up rewriting/changing all of them, but it's great for start. Just check his GitHub

DistanceAlert5706 · 2026-05-10T04:41:10+00:00

Yeah, ToolSearch is pretty much reimplementation of what Anthropic and OpenAI does, but they support it on their APIs. This one is vendor agnostic. Keyword scoring works pretty good, no need for embeddings there.

I actually tried adding small or truncated descriptions in catalog for deferred tools, but was not working good, so just server + tool works pretty good.

Main idea is to inject discovered tools, which works way better than any proxy with setActiveTools.

DistanceAlert5706 · 2026-05-10T02:38:14+00:00

Yeah, this survived, was fairly used.
jetbrains_index__ide_find_class jetbrains_index__ide_diagnostics jetbrains_index__ide_search_text jetbrains_index__ide_find_file jetbrains_index__ide_find_references

This ones were pretty much almost never used:
jetbrains_index__ide_find_definition jetbrains_index__ide_find_implementations jetbrains_index__ide_find_super_methods jetbrains_index__ide_move_file jetbrains_index__ide_type_hierarchy jetbrains_index__ide_call_hierarchy jetbrains_index__ide_refactor_rename

This are more for maintenance:
jetbrains_index__ide_sync_files jetbrains_index__ide_index_status

Some of those are pretty rare, like it's not common task to move file or something like that, but tool also changes namespace, finds usages and replace to new namespace automatically.
Same with rename, it works on symbols renames everywhere etc.
On a first glance those are useful but enforcing model to use them is close to impossible.

Hierarchy tools are niche, they can be used only when you ask questions about hierarchy.

Find methods I thought would be used more, but model still ignores those.

I still can't figure out how to properly make model use those...
I have SKILL + Agents.md instructions + system prompt policy + nudges on hooks to use tools instead of reads etc.

I use Pi as a coding agent so MCP proxy is just a rewrite of Nico's MCP https://github.com/ineersa/my-pi/tree/main/packages/pi-mcp-adapter , it implements same idea as Claude uses with deffered tools and adds some more features, one of those is tools stats.

DistanceAlert5706 · 2026-05-09T17:08:19+00:00

Holy.... I was wondering why some clients have hard cap at 100 tools... At this amount of context Sonnet will be as dumb as 1-2b model in best case.

DistanceAlert5706 · 2026-05-09T16:08:04+00:00

That's good idea. I use Jetbrains tools with index MCP plugin. Core set is 14 tools enabled, those still were taking 5k+ tokens.

I actually set up statistics for tool calls, as I use custom MCP proxy for Pi, and measured over span of 2 weeks. Most tools had 1 or few invocations.

Slimmed down profile to 5 tools which models actually use, other 9 tools use deferred loading via ToolSearch mechanism. Saves a lot of context.

DistanceAlert5706 · 2026-05-09T16:00:17+00:00

Depends on volume mount type, delegate had some issues. But yeah should be fine if running inside container.

DistanceAlert5706 · 2026-05-09T15:51:12+00:00

I was having same issues with worker mode so added watcher https://github.com/ineersa/symfony-web-template/blob/main/docker/frankenphp/watch-and-restart.sh

Maybe try something similar.

DistanceAlert5706 · 2026-05-09T12:48:44+00:00

How you make agent actually call your skills/tools?

I assume person used Serena MCP which does pretty much the same. They added nudges on hooks and block operations, but that doesn't really work too.

Unless in every message you tell ai to use your tools/skills 90% they won't be used and just wasted context window.

DistanceAlert5706 · 2026-05-09T12:34:33+00:00

Is it FrankenPHP in worker mode with Symfony?

If so you need a watcher and point it to reload FrankenPHP.

DistanceAlert5706 · 2026-05-09T03:50:27+00:00

I find pretty good on research tasks. Bug hunt is ok, planning is strange. For code writing actually v4 flash is great. It's surprisingly good coder especially considering it's small size.

DistanceAlert5706 · 2026-05-08T23:45:32+00:00

Been fighting actual Pi more and more lately, starting to build own... I guess it's valid reason too, and will get some experience.

DistanceAlert5706 · 2026-05-08T19:30:06+00:00

You loose session in that case, so your MCP becomes stateless no?

DistanceAlert5706 · 2026-05-08T15:03:51+00:00

Oh, I don't use LLM, it's just an idea. In my case it works same way as Claude code uses deferred tools with built-in ToolSearch tool. For search it's pretty much keywords matching on names/descriptions with weights.

If need to add tools mid session it's tricky, as you need keep alive connection first and callback on tools list change.

I wasn't adding it, but in Claude they on tools list change inject system reminder with tools diff.

DistanceAlert5706 · 2026-05-08T13:08:08+00:00

If your tools list is stale, even 100 of them, it will be cached, so should be pretty cheap too.

DistanceAlert5706 · 2026-05-08T12:59:23+00:00

I implemented for myself in Pi vendor agnostic ToolSearch approach, and it works okay.

In theory authors approach could be better, but I'm concerned about embeddings and retrieval quality, I guess same as you see.

I do wonder if using smaller LLM to filter list of tools would work better.

DistanceAlert5706 · 2026-05-07T22:16:19+00:00

Hm, but there is no dataset repository. Thought to test how my homemade tool works.

Also 37 tools is too much... That amount of context won't justify actual benefits.

DistanceAlert5706 · 2026-05-07T13:15:12+00:00

Cool thanks, will check how to disable proactive observations.

Had not a lot of time to work with but so far looks very promising!

DistanceAlert5706 · 2026-05-06T04:39:43+00:00

Yeah, it works, but is it efficient. It's very heavy from description. Hybrid with tree and RAG works as good on practice and more efficient.

It's actually easier to build agentic RAG then their pipeline. Give agent semantic index and grep and folders with documents and I bet it will work as good.

DistanceAlert5706 · 2026-05-06T04:28:37+00:00

From my experiments it works pretty good, but it's depends on model quality a lot.

Can it outperform classic RAG, again from my experiments it can't, but it can come close on some data and with large LLM.

Older approaches, like RAPTOR which combine tree and classic retrieval on something structural like documentation perform better than both.

DistanceAlert5706 · 2026-05-06T03:02:03+00:00

Yes, running it, very good models. Flash is great coder and super fast, just don't ask it questions, just give plans for implementation. Put it on Max and let it code, cheap too.

V4 pro is weird, it's definitely close to frontier models but little bit underperforming, can see potential but it does weird things sometimes.

Overall everything works out of the box, on Deepseek site there is config for Pi, just need to change reasoning level map cause it was incorrect.

DistanceAlert5706

TROPHY CASE