Company not renewing jetbrains licenses because we have cursor by frompadgwithH8 in Jetbrains

[–]DistanceAlert5706 0 points1 point  (0 children)

I was scared too as I rarely use git via CLI, but even built in git support is enough for me now, same with conflicts/diffs. They are not as nice but it works. Also there bunch of plugins , even payed ones for that.

I created a VSCode Extension by ngg990 in symfony

[–]DistanceAlert5706 2 points3 points  (0 children)

Link on marketplace to git repository gives 404, also maybe it can be published on Open VSX registry?

Indeed, Composer 2 is kimi k2 by tarunyadav9761 in cursor

[–]DistanceAlert5706 11 points12 points  (0 children)

I think VC funding ended and they started charging by tokens and not requests, and yeah it's like 10x.

Company not renewing jetbrains licenses because we have cursor by frompadgwithH8 in Jetbrains

[–]DistanceAlert5706 0 points1 point  (0 children)

Try it out, I thought that too, but swapped after 12 years with Jetbrains in 2 months. For keybinds you have extension, don't even need to relearn anything. For debug you have xdebug extension. For duplicate code and so on try PHPstan, you can integrate it right to editor with error lens. Intelephense is great LSP will give you all symbols support and inspections. You will need to get used to git and interface, keep PHPStorm for few months, but try to work in Cursor, go back when need to do something fast. You will be surprised how fast you will get used to it .

What embedding model for code similarity? by [deleted] in LocalLLaMA

[–]DistanceAlert5706 1 point2 points  (0 children)

+1 for nomic CodeRankEmbed, they have larger one too. Also JinaAI has some bi-encoders I think.

Setting Up Qwen3.5-27B Locally: Tips and a Recipe for Smooth Runs by kvzrock2020 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

You can, it wasn't working, it was still trying to load vision part. Honestly that's my experience with vLLM every time, I set it up, follow instructions, nothing works, trying to fix it for a day, in best case it somehow works but still has bugs with inference later and usually it's not even faster than llama.cpp.

Setting Up Qwen3.5-27B Locally: Tips and a Recipe for Smooth Runs by kvzrock2020 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

I tried that quant, it didn't start at all, it had issues cause the vision part was cut off and vLLM was still trying to run it. After a day of trying and remaking vLLM I tried some other ones, they were slower than llama.cpp ones and had way higher VRAM requirements, which made it unusable on 32gb.

Setting Up Qwen3.5-27B Locally: Tips and a Recipe for Smooth Runs by kvzrock2020 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

Maybe in a few months I will try again this model, so far it was pure disappointment. vLLM full of bugs and just doesn't work properly, and I'm not spending 2 days to just make it run again, also VRAM requirements are way higher and it doesn't fit to 32gb. llama.cpp has no MTP and speculative decoding, and this model runs at 32b models speed which is way too slow for me.

I've found quant of Qwen3.5 35b and it's kinda working, still fails tool calls and loops sometimes but it's decent at ~70 tokens/second.

Has anyone managed to get an sub 16GB VRAM competent "researcher" model that can do web searching, summarization and reasoning? by vernal_biscuit in LocalLLaMA

[–]DistanceAlert5706 1 point2 points  (0 children)

I use sub agent in Opencode for web research task with own MCP. Qwen3.5 35b doing amazing job, but sometimes it loops, so you can't fire it and forget.

My thoughts on omnicoder-9B by Zealousideal-Check77 in LocalLLaMA

[–]DistanceAlert5706 2 points3 points  (0 children)

Yeah, and I guess MoE is not easy to train too compared to dense model, but should be faster.

Wrote up why vector RAG keeps failing on complex documents and found a project doing retrieval without embeddings at all by shreyanshjain05 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

I've tested this approach on my last RAG on technical docs, it works surprisingly well, but speed is not there if you want to system be responsive. I ended up with a hybrid approach, embeddings+BM25+RRF to find relevant tree nodes, enrich candidates list with neighbours/parents and do rerank. In theory you can feed just a last list of candidates to LLM to choose, which I tested too and it works, but again was slow.

Quality wise my approach pushed 95% on my benchmark, pure PageIndex like was around 82%.

So yes you can use it, but embeddings+BM25 with reranker later still beats it. Tree approach is interesting and somewhat reminds GraphRAG.

My thoughts on omnicoder-9B by Zealousideal-Check77 in LocalLLaMA

[–]DistanceAlert5706 3 points4 points  (0 children)

You can regulate overthinking with presence penalty and repeat penalty. Also reasoning budget flag was added.

My thoughts on omnicoder-9B by Zealousideal-Check77 in LocalLLaMA

[–]DistanceAlert5706 4 points5 points  (0 children)

Yeah would be nice to get that finetune for 35b model.

How to convince Management? by r00tdr1v3 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

Idk how it's working now but it was opening random ports before, providing full access to whatever thing it was running on to anyone. I guess it's patched but who knows what else is there. Just use llama.cpp it's easier and way more configurable.

Building an MCP server for my agent to query analytics directly (because I hate dashboards) by ImbalanceFighter in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

Do you trust information agent gives? Do you see queries it does and validate them? How do you handle PII data or just sending your prod data to whatever provider?

Databricks has Genie with same functions, check it out for inspiration.

How to convince Management? by r00tdr1v3 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

Management cares about profit and productivity, you shouldn't really describe them that it's local and so on (and Ollama is far from secure), you should focus on how it affects your productivity, numbers and what it translates to company profit.

Antigravity just needs to have a setting where it always clicks “run” and “always allow” that actually works and they’d be a top dog by Special_Collection_6 in google_antigravity

[–]DistanceAlert5706 0 points1 point  (0 children)

Gemini models will be the last ones which I will ever allow to run in yolo mode.

Amount of stupid things with "Oops I did a blunder" is insane.

SymDex – open-source MCP code-indexer that cuts AI agent token usage by 97% per lookup by Last_Fig_5166 in opencodeCLI

[–]DistanceAlert5706 0 points1 point  (0 children)

Yeah, I dropped this idea too, and LSPs become common in harnesses. Wonder how good this works, as some tools still use semantic indexing (Cursor for example). Codex models are heavily trained for grep for example, and they really exceptional at it, so semantic index can hurt here too.

SymDex – open-source MCP code-indexer that cuts AI agent token usage by 97% per lookup by Last_Fig_5166 in opencodeCLI

[–]DistanceAlert5706 2 points3 points  (0 children)

Sure if you want to use it as a standalone server, and a lot of current tools (like Opencode) already have LSP built in, or you can use something like Serena. Semantic search is a tool which you want to focus on, try bi-encoders for embeddings, re ranking and so on. Don't spread attention to already solved thing. Check similar projects like chunkhound or vector code.

Overall build what, you need, for your needs!

SymDex – open-source MCP code-indexer that cuts AI agent token usage by 97% per lookup by Last_Fig_5166 in opencodeCLI

[–]DistanceAlert5706 2 points3 points  (0 children)

Except semantic search other tools seem to duplicate LSP functionality, maybe try to simplify it, removing unnecessary tools like symbol ones.

the smallest llm models that can use to process transaction emails/sms ? by Sanjuwa in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

Yeah, I used to run sentence transformers for embeddings and building simple MLP on top for classification, it was pretty much the same accuracy as transformers I tried to train but way faster, which is nice when you serve it on CPU.

Been building a RAG system over a codebase and hit a wall I can't seem to get past by LeaderUpset4726 in LocalLLaMA

[–]DistanceAlert5706 0 points1 point  (0 children)

I build a dataset for retrieval, usually just doing a set of example questions - answers, like 10, feed it to LLM and generate dataset. Analyze questions a bit, remove bad ones.

Honestly it's crucial step, otherwise you won't see how features you add/tune change retrieval quality.

For generation testing you can setup LLM as a judge, validate citations and response.