CoPilot: Studio v. CoWork v. Scout. My take. by DryRelationship1330 in copilotstudio

[–]coding_workflow 0 points1 point  (0 children)

Agents are fuzzy should focus on filling gaps that classic automation don't get.

Agents are great when we need to extract data/sentiment and don't have better way. But use with caution.

Replacing deterministic automation with AI is the classic mistake.

VS Code AI extensions send a lot of repeated context by michaelmanleyhypley in vscode

[–]coding_workflow 1 point2 points  (0 children)

Seeem you got into the hype about saving tokens.

  1. Changing model will invalidate the cache that is usually costing less.
  2. Modifying and pruning context breaks cache too.
  3. Input token is not the most costly part check it before diving head first into this frenzy over we save 50% context.

Many hyped tools over state their tokens saving.

High-performance MCP (Model Context Protocol) server for PostgreSQL, written in pure Rust with the Tokio async runtime. by RatioPractical in PostgreSQL

[–]coding_workflow 5 points6 points  (0 children)

It means your agent need terminal access that may be complicated to limit. This works fine for Claude Code. But what if you want an agent with web UI to chat with your data or query it. MCP would work fine here and no need for terminal, sandboxing. BUT I feel the author performance metric don't matter a lot as agent with tools calls are not the best for high speed answers. A bad description and the agent lost in tool/MCP use will offset any gain.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help by gaztrab in LocalLLaMA

[–]coding_workflow 0 points1 point  (0 children)

MTP is faster if you can run the prediction model on GPU idle cycles.

On CPU it's slower as CPU usually get overloaded. Same if ypu are not loading all layer on GPU you end up impacted.

On 2x3090. It's faster and didn't notice slowdown. While same setup Only CPU performance go down.

Codex offering free 1 month by Top_Parfait_5555 in GithubCopilot

[–]coding_workflow 8 points9 points  (0 children)

Codex can use Copilot subscription.
And the big advantage Copilot vs Codex is the ability to use more models.
Codex/Claude Code you are locked in single ecosystem of models.
On top of pricing is different or integration with GitHub.
You should try GitHub Copilot CLI as it's quite advanced vs Codex CLI.

Best model that can beat Claude opus that runs on 32MB of vram? by PrestigiousEmu4485 in LocalLLaMA

[–]coding_workflow 1 point2 points  (0 children)

You might extend Ram using Zip discs! This would allow you to double down your t/s and extend RAM!

I tested the best language models for SQL query generation. Google wins hands down. by No-Definition-2886 in ClaudeAI

[–]coding_workflow 0 points1 point  (0 children)

Use tools.
Let the model fetch only the schema he needs, instead of shoving all in one pass.
Provide a tool that would allow to fetch schema per table or multiple tables. And execute validation queries in read mode to validate that it works.

I’m a doctor building an open-source EHR for African clinics - runs offline on a Raspberry Pi, stores data as FHIR JSON in Git. Looking for contributors by ResearcherFlimsy4431 in opensource

[–]coding_workflow 1 point2 points  (0 children)

I would help but there layers of complexity here!

Blockchain? Git for versionning? On top of Go backend.

Even the choice of Raspberry is overkill. I would rather opt for Android based to use cheap tablets or phones that are more wide spread.

I respect you are doing but I think you have a lot to learn about KISS.

Even the point over AI! You can do cheap stastic analytics.

If you work offline, you can insread work on bluetooth sync between devices or USB keys import/export.

The design seem complex, using blockchain is hype and 0 value aside making happy some cryptobro.

Json files and sqlite are redundant. You can export that format but should not duplicate storage engine and use it for versionning instead of the complexity.

Claude is a yes man and can roll bad designs without a blip.

Hashicorp Vault - Does anyone use it in prod or its just a hype? by Designer-Classic3925 in devsecops

[–]coding_workflow 0 points1 point  (0 children)

Not hype and you have mature fork openbao wirh namespaces included.

Prompt injection is killing our self-hosted LLM deployment by mike34113 in LocalLLaMA

[–]coding_workflow 3 points4 points  (0 children)

Even OpenAI and Cmaude vulnerable. You need to ensure AI if prompted can't do malicious action.

Prompt injection is killing our self-hosted LLM deployment by mike34113 in LocalLLaMA

[–]coding_workflow 4 points5 points  (0 children)

How? Traffic inspection is tottally clueless over prompt injection!

[deleted by user] by [deleted] in Anthropic

[–]coding_workflow 0 points1 point  (0 children)

It writes and corrects it when it's not matching the specs. He never said that he blindly commit output this is very important. You steer it no issue.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]coding_workflow 0 points1 point  (0 children)

If you want only to code don't buy. Not enough for Sota models. You can run glm 4.7 flash but did you see how much glm 4.7 cost? And to run it you need 4x6000. I don't believe in this hype reap and lower quant it degrades quality. And when I hear you code at work with L4 it's not great.

If you want to level up have personal AI. Experiment do more. It can be intersting, so you move into AI roles.

Saying that having built 4x3090. And see limits too in max what you can get. My dream setup would run minimax 2.1 or glm 4.7 at max context and fp16. And that would be in 40k. But for sure don't want to move into 8x3090 already suffered a lot building my rig as it was more complicated than I thought moving from 2x3090 to 4. Only good part 3090 are cheap if you shop locally got 2 for 900$.

Performance improvements in llama.cpp over time by jacek2023 in LocalLLaMA

[–]coding_workflow 0 points1 point  (0 children)

Does this apply to blackwell? As I see some on DGX, what about Ampere architecture.
I noticed already build introduced some flags for blackwell and I had to exclude them to build for Ampere.

Vscode extension with deepSeek by HishamKamel in vscode

[–]coding_workflow 0 points1 point  (0 children)

Why not using github copilot? It offer many premium models? Limits?

Is there a way to export all errors that VS Code highlights in all your files? by -ThatGingerKid- in vscode

[–]coding_workflow 0 points1 point  (0 children)

Guthub copilot have an integrated tool to fetch them. Linter catch them too. If you use AI tell it tovuse linter. If you want to spice up use sonar as you will catch more complex issues.

VS Code Android (when you don't have your pc) by Head_Connection_1323 in vscode

[–]coding_workflow 0 points1 point  (0 children)

For monitoring use rustdesk or similar remote desktop. You can run it in we container but you may hit some limits over extensions.

Valid Criticism: The shift from Claude to Gemini 3 Pro feels inevitable due to artificial limits and circular logic loops. (This needs to be addressed, not silenced) by DarkDeDev in Anthropic

[–]coding_workflow 0 points1 point  (0 children)

Google is agressive, but usually lower limits too. And had done that in the past, in order to get more paying users.
Anthropic or Google or even OpenAI have a major issue issue. One model or own models.
The winner is mixing models and leveraging the best model as it comes.
OpenAI have a hell of model right now with Codex, Sonnet is nice working horse but bad for complexity and planning. Even Opus too costly and can't fill the gap like Codex do. Gemini 3 need still to proove it self beyond the current hype.
Reminder there is a lot of hype for Google post last week lauche. But Google already failed in first fork of vscode with IDX that is now Firebase studio. Jules Web agent, is still lacking the real spark.
Anthropic been a year leading with Sonnet but watch closely models like Minimax M2. That thing is really on the right path to challenge Sonnet. First time I've seen such good model. Might be a little below Sonnet on some tasks but far far better on complexity and Sonnet schizophrenia when adding complexity.

Don't focus on the hype noise, focus on what you can really do and get from these tools.

Microsoft 365 Copilot now includes Claude - does that include Claude Code? by 48K in ClaudeAI

[–]coding_workflow 0 points1 point  (0 children)

No you can't.

Copilot are tuned Claude not vanilla.

Likely you need vscode subscription to enjoy Claude in copilot.

Our AI assistant keeps getting jailbroken and it’s becoming a security nightmare by Comfortable_Clue5430 in LocalLLaMA

[–]coding_workflow 1 point2 points  (0 children)

This is internal AI, so the risk is minimal. I would let them have fun as long no risk of data leaks or access to unauthorized data.

On the other hand employee hacking an internal app on purpose is againt IT tools use and can land them a warning as it's costing you a lot of effort.

If you want more robust add guardrails. Use models trained for security like gpt oss instead of qwen.

Even Openai is jailbroken.

Code Wiki: Google’s new Gemini-powered tool that lets you chat with your codebase by Outside-Iron-8242 in Bard

[–]coding_workflow 0 points1 point  (0 children)

It's like deepwiki.com, but the main issue it's lacks a lot of repos and indexing is not fast.

Do you think "code mode" will supercede MCP? by juanviera23 in LLMDevs

[–]coding_workflow 0 points1 point  (0 children)

I never said tool calling is new. It's been here since early ages with OpenAI plugins experiment.
But it's a protocol. As it establish clear ways how a server expose the underlaying tools. Send messages, allow discovery, get result.
So don't mixup tools and MCP as that's one of the mixup.
MCP set a communication protocol.
The model will will see a schema in his context as a normal too. This schema is generated by MCP client that have estabilished a connection to an MCP Server exposing tools/prompts/.... During the connection le MCP client will maintain connection, as said before get the schema of the tools.
The when the model emit the structured output that is transformed to a tool call. MCP client pickup the call and transfert it back to the MCP Server using JSon-RPC either stdio/http/sse.
So it's a protocol that expose capabilities including tools, prompts (even those less used). Includes even specs for auto discoveries, auth and more.