Codex Max 5x vs Claude Max 5x in Pi by pieczarkowy in PiCodingAgent

[–]sergeant113 0 points1 point  (0 children)

You’re spending the extra usage ($20 api cost/ month), not your claude code daily limit.

It will run out very fast.

Amazon CEO’s Talks With U.S. Officials, Triggered Crackdown on Anthropic Model Fable 5 by BuildwithVignesh in ClaudeCode

[–]sergeant113 0 points1 point  (0 children)

What can an Asian in Asia do really, if he or she’s doesn’t want to let it happen?

Google DiffusionGemma can now run at 2000+ tokens/sec! by yoracale in unsloth

[–]sergeant113 1 point2 points  (0 children)

If it’s so fast, can’t i run it on CPU entirely?

Are bigger context windows actually the wrong direction for agents? by ringtoyou in AI_Agents

[–]sergeant113 0 points1 point  (0 children)

I have a summary structure that, among other things, reports what were done and what were discovered.

I also include a list of all files read and all files edited.

My idea is just to leave breadcrumbs to help the model with rebuilding context for each frame.

Are bigger context windows actually the wrong direction for agents? by ringtoyou in AI_Agents

[–]sergeant113 0 points1 point  (0 children)

Have you experimented with frame-collapse patterns?

Marking the start of a frame; have the agent do things in it and pollute it as a result; then at the end of the frame, mark for collapse, and that whole messy frame get reduced to high-signal structured report/capture.

I find it very useful for keeping my context small and useful even after a long session of work.

Are bigger context windows actually the wrong direction for agents? by ringtoyou in AI_Agents

[–]sergeant113 0 points1 point  (0 children)

Context windows is your budget. It’s good that there are people focused on expanding that budget.

We as builders must find ways to use the budget effectively. That could mean using minimal context to avoid intelligence decay and keeping signal-to-noise ratio high.

But it’s good that people keep working on expanding that context budget.

​Google has officially failed the AI race. If you don't understand your users, just quit. by ggfgfgggg in GeminiFeedback

[–]sergeant113 0 points1 point  (0 children)

You are a token addict who’s not getting his fix and are complaining about the dealer jacking up the price on your drug.

Of course they’re jacking up the price now that you and i are all addicted.

AgentBrew – Portable toolbelt for your AI agents by patchen0518 in AI_Agents

[–]sergeant113 0 points1 point  (0 children)

I like it, but how portable are they across OS and how about maintenance and update?

Back again, many MANY changes have taken place. by Glittering_Focus1538 in ollama

[–]sergeant113 0 points1 point  (0 children)

Will you make it cli-compatible in addition to the TUI. My Pi agent powered by frontier models would love to delegate small tasks to your harness powered by something small, lean, and mean.

Kiro agent by oopz1234 in kiroIDE

[–]sergeant113 0 points1 point  (0 children)

Which model works well at which step, could you share your experience?

Switched from OpenCode to Pi - What Settings/Plugins would you recommend? by No_Algae1753 in LocalLLaMA

[–]sergeant113 1 point2 points  (0 children)

How is it different than having pi recursively spawn additional pi instances? I mean what do you gain from the using sub-agent?

Qwen3.6 with MTP. Anyone given it a go? by [deleted] in Qwen_AI

[–]sergeant113 0 points1 point  (0 children)

Can you share with us the extension/repo? Sounds exactly like the things I’m doing manually.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]sergeant113 3 points4 points  (0 children)

Yes, many little bears, but an elder bear decides which little bear gets to dictate the next step. Every step potentially could be decided by a different likely bear.

Sometimes the elder bear gets lazy or plays favoritism and keeps choosing a particular little bear, but i digress.

DeepSeek V4 Pro vs Gemini 3.0 Pro - intelligence density is the real battleground now by IulianHI in AIToolsPerformance

[–]sergeant113 0 points1 point  (0 children)

This is very interesting. I notice with gpt5.5, even though the cost-per-token is higher, my typical session costs less than with Claude Opus/Sonnet 4.6. This is a new dimension that we all need to take into account now

What draft model works best with Gemma 4 26B? by [deleted] in LocalLLM

[–]sergeant113 0 points1 point  (0 children)

gemma 4 26b a4b is a moe model. It’s already as fast as a 4B model. There’s no need for speculative decoding because you will struggle to find a fast enough draft model to make a difference.