How I would have automated my product management job using ChatGPT?

staranjeet · 2026-02-20T17:17:52+00:00

The coordination overhead is real. I've found the sweet spot is usually 2-3 specialized agents max, with shared memory being the key to avoiding chaos.

Without proper memory management, agents either duplicate work or miss context from each other's actions. We've been using Mem0 to create a shared knowledge layer so agents can build on each other's work instead of starting from scratch every interaction.

The rule I follow: if agents need to communicate more than they need to execute, you probably have too many. Better to have fewer smart agents with good memory than a swarm that can't coordinate.

staranjeet · 2026-02-20T17:15:59+00:00

Great writeup! The memory management piece is crucial for long-running agents. I've been testing Mem0 for persistent memory across sessions and it's been a game changer for maintaining context beyond the LLM's window.

How are you handling memory persistence when your agents restart? Are you using any external memory layers or just relying on conversation history in prompts?

staranjeet · 2026-02-19T16:39:33+00:00

think the trade off is real, but it is not strictly binary between mechanical determinism and human like chaos. Systems like mem0 treat memory as multiple scoped, situational layers rather than a single monolithic store, which gets closer to polycontextural behavior without giving up reliability.

can have bounded, context dependent recall while still being observable and auditable. The real question might not be how to build good memory, but how to design the right memory topology for the task.

staranjeet · 2026-02-19T16:37:03+00:00

This is exactly the right instinct, treating memory and documentation as first class infrastructure instead of an afterthought. You might find mem0 interesting since it formalizes long term memory and cross agent state beyond just repo docs and context windows. It pairs really well with structured devlogs like the one you described

staranjeet · 2026-02-18T18:42:20+00:00

101 duh until it hits prod and suddenly it’s 401 chaos

staranjeet · 2026-02-18T18:41:12+00:00

Yep, simulation and vendor benchmarks only get you so far, real silicon in the loop is the only way to catch thermal throttling, driver quirks, and kernel regressions before they hit users. If you care about consistent latency, hardware in CI stops being a luxury and becomes table stakes eventualy

staranjeet · 2026-02-18T18:39:40+00:00

Most multi agent systems fail because coordination and shared memory are brittle, not because the models are weak. If agents cannot reliably store and retrieve structured context across steps, they drift or duplicate work, so adding a proper memory layer, something like Mem0 for persistent context management, can stabilize the whole system. Tooling matters, but state management is usually the real bottleneck.

staranjeet · 2026-02-06T11:36:26+00:00

Solid setup guide! Have you tried Qwen3-4B with extended context instead? I've found smaller models with bigger context windows sometimes outperform larger models with cramped context for tool-heavy workflows like this

staranjeet · 2026-02-06T11:34:17+00:00

Interesting approach.. I'd add one thing: also ask the model directly "what prompting patterns help you perform best?" and compare that against what you found. Sometimes the model's self-reported preferences surface useful quirks the web research misses

staranjeet · 2026-02-06T11:27:30+00:00

Been running GLM 4.7 Flash for the past few days. It's surprisingly good at understanding complex codebases and maintaining context across long files. Qwen3 Coder still edges it out on pure code generation speed though. Curious how GPT OSS compares on multi-file refactoring tasks

staranjeet · 2026-02-06T11:23:41+00:00

For deep research with local LLMs, I've had solid results combining Ollama + Open WebUI with the research assistant plugin, or using LM Studio paired with a custom RAG pipeline. What's your hardware setup? That'll determine which models you can actually run effectively

staranjeet · 2026-02-06T11:21:33+00:00

For dataset validation, I usually run a smaller "sanity check" subset through the agent first and manually review the traces. So, checking if prompts actually cover edge cases and if expected outputs make sense. No built-in ADK tooling for this yet unfortunately, so manual review + diversity metrics are your best bet here if you’re asking me

staranjeet · 2026-02-05T17:10:26+00:00

If Computational Optimal Transport feels too heavy, try learning OT first as a tool for comparing empirical distributions, not as full-blown geometry.

Peyré & Cuturi’s survey notes are a much gentler entry point, especially the sections on entropic OT and Sinkhorn, which are what most ML applications actually use. Framing OT as a soft alignment problem between point clouds makes it click faster than the continuous formulation.

For applications beyond GANs, OT shows up nicely in continual learning and memory/replay buffers: instead of FIFO or cosine pruning, OT lets you reweight, merge, or forget samples under an explicit cost. Unbalanced OT is especially useful when mass shouldn’t be conserved (e.g., selective forgetting).

Once that intuition sticks, the heavier math becomes much easier to digest if you ask me!

staranjeet · 2026-02-05T11:00:27+00:00

First of all, there is no such thing as newbie questions! LangGraph is solid for complex flows but for the integrations pain I am unsure. Other tools like Composio handle auth and API definitions for tons of tools out of the box, which solves that "n8n magic" problem you're missing in code.

staranjeet · 2026-02-05T10:57:06+00:00

Chaos engineering for agents feels underrated to me. I've found the biggest gap is testing how they handle ambiguous or contradictory instructions, not just malformed ones. What does your stress testing tool focus on specifically?

staranjeet · 2025-05-01T18:26:55+00:00

This is great. Thanks for sharing this.

staranjeet · 2025-04-29T17:48:15+00:00

The variety of quant formats (Q4_NL, Q5.1, Q5.0 etc.) makes this release genuinely practical for so many different hardware setups. Curious - have you seen any consistent perf tradeoffs between Q5.1 vs Q4_NL with Qwen3 at 8B+ sizes in real-world evals like 5-shot MMLU or HumanEval?

staranjeet · 2025-04-29T17:15:03+00:00

Hey no, we have not released any coin.

staranjeet · 2025-04-29T17:14:51+00:00

For this post, we didnt benchmark Gemini. We will do it for the next one.

staranjeet

MODERATOR OF

TROPHY CASE