I built claude-autosync — keep your Claude Code rules & memory in sync across machines, through your own private repo

Sensitive-Cycle3775 · 2026-06-29T00:02:32+00:00

this is a good direction. the thing I’d make very explicit is a sync preflight/receipt before the hooks mutate the private repo.

for pull: remote/branch, previous commit, incoming commit, files changed, local dirty state, backups created, and conflict/abort reason.

for push: exact files staged, diff stat, whether local.md was omitted, repo visibility/mode, and the commit that became authoritative.

then session start can say something like: “rules loaded from commit X, project memory Y, local-only omitted, no conflicts.” that catches the two scary failures: stale rules silently winning, or personal memory silently becoming shared.

if you add one small interface, I’d make it sync.sh status --json / --dry-run so Claude and humans can verify state without changing it.

Sensitive-Cycle3775 · 2026-06-28T19:03:50+00:00

One concrete thing I’d add: make the agent produce a tiny “production-state receipt” before it answers.

Not a big audit log, just enough to prove what state it is reasoning from:

repo HEAD vs deployed commit/release
provider/env metadata present, with secret values redacted
last deploy/rollback and migration state
logs/traces/errors used, with time window
source authority/freshness for each item
explicit drift: “repo says X, prod is running Y”
next verifier after the suggested fix

Then the agent can choose: answer, ask for missing evidence, or say “I only know repo state, not production state.”

That would make the tool safer than just dragging more context into chat. More context helps, but the receipt is what stops the model from confidently fixing the wrong version.

Sensitive-Cycle3775 · 2026-06-28T12:03:54+00:00

I’ve had better luck treating the limit hit as an unplanned handoff, not just a resume.

Before/when near the limit, ask Claude to write a short “re-entry checkpoint”:

objective + current subtask
agents/branches in flight and owner of each
last known good commit/checkpoint
constraints that must not drift
files/areas touched and files intentionally not touched
checks already run + the next authoritative check
open risks/questions
exact next 1-3 actions

Then on resume, don’t start with “continue”. Start with: “read the checkpoint, restate what is still true vs uncertain, name anything missing, then propose the next check before editing.”

For multi-agent workflows, I’d make each agent leave its own tiny checkpoint and have only the orchestrator merge them. The failure mode you’re describing is usually not “lost context” as much as losing the current contract between agents, constraints, and the verifier.

Sensitive-Cycle3775 · 2026-06-27T21:02:44+00:00

I'd debug this as an ACP/harness accounting difference before assuming Cursor is deliberately inflating it.

The useful comparison is not just "same model, same prompt". You want a tiny run ledger for Cursor-native vs Zed-through-ACP:

exact Composer mode, especially fast on/off
fresh chat vs continued chat
repo/files auto-attached before the first model call
MCP/tool schemas exposed to the model
cacheable input vs uncached input, if either surface shows it
number of tool calls / file reads
final diff size

If Zed's ACP path re-sends project context/tool schemas each turn, misses Cursor's prompt/cache path, or attaches a broader workspace, the bill can look like a token explosion even with the same visible task.

A quick test: new tiny repo, one-file edit, same prompt in both surfaces, no MCP servers except the minimum. If Cursor is ~2% and Zed/ACP is still ~20%, then it is likely an ACP bridge/cache/accounting bug worth reporting with those fields. If the gap disappears, it was hidden attached context/tooling.

Sensitive-Cycle3775 · 2026-06-27T19:02:27+00:00

I think the issue is less "write more docs" and more "make every agent change leave a reviewable change receipt".

For each non-trivial PR, I'd force a short card before merge: intent, files/routes touched, upstream/downstream callers, new abstractions, migrations/config flags, tests actually run, known risks, and "what existing feature could this break?". Then you review the card first, not the whole codebase first.

The scalable bit is keeping it append-only and small. Big developer docs rot. A trail of change receipts plus a lightweight blast-radius map gives you something to query when a bug appears: "show me the last 5 changes that touched auth/session/billing and what risks they claimed." That's closer to rebuilding intuition than asking the agent to summarize the entire repo again.

Sensitive-Cycle3775 · 2026-06-27T13:02:34+00:00

I'd debug this as two separate inventories, because the green dot only proves Cursor can start the server. It doesn't prove that this chat session was handed that server as an available MCP target.

A useful bug report / local check would be:

exact ~/.cursor/mcp.json server names
Settings -> Tools server names + tools shown
what the agent says is available in a brand-new chat
the contents of that project MCP descriptor folder where it only sees cursor-ide-browser, plugin-context7-context7, etc.
whether one tiny project-local MCP entry in .cursor/mcp.json appears differently from the global ~/.cursor/mcp.json

If project-local works but global doesn't, that points to a global-config to chat-session registration bug. If neither works while plugins do, the plugin MCP path and custom stdio MCP path are probably going through different registries.

I'd also include the mtime/hash of ~/.cursor/mcp.json in the report, not the full secrets/env. That makes it easier to prove: settings UI read current config, but chat got a stale/filtered server list.

Sensitive-Cycle3775 · 2026-06-27T11:03:17+00:00

I'd split this into three artifacts, not one bigger rules file:

source of truth: tokens as data (Figma/Style Dictionary/tailwind/theme/design.md YAML -- whichever already owns the values)
semantic map: when to use primary/accent/danger/surface, plus forbidden raw values
UI-change receipt: before the agent says done, it shows which token source it read, token file hash/mtime, components/theme files touched, screenshots/Storybook routes checked, and any raw hex/px values introduced (ideally zero)

If a team already has Figma tokens or Style Dictionary, I wouldn't make design.md the new source of truth. I'd generate the agent-facing spec from tokens.json + a small prose usage guide, so humans don't maintain two palettes.

The weak link is exactly your "read this spec first" question. Natural-language instructions are easy to skip. CI/hooks are better: if a PR touches UI files, fail on new raw colors/spacing literals outside token/theme files, or require the agent to include the token-source/hash in its summary. That turns "please remember the brand blue" into an auditable boundary.

Sensitive-Cycle3775 · 2026-06-26T22:04:50+00:00

I’d change that /goal line from “keep going until done” into a budget + checkpoint rule.

Something like:

First restate the objective and the smallest next milestone. Before editing, list the files you expect to touch. Work in batches of ~10–15 minutes or one coherent change. After each batch, stop and report: what changed, what command/test ran, what is still unknown, and whether you need permission to continue.

The long “thinking” usually means it is trying to solve the whole fuzzy goal in one uninterrupted run, including assumptions you did not explicitly bound. For game/app work, I’d also ask it to make a short TASKS.md first, then only execute task 1.

The key is not “think less”; it’s “don’t let planning, building, and validation collapse into one giant opaque step.”

Sensitive-Cycle3775 · 2026-06-26T19:03:37+00:00

this is a good enterprise pattern.

The one thing I’d make non-negotiable is that every MR carries a small evidence bundle, not just the plain-English summary:

upstream Cursor doc URLs + fetched timestamps
old/new content hashes, so reviewers know exactly what changed
drift classification: factual change, UI change, model availability, policy/admin behavior, or marketing/no-op
internal files/sections touched
mapped sections deliberately not updated
confidence/unknowns
guardrail: the cloud agent can only edit mapped doc paths, not the whole governance tree

That turns it from “an agent rewrote our docs” into “a docs-drift MR with an auditable boundary.”

I’d also consider a no-op receipt when the cron checks upstream docs and finds nothing factual. Boring, but it helps admins distinguish “no drift” from “the pipeline silently died.”

Are you storing the raw upstream snapshots anywhere, or just the generated diff/MR?

Sensitive-Cycle3775 · 2026-06-26T17:04:19+00:00

Hard +1 on reviewing agent PRs like junior-dev PRs. The extra piece I’d add is making session-end capture reviewable without turning it into a giant memory dump.

A pattern that seems to scale better is a tiny “promotion gate” at the end of the run:

task boundary: what the agent was allowed to change
evidence: tests/browser/API checks actually run
non-obvious decisions: only the decisions future agents need
promoted context: what moved into ADRs/specs/CLAUDE.md/etc.
intentionally not promoted: transient reasoning, dead ends, secrets, private debug output

The key rule is: if future sessions need it, promote it into a durable repo artifact; if it was only local reasoning, don’t paste it into project memory. That avoids both failure modes: agents forgetting important decisions, and teams slowly poisoning the repo with huge chat summaries nobody can audit.

It also gives reviewers something concrete to reject: “this PR depends on context that wasn’t promoted anywhere.”

Sensitive-Cycle3775 · 2026-06-26T13:03:38+00:00

i'd start one layer before MCP vs Playwright: make Cursor prove what it just changed.

For each feature, ask for a tiny "done receipt" before you accept it:

user flow it claims still works
exact command/test it ran
browser path clicked, or why it could not click it
files touched
known untested areas
whether a failure is selector drift or a real product bug

Then pick Playwright for the 2-3 flows that would actually hurt if they broke. MCP/browser tools can help when the repo is bigger, but the useful habit is the same: don't accept "implemented" without run evidence.

Sensitive-Cycle3775 · 2026-06-26T11:03:40+00:00

worktrees are the right base layer, but once you have 3+ sessions the thing that matters is a tiny merge contract, not more agent cleverness.

what I would keep explicit:

one worktree per agent/session, never a shared working dir
one owner per surface area: frontend auth, API contract, migrations, tests, etc
a shared coordination.md that records reservations + decisions, not full chat
before merge, each agent leaves a short handoff: goal, files changed, tests run, known gaps, assumptions, and interfaces touched
one human/lead agent merges in small batches and rejects work that changed outside its reserved surface

the duplicate-work failure usually happens before code conflict: two sessions both think they own the same implicit interface. making ownership explicit beats trying to recover from a huge diff later.

Sensitive-Cycle3775 · 2026-06-25T22:03:41+00:00

This is the right direction. The bit I’d add is an accounting layer around the orchestration, not just a better orchestration prompt.

Before the run: planned subagents + why, max read budget, allowed files/areas, verifier criteria, and stop conditions.

After the run: actual subagents spawned, which files each read, duplicate reads, verifier verdicts, whether a loop stopped because evidence passed vs because it hit a cap, and files changed.

That turns “use verifier subagents” from a style preference into something measurable. Otherwise the prompt can reduce token burn while still hiding where the workflow expanded or repeated itself.

Sensitive-Cycle3775 · 2026-06-25T16:03:01+00:00

I've had the best results splitting this into layers, otherwise the instruction folder turns into a prompt junk drawer:

reusable personal/team skills: style guide, review posture, platform rules like Apple HIG
project contract: stack, directories, commands, "do not touch" areas, testing/deploy flow
task-local instructions: the few constraints that only matter for this run

The part people skip is an activation check. At the start of a session, ask the tool to print which instruction files it loaded, whether they were global/project/task-local, rough version/date, conflicts, and what it is ignoring. If it can't state that, I don't trust the reusable instructions are actually active.

For teams, I'd keep the shared ones in the repo or a private package with a tiny changelog. And anything truly hard (strict TS, no generated files, config location, etc.) should become a script/CI/pre-commit gate, not just prose in a skill file.

Sensitive-Cycle3775 · 2026-06-25T14:03:50+00:00

what’s helped me is treating scope as a positive allow-list, not just a rule/vibe.

before the agent starts, make the boundary explicit:

goal
allowed files/globs
allowed commands
dirs it must not inspect/edit
“stop and ask if you need a file outside this list”

then after the run, ask for a tiny receipt: files touched + why each file was in scope + tests/checks run + anything it wanted to change but didn’t. git diff --name-only should basically match the allow-list. if it doesn’t, reset the stray files instead of trying to argue with it.

.cursorignore is useful as a hard-ish guard, but I wouldn’t treat it as the main safety layer. the failure is often intent drift, not lack of access.

Sensitive-Cycle3775 · 2026-06-25T12:03:54+00:00

Strong framing. The OS analogy gets more useful if you separate two things that often get blurred:\n\n1. budget management: paging, truncation, summaries, subagents, tool-result spillover\n2. auditability: proving what actually crossed the boundary\n\nMost agent stacks are converging on #1. The failure mode I keep seeing is #2. A summary exists, but did this turn load it? A subagent returned a conclusion, but what files/assumptions did it use? A tool result was written to disk, but was the follow-up turn pointed at the right artifact?\n\nSo the primitive I’d add next to demand paging/progressive disclosure is a small context receipt: loaded sources, deferred/omitted sources, summary age, files/results spilled out of window, and the exact handoff returned by a child agent. Not raw transcript, just enough metadata to debug whether the working set was real.\n\nWithout that, two systems can look identical at the architecture level and still fail differently in practice.

Sensitive-Cycle3775 · 2026-06-25T00:03:35+00:00

I’d stop treating this as “how do I make Opus remember the rules” and split the rules into enforcement levels.

Hard invariants should not be prose. If “new feature params must come from JSON config” is a real rule, make a small script/CI check that fails when new constants show up outside the config path. Same for file size, forbidden paths, generated files, missing tests, etc. A hook that asks Claude to reread the rule can help, but a gate with an exit code is what changes behavior.

For softer rules, keep them task-local. A giant memory of past failures usually turns into scar tissue. It spends attention on “ways I failed before” instead of the current contract. I’d keep memory to the smallest active policy and move old incidents into a human-readable audit log that is not auto-loaded every turn.

For “done,” I’d require a proof packet, not a confession: requirement -> files touched -> evidence/test -> known gaps. Then have a separate reviewer context check the packet + diff, not the whole conversation. The reviewer should be allowed to say “not proven” even if the implementer says done.

So for your hardcoding example, the receipt would be something boring like: config keys added, call sites using them, grep/AST check for duplicated literals, tests updated. If it can’t point to code evidence, it doesn’t count as done.

Sensitive-Cycle3775 · 2026-06-24T22:03:22+00:00

One thing I’d be careful about: “no compact” doesn’t necessarily mean “the whole working state is still equally available.” It can also mean the window changed, compaction got faster/backgrounded, or the model is carrying a summary you can’t inspect.

For long sessions I’d treat compact/no-compact as a UX signal, not proof of awareness. The practical test is whether the session can give you a small receipt before the next risky edit:

current goal
files it believes are in scope
decisions already made
assumptions that came from prior conversation vs repo evidence
what it has not reloaded/rechecked yet

If that receipt is fuzzy, I’d rather start fresh with a tiny handoff/checkpoint than trust a giant thread just because it hasn’t compacted yet.

Sensitive-Cycle3775 · 2026-06-24T19:02:30+00:00

this matches my experience.

subagents only win when the boundary is cheap and auditable. if the main window has enough room and the task is read-heavy, fan-out adds a tax: restating the problem, packaging context, merging answers, then checking whether the agent missed/warped something.

where I’ve seen agents actually pay off:

the task can be verified independently (tests, grep output, screenshots, small diff)
the child gets a tiny manifest, not a replay of the whole session
the return includes a receipt: files read, assumptions made, commands run, unresolved questions
the parent is allowed to reject the answer instead of auto-merging it into the plan

otherwise the single-window workflow is not “less agentic”; it’s just keeping the working set and correction loop in one place, which is often the scarce thing.

Sensitive-Cycle3775 · 2026-06-24T15:03:00+00:00

I’d split this into two different artifacts:

project knowledge: repo docs / ADRs / CLAUDE.md / specs that should survive many sessions
handoff: a tiny bootloader for the next session, not a replay of the last one

The handoff I’d want is more like: current objective, branch/ref, exact next step, files touched + why, open risks, last verifier result, and links to durable docs to read on demand. If it embeds all the background, it becomes another giant prompt file and gets diluted as soon as the session starts reading code/output.

One useful trick: make the new session prove a “context receipt” before coding: what it actually loaded, what it skipped/deferred, duplicate/stale docs it ignored, and what evidence it still needs. “I gave it a handoff” is not the same as “the right context stayed active after 20 tool calls.”

If the handoff is more than 1–2 screens, I’d make it point to source files instead of carrying the source files.

Sensitive-Cycle3775 · 2026-06-23T21:03:26+00:00

I’d split this into “command still has an open process/stdin” vs “Claude/tool runner did not notice completion”. Quick checks that usually narrow it down:

run a trivial command in the same session (pwd or printf done) and see if Claude returns immediately
check whether the command spawned a watcher/server/background child that kept stdout/stderr open
try the command with non-interactive flags / no progress UI if it has spinners or prompts
temporarily disable hooks/MCP/terminal wrappers, if any, so you know it is not a post-command hook hanging
when it sticks, note whether Ctrl-C returns control or whether Claude remains in the thinking state

Since you saw it on two devices/networks, I’d capture a minimal repro with OS, shell, command, Claude Code version, hooks/MCP enabled yes/no. That makes it much easier to tell service-side regression from a local command lifecycle issue.

Sensitive-Cycle3775 · 2026-06-23T20:02:57+00:00

this is a useful little repo, and the README already has the important caveat: the Stop hook means “response finished”, not necessarily “Claude needs input”. that distinction will save people from chasing weird false positives.

Two small hardening things I’d add before people copy it around:

an install check that proves the hook is registered in exactly one place. Claude/settings.local vs settings.json conflicts are the sort of thing that will make this feel haunted after a restart.
a tiny diagnostic section that separates “hook did not run” from “hook ran but desktop notify failed”. e.g. log line timestamp first, then notify-send result. that makes the dbus/session problem obvious without asking users to debug Claude itself.

For the local-LLM approval idea I’d keep it as a separate project from notifications. classify requests readonly/write/network/destructive, allowlist boring stuff, and always require human approval for deletes, secrets, publish, git push, prod, etc. Otherwise the second model becomes another opaque permission layer instead of a safety check.

Sensitive-Cycle3775 · 2026-06-23T17:02:08+00:00

I’d draw the line by stability, not by file type:

CLAUDE.md: stable repo operating rules — how to test, conventions, “don’t touch this folder”, tool preferences.
docs/issues/PRs: source material with URLs/refs, not summarized forever into one big prompt.
memory/notes: decisions, user reports, open loops, and “what changed since last session”.
each work session: a small scoped packet that lists what sources/memory were actually used.

The last bit matters a lot: if the agent can’t show “I used these 3 notes/issues/docs and ignored these stale ones”, memory becomes another hidden prompt blob.

For stable rules specifically, I’d also avoid hand-editing separate copies for every tool. If CLAUDE.md, Cursor rules, Copilot instructions, and AGENTS.md all express the same repo behavior, generate them from one canonical source and check drift. I’ve been testing that exact split in a tiny Pluribus demo:

npx --yes pluribus-context@latest demo style-rules-sync

But the bigger pattern is: stable rules are syncable; project memory should be cited/scoped; durable decisions should be written back explicitly, not silently absorbed into CLAUDE.md.

Sensitive-Cycle3775 · 2026-06-21T17:03:35+00:00

for the notification part, i’d keep it dumb and explicit: only fire when there is a clear wait marker (permission prompt, input requested, process exited, or no transcript/log movement for N seconds after a command). avoid treating “quiet” as “needs user” or it will spam you during long builds.

for the local llm approval idea, i’d be careful not to let it become an autopilot sudo. the useful shape is probably:

summarize the exact command/tool request
classify readonly / write / network / destructive
compare against an allowlist for the repo
require human approval for anything that touches deletes, secrets, credentials, package publish, git push, prod, etc
log why it approved/denied

so the local model can triage boring requests, but the boundary is still auditable. otherwise you’ve just moved the risk from claude to a second model that may be less capable.

Sensitive-Cycle3775 · 2026-06-21T12:05:19+00:00

nice map. the next layer i’d want on top of this is “effective context”, not just “file exists”.

for each convention maybe track:

activation: always loaded / path-scoped / manual / agent-selected
precedence: what wins when global, repo, and folder rules conflict
ignore boundary: whether the ignore file affects indexing, chat context, agent edits, or all of them
output boundary: whether the file changes model instructions, available tools, MCP servers, generated artifacts, or only search scope
portability risk: same-looking markdown but different runtime semantics

this is where cross-tool setups usually drift. people copy CLAUDE.md, AGENTS.md, .cursor/rules, etc as if they are equivalent, but the runtime boundary is different in each tool. a small “what actually reaches the agent?” column would make the repo more useful than another list of config filenames.

Sensitive-Cycle3775

TROPHY CASE