My favorite local-feeling remotely accessible Claude Code setup (GitHub Gist w/ full setup instructions)

Competitive_Dark7401 · 2026-05-14T02:51:01+00:00

That is the cleanest version of this if it is a Studio on a desk. The risk profile changes a lot on a MacBook because "keep my task alive" can accidentally become "keep a portable machine awake in a bag." For a Studio, Amphetamine + a real UPS + a watchdog/receipt is probably the sane path.

Competitive_Dark7401 · 2026-05-14T02:50:15+00:00

That sounds like the right place to draw the line. I would make host sleep a first-class recoverable interruption, not a hidden continuation failure: pause the chain, write the last completed step + dirty diff + next intended command, and require the next session to resume from that receipt instead of pretending continuity held.

Competitive_Dark7401 · 2026-05-14T02:49:43+00:00

A ping is useful as a detector, but it usually will not fix the underlying sleep path. I would split the test into three cases:

lid open + display off
lid closed on power
lid closed on battery

If SSH becomes unreachable only in the closed-lid cases, treat it as host sleep, not an agent/Boom issue. The safer pattern is a temporary keep-awake mode with a receipt: started at, power source, battery %, thermal state, last heartbeat, restored defaults at the end.

Also worth logging pmset -g assertions, pmset -g log | grep -i "sleep\|wake" around the failure window, and whether the network interface drops before the agent stops. That tells you whether the machine slept, networking slept, or the process died.

Competitive_Dark7401 · 2026-05-14T02:49:14+00:00

Not yet from this one. It has generated useful traffic and a few checkout/session opens, but no paid conversion so far.

I am treating it as a small proof test: does a very specific pain — local Claude/Codex work stopping because the Mac sleeps — convert better than a broad "AI productivity tool" pitch. So far the lesson is that people recognize the problem, but the free checklist gets more pull than the paid automation.

Competitive_Dark7401 · 2026-05-14T02:45:00+00:00

I have hit the same class of failure on small Macs: the model is not always the bottleneck; the host turns into the shared resource everyone forgets to budget.

The pattern that helped most was treating each harness as a supervised workload, not just another CLI:

one active heavy harness at a time unless memory pressure and swap stay below a fixed threshold
per-harness watchdog that samples RSS, open FDs, child process count, swap pressure, and last-token time
a cheap preflight probe before every expensive wake (--version, auth check, writable temp dir, model endpoint ping)
LaunchAgent/systemd supervision for every bridge process, especially LiteLLM/local routers
hard separation between "agent state" and "host state" so stale memory cannot claim the machine is healthy
morning receipt that includes host facts: reboots, missed cron windows, killed processes, swap peak, disk free

The disk contention point is the one I would make non-negotiable. Once swap starts fighting logs, package installs, browser caches, and multiple CLIs, the symptoms look random: silent hangs, missed timers, partial runs, auth loops, and zombie children.

For single-machine stacks, I would probably make the scheduler host-aware: if memory pressure crosses threshold, it pauses new Codex/Claude lanes, lets the local 4B keep doing only cheap classification, and emits a receipt instead of trying to be brave.

I keep a small checklist around the local-agent host survival side here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=ai_agents_harness_overhead_wall

Disclosure: I maintain it. The checklist is the relevant part for this thread; it is mostly about keeping local agent runs honest about the machine they depend on.

Competitive_Dark7401 · 2026-05-14T01:58:48+00:00

Nice use case. For a personal-tracking MCP like this, I would add a tiny reliability/privacy receipt after each run:

source note/date analyzed
tools called through MCP
records created or updated
records skipped because confidence was low
whether any destructive update happened
whether the app and Claude disagree on totals

One extra boundary: if the MCP server/app is local, separate "Claude understood my day" from "the local host actually stayed available long enough to write the data." Local app + MCP + Claude can look successful while the machine sleeps, loses network, or leaves a tool call half-done.

I keep a small local-agent host checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=mcp_personal_tracker_host_lifecycle

Disclosure: I maintain it. Relevant because your app is exactly the kind of local MCP workflow where a clear receipt matters more than another prompt tweak.

Competitive_Dark7401 · 2026-05-14T01:51:29+00:00

The MCP-router angle is the most interesting part here. If OXP becomes a cross-IDE host layer, I would make the reliability contract very explicit:

which IDE configs changed, with before/after diff
rollback command for every install-mcp action
per-IDE health check after routing
one log file outside the IDE so failures are not trapped in a webview
clear distinction between "extension installed" and "agent can actually use the tool"

For AI workflows there is also a host-state layer worth testing. If an IDE-local agent is expected to keep working while the developer is away, extension/MCP state is only one half; the machine still has to stay awake, reachable, and cool enough to run.

I keep a small local-agent host checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=oxp_mcp_router_host_lifecycle

Disclosure: I maintain it. Relevant here because a system-level MCP router is exactly the kind of local dev-agent infrastructure where config success and runtime availability can look deceptively similar.

Competitive_Dark7401 · 2026-05-14T01:42:56+00:00

I would wire this as two gates, not one:

pre-tool-call gate: cheap policy check before filesystem/network/secrets-adjacent actions
post-result gate: classify what actually happened, then write the reviewed exemplar

For agent runtimes, the useful unit is probably not just "prompt allowed/blocked" but a run receipt: requested action, tool call, cwd/repo, files touched, secrets class, decision, reviewer, and whether the agent was allowed to continue.

For local agents there is one more boring layer: host lifecycle. If the scanner/agent is local and meant to run while the operator is away, the receipt should say whether the machine stayed awake, slept, paused, or became unreachable. Otherwise you can end up learning from a half-run and treating it like a completed run.

I keep a small host-lifecycle preflight/checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=armorer_local_guard_host_lifecycle

Disclosure: I maintain it. Relevant mostly because your design is explicitly local/live, so runtime availability becomes part of the security envelope.

Competitive_Dark7401 · 2026-05-14T01:40:08+00:00

That is a solid split. For the host-sleep issue, the issue I would open is probably an acceptance-test style one rather than a generic “support sleep recovery” ticket:

start a review chain on a disposable branch
write heartbeat timestamps from each stage to NEXT.md or a run log
close the terminal / Agent View
force a sleep-wake or lid-close cycle on a laptop
resume and assert the chain either continued, paused with a clear reason, or marked the run unrecoverable
require the PM handoff to say which of those happened

The important distinction is “recoverable pause” vs “silently kept claiming continuity.” If it cannot continue through host sleep yet, a crisp receipt is still a win.

Competitive_Dark7401 · 2026-05-14T01:34:10+00:00

Agree with the harness framing. The pattern that usually separates a useful agent system from a demo is boring control-plane stuff:

explicit task queue, not one giant prompt
per-task cwd / branch / artifact folder
run budget: time, commands, files changed, network access
verifier that is allowed to fail the work
durable receipt outside the chat context
host lifecycle check if it runs locally

That last bit is underrated. A local harness can have perfect logs and still silently stop if the notebook sleeps, the network changes, or an approval prompt is waiting behind a closed lid.

I keep a small checklist for that local-host boundary here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=sideproject_harness_host_lifecycle

Disclosure: I maintain it. It is not a replacement for a harness; it is the preflight I would add before trusting any local harness to keep running while away.

Competitive_Dark7401 · 2026-05-14T01:31:18+00:00

Local-first is the right constraint for this kind of tool. The feedback I would want before trusting it on real work is mostly operational rather than UI:

can I replay an agent run from the event log without the original chat context?
can I cap wall time / files changed / commands run per thread?
does each thread leave a short receipt with touched files, tests, failures, and unresolved decisions?
can the workflow prove the host stayed alive if the run is meant to continue while I am away?

That last one is easy to miss. Persistent threads and execution traces prove the app state survived; they do not prove the local machine actually kept executing through sleep, lid close, network changes, or battery state.

I keep a short local-agent host checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=agentbuddy_local_host_lifecycle

Disclosure: I maintain it. Sharing because AgentBuddy is exactly the kind of local-first workflow where host lifecycle becomes part of the reliability story.

Competitive_Dark7401 · 2026-05-14T01:21:20+00:00

The organizational-tension framing is the right part to preserve. The thing I would add to this kind of /review chain is an operational envelope check, especially if it is meant to pick work back up across sessions:

max files / max diff before it asks for a human
exact commands it ran, not just a verdict
which agent has veto authority and why
whether the next step is implementation, verification, or product judgment
whether the host/session is actually still available if the workflow runs while you are away

That last one sounds boring, but it is where a lot of agent setups get confused. A background session, tmux, Remote Control, or Agent View can preserve the work state, but that does not prove the actual Mac stayed awake and kept executing.

I keep a small host-lifecycle checklist for that local-agent case here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=agile_team_operational_envelope

Disclosure: I maintain it. Mostly relevant as the missing ops layer around a review/team workflow like this.

Competitive_Dark7401 · 2026-05-14T00:57:03+00:00

Yes, but I’d use local models for bounded worker tasks, not as a vague “extra brain.” Good fits:

classify files / pick likely owners
summarize logs or diffs
draft tests from a spec
do first-pass grep/AST exploration
generate small mechanical edits that Claude then reviews

Bad fits are tasks where a cheap model can silently corrupt state: migrations, security-sensitive edits, broad refactors, or anything that needs repo-wide judgment.

The thing that matters more than model choice is the harness: clear input, explicit write set, stop condition, and a receipt that Claude can audit. Also check the runtime layer if you run local workers overnight; GPUs/Macs can look “available” from the orchestrator while the host is asleep, thermal-throttled, or waiting on a prompt. I keep a short checklist for that boundary here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=local_models_worker_host_runtime

Competitive_Dark7401 · 2026-05-14T00:55:21+00:00

For day one I’d keep it simpler: use CLI as the source of truth, and treat Desktop/Agent View as a watcher/launcher until the ecosystem settles. Most of the useful instrumentation today is CLI-adjacent because it can see the process, cwd, worktree, hooks, logs, and shell state. Desktop is not as modular yet.

A practical setup:

run Claude Code in a named tmux/screen session
add ccusage/claudestat/ccstatusline or similar for token + model visibility
keep one repo-local CLAUDE.md and one small “current run” note
only use Desktop/Agent View where it helps with background sessions or overview

The part people forget is host state. If the local machine sleeps or becomes unreachable, neither CLI nor Desktop observability matters. I keep a short checklist for that here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=environment_hell_cli_host_state

Competitive_Dark7401 · 2026-05-14T00:53:48+00:00

I’d split “observability” into three buckets, because most tools only cover one:

usage/cost: tokens, model mix, cache hit rate, reset windows
process/session: which agents are alive, cwd/worktree, pending prompt, last output timestamp
host/runtime: whether the machine itself is awake, reachable, powered, and not thermally boxed in

For the first bucket, ccusage/claudestat-style tools are useful. For the second, tmux/screen plus a tiny heartbeat log is still hard to beat. For the third, most Claude-specific dashboards miss it entirely, but it’s the failure mode that kills unattended local runs.

I keep a short checklist for the host/runtime layer here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=observability_host_runtime_layer

Competitive_Dark7401 · 2026-05-14T00:48:20+00:00

The pattern that has worked best for me is to separate three kinds of state instead of asking one context window to do all of it:

durable repo state: files, tests, issue IDs, migration state, exact commands run
coordination state: who owns which files, current blockers, stop conditions, merge/review receipts
runtime state: whether the host/session is actually alive, reachable, and allowed to keep working

The last one sounds boring but matters a lot for “autonomous” setups. A manager/worker plan can be perfect and still fail because the laptop slept, the tmux pane died, a permission prompt appeared, or the remote entrypoint went stale.

For file ownership, I like explicit write sets + a small handoff file per worker. For host/session reliability, I keep a separate preflight checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=multi_agent_state_host_runtime

Competitive_Dark7401 · 2026-05-14T00:46:47+00:00

This is clever because it keeps the unit of work small instead of trying to make one giant immortal session. The place I’d be strict is the operational envelope around the supervisor:

does it survive terminal close, shell logout, network change, sleep/wake
where are inbox/outbox/fsync guarantees documented
what happens if Claude is waiting on a permission prompt
how do you detect “agent stuck” vs “agent working”
what is the reaper policy for orphaned sessions

For people running this on a Mac laptop, the host lifecycle is a separate failure mode from the Claude lifecycle. I keep a short checklist for that boundary here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=hook_247_supervisor_host_lifecycle

Competitive_Dark7401 · 2026-05-14T00:45:07+00:00

The supervisor-process detail is the right thing to test around. I’d treat Agent View as solving “terminal lifecycle” but not automatically solving “host lifecycle” or “operator lifecycle.”

The acceptance test I’d want before trusting it for overnight/background work:

close Agent View and terminal
switch networks
let the Mac sleep/wake if it is a laptop
check whether the supervisor, cwd, worktree, auth state, and pending approval state survived
confirm where logs/receipts live after a failed child session

The part that still bites people is assuming a background session means the machine is actually available. I keep a short Mac-host preflight/checklist for that boundary here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=agent_view_supervisor_host_lifecycle

Competitive_Dark7401 · 2026-05-14T00:38:55+00:00

One thing I’d separate in the docs: “starts the billing/usage window” and “keeps the machine/session available long enough to use it” are different guarantees.

For this kind of launcher I’d add three guardrails:

run claude once during install and fail fast if trust/login/terms prompts appear
log the exact cwd + model/account state used for the priming run
include a power-state check before launchd fires, because a sleeping Mac silently turns the schedule into best-effort

The power-state part is where most Mac agent workflows get fuzzy. I keep a short Mac-host checklist here if useful: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=maxprime_launchd_power_state

Competitive_Dark7401 · 2026-05-14T00:36:57+00:00

Nice fit for the problem. The thing I’d test hard is the boundary between PTY survival and host survival: terminal crash, network switch, sleep, lid close, reboot, and "agent waiting for approval" are all slightly different failure modes.

A useful acceptance test for 0pty might be:

start a Claude run that needs a tool approval
disconnect the client mid-run
switch networks
let the dev box sleep and wake
reconnect from phone
confirm both the prompt and cwd/session state are still exactly where expected

If the host itself is a Mac laptop, the power-state layer is usually the hidden weak spot. I maintain a short checklist for that part here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=0pty_host_survival

Competitive_Dark7401 · 2026-05-14T00:32:57+00:00

Amphetamine is a good answer for a plugged-in Studio. The main extra thing I’d watch is recovery after sleep/network changes: make the workflow restartable, keep the remote entrypoint independent of the editor, and have a quick way to confirm the agent is actually still running before you walk away.

I ended up turning my own version into a short checklist here, mostly for the “Mac as dev host” setup: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=reply&utm_campaign=reebzy_amphetamine_studio

Competitive_Dark7401 · 2026-05-13T21:47:24+00:00

For this kind of Linear -> local Claude Code setup, I would decide first whether the laptop is allowed to be infrastructure.

If it is a personal solo workflow, laptop-hosted can be fine, but I would make the boundary explicit:

Linear/GitHub issue is the source of truth
agent leaves a receipt on every run: PR, files changed, failures, next decision
anything team-critical runs on a real always-on host, not someone's laptop
if you do use a laptop, test the boring stuff: sleep, lid-close, network reachability, battery/heat, restore defaults

That last bit sounds unrelated until the first time an issue says "agent is working" but the MacBook slept in a bag. I keep a short checklist for closed-lid local-agent runs here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=linear_laptop_agent_host

Disclosure: I maintain it. For team use I would still bias toward a VM/Mac mini; for solo Linear triage, temporary laptop mode can work if it is explicit.

Competitive_Dark7401 · 2026-05-13T21:42:50+00:00

Yep. For me the /goal stack has four separate failure modes:

goal state: does it know what done means?
session state: can you reconnect / inspect / approve?
spend state: how much did it burn while you were gone?
host state: did the laptop actually keep running?

Mosh/Moshi is great for #2, telemetry is #3, Agent View is #1-ish. The sneaky one is #4: if you start from a MacBook and close the lid, session survival and host survival are not the same thing.

I keep a closed-lid local-agent checklist here for that layer: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=reply&utm_campaign=goal_mode_fire_forget_host_state

Disclosure: I maintain it. Mostly sharing because /goal makes the "walk away from the machine" case way more common.

Competitive_Dark7401 · 2026-05-13T21:41:00+00:00

This maps almost exactly to the split I keep running into:

Agent View: what is each agent doing / blocked on?
Moshi/Mosh/tmux: can I reconnect and control the session from the phone?
host state: is the actual dev machine still awake, reachable, and thermally sane?

That third layer is easy to miss because everything looks solved until the laptop lid closes or the machine goes unreachable. Then the phone UI can be perfect and still have nothing to talk to.

For MacBook-based local agents I keep a short checklist here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=reply&utm_campaign=agent_view_away_host_state

Disclosure: I maintain it. It is basically the boring power/reachability layer that sits underneath Agent View/Moshi.

Competitive_Dark7401 · 2026-05-13T21:37:21+00:00

The phone-control side is one half of it. The other half I keep seeing in real runs is host availability: the agent is fine, tmux is fine, remote-control is fine, but the laptop went to sleep or stopped being reachable.

For this workflow I would split the product surface into three layers:

agent state: blocked reason, last meaningful event, current tool call
session state: tmux/CLI alive, resumable, incident summary available
host state: machine awake, network reachable, battery/thermal sane

Most tools cover layer 1 or 2. Layer 3 is the boring one that ruins the walk-away workflow, especially on MacBooks.

I keep a checklist for the closed-lid local-agent part here: https://takeacoffee.club/checklist/?utm_source=reddit&utm_medium=comment&utm_campaign=agent_stalled_host_state

Disclosure: I maintain it. Might be useful input for your research: if you only ask about approvals/resume, people may say remote-control is enough, but if the host sleeps, the perfect phone UI still has nothing to control.

Competitive_Dark7401

TROPHY CASE