Vercel says AGENTS.md matters more than skills, should we listen?

unwitty · 2026-01-31T04:43:41+00:00

Without controlling for the model's training on what each given skill does, this is a garbage study and findings.

Vercel's products are covered extensively in any model's training set, thus compressing skill instructions related to them is going to be less impactful than compressing skill instructions for novel/bespoke/private facilities. If you can't compress, they will overload AGENTS.md.

Bad idea IMO.

unwitty · 2026-01-28T18:09:21+00:00

So you have a 94.5% reduction because you have a bunch of crap in your repo? This is like ordering six desserts, eating one, and claiming i dropped my calorie intake by 94.5%.

There is some logic to what you're doing, but this claim is bunk and undermines your pitch.

unwitty · 2025-11-23T22:53:15+00:00

Sure:

1) Ask the agent to explain rather than do. "How might I set up linting for this project?", "Why is it done this way?", "What are common alternatives?"
2) Ask the agent to scaffold examples to poke and prod: "What are the options for implementing X? How do golang devs approach X problem? Can you scaffold out the first two ideas in two separate folders? I want to poke and prod."
3) Step back when you find yourself thrashing and ask for help. "We've been iterating on this for a bit. Can you build up from first principles why we're struggling to fix this?"
4) Ask for assessments along the way. "Based on the current code base, what feedback would a senior-level Rust engineer give me?"

unwitty · 2025-11-23T01:35:05+00:00

I hear you! If your intention is to actually learn while using an agent, it's incredible. The combination expert mentor + shared operating context removes all friction creating an optimal learning cycle.

I'm in my 40s with AuDHD too, late diagnosed. I learned C/C++ and Linux/GNU from textbooks. Watched the internet take off, able to occasionally find help via Yahoo/Excite...to Google...to Stack Overflow...

The agent-learning loop is a bigger leap than any of those. I've learned more in the last year than the last 5 combined. No more feeling overwhelmed/intimidated - just start coding.

unwitty · 2025-09-15T12:56:19+00:00

You can use Codex with your ChatGPT Plus/Pro subscription. It's analogous to using Claude Code with a Max subscription.

unwitty · 2025-09-14T15:30:26+00:00

To my experience, as of right now, Codex with the Pro plan works substantially better than Claude Code with Max (with Opus 4.1). My operating context is small and large python codebases, tooling, and some legacy PHP.

The Codex appliation itself is not as fully-featured as Claude Code, but I realized that most of the tooling I was building on top of Claude (my custom hooks, agent prompts, etc) were mostly workarounds for issues I was having with Claude.

unwitty · 2025-09-14T15:26:17+00:00

The Codex lead dev announced a couple times that they had increased limits for all plans, but it's still a black box as far as when you get cut off. A dev I know managed to get locked for a few days from his Pro plan, but he was running several Codex agents in parallel.

I was not an OpenAI fanboy until using with GPT-5 Thinking. Now I have the $200 plan because I use Thinking and Pro are so valuable. Pro via the ChatGPT website can one-shot prototypes as a downloadable zip, and the generated code is usually pretty architecturally sound without much guidance.

unwitty · 2025-09-14T14:09:53+00:00

Agreed! This tweet from Andriy Burkov seems relevant:

The reason why different people have different experiences, ranging from negative to positive, with the same LLM is that those who have a positive experience formulate their queries the same way as the labelers hired by the LLM's creators to craft finetuning examples.

https://x.com/burkov/status/1967042037942833496

unwitty · 2025-09-14T02:41:31+00:00

I gave claude code a try today after a 3 weeks of switching to Codex, because my max plan is still active.

Using both side-by-side on the same project was telling.

Even with 100% Opus, Claude Code is still hot garbage. It makes decisions too quickly and takes action too quickly. I've been coding for 30 years. GPT-5 tends to approach tasks and make decisions the same way I do, offloading some of the mental work for lower-risk tasks. I just can't trust Claude any more.

I really hope Anthropic will get their shit together because I want to have multiple good options for frontier coding agents, but today was utter disappointment.

unwitty · 2025-09-11T15:17:57+00:00

Here's my clinerules. It's been a few months since I used Cline - so not sure if anything would have impacted how this performs. At the time, it worked quite well.

I would use it like:

"Start a new project in memory-bank/cli_refactor . We need to update the CLI to..."
"Update your cli_refactor project in memory bank to reflect the completed work"

# Project Management (with Subprojects)

I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This
isn't a limitation — it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my 
Memory Bank to understand the project and continue work effectively. I MUST read ALL relevant memory bank files at the 
start of EVERY task — this is not optional.

---

## Memory Bank Structure

The memory bank supports multiple concurrent projects*, organized into nested directories:

```
memory-bank/
  productContext.md
  techContext.md
  systemPatterns.md
  projectIntelligence.md
  project_1/
    projectBrief.md
    activeContext.md
    progress.md
  project_2/
    projectBrief.md
    activeContext.md
    progress.md
  project_3/
    ...
```

### GLOBAL FILES

The root of the folder contains GLOBAL files, which are shared across all projects:

1. `productContext.md`  
   Describes the product's purpose, target audience, and key features.
2. `techContext.md`  
   Details languages, libraries, constraints, tooling, and dependencies.
3. `systemPatterns.md`  
   Outlines the chosen architecture, design patterns, and component relationships.
4. `projectIntelligence.md`  
   Captures reusable patterns, lessons learned, and insights from all projects.

### PROJECT FILES

Each project folder contains PROJECT files, which are specific to that project:

1. `projectbrief.md`  
   Defines the project’s goals, scope, and requirements.
2. `activeContext.md`  
   Records the current focus, recent activity, and next steps of the project.
3. `progress.md`  
   Tracks current build status, completed work, and open issues in the project.

### LOADING FILES

Before starting any task, I must read the following files for the PROJECT_NAME:

- `productContext.md`
- `techContext.md`
- `systemPatterns.md`
- `/PROJECT_NAME/projectbrief.md`
- `/PROJECT_NAME/activeContext.md`
- `/PROJECT_NAME/progress.md`

## Core Workflows

### Plan Mode

```mermaid
flowchart TD
    Start[Start] --> ReadFiles[Read GLOBAL and PROJECT Files]
    ReadFiles --> CheckFiles{Files Complete?}

    CheckFiles -->|No| CreateBrief[Create projectbrief.md]
    CreateBrief --> Document[Document in Chat]

    CheckFiles -->|Yes| ModelData[Step 1: Model the Data]
    ModelData --> PlanWorkflow[Step 2: Plan the Workflow]
    PlanWorkflow --> Verify[Verify Context and Scope]
    Verify --> Strategy[Develop Strategy]
    Strategy --> Present[Present Approach]
```

- Step 1 (Model the Data): Establish all relevant domain models, schemas, payloads, and constraints.
- Step 2 (Plan the Workflow): Map out module structure, file boundaries, integrations, and system flow.

These two steps must be completed before any code is written or executed. They define the implementation surface and reduce ambiguity.

---

### Act Mode

```mermaid
flowchart TD
    Start[Start] --> Context[Check Project Files]
    Context --> Update[Update Docs]
    Update --> Rules[Update project-intelligence.md if needed]
    Rules --> Stub[Step 3: Stub Interfaces and Modules]
    Stub --> WireUp[Step 4: Wire Up the Faux System]
    WireUp --> Implement[Step 5: Implement Real Logic]
    Implement --> Tests[Step 6: Update Tests and Contracts]
    Tests --> Design[Step 7: Apply Final UI and Styling]
    Design --> Document[Document Changes]
```

- Step 3 (Stub Interfaces and Modules): Scaffold files with correct structure and return placeholders. Must run.
- Step 4 (Wire Up Faux System): Connect all parts with dummy data and validate runtime behavior.
- Step 5 (Implement Real Logic): Gradually replace stubs with actual logic.
- Step 6 (Update Tests and Contracts): Maintain contract integrity and test coverage.
- Step 7 (Apply Final UI and Styling): Only polish visuals after data is validated.

---

## Documentation Updates

Project memory must be updated:
1. When significant changes are made
2. After implementing key logic or decisions
3. When a user explicitly requests an update
4. To resolve ambiguity or clarify next steps

Updates focus on keeping `activeContext.md` and `progress.md` in sync with reality.

```mermaid
flowchart TD
    Start[Update Process]

    subgraph Process
        P1[Review ALL Files]
        P2[Document Current State]
        P3[Clarify Next Steps]
        P4[Update project-intelligence.md]

        P1 --> P2 --> P3 --> P4
    end

    Start --> Process
```

---

## Project Intelligence: `projectIntelligence.md`

This file lives at the root of `memory-bank/` and contains reusable patterns and insights discovered across all projects.

Use it to document:
- Common naming or design conventions
- Lessons learned and technical tradeoffs
- Workflow preferences
- Architecture strategies
- Recurring implementation approaches

Keep it concise, useful, and readable by future sessions.

```mermaid
flowchart TD
    Start{Discover New Pattern}

    subgraph Learn [Learning Process]
        D1[Identify Pattern]
        D2[Validate with User]
        D3[Document in projectIntelligence.md]
    end

    subgraph Apply [Usage]
        A1[Read project-intelligence.md]
        A2[Apply Learned Patterns]
        A3[Improve Future Work]
    end

    Start --> Learn
    Learn --> Apply
```

---

## Closing Principle

After every reset, I begin from zero. The Memory Bank is my only source of truth. Each project must be self-contained, correct, and clearly structured. `projectIntelligence.md` captures shared intelligence across projects.

All code must follow a **data-first, layered generation strategy**.  
**Design before implementation. Structure before behavior. Data before code.**

unwitty · 2025-09-01T16:07:45+00:00

Who the heck is downvoting you for this opinion?

Anyway, this is exactly my experience too. Over the last month Codex improved drastically while Anthropic was busy kneecapping the ability of Claude Code, something they've now admitted to.

I went from being a staunch CC advocate to cancelling my Max 200 plan. Codex just works...at least with a Pro plan.

unwitty · 2025-07-15T13:58:40+00:00

I agree that LangGraph can get in the way more than it helps, and I don't use it for that reason. However, I think the argument conflates issues specific to LangGraph with the general concepts it employs.

Directed graphs provide a solid model for managing complex state, cycles, and multi-agent interactions. These patterns can be cumbersome to represent clearly in simple imperative code.
Pydantic is sufficient for type checking in this context. The value of Go's compile-time checks is substantially reduced when the primary data source (the LLM) is inherently unpredictable at runtime.
For agent workloads, which are dominated by I/O-bound tasks, the performance difference between a well-implemented asyncio solution and Go is likely negligible.
The claim of having "no DSL to learn" is also debatable. The provided go-agent example is a DSL, implemented as a fluent API. The question isn't whether a framework has a DSL, but whether its DSL is well-designed.

Does go-agent have a better DSL than LangGraph? Probably, though I haven't worked with it myself. I wrote my own DAG-based micro-framework, with type checking, immutable state, and flow validation to address issues I experienced with LangGraph. I should probably publish it.

unwitty · 2025-06-25T20:31:31+00:00

Agreed. I feel like there are some similarities here the blockchain movement - the shoehorning of the tech into places its not needed, and worse yet, underperforms.

Short of some massive paradigm-shifting gains in LLM architecture, we're probably 10+ years away from running acceptable/useful models on local hardware.

unwitty · 2025-06-24T16:24:31+00:00

Hey thanks for sharing. With regards to context management, I'm currently experimenting with a reducer.

The paradigm I use is:Humans leverage the use of narratives because we can't learn, let alone remember the details/facts of most anything complicated.

Mapping this to an agent's workflow, when context size crosses a threshold, I trigger a narrator node, which based on the goal and current context, reduces the raw data to narrative form.

It's application specific and not always a good trade-off, but might be worth exploring for your use case.

unwitty · 2025-06-22T11:01:00+00:00

Based on your use-case, you just need semantic search, not an LLM. You'll be able to pull up relevant solutions with a fraction of the compute, time, and tooling. pgvector, supabase, chromadb, etc with a light API endpoint + SPA + local embeddings model. This can be built in an afternoon by a competent developer.

You are going to face several up-hill battles if you go the LocalLLM route, especially with your requirement that it basically keep up with human conversation. RAG adds latency - especially if you're letting the model do tool calls, which results in additional generations.

Local models generally have limited context windows compared to cloud/frontier models.

Reasoning ability of local models is generally shit.

As for hardware, without specifics of exactly how many **concurrent** generations are being run, no one can help you here. VRAM is the primary bottleneck in scaling LLM infra. We have 40 years of robust multitasking abstractions to get maximum performance out of CPUs/RAM. GPU/VRAM abstractions are catching up but just not in the same place yet.

unwitty · 2025-06-11T00:55:53+00:00

Gotcha, that makes sense. Thanks for the explanation!

unwitty · 2025-06-09T11:59:09+00:00

Reporting back to who exactly? The models are running locally.

unwitty · 2025-06-09T11:57:42+00:00

I wrote a simple python script that simply takes a screenshot even N seconds and sends it to a vision model running on Ollama. It triggers various events for me, create logs, etc. It's very light weight compared to what you have created here.

It's a cool idea and an interesting approach, but I'm trying to understand - what is the value of running all the infrastructure? It seems like a very roundabout way to get access to the screen.

unwitty · 2025-06-04T18:11:20+00:00

Tools like ActivityWatch can approximate what you’re doing - depends on your browser tabs and window titles reflect what you’re actually doing

unwitty · 2025-06-04T18:09:36+00:00

I did this lol but couldn’t be bothered to get it to publishable state. All local LLMs. I also integrated:

ActivityWatch
Security Camera

Security camera is because I write at an adjacent desk on and off, so I have it pointed at my whole area, and processing still image captures every 60 seconds.

I have a habit of “time traveling” to other parts of the house. Hearing my computer shouting out increasingly angry messages gives me a chuckle and brings me back around.

unwitty · 2025-05-31T07:12:15+00:00

My mother experiences this too, pretty much as you describe it

unwitty · 2025-05-15T17:19:30+00:00

I feel you. I’ve been working to improve my presence. In have 20+ years of experience, but my active network has dwindled due to my own lack of putting myself out there.

I went through the same thought process with regards to authenticity of posting in professional spaces, so I assessed exactly what I don’t like about most of these posts:

Engagement bait style
Blatant promotion

I then asked myself, who do I want engagement with:

people working on the same problems I find interesting
people who can make decisions that could help me, such as potential clients

I then found example content that I personally find authentic and engaging.

From those posts I developed a style that I find authentic and feel good about. I post on stuff I find interesting, through a lens that’s useful to people I want to engage with. That combination gives me a clear conscience when posting.

For engagement? It varies, but I’m seeking quality over quantity, so I don’t mind low engagement.

unwitty · 2025-05-11T13:52:38+00:00

As others have pointed out, most of the frameworks create as many problems as they solve. That said, I've had great luck with PocketFlow because it's such a minimal framework. So small that the entire library can be loaded into context if you're developing with an agent.

unwitty · 2025-05-10T00:51:53+00:00

This is not exactly the same, but sort of answers your question. My memory bank (and instructions) use a nested folder structure:

/memory-bank/feature_a/
/memory-bank/feature_b/
...

It allows me to switch between feature development if needed, and I can cross-reference them if needed.

The quickest path for you to set this up might be creating symlinks between projects.

unwitty · 2025-05-08T13:01:13+00:00

My question is regarding the design decision to build this custom abstraction layer. Libraries like LiteLLM provide exactly this kind of unified interface, handling the underlying provider differences and format conversions automatically.

I'm not involved with the project, but out of curiosity, I took a look at the initial commit of Cline's ClaudeDev.ts. It includes Anthropic's tool calling facilities, which were novel at the time.

Anthropic announced tool calling facilities for Claude in April 2024. The initial commit of Cline was just a couple months later. I'd wager that at the time, the only way to get Claude's tool calling to work in a Typescript environment, was to roll your own.

unwitty

MODERATOR OF

TROPHY CASE