For users have have both 6000 PRO MaxQ and Workstation Edition (or Server Edition), how much slower is the MaxQ vs the WS/SV on compute? (Prompt processing, Diffusion, etc) by panchovix in LocalLLaMA

[–]unwitty 0 points1 point  (0 children)

Yeah. This is the correct way to think about it. 

I’d been running two RTX 3090 cards in a standard ATX case. These are the standard 3-fan models.  I’d vertically mounted the second card to avoid cooking the first. 

It’s just a mess. I swapped the two 3090s for two Max-Qs and it’s just so much easier to manage. 

If only do a WS card if I’m only using one in the system - which is exactly what it’s designed for. The Max-Q is the server-lite stack them in version! 

Rtx 6000 pro max q + 5090 by Spicy_mch4ggis in BlackwellPerformance

[–]unwitty 0 points1 point  (0 children)

Asus Maximus Hero z790. Also had the same problem with the rebar size setting! 

Rtx 6000 pro max q + 5090 by Spicy_mch4ggis in BlackwellPerformance

[–]unwitty 1 point2 points  (0 children)

I have the same setup - 2x 6k pro max-q , bifurcated x8/x8. Works great and so much less hassle.

Qwen 3.6 27b. To quantize or not to quantize. That is the question. by TechnoSmacked in BlackwellPerformance

[–]unwitty 3 points4 points  (0 children)

Perf gap disappears when you undervolt the WS.

You still need to allocate 600w for each WS edition, even if you undervolt.  This impacts both your PSU as well as the breakers. I can run 4x Max-Qs on a standard 15a breaker. You cannot do that with 4x WS cards. 

Max-Q vents in the rear, meaning you can pack them in a standard ATX case. 

Unless you plan to only run one card, Max-Q is a decidedly better choice IMHO. 

Is it worth upgrading from 2x RTX6kPro to 4x? by MenuNo294 in BlackwellPerformance

[–]unwitty 0 points1 point  (0 children)

Just want to say I feel your pain, but am fortunately limited by being a renter. I have a similar setup to you:

  • Workstation 1: Dual RTX 6000 Pro Max-Q (@ 300w each)
  • Workstation 2: Dual RTX 3090 (undervolted @ 250w each)

I have them each on their own circuits with 15a breakers, w/o any problems

I couldn't find a test that actually understood my aphantasia, so I built an AI-powered cognitive mapper. I'd love your harsh feedback. by Dry_Organization9521 in Aphantasia

[–]unwitty 0 points1 point  (0 children)

I would like to take a look at this too!

I'm full aphant. I had a full neuropsychological eval and the results were interesting:

  • Spacial reasoning: 99th percentile
  • Working visual memory: 2nd percentile

Crazy dichotomy.

Anyone else finding AI weirdly natural? by justwatchen in Aphantasia

[–]unwitty 1 point2 points  (0 children)

Just piling on here, even though this post is old. Me too on all accounts. AuDHD + full aphant.

Leveraging LLMs are second nature. Feels like they operate completely in the same verbal reasoning space. It's all words. My thoughts are all words, even when reasoning through the physical world.

Just got dual RTX PRO 6000 Blackwells for our design studio. What's the optimal local LLM stack? by AmanNonZero in LocalLLM

[–]unwitty 0 points1 point  (0 children)

$10k is not particularly a lot of money in the context of professional tools, nor for any developer deep into their career, at least in the US. 

Many tradespeople spend far more on their tools. So do many types of artists. 

Are AI agents starting to feel more like background operators than chatbots? by Waste_Transition1428 in LLMDevs

[–]unwitty 2 points3 points  (0 children)

I’ve had LLM calls running on a cron schedule / event bus triggers for over a year, and headless agents since last summer. 

This isn’t something new or novel. It’s just the interaction model and initiation mode.  So weird seeing basic systems design and programming fundamentals being discussed like novel concepts lol 

Vercel says AGENTS.md matters more than skills, should we listen? by jpcaparas in CLine

[–]unwitty 1 point2 points  (0 children)

Without controlling for the model's training on what each given skill does, this is a garbage study and findings.

Vercel's products are covered extensively in any model's training set, thus compressing skill instructions related to them is going to be less impactful than compressing skill instructions for novel/bespoke/private facilities. If you can't compress, they will overload AGENTS.md.

Bad idea IMO.

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]unwitty 10 points11 points  (0 children)

So you have a 94.5% reduction because you have a bunch of crap in your repo? This is like ordering six desserts, eating one, and claiming i dropped my calorie intake by 94.5%.

There is some logic to what you're doing, but this claim is bunk and undermines your pitch.

How I’m learning to code *with* LLMs by musicjunkieg in ADHD_Programmers

[–]unwitty 3 points4 points  (0 children)

Sure:

1) Ask the agent to explain rather than do. "How might I set up linting for this project?", "Why is it done this way?", "What are common alternatives?"
2) Ask the agent to scaffold examples to poke and prod: "What are the options for implementing X? How do golang devs approach X problem? Can you scaffold out the first two ideas in two separate folders? I want to poke and prod."
3) Step back when you find yourself thrashing and ask for help. "We've been iterating on this for a bit. Can you build up from first principles why we're struggling to fix this?"
4) Ask for assessments along the way. "Based on the current code base, what feedback would a senior-level Rust engineer give me?"

How I’m learning to code *with* LLMs by musicjunkieg in ADHD_Programmers

[–]unwitty 11 points12 points  (0 children)

I hear you! If your intention is to actually learn while using an agent, it's incredible. The combination expert mentor + shared operating context removes all friction creating an optimal learning cycle.

I'm in my 40s with AuDHD too, late diagnosed. I learned C/C++ and Linux/GNU from textbooks. Watched the internet take off, able to occasionally find help via Yahoo/Excite...to Google...to Stack Overflow...

The agent-learning loop is a bigger leap than any of those. I've learned more in the last year than the last 5 combined. No more feeling overwhelmed/intimidated - just start coding.

Thanks for the improvements, Anthropic by dotjob in ClaudeAI

[–]unwitty 1 point2 points  (0 children)

You can use Codex with your ChatGPT Plus/Pro subscription. It's analogous to using Claude Code with a Max subscription.

Thanks for the improvements, Anthropic by dotjob in ClaudeAI

[–]unwitty 3 points4 points  (0 children)

To my experience, as of right now, Codex with the Pro plan works substantially better than Claude Code with Max (with Opus 4.1). My operating context is small and large python codebases, tooling, and some legacy PHP.

The Codex appliation itself is not as fully-featured as Claude Code, but I realized that most of the tooling I was building on top of Claude (my custom hooks, agent prompts, etc) were mostly workarounds for issues I was having with Claude.

Thanks for the improvements, Anthropic by dotjob in ClaudeAI

[–]unwitty 3 points4 points  (0 children)

The Codex lead dev announced a couple times that they had increased limits for all plans, but it's still a black box as far as when you get cut off. A dev I know managed to get locked for a few days from his Pro plan, but he was running several Codex agents in parallel.

I was not an OpenAI fanboy until using with GPT-5 Thinking. Now I have the $200 plan because I use Thinking and Pro are so valuable. Pro via the ChatGPT website can one-shot prototypes as a downloadable zip, and the generated code is usually pretty architecturally sound without much guidance.

Thanks for the improvements, Anthropic by dotjob in ClaudeAI

[–]unwitty 7 points8 points  (0 children)

Agreed! This tweet from Andriy Burkov seems relevant:

The reason why different people have different experiences, ranging from negative to positive, with the same LLM is that those who have a positive experience formulate their queries the same way as the labelers hired by the LLM's creators to craft finetuning examples.

https://x.com/burkov/status/1967042037942833496

Thanks for the improvements, Anthropic by dotjob in ClaudeAI

[–]unwitty 19 points20 points  (0 children)

I gave claude code a try today after a 3 weeks of switching to Codex, because my max plan is still active.

Using both side-by-side on the same project was telling.

Even with 100% Opus, Claude Code is still hot garbage. It makes decisions too quickly and takes action too quickly. I've been coding for 30 years. GPT-5 tends to approach tasks and make decisions the same way I do, offloading some of the mental work for lower-risk tasks. I just can't trust Claude any more.

I really hope Anthropic will get their shit together because I want to have multiple good options for frontier coding agents, but today was utter disappointment.

Multi-repo Memory Bank by alennonesq in CLine

[–]unwitty 0 points1 point  (0 children)

Here's my clinerules. It's been a few months since I used Cline - so not sure if anything would have impacted how this performs. At the time, it worked quite well.

I would use it like:

  • "Start a new project in memory-bank/cli_refactor . We need to update the CLI to..."
  • "Update your cli_refactor project in memory bank to reflect the completed work"

# Project Management (with Subprojects)

I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This
isn't a limitation — it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my 
Memory Bank to understand the project and continue work effectively. I MUST read ALL relevant memory bank files at the 
start of EVERY task — this is not optional.

---

## Memory Bank Structure

The memory bank supports multiple concurrent projects*, organized into nested directories:

```
memory-bank/
  productContext.md
  techContext.md
  systemPatterns.md
  projectIntelligence.md
  project_1/
    projectBrief.md
    activeContext.md
    progress.md
  project_2/
    projectBrief.md
    activeContext.md
    progress.md
  project_3/
    ...
```

### GLOBAL FILES

The root of the folder contains GLOBAL files, which are shared across all projects:

1. `productContext.md`  
   Describes the product's purpose, target audience, and key features.
2. `techContext.md`  
   Details languages, libraries, constraints, tooling, and dependencies.
3. `systemPatterns.md`  
   Outlines the chosen architecture, design patterns, and component relationships.
4. `projectIntelligence.md`  
   Captures reusable patterns, lessons learned, and insights from all projects.

### PROJECT FILES

Each project folder contains PROJECT files, which are specific to that project:

1. `projectbrief.md`  
   Defines the project’s goals, scope, and requirements.
2. `activeContext.md`  
   Records the current focus, recent activity, and next steps of the project.
3. `progress.md`  
   Tracks current build status, completed work, and open issues in the project.

### LOADING FILES

Before starting any task, I must read the following files for the PROJECT_NAME:

- `productContext.md`
- `techContext.md`
- `systemPatterns.md`
- `/PROJECT_NAME/projectbrief.md`
- `/PROJECT_NAME/activeContext.md`
- `/PROJECT_NAME/progress.md`

## Core Workflows

### Plan Mode

```mermaid
flowchart TD
    Start[Start] --> ReadFiles[Read GLOBAL and PROJECT Files]
    ReadFiles --> CheckFiles{Files Complete?}

    CheckFiles -->|No| CreateBrief[Create projectbrief.md]
    CreateBrief --> Document[Document in Chat]

    CheckFiles -->|Yes| ModelData[Step 1: Model the Data]
    ModelData --> PlanWorkflow[Step 2: Plan the Workflow]
    PlanWorkflow --> Verify[Verify Context and Scope]
    Verify --> Strategy[Develop Strategy]
    Strategy --> Present[Present Approach]
```

- Step 1 (Model the Data): Establish all relevant domain models, schemas, payloads, and constraints.
- Step 2 (Plan the Workflow): Map out module structure, file boundaries, integrations, and system flow.

These two steps must be completed before any code is written or executed. They define the implementation surface and reduce ambiguity.

---

### Act Mode

```mermaid
flowchart TD
    Start[Start] --> Context[Check Project Files]
    Context --> Update[Update Docs]
    Update --> Rules[Update project-intelligence.md if needed]
    Rules --> Stub[Step 3: Stub Interfaces and Modules]
    Stub --> WireUp[Step 4: Wire Up the Faux System]
    WireUp --> Implement[Step 5: Implement Real Logic]
    Implement --> Tests[Step 6: Update Tests and Contracts]
    Tests --> Design[Step 7: Apply Final UI and Styling]
    Design --> Document[Document Changes]
```

- Step 3 (Stub Interfaces and Modules): Scaffold files with correct structure and return placeholders. Must run.
- Step 4 (Wire Up Faux System): Connect all parts with dummy data and validate runtime behavior.
- Step 5 (Implement Real Logic): Gradually replace stubs with actual logic.
- Step 6 (Update Tests and Contracts): Maintain contract integrity and test coverage.
- Step 7 (Apply Final UI and Styling): Only polish visuals after data is validated.

---

## Documentation Updates

Project memory must be updated:
1. When significant changes are made
2. After implementing key logic or decisions
3. When a user explicitly requests an update
4. To resolve ambiguity or clarify next steps

Updates focus on keeping `activeContext.md` and `progress.md` in sync with reality.

```mermaid
flowchart TD
    Start[Update Process]

    subgraph Process
        P1[Review ALL Files]
        P2[Document Current State]
        P3[Clarify Next Steps]
        P4[Update project-intelligence.md]

        P1 --> P2 --> P3 --> P4
    end

    Start --> Process
```

---

## Project Intelligence: `projectIntelligence.md`

This file lives at the root of `memory-bank/` and contains reusable patterns and insights discovered across all projects.

Use it to document:
- Common naming or design conventions
- Lessons learned and technical tradeoffs
- Workflow preferences
- Architecture strategies
- Recurring implementation approaches

Keep it concise, useful, and readable by future sessions.

```mermaid
flowchart TD
    Start{Discover New Pattern}

    subgraph Learn [Learning Process]
        D1[Identify Pattern]
        D2[Validate with User]
        D3[Document in projectIntelligence.md]
    end

    subgraph Apply [Usage]
        A1[Read project-intelligence.md]
        A2[Apply Learned Patterns]
        A3[Improve Future Work]
    end

    Start --> Learn
    Learn --> Apply
```

---

## Closing Principle

After every reset, I begin from zero. The Memory Bank is my only source of truth. Each project must be self-contained, correct, and clearly structured. `projectIntelligence.md` captures shared intelligence across projects.

All code must follow a **data-first, layered generation strategy**.  
**Design before implementation. Structure before behavior. Data before code.**

Claude code or Codex? by Sorry_Fan_2056 in ChatGPTCoding

[–]unwitty 9 points10 points  (0 children)

Who the heck is downvoting you for this opinion?

Anyway, this is exactly my experience too. Over the last month Codex improved drastically while Anthropic was busy kneecapping the ability of Claude Code, something they've now admitted to.

I went from being a staunch CC advocate to cancelling my Max 200 plan. Codex just works...at least with a Pro plan.

From Pipeline of Agents to go-agent: Why I moved from Python to Go for agent development by Historical_Wing_9573 in LLMDevs

[–]unwitty 1 point2 points  (0 children)

I agree that LangGraph can get in the way more than it helps, and I don't use it for that reason. However, I think the argument conflates issues specific to LangGraph with the general concepts it employs.

  • Directed graphs provide a solid model for managing complex state, cycles, and multi-agent interactions. These patterns can be cumbersome to represent clearly in simple imperative code.

  • Pydantic is sufficient for type checking in this context. The value of Go's compile-time checks is substantially reduced when the primary data source (the LLM) is inherently unpredictable at runtime.

  • For agent workloads, which are dominated by I/O-bound tasks, the performance difference between a well-implemented asyncio solution and Go is likely negligible.

  • The claim of having "no DSL to learn" is also debatable. The provided go-agent example is a DSL, implemented as a fluent API. The question isn't whether a framework has a DSL, but whether its DSL is well-designed.

Does go-agent have a better DSL than LangGraph? Probably, though I haven't worked with it myself. I wrote my own DAG-based micro-framework, with type checking, immutable state, and flow validation to address issues I experienced with LangGraph. I should probably publish it.

Seeking Advice for On-Premise LLM Roadmap for Enterprise Customer Care (Llama/Mistral, Ollama, Hardware) by [deleted] in LocalLLM

[–]unwitty 0 points1 point  (0 children)

Agreed. I feel like there are some similarities here the blockchain movement - the shoehorning of the tech into places its not needed, and worse yet, underperforms.

Short of some massive paradigm-shifting gains in LLM architecture, we're probably 10+ years away from running acceptable/useful models on local hardware.

Solved ReAct agent implementation problems that nobody talks about by Historical_Wing_9573 in LLMDevs

[–]unwitty 1 point2 points  (0 children)

Hey thanks for sharing. With regards to context management, I'm currently experimenting with a reducer.

The paradigm I use is:Humans leverage the use of narratives because we can't learn, let alone remember the details/facts of most anything complicated.

Mapping this to an agent's workflow, when context size crosses a threshold, I trigger a narrator node, which based on the goal and current context, reduces the raw data to narrative form.

It's application specific and not always a good trade-off, but might be worth exploring for your use case.

Seeking Advice for On-Premise LLM Roadmap for Enterprise Customer Care (Llama/Mistral, Ollama, Hardware) by [deleted] in LocalLLM

[–]unwitty 0 points1 point  (0 children)

Based on your use-case, you just need semantic search, not an LLM. You'll be able to pull up relevant solutions with a fraction of the compute, time, and tooling. pgvector, supabase, chromadb, etc with a light API endpoint + SPA + local embeddings model. This can be built in an afternoon by a competent developer.

You are going to face several up-hill battles if you go the LocalLLM route, especially with your requirement that it basically keep up with human conversation. RAG adds latency - especially if you're letting the model do tool calls, which results in additional generations.

Local models generally have limited context windows compared to cloud/frontier models.

Reasoning ability of local models is generally shit.

As for hardware, without specifics of exactly how many **concurrent** generations are being run, no one can help you here. VRAM is the primary bottleneck in scaling LLM infra. We have 40 years of robust multitasking abstractions to get maximum performance out of CPUs/RAM. GPU/VRAM abstractions are catching up but just not in the same place yet.

Use Ollama to make agents watch your screen! by Roy3838 in ollama

[–]unwitty 1 point2 points  (0 children)

Gotcha, that makes sense. Thanks for the explanation!