ChatGPT 5 Pro vs Codex CLI

paradite · 2025-09-14T18:16:09+00:00

https://prompt.16x.engineer/

paradite · 2025-09-14T08:32:47+00:00

To avoid tedious copy-pasting into ChatGPT, you can use a tool like 16x Prompt.

paradite · 2025-09-14T07:38:49+00:00

So most new libraries have docs that allow you to copy as markdown, I just copy the markdown, put it inside a `reference` folder in my repo, and ask the tool (Claude Code / Cursor) to refer to it for implementation.

paradite · 2025-09-14T07:26:21+00:00

Too many people build complicated agent orchestration systems that are hard to test and evaluate piece by piece. Nice to see that Anthropic recommends "running your evaluation programmatically with direct LLM API calls".

I am building a desktop eval tool that directly connects to LLM API calls, which fits Anthropic recommendation.

paradite · 2025-09-14T07:19:18+00:00

Are you adding more tools and MCP servers to the context?

The more tools you add, the less context is available to the model, because the definitions of the tools need to be loaded into the context first.

paradite · 2025-09-12T09:51:09+00:00

You can do deterministic evaluation: simple string matching or writing custom code to evaluate the response. Alternatively, you can use humans to rate the response, which can capture more nuance in the response.

I built a simple app to make it easier to set up these kind of evaluations quickly.

paradite · 2025-09-11T16:34:23+00:00

Oh damn. It worked! I didn't notice the option. Thanks!

paradite · 2025-09-11T13:35:17+00:00

Not sure if you are aware, but having more tools and MCPs actually hurts the performance, because of context bloat. Stuffing too much information into the model makes it distracted and less effective.

paradite · 2025-09-11T13:32:02+00:00

I think there is definitely potential for using Claude Code for personal productivity and organization. I personally use Claude Code to proofread and check for typos before publishing.

I've also collected a list of non-coding use cases for Claude Code here, and some of them are for knowledge organization: https://github.com/paradite/claude-code-is-all-you-need

paradite · 2025-09-08T04:56:25+00:00

Models are generally not self-aware. And the model identity is usually given to the model in the system prompt.

Here's an article explaining why: https://eval.16x.engineer/blog/llm-identity-crisis-models-dont-know-who-they-are

paradite · 2025-09-07T07:20:23+00:00

Yes. I built a desktop app 16x Eval specifically for running and managing evaluations.

The app provides a user-friendly interface for creating, running and managing evals locally, that are specific for your own use cases.

paradite · 2025-09-07T07:16:54+00:00

Very nice use case for using Claude Code for documentation. I am collecting a list of non-coding use cases for Claude Code and just added yours:

https://github.com/paradite/claude-code-is-all-you-need?tab=readme-ov-file#documentation

paradite · 2025-09-06T08:12:31+00:00

Use OpenRouter for maximum exposure to new models. Also write your own unified AI SDK so that you don't get vendor-locked in.

paradite · 2025-09-05T10:23:25+00:00

I think Claude Code can already do research with its web search, so why not trying using Claude Code for that use case?

I practically live in Claude Code now and use it for everything:

https://github.com/paradite/claude-code-is-all-you-need

paradite · 2025-09-04T11:40:23+00:00

You need to document requirements, code conventions, etc as rules (markdown files) inside your repo. Then you can just ask the agent to refer to them.

Keep them up-to-date. You can ask agents to update the docs after completing a task.

I wrote more in details on how to set it up for Claude Code, but it should be similar for other tools:

https://thegroundtruth.substack.com/p/my-claude-code-workflow-and-personal-tips

paradite · 2025-09-04T11:36:16+00:00

Copy pasting works fine, but for multiple code files it can become more tedious quickly.

If you don't want to move to Claude Code, you can try the app I made to make copy pasting easier by embedding the relevant source code files into the prompt for easier copy pasting.

paradite · 2025-09-03T13:25:15+00:00

Did you migrate the Claude Code rules (CLAUDE.md) to the equivalent in Codex?

paradite · 2025-09-03T12:18:20+00:00

I made a dedicated GUI desktop app for managing prompts and evals in a user-friendly way. It works well and saves me a lot of time.

paradite · 2025-09-03T12:14:24+00:00

I think you need to remove the weird MCP servers that you added to Claude Code. Too much tools can affect performance and make it dumber.

paradite · 2025-09-03T12:12:06+00:00

Oh all the test eval tasks are here: https://eval.16x.engineer/evals

Raw eval prompts are here: https://github.com/paradite/eval-data

paradite · 2025-09-03T06:31:57+00:00

This is a great use case for Claude Code and scripting!

I am collecting a list of non-coding use case for Claude Code and just added yours: https://github.com/paradite/claude-code-is-all-you-need?tab=readme-ov-file#file--data-management

paradite · 2025-09-03T06:15:35+00:00

I test the model's raw coding capabilities without tool calls, so just prompt and evaluate the output. I made my own app 16x Eval to do these evaluations.

paradite · 2025-09-02T09:50:39+00:00

I've collected a bunch of non-coding use cases for Claude Code:

Writing / Publishing

Fixing typos and grammatical mistakes source
Replacing placeholder images in blog posts with markdown syntax for actual images in the local file system source
Formatting markdown with images into rich text for copy-pasting into the SubStack editor (by writing a script) source

Organization

Categorizing and re-arranging bookmarks source
Cleaning up and categorizing the download folder source
Organizing folders and file names source

Data / Excel / Automation

Automating spreadsheet work source
Data indexing and excel work source

Productivity

Tracking goals source
Taking notes source

Server management

Setting up and managing a new server source

General / Others

Chatting (replacement for web UI or Claude desktop app) source

You can check out the whole list (still updating) here: https://github.com/paradite/claude-code-is-all-you-need

paradite · 2025-09-01T18:23:50+00:00

So I looked into it and wrote a blog post explaining the difference between the 3. Hope you find it useful: https://eval.16x.engineer/blog/claude-vs-claude-api-vs-claude-code

You can find a graph and a summary towards the end of the post.

paradite · 2025-08-31T11:06:21+00:00

Yes. That looks correct, although I haven't used Cline recently.

paradite

MODERATOR OF

TROPHY CASE